Closed Bug 1306856 Opened 9 years ago Closed 9 years ago

Intermittent TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_verifyVideoAfterRenegotiation.html | application timed out after 330 seconds with no output

Categories

(Core :: WebRTC, defect, P3)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1304519

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure)

Same thing, do you know what's up with that? I see the following: > TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_verifyVideoAfterRenegotiation.html | ICE gathering state has reached completed but I don't know if it's relevant.
Flags: needinfo?(pehrson)
I think that's irrelevant. It's interesting that this is a `application timed out after 330 seconds with no output` failure though. That would indicate a deadlock or something similar. This is the interesting bit: 04:57:52 INFO - Thread 17 04:57:52 INFO - 0 ntdll.dll!KiFastSystemCallRet + 0x0 04:57:52 INFO - eip = 0x77cd70b4 esp = 0x07c5f5b0 ebp = 0x07c5f614 ebx = 0x00000000 04:57:52 INFO - esi = 0x01000044 edi = 0x00000000 eax = 0x00b09f18 ecx = 0x07c5f610 04:57:52 INFO - edx = 0x00000000 efl = 0x00000213 04:57:52 INFO - Found by: given as instruction pointer in context 04:57:52 INFO - 1 ntdll.dll!NtWaitForSingleObject + 0xc 04:57:52 INFO - eip = 0x77cd6a24 esp = 0x07c5f5b4 ebp = 0x07c5f614 04:57:52 INFO - Found by: call frame info 04:57:52 INFO - 2 ntdll.dll!RtlpWaitOnCriticalSection + 0xc4 04:57:52 INFO - eip = 0x77cc2264 esp = 0x07c5f5b8 ebp = 0x07c5f614 04:57:52 INFO - Found by: call frame info 04:57:52 INFO - 3 ntdll.dll!EtwEventEnabled + 0xd9 04:57:52 INFO - eip = 0x77cc2148 esp = 0x07c5f61c ebp = 0x07c5f63c ebx = 0x00000020 04:57:52 INFO - Found by: call frame info 04:57:52 INFO - 4 mozglue.dll!moz_xmalloc [mozalloc.cpp:aa4fe3cf49e0 : 83 + 0xaa] 04:57:52 INFO - eip = 0x69f32cb0 esp = 0x07c5f644 ebp = 0x07c5f660 ebx = 0x00000001 04:57:52 INFO - Found by: call frame info 04:57:52 INFO - 5 xul.dll!mozilla::NewRunnableMethod<mozilla::net::nsSocketTransportService *,void ( mozilla::net::nsSocketTransportService::*)(void)>(mozilla::net::nsSocketTransportService *,void ( mozilla::net::nsSocketTransportService::*)(void)) [nsThreadUtils.h:aa4fe3cf49e0 : 793 + 0x6] 04:57:52 INFO - eip = 0x617cf955 esp = 0x07c5f668 ebp = 0x07c5f674 04:57:52 INFO - Found by: call frame info 04:57:52 INFO - 6 xul.dll!mozilla::net::nsSocketTransportService::Run() [nsSocketTransportService2.cpp:aa4fe3cf49e0 : 918 + 0x15] 04:57:52 INFO - eip = 0x6152c283 esp = 0x07c5f67c ebp = 0x07c5f844 04:57:52 INFO - Found by: call frame info I'm forwarding this to Byron.
Flags: needinfo?(pehrson) → needinfo?(docfaraday)
Rank: 35
Priority: -- → P3
An assert in the memory allocator. Maybe OOM?
Flags: needinfo?(docfaraday)
Isn't that just a symptom of the harness force-killing the application after the timeout?
Flags: needinfo?(docfaraday)
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #4) > Isn't that just a symptom of the harness force-killing the application after > the timeout? I'm the wrong person to ask about that. Not sure who would be...
Flags: needinfo?(docfaraday)
On non-Windows platform it's SIGABRT, but on Win it's this: https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/testing/mochitest/automation.py#373 Which leads us to this here: https://dxr.mozilla.org/mozilla-central/source/build/win32/crashinjectdll/crashinjectdll.cpp#16 This does not look like what I would expect to see from the code above: 04:57:52 INFO - Crash reason: EXCEPTION_BREAKPOINT 04:57:52 INFO - Crash address: 0x69f40a6c 04:57:52 INFO - Assertion: Unknown assertion type 0x00000000 04:57:52 INFO - Process uptime: 140 seconds 04:57:52 INFO - Thread 0 (crashed) I think the most interesting log part is this: 04:52:07 INFO - (ice/NOTICE) ICE(PC:1475322726874000 (id=2147483936 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_verifyVideoAfter): peer (PC:1475322726874000 (id=2147483936 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_verifyVideoAfter:default) is now checking 04:52:07 INFO - (ice/INFO) ICE-PEER(PC:1475322726876000 (id=21474[Parent 1596] WARNING: pipe error: 232: file c:/builds/moz2_slave/autoland-w32-pgo-0000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 513 04:52:07 INFO - ###!!! [Parent][MessageChannel] Error: (msgtype=0x300093,name=PBrowser::Msg_UpdateNativeWindowHandle) Channel error: cannot send/recv 04:52:07 INFO - ###!!! [Parent][MessageChannel] Error: (msgtype=0x300084,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv Not sure if that is the reason for the STS crash, or if IPC is unhappy because the STS thread dis-appeared.
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #4) > Isn't that just a symptom of the harness force-killing the application after > the timeout? Yes. The problem here is a timeout.
(In reply to Nils Ohlmeier [:drno] from comment #6) > On non-Windows platform it's SIGABRT, but on Win it's this: > https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/ > testing/mochitest/automation.py#373 > Which leads us to this here: > https://dxr.mozilla.org/mozilla-central/source/build/win32/crashinjectdll/ > crashinjectdll.cpp#16 > > This does not look like what I would expect to see from the code above: > > 04:57:52 INFO - Crash reason: EXCEPTION_BREAKPOINT > 04:57:52 INFO - Crash address: 0x69f40a6c > 04:57:52 INFO - Assertion: Unknown assertion type 0x00000000 > 04:57:52 INFO - Process uptime: 140 seconds > 04:57:52 INFO - Thread 0 (crashed) > > I think the most interesting log part is this: > > 04:52:07 INFO - (ice/NOTICE) ICE(PC:1475322726874000 (id=2147483936 > url=http://mochi.test:8888/tests/dom/media/tests/mochitest/ > test_peerConnection_verifyVideoAfter): peer (PC:1475322726874000 > (id=2147483936 > url=http://mochi.test:8888/tests/dom/media/tests/mochitest/ > test_peerConnection_verifyVideoAfter:default) is now checking > 04:52:07 INFO - (ice/INFO) ICE-PEER(PC:1475322726876000 > (id=21474[Parent 1596] WARNING: pipe error: 232: file > c:/builds/moz2_slave/autoland-w32-pgo-0000000000000/build/src/ipc/chromium/ > src/chrome/common/ipc_channel_win.cc, line 513 > 04:52:07 INFO - ###!!! [Parent][MessageChannel] Error: > (msgtype=0x300093,name=PBrowser::Msg_UpdateNativeWindowHandle) Channel > error: cannot send/recv > 04:52:07 INFO - ###!!! [Parent][MessageChannel] Error: > (msgtype=0x300084,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv > > Not sure if that is the reason for the STS crash, or if IPC is unhappy > because the STS thread dis-appeared. Actually, the order of these log lines doesn't make sense. We have the nICEr logging before any of the mochitest logging. This must mean the mochitest logging is being batched up separately, and it might also mean those e10s messages happened later in the test too. What I think we're seeing here is the result of the IPC pipe between parent and content falling down. That would definitely cause timeouts without any evidence of deadlock.
(In reply to Byron Campen [:bwc] from comment #8) > Actually, the order of these log lines doesn't make sense. We have the nICEr > logging before any of the mochitest logging. This must mean the mochitest > logging is being batched up separately, and it might also mean those e10s > messages happened later in the test too. That's how it works, yeah. I think you can get things in the right order if the test does SimpleTest.requestCompleteLog() - since it doesn't have to think about snipping stuff.
SimpleTest.requestCompleteLog() turns off the buffering of mochitest log messages and as a result the messages get dumped to stdout/stderr (not sure which) in real time. Although (obviously) that still does not guarantee that messages appear in the actual order as things get logged at/from different places. nICEr logs to stderr, which gets to the logging right away. According to this: https://dxr.mozilla.org/mozilla-central/rev/ea104eeb14cc54da9a06c3766da63f73117723a0/ipc/glue/MessageChannel.cpp#588 the IPC also writes its error message to stderr. And then the mochitest messages only get dumped by SimpleTest once the test timeout has occurred. So I agree with Byron this looks like an IPC failure in the middle of the test. Which then explains why it times out. And it fails when trying to send this: https://dxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#872 which does not really give me a clue of what is going on here. I guess it's time to call in some IPC experts. Jesup do you know whom we could NI for this?
Flags: needinfo?(rjesup)
(In reply to Nils Ohlmeier [:drno] from comment #10) ... > And it fails when trying to send this: > https://dxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#872 > which does not really give me a clue of what is going on here. > > I guess it's time to call in some IPC experts. Jesup do you know whom we > could NI for this? let's try billm. Bill?
Flags: needinfo?(rjesup) → needinfo?(wmccloskey)
No longer blocks: 1308246
No longer blocks: 1307696
No longer blocks: 1306884
There we go. And it's supposedly fixed.
Blocks: 1218576
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(wmccloskey)
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.