Closed
Bug 1306856
Opened 9 years ago
Closed 9 years ago
Intermittent TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_verifyVideoAfterRenegotiation.html | application timed out after 330 seconds with no output
Categories
(Core :: WebRTC, defect, P3)
Core
WebRTC
Tracking
()
RESOLVED
DUPLICATE
of bug 1304519
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure)
Comment 1•9 years ago
|
||
Same thing, do you know what's up with that? I see the following:
> TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_verifyVideoAfterRenegotiation.html | ICE gathering state has reached completed
but I don't know if it's relevant.
Flags: needinfo?(pehrson)
Comment 2•9 years ago
|
||
I think that's irrelevant.
It's interesting that this is a `application timed out after 330 seconds with no output` failure though. That would indicate a deadlock or something similar.
This is the interesting bit:
04:57:52 INFO - Thread 17
04:57:52 INFO - 0 ntdll.dll!KiFastSystemCallRet + 0x0
04:57:52 INFO - eip = 0x77cd70b4 esp = 0x07c5f5b0 ebp = 0x07c5f614 ebx = 0x00000000
04:57:52 INFO - esi = 0x01000044 edi = 0x00000000 eax = 0x00b09f18 ecx = 0x07c5f610
04:57:52 INFO - edx = 0x00000000 efl = 0x00000213
04:57:52 INFO - Found by: given as instruction pointer in context
04:57:52 INFO - 1 ntdll.dll!NtWaitForSingleObject + 0xc
04:57:52 INFO - eip = 0x77cd6a24 esp = 0x07c5f5b4 ebp = 0x07c5f614
04:57:52 INFO - Found by: call frame info
04:57:52 INFO - 2 ntdll.dll!RtlpWaitOnCriticalSection + 0xc4
04:57:52 INFO - eip = 0x77cc2264 esp = 0x07c5f5b8 ebp = 0x07c5f614
04:57:52 INFO - Found by: call frame info
04:57:52 INFO - 3 ntdll.dll!EtwEventEnabled + 0xd9
04:57:52 INFO - eip = 0x77cc2148 esp = 0x07c5f61c ebp = 0x07c5f63c ebx = 0x00000020
04:57:52 INFO - Found by: call frame info
04:57:52 INFO - 4 mozglue.dll!moz_xmalloc [mozalloc.cpp:aa4fe3cf49e0 : 83 + 0xaa]
04:57:52 INFO - eip = 0x69f32cb0 esp = 0x07c5f644 ebp = 0x07c5f660 ebx = 0x00000001
04:57:52 INFO - Found by: call frame info
04:57:52 INFO - 5 xul.dll!mozilla::NewRunnableMethod<mozilla::net::nsSocketTransportService *,void ( mozilla::net::nsSocketTransportService::*)(void)>(mozilla::net::nsSocketTransportService *,void ( mozilla::net::nsSocketTransportService::*)(void)) [nsThreadUtils.h:aa4fe3cf49e0 : 793 + 0x6]
04:57:52 INFO - eip = 0x617cf955 esp = 0x07c5f668 ebp = 0x07c5f674
04:57:52 INFO - Found by: call frame info
04:57:52 INFO - 6 xul.dll!mozilla::net::nsSocketTransportService::Run() [nsSocketTransportService2.cpp:aa4fe3cf49e0 : 918 + 0x15]
04:57:52 INFO - eip = 0x6152c283 esp = 0x07c5f67c ebp = 0x07c5f844
04:57:52 INFO - Found by: call frame info
I'm forwarding this to Byron.
Flags: needinfo?(pehrson) → needinfo?(docfaraday)
Updated•9 years ago
|
Rank: 35
Priority: -- → P3
Comment 4•9 years ago
|
||
Isn't that just a symptom of the harness force-killing the application after the timeout?
Flags: needinfo?(docfaraday)
Comment 5•9 years ago
|
||
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #4)
> Isn't that just a symptom of the harness force-killing the application after
> the timeout?
I'm the wrong person to ask about that. Not sure who would be...
Flags: needinfo?(docfaraday)
Comment 6•9 years ago
|
||
On non-Windows platform it's SIGABRT, but on Win it's this:
https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/testing/mochitest/automation.py#373
Which leads us to this here:
https://dxr.mozilla.org/mozilla-central/source/build/win32/crashinjectdll/crashinjectdll.cpp#16
This does not look like what I would expect to see from the code above:
04:57:52 INFO - Crash reason: EXCEPTION_BREAKPOINT
04:57:52 INFO - Crash address: 0x69f40a6c
04:57:52 INFO - Assertion: Unknown assertion type 0x00000000
04:57:52 INFO - Process uptime: 140 seconds
04:57:52 INFO - Thread 0 (crashed)
I think the most interesting log part is this:
04:52:07 INFO - (ice/NOTICE) ICE(PC:1475322726874000 (id=2147483936 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_verifyVideoAfter): peer (PC:1475322726874000 (id=2147483936 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_verifyVideoAfter:default) is now checking
04:52:07 INFO - (ice/INFO) ICE-PEER(PC:1475322726876000 (id=21474[Parent 1596] WARNING: pipe error: 232: file c:/builds/moz2_slave/autoland-w32-pgo-0000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 513
04:52:07 INFO - ###!!! [Parent][MessageChannel] Error: (msgtype=0x300093,name=PBrowser::Msg_UpdateNativeWindowHandle) Channel error: cannot send/recv
04:52:07 INFO - ###!!! [Parent][MessageChannel] Error: (msgtype=0x300084,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
Not sure if that is the reason for the STS crash, or if IPC is unhappy because the STS thread dis-appeared.
Comment 7•9 years ago
|
||
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #4)
> Isn't that just a symptom of the harness force-killing the application after
> the timeout?
Yes. The problem here is a timeout.
Comment 8•9 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #6)
> On non-Windows platform it's SIGABRT, but on Win it's this:
> https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/
> testing/mochitest/automation.py#373
> Which leads us to this here:
> https://dxr.mozilla.org/mozilla-central/source/build/win32/crashinjectdll/
> crashinjectdll.cpp#16
>
> This does not look like what I would expect to see from the code above:
>
> 04:57:52 INFO - Crash reason: EXCEPTION_BREAKPOINT
> 04:57:52 INFO - Crash address: 0x69f40a6c
> 04:57:52 INFO - Assertion: Unknown assertion type 0x00000000
> 04:57:52 INFO - Process uptime: 140 seconds
> 04:57:52 INFO - Thread 0 (crashed)
>
> I think the most interesting log part is this:
>
> 04:52:07 INFO - (ice/NOTICE) ICE(PC:1475322726874000 (id=2147483936
> url=http://mochi.test:8888/tests/dom/media/tests/mochitest/
> test_peerConnection_verifyVideoAfter): peer (PC:1475322726874000
> (id=2147483936
> url=http://mochi.test:8888/tests/dom/media/tests/mochitest/
> test_peerConnection_verifyVideoAfter:default) is now checking
> 04:52:07 INFO - (ice/INFO) ICE-PEER(PC:1475322726876000
> (id=21474[Parent 1596] WARNING: pipe error: 232: file
> c:/builds/moz2_slave/autoland-w32-pgo-0000000000000/build/src/ipc/chromium/
> src/chrome/common/ipc_channel_win.cc, line 513
> 04:52:07 INFO - ###!!! [Parent][MessageChannel] Error:
> (msgtype=0x300093,name=PBrowser::Msg_UpdateNativeWindowHandle) Channel
> error: cannot send/recv
> 04:52:07 INFO - ###!!! [Parent][MessageChannel] Error:
> (msgtype=0x300084,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
>
> Not sure if that is the reason for the STS crash, or if IPC is unhappy
> because the STS thread dis-appeared.
Actually, the order of these log lines doesn't make sense. We have the nICEr logging before any of the mochitest logging. This must mean the mochitest logging is being batched up separately, and it might also mean those e10s messages happened later in the test too.
What I think we're seeing here is the result of the IPC pipe between parent and content falling down. That would definitely cause timeouts without any evidence of deadlock.
Comment 9•9 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #8)
> Actually, the order of these log lines doesn't make sense. We have the nICEr
> logging before any of the mochitest logging. This must mean the mochitest
> logging is being batched up separately, and it might also mean those e10s
> messages happened later in the test too.
That's how it works, yeah. I think you can get things in the right order if the test does SimpleTest.requestCompleteLog() - since it doesn't have to think about snipping stuff.
Comment 10•9 years ago
|
||
SimpleTest.requestCompleteLog() turns off the buffering of mochitest log messages and as a result the messages get dumped to stdout/stderr (not sure which) in real time. Although (obviously) that still does not guarantee that messages appear in the actual order as things get logged at/from different places.
nICEr logs to stderr, which gets to the logging right away. According to this:
https://dxr.mozilla.org/mozilla-central/rev/ea104eeb14cc54da9a06c3766da63f73117723a0/ipc/glue/MessageChannel.cpp#588
the IPC also writes its error message to stderr. And then the mochitest messages only get dumped by SimpleTest once the test timeout has occurred.
So I agree with Byron this looks like an IPC failure in the middle of the test. Which then explains why it times out.
And it fails when trying to send this:
https://dxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#872
which does not really give me a clue of what is going on here.
I guess it's time to call in some IPC experts. Jesup do you know whom we could NI for this?
Flags: needinfo?(rjesup)
Comment 11•9 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #10)
...
> And it fails when trying to send this:
> https://dxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#872
> which does not really give me a clue of what is going on here.
>
> I guess it's time to call in some IPC experts. Jesup do you know whom we
> could NI for this?
let's try billm. Bill?
Flags: needinfo?(rjesup) → needinfo?(wmccloskey)
Comment 12•9 years ago
|
||
There we go. And it's supposedly fixed.
Blocks: 1218576
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(wmccloskey)
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•