Intermittent TEST-UNEXPECTED-TIMEOUT | dom/media/tests/mochitest/test_peerConnection_verifyVideoAfterRenegotiation.html | application timed out after 330 seconds with no output

RESOLVED DUPLICATE of bug 1304519

Status

()

Core
WebRTC
P3
normal
Rank:
35
RESOLVED DUPLICATE of bug 1304519
2 years ago
2 years ago

People

(Reporter: Treeherder Bug Filer, Unassigned)

Tracking

({intermittent-failure})

Firefox Tracking Flags

(Not tracked)

Details

Same thing, do you know what's up with that? I see the following:

> TEST-FAIL | dom/media/tests/mochitest/test_peerConnection_verifyVideoAfterRenegotiation.html | ICE gathering state has reached completed

but I don't know if it's relevant.
Flags: needinfo?(pehrson)
I think that's irrelevant.

It's interesting that this is a `application timed out after 330 seconds with no output` failure though. That would indicate a deadlock or something similar.

This is the interesting bit:

04:57:52     INFO -  Thread 17
04:57:52     INFO -   0  ntdll.dll!KiFastSystemCallRet + 0x0
04:57:52     INFO -      eip = 0x77cd70b4   esp = 0x07c5f5b0   ebp = 0x07c5f614   ebx = 0x00000000
04:57:52     INFO -      esi = 0x01000044   edi = 0x00000000   eax = 0x00b09f18   ecx = 0x07c5f610
04:57:52     INFO -      edx = 0x00000000   efl = 0x00000213
04:57:52     INFO -      Found by: given as instruction pointer in context
04:57:52     INFO -   1  ntdll.dll!NtWaitForSingleObject + 0xc
04:57:52     INFO -      eip = 0x77cd6a24   esp = 0x07c5f5b4   ebp = 0x07c5f614
04:57:52     INFO -      Found by: call frame info
04:57:52     INFO -   2  ntdll.dll!RtlpWaitOnCriticalSection + 0xc4
04:57:52     INFO -      eip = 0x77cc2264   esp = 0x07c5f5b8   ebp = 0x07c5f614
04:57:52     INFO -      Found by: call frame info
04:57:52     INFO -   3  ntdll.dll!EtwEventEnabled + 0xd9
04:57:52     INFO -      eip = 0x77cc2148   esp = 0x07c5f61c   ebp = 0x07c5f63c   ebx = 0x00000020
04:57:52     INFO -      Found by: call frame info
04:57:52     INFO -   4  mozglue.dll!moz_xmalloc [mozalloc.cpp:aa4fe3cf49e0 : 83 + 0xaa]
04:57:52     INFO -      eip = 0x69f32cb0   esp = 0x07c5f644   ebp = 0x07c5f660   ebx = 0x00000001
04:57:52     INFO -      Found by: call frame info
04:57:52     INFO -   5  xul.dll!mozilla::NewRunnableMethod<mozilla::net::nsSocketTransportService *,void ( mozilla::net::nsSocketTransportService::*)(void)>(mozilla::net::nsSocketTransportService *,void ( mozilla::net::nsSocketTransportService::*)(void)) [nsThreadUtils.h:aa4fe3cf49e0 : 793 + 0x6]
04:57:52     INFO -      eip = 0x617cf955   esp = 0x07c5f668   ebp = 0x07c5f674
04:57:52     INFO -      Found by: call frame info
04:57:52     INFO -   6  xul.dll!mozilla::net::nsSocketTransportService::Run() [nsSocketTransportService2.cpp:aa4fe3cf49e0 : 918 + 0x15]
04:57:52     INFO -      eip = 0x6152c283   esp = 0x07c5f67c   ebp = 0x07c5f844
04:57:52     INFO -      Found by: call frame info


I'm forwarding this to Byron.
Flags: needinfo?(pehrson) → needinfo?(docfaraday)

Updated

2 years ago
Rank: 35
Priority: -- → P3

Comment 3

2 years ago
An assert in the memory allocator. Maybe OOM?
Flags: needinfo?(docfaraday)
Isn't that just a symptom of the harness force-killing the application after the timeout?
Flags: needinfo?(docfaraday)

Comment 5

2 years ago
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #4)
> Isn't that just a symptom of the harness force-killing the application after
> the timeout?

I'm the wrong person to ask about that. Not sure who would be...
Flags: needinfo?(docfaraday)
On non-Windows platform it's SIGABRT, but on Win it's this:
https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/testing/mochitest/automation.py#373
Which leads us to this here:
https://dxr.mozilla.org/mozilla-central/source/build/win32/crashinjectdll/crashinjectdll.cpp#16

This does not look like what I would expect to see from the code above:

04:57:52     INFO -  Crash reason:  EXCEPTION_BREAKPOINT
04:57:52     INFO -  Crash address: 0x69f40a6c
04:57:52     INFO -  Assertion: Unknown assertion type 0x00000000
04:57:52     INFO -  Process uptime: 140 seconds
04:57:52     INFO -  Thread 0 (crashed)

I think the most interesting log part is this:

04:52:07     INFO -  (ice/NOTICE) ICE(PC:1475322726874000 (id=2147483936 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_verifyVideoAfter): peer (PC:1475322726874000 (id=2147483936 url=http://mochi.test:8888/tests/dom/media/tests/mochitest/test_peerConnection_verifyVideoAfter:default) is now checking
04:52:07     INFO -  (ice/INFO) ICE-PEER(PC:1475322726876000 (id=21474[Parent 1596] WARNING: pipe error: 232: file c:/builds/moz2_slave/autoland-w32-pgo-0000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 513
04:52:07     INFO -  ###!!! [Parent][MessageChannel] Error: (msgtype=0x300093,name=PBrowser::Msg_UpdateNativeWindowHandle) Channel error: cannot send/recv
04:52:07     INFO -  ###!!! [Parent][MessageChannel] Error: (msgtype=0x300084,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv

Not sure if that is the reason for the STS crash, or if IPC is unhappy because the STS thread dis-appeared.
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #4)
> Isn't that just a symptom of the harness force-killing the application after
> the timeout?

Yes.  The problem here is a timeout.

Comment 8

2 years ago
(In reply to Nils Ohlmeier [:drno] from comment #6)
> On non-Windows platform it's SIGABRT, but on Win it's this:
> https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/
> testing/mochitest/automation.py#373
> Which leads us to this here:
> https://dxr.mozilla.org/mozilla-central/source/build/win32/crashinjectdll/
> crashinjectdll.cpp#16
> 
> This does not look like what I would expect to see from the code above:
> 
> 04:57:52     INFO -  Crash reason:  EXCEPTION_BREAKPOINT
> 04:57:52     INFO -  Crash address: 0x69f40a6c
> 04:57:52     INFO -  Assertion: Unknown assertion type 0x00000000
> 04:57:52     INFO -  Process uptime: 140 seconds
> 04:57:52     INFO -  Thread 0 (crashed)
> 
> I think the most interesting log part is this:
> 
> 04:52:07     INFO -  (ice/NOTICE) ICE(PC:1475322726874000 (id=2147483936
> url=http://mochi.test:8888/tests/dom/media/tests/mochitest/
> test_peerConnection_verifyVideoAfter): peer (PC:1475322726874000
> (id=2147483936
> url=http://mochi.test:8888/tests/dom/media/tests/mochitest/
> test_peerConnection_verifyVideoAfter:default) is now checking
> 04:52:07     INFO -  (ice/INFO) ICE-PEER(PC:1475322726876000
> (id=21474[Parent 1596] WARNING: pipe error: 232: file
> c:/builds/moz2_slave/autoland-w32-pgo-0000000000000/build/src/ipc/chromium/
> src/chrome/common/ipc_channel_win.cc, line 513
> 04:52:07     INFO -  ###!!! [Parent][MessageChannel] Error:
> (msgtype=0x300093,name=PBrowser::Msg_UpdateNativeWindowHandle) Channel
> error: cannot send/recv
> 04:52:07     INFO -  ###!!! [Parent][MessageChannel] Error:
> (msgtype=0x300084,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
> 
> Not sure if that is the reason for the STS crash, or if IPC is unhappy
> because the STS thread dis-appeared.

Actually, the order of these log lines doesn't make sense. We have the nICEr logging before any of the mochitest logging. This must mean the mochitest logging is being batched up separately, and it might also mean those e10s messages happened later in the test too.

What I think we're seeing here is the result of the IPC pipe between parent and content falling down. That would definitely cause timeouts without any evidence of deadlock.
(In reply to Byron Campen [:bwc] from comment #8)
> Actually, the order of these log lines doesn't make sense. We have the nICEr
> logging before any of the mochitest logging. This must mean the mochitest
> logging is being batched up separately, and it might also mean those e10s
> messages happened later in the test too.

That's how it works, yeah. I think you can get things in the right order if the test does SimpleTest.requestCompleteLog() - since it doesn't have to think about snipping stuff.
SimpleTest.requestCompleteLog() turns off the buffering of mochitest log messages and as a result the messages get dumped to stdout/stderr (not sure which) in real time. Although (obviously) that still does not guarantee that messages appear in the actual order as things get logged at/from different places.

nICEr logs to stderr, which gets to the logging right away. According to this:
https://dxr.mozilla.org/mozilla-central/rev/ea104eeb14cc54da9a06c3766da63f73117723a0/ipc/glue/MessageChannel.cpp#588
the IPC also writes its error message to stderr. And then the mochitest messages only get dumped by SimpleTest once the test timeout has occurred.

So I agree with Byron this looks like an IPC failure in the middle of the test. Which then explains why it times out.

And it fails when trying to send this:
https://dxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#872
which does not really give me a clue of what is going on here.

I guess it's time to call in some IPC experts. Jesup do you know whom we could NI for this?
Flags: needinfo?(rjesup)
(In reply to Nils Ohlmeier [:drno] from comment #10)
...
> And it fails when trying to send this:
> https://dxr.mozilla.org/mozilla-central/source/dom/ipc/PBrowser.ipdl#872
> which does not really give me a clue of what is going on here.
> 
> I guess it's time to call in some IPC experts. Jesup do you know whom we
> could NI for this?

let's try billm.  Bill?
Flags: needinfo?(rjesup) → needinfo?(wmccloskey)
Blocks: 1306884
Blocks: 1307696
Blocks: 1308246
No longer blocks: 1308246
No longer blocks: 1307696
No longer blocks: 1306884
There we go. And it's supposedly fixed.
Blocks: 1218576
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Flags: needinfo?(wmccloskey)
Resolution: --- → DUPLICATE
Duplicate of bug: 1304519
You need to log in before you can comment on or make changes to this bug.