Closed Bug 1073310 Opened 10 years ago Closed 8 years ago

intermittent failure in test_peerConnection_basicH264Video when we don't crash in shutdown

Categories

(Core :: WebRTC, defect, P4)

x86
Windows XP
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mccr8, Unassigned)

References

Details

With the patch in bug 1035454 (which prevents us from crashing in a child process at the time of the xpcom-shutdown message) applied I'm getting a fairly frequent intermittent Mochitest-3 failure, only on Windows XP.

19:36:50     INFO -  285 INFO TEST-PASS | /tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | PeerConnectionWrapper (pcRemote): legal ICE state transition from new to checking
19:36:50     INFO -  286 INFO PeerConnectionWrapper (pcRemote): 'onaddstream' event fired for {}
19:36:50     INFO -  287 INFO Got media stream: video (remote)
19:36:50     INFO -  288 INFO canplaythrough fired for media element pcRemote_remote_video
19:36:50     INFO -  289 INFO timeupdate fired for media element pcRemote_remote_video
19:36:50     INFO -  290 INFO time passed for media element pcRemote_remote_video
19:36:50     INFO -  291 INFO TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | Unexpected callback for 'INTERNAL_ERROR' with message = 'Cannot start media channels cause = OK' at ["PCW_setLocalDescription@http://mochi.test:8888/tests/dom/media/tests/mochitest/pc.js:1841:1","PCT_setLocalDescription@http://mochi.test:8888/tests/dom/media/tests/mochitest/pc.js:746:3","commandsPeerConnection<@http://mochi.test:8888/tests/dom/media/tests/mochitest/templates.js:305:1","_executeNext@http://mochi.test:8888/tests/dom/media/tests/mochitest/pc.js:103:9",""] - expected PASS
19:36:50     INFO -  292 INFO TEST-OK | /tests/dom/media/tests/mochitest/test_peerConnection_basicH264Video.html | took 34546ms

I'd assume this failure means that somehow we're getting a message from the child process we don't expect, but it seems odd to me that this could happen after xpcom-shutdown has happened, as I thought we were deep enough in shutdown that no more messages can be sent from the child, so I don't know how the parent would even observe there is badness, assuming this is something happening in the parent.
Do you have any idea what that error might mean, jib, even in just vague terms?  Thanks.
Flags: needinfo?(jib)
The error message comes from here:
http://dxr.mozilla.org/mozilla-central/source/media/webrtc/signaling/src/sipcc/core/gsm/fsmdef.c#3172

Which looks like the code believes the plugin claims to have no resources to en-/decode the stream.

But how do you got to the conclusion "xpcom-shutdown has happened"? The error message popped up even before the transport layer (ICE) for the call got established, so relatively early in the test. "xpcom-shutdown" sounds to me like you are assuming this is at the end of the test, or is that a misinterpretation from my side?
Well, the only thing the patch from bug 1035454 does is stop us from crashing during xpcom-shutdown in child processes, so my theory was that whatever is going awry must be happening during xpcom-shutdown or later, but maybe not.
I agree with Nils. Looks like the peerConnection is closed two seconds later in https://tbpl.mozilla.org/php/getParsedLog.php?id=48431702&tree=Try#error0
Flags: needinfo?(jib)
I did a bunch of bisection using Try, and it looks like something is going awry in the component manager shutdown ( nsComponentManagerImpl::gComponentManager)->Shutdown() ).

good: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=1962460370e2
bad: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=598fbe339e91
I've worked around this for now by inserting an exit(0) in ShutdownXPCOM on Windows, in bug 1035454, but this will need to be fixed to get shutdown leak logging on Windows in child processes.
Blocks: 1051230
No longer blocks: 1035454
Blocks: 1091917
No longer blocks: 1051230
backlog: --- → webRTC+
Rank: 45
Priority: -- → P4
I'm removing the early exit in bug 1242119. This intermittent failure seems to have gone away at some point since this was filed in 2014. I'll close this for now.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.