Closed Bug 821884 Opened 12 years ago Closed 12 years ago

When WebRTC mochitests are run Firefox quit with: "Shutdown | Exited with code 11|-11|-1073741819 during test run"

Categories

(Core :: WebRTC, defect, P3)

x86
All
defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: whimboo, Assigned: abr)

References

Details

(Keywords: intermittent-failure, Whiteboard: [WebRTC][retest on m-c][blocking-webrtc-])

Firefox exists with an error code of -11 for Linux64 debug and OS X 10.8 debug. This only happens when the WebRTC mochitests are run:

https://tbpl.mozilla.org/php/getParsedLog.php?id=17951050&tree=Alder

WARNING: not an nsIRDFRemoteDataSource: 'remote != nullptr', file ../../../../rdf/datasource/src/nsLocalStore.cpp, line 279
WARNING: NS_ENSURE_TRUE(mMainThread) failed: file ../../../xpcom/threads/nsThreadManager.cpp, line 259
WARNING: NS_ENSURE_TRUE(thread) failed: file ../../../../netwerk/base/src/nsSocketTransportService2.cpp, line 115
WARNING: Leaking the RDF Service.: file ../../../rdf/build/nsRDFModule.cpp, line 165
WARNING: NS_ENSURE_TRUE(compMgr) failed: file nsComponentManagerUtils.cpp, line 58
WARNING: NS_ENSURE_TRUE(mTextInputHandler) failed: file ../../../widget/cocoa/nsChildView.mm, line 3864
TEST-UNEXPECTED-FAIL | Shutdown | Exited with code -11 during test run
INFO | automation.py | Application ran for: 0:18:06.522218
INFO | automation.py | Reading PID log: /var/folders/3n/n3hy7ypj7vv84_r7xppycg7h00000w/T/tmpKqddc4pidlog

INFO | runtests.py | Running tests: end.
program finished with exit code 245
elapsedTime=1093.999389
TinderboxPrint: mochitest-plain-3<br/>11455/0/223
Unknown Error: command finished with exit code: 245
========= Finished 'python mochitest/runtests.py ...' warnings (results: 1, elapsed: 18 mins, 14 secs) (at 2012-12-14 12:11:59.377724) =========

Not sure if that means we crash on shutdown. Ted, do you know?
Windows platforms report a similar behavior but a different exit code. I would say both do belong to each other.

Exited with code -1073741819 during test run

TEST-UNEXPECTED-FAIL | Shutdown | Exited with code -1073741819 during test run
INFO | automation.py | Application ran for: 0:29:59.657000
INFO | automation.py | Reading PID log: c:\docume~1\cltbld\locals~1\temp\tmplo92bcpidlog
==> process 3064 launched child process 3676
==> process 3064 launched child process 1864
==> process 3064 launched child process 3512
==> process 3064 launched child process 3896
==> process 3064 launched child process 1836
==> process 3064 launched child process 1180
INFO | automation.py | Checking for orphan process with PID: 3676
INFO | automation.py | Checking for orphan process with PID: 1864
INFO | automation.py | Checking for orphan process with PID: 3512
INFO | automation.py | Checking for orphan process with PID: 3896
INFO | automation.py | Checking for orphan process with PID: 1836
INFO | automation.py | Checking for orphan process with PID: 1180
SUCCESS: The process with PID 3056 has been terminated.
SUCCESS: The process with PID 564 has been terminated.
SUCCESS: The process with PID 1176 has been terminated.

INFO | runtests.py | Running tests: end.
program finished with exit code -1073741819
elapsedTime=1807.546000
TinderboxPrint: mochitest-plain-2<br/>220373/0/18398
Unknown Error: command finished with exit code: -1073741819
========= Finished 'python mochitest/runtests.py ...' warnings (results: 1, elapsed: 30 mins, 16 secs) (at 2012-12-14 11:49:54.228754) =========
Summary: When WebRTC mochitests are run Firefox quit with: "Shutdown | Exited with code -11 during test run" → When WebRTC mochitests are run Firefox quit with: "Shutdown | Exited with code -11 during test run" or "Exited with code -1073741819 during test run"
TBPL won't see this without the keyword "intermittent-failure".
Assignee: nobody → adam
Priority: -- → P1
Whiteboard: [WebRTC][automation-blocked] → [WebRTC][automation-blocked][blocking-webrtc+]
I'm going to back-burner this bug until Bug 824851 and Bug 824359 are resolved. Currently, attempts to run mochi tests (at least on my machine) end up crashing about 50% to 75% of the time with backtraces pointing to those two bugs.

If the issue described in this bug still exists, it appears to be impossible (from a practical perspective) to reproduce until the mochi tests are otherwise mostly non-crashing.
Depends on: 824851, 824359
Status: NEW → ASSIGNED
https://tbpl.mozilla.org/php/getParsedLog.php?id=18338067&tree=Alder:

88 INFO TEST-PASS | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideoCombined.html | Remote video stream for remote peer is accessible
89 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideoCombined.html | For now simply disconnect. We will add checks for media in a follow-up bug
TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_peerConnection_basicAudioVideoCombined.html | Exited with code 11 during test run
Summary: When WebRTC mochitests are run Firefox quit with: "Shutdown | Exited with code -11 during test run" or "Exited with code -1073741819 during test run" → When WebRTC mochitests are run Firefox quit with: "Shutdown | Exited with code 11|-11|-1073741819 during test run"
Blocks: 796463
This bug remains elusive. To date, I have taken the following steps:

- I've run the peerconnection mochi tests in a tight loop for several hundred iterations (exact number unknown, as the gdb counter I was using appears to roll over at 256), under OS X 10.8 debug (as indicated in comment 1).

- I've repeatedly run a full suite of mochi tests on a local machine to ensure that prior tests aren't leaving behind an environment that triggers the crash.

- I've pushed to try on alder to try to get the try servers to replicate.

- Based on the log from Comment 4, I've read through substantial portions of the WebRTC and libvpx code (including quite a bit of analysis in the debugger) to ensure that the VP8 encoder can't be deallocated while still in use. Everything I've read so far seems to provide good synchronization between lifetime management and usage of this codec.

===

In practice, I suspect that the attached logs represent two different problems, and I am fairly comfortable that the first one no longer exists.

- In the first log (comment 0), the WebRTC tests themselves don't show any anomalies, and other mochi tests run both before and after them. The process also appears to exit relatively gracefully, except for the offending error code. While this doesn't necessarily rule out WebRTC functionality as the root, it does point to this possibly being in another part of the tree, which might have been fixed already (especially since it failed on four platforms for this try -- which would imply a very high repeatability -- and my repeated attempts to get it to fail have been unsuccessful). 

- The full log for comment 1 (https://tbpl.mozilla.org/php/getParsedLog.php?id=17950337&tree=Alder&full=1) shows a nearly identical pattern. I concur that it is likely to the same problem as described in comment 0.

- The log linked from comment 4, which is what I've been doing most of my analysis based on, involves an actual crash when the VP8 encoder attempts to write an error message into the log. This points to potential memory corruption; which, in theory, could be coming from anywhere. I've done a debugger-assisted code review to check for races involving the implicated objects, and haven't found any unsafe behavior.

I'm not convinced that the second problem is gone (although I'm at a loss as to how to replicate it), so I would suggest leaving this bug open for at least a while after we have turned the WebRTC mochi tests on for m-c. Due to the highly elusive nature of this bug, however (and after consulting with mreavy), I have decided to remove this issue from the webrtc blocker list and the corresponding umbrella tracker bug (bug 796463). I'm also re-prioritizing it at P3.

I'm happy to revisit these changes to this bug if any new information becomes available.
No longer blocks: 796463
Priority: P1 → P3
Whiteboard: [WebRTC][automation-blocked][blocking-webrtc+] → [WebRTC][automation-blocked][blocking-webrtc-]
I'm going to mark qawanted and mark for retest when we get this on m-c.
Keywords: qawanted
Whiteboard: [WebRTC][automation-blocked][blocking-webrtc-] → [WebRTC][retest on m-c][blocking-webrtc-]
Actually, let's just reopen if this reproduces while these tests on m-c. Sounds like we can't do anything actionable at this point, so I'm closing as incomplete.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Keywords: qawanted
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.