Closed
Bug 885640
Opened 12 years ago
Closed 12 years ago
sdp_unittests has a shutdown race resulting in PROCESS-CRASH | sdp_unittests | application crashed [@ mozalloc_abort(char const*)]
Categories
(Core :: WebRTC: Signaling, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla25
People
(Reporter: abr, Assigned: abr)
Details
(Keywords: intermittent-failure, Whiteboard: [WebRTC] [blocking-webrtc-])
Attachments
(2 files)
3.24 KB,
text/plain
|
Details | |
815 bytes,
patch
|
ekr
:
review+
bajaj
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
There appears to be some kind of shutdown race that can occur with sdp_unittests, resulting in either an abort or deadlock.
See the Fedora debug and Fedora64 debug build failures here:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=9695f620df74&onlyunstarred=1
In both cases, things went wrong during shut-down. One hangs and is killed after 300 seconds. The other aborts with a stack indicating a call chain of PR_Lock -> PR_Assert. The failed assertion is:
Assertion failure: 0 == rv, at nsprpub/pr/src/pthreads/ptsynch.c:175
Which, in this case, means that a call to "pthread_mutex_lock(&lock->mutex);" failed. The most likely culprit is that the underlying memory is no longer a valid mutex structure.
Assignee | ||
Comment 1•12 years ago
|
||
Attaching stack trace at failure point, so that it outlives the tbpl logs.
Assignee | ||
Updated•12 years ago
|
Attachment #765781 -
Attachment description: Stack Trace (and some diagnostics) for Intermittent Assert → Stack Trace at failure point (during shutdown)
Comment 2•12 years ago
|
||
Keywords: intermittent-failure
Summary: sdp_unittests has a shutdown race → sdp_unittests has a shutdown race resulting in PROCESS-CRASH | sdp_unittests | application crashed [@ mozalloc_abort(char const*)]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 4•12 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 10•12 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 12•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
Priority: P4 → P1
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 25•12 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 34•12 years ago
|
||
Assignee | ||
Comment 35•12 years ago
|
||
Comment on attachment 769715 [details] [diff] [review]
Clean up PeerConnectionCtx before exiting
I actually haven't managed to chase this problem all the way down to why it's failing the way it is, but have a strong suspicion that the root cause is that I missed cleaning up the PCCtx in my original patch.
Given (1) it may take a while to chase this to ground; (2) the problem is happening several times a day, so it should be easy to tell if it stops; and (3) the change in this patch makes the unit test more correct in any case, I'd like to go ahead and land this fix to see if the problem goes away.
Attachment #769715 -
Flags: review?(ekr)
Comment 36•12 years ago
|
||
Comment on attachment 769715 [details] [diff] [review]
Clean up PeerConnectionCtx before exiting
Review of attachment 769715 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm
Attachment #769715 -
Flags: review?(ekr) → review+
Assignee | ||
Comment 37•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/b6d569b7cc8c
Marking leave-open until we verify that the problems on Linux have gone away.
Whiteboard: [WebRTC] [blocking-webrtc-] → [WebRTC] [blocking-webrtc-] [leave-open]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 40•12 years ago
|
||
I have verified that the crashes for comment 38 and comment 39 were from pushes prior to this patch landing. I would expect no crashes on m-i after this comment.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 42•12 years ago
|
||
Yeah, we're stuck with mozilla-central closed so we can't merge from inbound to everywhere else, so you're probably going to see a few days more starring from other trees.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 44•12 years ago
|
||
Comment on attachment 769715 [details] [diff] [review]
Clean up PeerConnectionCtx before exiting
This patch fixes a test driver that is run by the builders. It is not user-visible, but it impacts developers and sheriffs by causing sporadic build failures (cf comment 43).
[Approval Request Comment]
> Bug caused by (feature/regressing bug #):
Bug 880067
> User impact if declined:
None
> Testing completed (on m-c, etc.):
Landed on mozilla-inbound. No test failures seen since then.
> Risk to taking this patch (and alternatives if risky):
No risk -- it only changes a unit test. No distributed binaries are impacted.
> String or IDL/UUID changes made by this patch:
None
Attachment #769715 -
Flags: approval-mozilla-aurora?
Updated•12 years ago
|
Attachment #769715 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Comment 45•12 years ago
|
||
Updated•12 years ago
|
status-firefox24:
--- → fixed
status-firefox25:
--- → fixed
Whiteboard: [WebRTC] [blocking-webrtc-] [leave-open] → [WebRTC] [blocking-webrtc-]
Comment 46•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 50•12 years ago
|
||
philor -- comment 49 is something very different than this bug. There is no process crash in that log, simply a test failure. There are actually a bunch of failures here in a bunch of places, of which sdp_unittests is simply one.
Perhaps this is a new bug?
Assignee | ||
Comment 51•12 years ago
|
||
Comment 48 is also crash-free and therefore not this bug either.
Comment hidden (Legacy TBPL/Treeherder Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•