Closed Bug 885640 Opened 11 years ago Closed 11 years ago

sdp_unittests has a shutdown race resulting in PROCESS-CRASH | sdp_unittests | application crashed [@ mozalloc_abort(char const*)]

Categories

(Core :: WebRTC: Signaling, defect, P1)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla25
Tracking Status
firefox24 --- fixed
firefox25 --- fixed

People

(Reporter: abr, Assigned: abr)

Details

(Keywords: intermittent-failure, Whiteboard: [WebRTC] [blocking-webrtc-])

Attachments

(2 files)

There appears to be some kind of shutdown race that can occur with sdp_unittests, resulting in either an abort or deadlock.

See the Fedora debug and Fedora64 debug build failures here:

https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=9695f620df74&onlyunstarred=1

In both cases, things went wrong during shut-down. One hangs and is killed after 300 seconds. The other aborts with a stack indicating a call chain of PR_Lock -> PR_Assert. The failed assertion is:

  Assertion failure: 0 == rv, at nsprpub/pr/src/pthreads/ptsynch.c:175

Which, in this case, means that a call to "pthread_mutex_lock(&lock->mutex);" failed. The most likely culprit is that the underlying memory is no longer a valid mutex structure.
Attaching stack trace at failure point, so that it outlives the tbpl logs.
Attachment #765781 - Attachment description: Stack Trace (and some diagnostics) for Intermittent Assert → Stack Trace at failure point (during shutdown)
https://tbpl.mozilla.org/php/getParsedLog.php?id=24434655&tree=Mozilla-Inbound
Summary: sdp_unittests has a shutdown race → sdp_unittests has a shutdown race resulting in PROCESS-CRASH | sdp_unittests | application crashed [@ mozalloc_abort(char const*)]
Priority: P4 → P1
Comment on attachment 769715 [details] [diff] [review]
Clean up PeerConnectionCtx before exiting

I actually haven't managed to chase this problem all the way down to why it's failing the way it is, but have a strong suspicion that the root cause is that I missed cleaning up the PCCtx in my original patch.

Given (1) it may take a while to chase this to ground; (2) the problem is happening several times a day, so it should be easy to tell if it stops; and (3) the change in this patch makes the unit test more correct in any case, I'd like to go ahead and land this fix to see if the problem goes away.
Attachment #769715 - Flags: review?(ekr)
Comment on attachment 769715 [details] [diff] [review]
Clean up PeerConnectionCtx before exiting

Review of attachment 769715 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm
Attachment #769715 - Flags: review?(ekr) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/b6d569b7cc8c

Marking leave-open until we verify that the problems on Linux have gone away.
Whiteboard: [WebRTC] [blocking-webrtc-] → [WebRTC] [blocking-webrtc-] [leave-open]
I have verified that the crashes for comment 38 and comment 39 were from pushes prior to this patch landing. I would expect no crashes on m-i after this comment.
Yeah, we're stuck with mozilla-central closed so we can't merge from inbound to everywhere else, so you're probably going to see a few days more starring from other trees.
Comment on attachment 769715 [details] [diff] [review]
Clean up PeerConnectionCtx before exiting

This patch fixes a test driver that is run by the builders. It is not user-visible, but it impacts developers and sheriffs by causing sporadic build failures (cf comment 43).

[Approval Request Comment]
> Bug caused by (feature/regressing bug #): 

Bug 880067

> User impact if declined: 

None

> Testing completed (on m-c, etc.): 

Landed on mozilla-inbound. No test failures seen since then.

> Risk to taking this patch (and alternatives if risky): 

No risk -- it only changes a unit test. No distributed binaries are impacted.

> String or IDL/UUID changes made by this patch:

None
Attachment #769715 - Flags: approval-mozilla-aurora?
Attachment #769715 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Aurora: https://hg.mozilla.org/releases/mozilla-aurora/rev/cdc01273863f
Whiteboard: [WebRTC] [blocking-webrtc-] [leave-open] → [WebRTC] [blocking-webrtc-]
https://hg.mozilla.org/mozilla-central/rev/b6d569b7cc8c
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
philor -- comment 49 is something very different than this bug. There is no process crash in that log, simply a test failure. There are actually a bunch of failures here in a bunch of places, of which sdp_unittests is simply one.

Perhaps this is a new bug?
Comment 48 is also crash-free and therefore not this bug either.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: