Closed Bug 850773 Opened 11 years ago Closed 11 years ago

Assertion failure and crash on shutdown: ((bool)(__builtin_expect(!!(!NS_FAILED_impl(rv)), 1))), at media/webrtc/signaling/../../../media/mtransport/runnable_ [@ mozilla::MediaPipeline::Shutdown()]

Categories

(Core :: WebRTC: Signaling, defect)

22 Branch
x86_64
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla22
Tracking Status
firefox22 + fixed

People

(Reporter: whimboo, Assigned: jesup)

References

Details

(Keywords: assertion, crash, reproducible, Whiteboard: [WebRTC][blocking-webrtc+][qa-])

Crash Data

The assertion as logged on bug 835851, and bug 845088 we also hit with another stack as given below. I can see this while running my local Mochitests for the datachannel connections. I can reliable reproduce it.

Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash address: 0x0

1:22.55 Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
 1:22.55 Crash address: 0x0
 1:22.55 
 1:22.55 Thread 0 (crashed)
 1:22.55 0  XUL!mozilla::RUN_ON_THREAD [runnable_utils.h:1574542c3999 : 48 + 0x0]
 1:22.55 rbx = 0x00007fff7ea7c630   r12 = 0x0000000000000000
 1:22.55 r13 = 0x00000001193f9fa0   r14 = 0x0000000000000001
 1:22.55 r15 = 0x000000010831c8b8   rip = 0x00000001030e581a
 1:22.55 rsp = 0x00007fff5fbfe6d0   rbp = 0x00007fff5fbfe6f0
 1:22.55 Found by: given as instruction pointer in context
 1:22.55 1  XUL!mozilla::MediaPipeline::Shutdown() [MediaPipeline.h:1574542c3999 : 109 + 0x7]
 1:22.55 rip = 0x00000001030e54ef   rsp = 0x00007fff5fbfe700
 1:22.55 rbp = 0x00007fff5fbfe710
 1:22.55 Found by: stack scanning
 1:22.55 2  XUL!sipcc::PeerConnectionMedia::DisconnectMediaStreams() [PeerConnectionMedia.h:1574542c3999 : 191 + 0x4]
 1:22.55 rip = 0x00000001030e3ed9   rsp = 0x00007fff5fbfe720
 1:22.55 rbp = 0x00007fff5fbfe760
 1:22.55 Found by: stack scanning
 1:22.55 3  XUL!(anonymous namespace)::verb_to_gl_path_cmd(SkPath::Verb)::gTable + 0x43172
 1:22.55 rip = 0x00000001039560c2   rsp = 0x00007fff5fbfe740
 1:22.55 rbp = 0x00007fff5fbfe760
 1:22.55 Found by: stack scanning
 1:22.55 4  XUL!(anonymous namespace)::verb_to_gl_path_cmd(SkPath::Verb)::gTable + 0x42f60
 1:22.56 rip = 0x0000000103955eb0   rsp = 0x00007fff5fbfe748
 1:22.56 rbp = 0x00007fff5fbfe760
 1:22.56 Found by: stack scanning
 1:22.56 5  XUL!(anonymous namespace)::verb_to_gl_path_cmd(SkPath::Verb)::gTable + 0x42e3b
 1:22.56 rip = 0x0000000103955d8b   rsp = 0x00007fff5fbfe760
 1:22.56 rbp = 0x00007fff5fbfe760
 1:22.56 Found by: stack scanning
 1:22.56 6  XUL!sipcc::PeerConnectionMedia::SelfDestruct() [PeerConnectionMedia.cpp:1574542c3999 : 230 + 0x7]
 1:22.56 rip = 0x00000001030e3c80   rsp = 0x00007fff5fbfe770
 1:22.56 rbp = 0x00007fff5fbfe7a0
 1:22.56 Found by: stack scanning
 1:22.56 7  libnspr4.dylib!PR_GetCurrentThread [ptthread.c:1574542c3999 : 621 + 0xb]
 1:22.56 rip = 0x000000010009ddf3   rsp = 0x00007fff5fbfe780
 1:22.56 rbp = 0x00007fff5fbfe7a0
 1:22.56 Found by: stack scanning

I haven't tested yet with Aurora, so someone might want to do it.
Whiteboard: [WebRTC][blocking-webrtc?] → [WebRTC][blocking-webrtc+]
I'm working on a simplified testcase. I should have it up later.

Randel and Eric, which NSPR logs would be helpful here?
Crash Signature: [@ mozilla::MediaPipeline::Shutdown()] → [@ mozilla::RUN_ON_THREAD] [@ mozilla::MediaPipeline::Shutdown()]
Flags: needinfo?(rjesup)
Flags: needinfo?(ekr)
This is strange. When I save the file again with even no modifications it happens that the assertion goes away. It takes me a while to see it again, but then it happens each time until I save the file again. I'm currently not sure what the initial trigger is here.
I wouldn't try to debug this, We think that this is a problem with reentrancy. If you can get a reliable test case, we can verify that we have fixed it.
Flags: needinfo?(ekr)
(In reply to Eric Rescorla (:ekr) from comment #3)
> I wouldn't try to debug this, We think that this is a problem with
> reentrancy. If you can get a reliable test case, we can verify that we have
> fixed it.

Fixed by which patch or bug? I haven't recompiled today but I have a build from 2 days ago.
bug 844493

It's not fixed yet, but if you create a test...
I don't have a test which reliable shows that, given that I can't find the initial trigger.
Depends on: 844493
If we're looking for a reliable test-case for the re-entrancy-on-shutdown problems, I recommend Bug 842749, as it always either asserts or hangs for me, and is fairly well understood. Though, building a test from it would require automated triggering of shutdown while the JS is busy, which I don't know whether the harness supports.
Jan, do all those assertions with different call stacks have the same underlying issue? Or would that be different manifestations we have to handle separately?
It's unclear. There seem to be a number of issues which likely are related but nobody knows.
The JS re-entrancy during shutdown issue is known intermittent (happens when browser is shutdown while JS is executing inside peerconnection, a race), causing things to shutdown in unplanned order every time it happens, provoking a number of asserts/crashes. Debugging these are likely not of value since they are reactions to bad shutdown order (once we fix the order they go away).

Other intermittent asserts in good shutdowns (from "JS idle" state) are unrelated. Telling them apart is tricky (if the JS involved has a for-loop creating 70 peerconnections, it's most definitely related; but if it doesn't rely on loops to stay busy then it'll sit mostly idle since peerconnections don't block, and it's likely unrelated).

Best may be to track them separately and see how many fall away once the patch lands.
Is FF21 affected?
Assignee: nobody → rjesup
We believe this bug should be fixed now that the patch for bug 844493 landed. Please reopen if you see it again.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Flags: needinfo?(rjesup)
Whiteboard: [WebRTC][blocking-webrtc+] → [WebRTC][blocking-webrtc+][qa-]
Flags: in-testsuite-
Target Milestone: --- → mozilla22
You need to log in before you can comment on or make changes to this bug.