Closed Bug 816640 Opened 7 years ago Closed 7 years ago

Assertion failure and shutdown crash: "_mOwningThread.GetThread() == PR_GetCurrentThread() (nrappkitTimerCallback not thread-safe)"

Categories

(Core :: WebRTC: Networking, defect, P1, critical)

20 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla20

People

(Reporter: whimboo, Assigned: jesup)

Details

(Keywords: assertion, crash, Whiteboard: [WebRTC][blocking-webrtc+][qa-])

Attachments

(3 files)

While working on my first peer connection Mochitest I hit this assertion and shutdown crash. The reduced testcase can be found attached. I will give more details once I got a better stack as the one below.

This crash only happens because the variable of the peer connection instance is declared in the global scope. Once I move it into the runTest() method no crash happens anymore.

Thread 0 (crashed)
 0  XUL!mozilla::nrappkitTimerCallback::Release() [nr_timer.cpp : 46 + 0x0]
    rbx = 0x0000000000000000   r12 = 0x0000000000000000
    r13 = 0x0000000000000000   r14 = 0x0000000000000000
    r15 = 0x0000000000000000   rip = 0x00000001043f907e
    rsp = 0x00007fff5fbfe280   rbp = 0x00007fff5fbfe2c0
    Found by: given as instruction pointer in context
 1  XUL!nsTArray<nsTimerImpl*, nsTArrayDefaultAllocator>::ElementAt(unsigned int) [nsTArray.h : 539 + 0xc]
    rip = 0x00000001037c3ac1   rsp = 0x00007fff5fbfe2a0
    rbp = 0x00007fff5fbfe2c0
    Found by: stack scanning
 2  XUL!nsTimerImpl::ReleaseCallback() [nsTimerImpl.h:eb8da53904eb : 78 + 0x13]
    rip = 0x00000001037c1112   rsp = 0x00007fff5fbfe2d0
    rbp = 0x00007fff5fbfe2f0
    Found by: stack scanning
 3  XUL!TimerThread::Shutdown() [TimerThread.cpp:eb8da53904eb : 154 + 0x8]
    rip = 0x00000001037c2108   rsp = 0x00007fff5fbfe300
    rbp = 0x00007fff5fbfe360
    Found by: stack scanning
 4  libnspr4.dylib!_PR_Mach_GetInterval [darwin.c:eb8da53904eb : 36 + 0x8]
    rip = 0x000000010053d8ad   rsp = 0x00007fff5fbfe330
    rbp = 0x00007fff5fbfe360
    Found by: stack scanning
 5  XUL!nsTimerImpl::Shutdown() [nsTimerImpl.cpp:eb8da53904eb : 247 + 0xb]
    rip = 0x00000001037bf9c0   rsp = 0x00007fff5fbfe370
    rbp = 0x00007fff5fbfe390
    Found by: stack scanning
 6  XUL!mozilla::ShutdownXPCOM(nsIServiceManager*) [nsXPComInit.cpp:eb8da53904eb : 570 + 0xa]
    rip = 0x000000010372d48f   rsp = 0x00007fff5fbfe3a0
    rbp = 0x00007fff5fbfe4f0
    Found by: stack scanning
 7  XUL!nsObserverService::Release() [nsObserverService.cpp:eb8da53904eb : 51 + 0x2c]
    rip = 0x0000000103751740   rsp = 0x00007fff5fbfe3c0
    rbp = 0x00007fff5fbfe4f0
    Found by: stack scanning
 8  XUL!nsCOMPtr<nsIObserverService>::~nsCOMPtr() [nsCOMPtr.h : 494 + 0x12]
    rip = 0x000000010127705b   rsp = 0x00007fff5fbfe400
    rbp = 0x00007fff5fbfe4f0
    Found by: stack scanning
 9  XUL!nsCOMPtr<nsIObserverService>::~nsCOMPtr() [nsCOMPtr.h : 491 + 0x4]
    rip = 0x0000000101273315   rsp = 0x00007fff5fbfe430
    rbp = 0x00007fff5fbfe4f0
    Found by: stack scanning
10  XUL + 0x24f3faf
    rip = 0x0000000103751fb0   rsp = 0x00007fff5fbfe470
    rbp = 0x00007fff5fbfe4f0
    Found by: stack scanning
11  XUL!NS_ShutdownXPCOM_P [nsXPComInit.cpp:eb8da53904eb : 513 + 0x8]
    rip = 0x000000010372d1c5   rsp = 0x00007fff5fbfe500
    rbp = 0x00007fff5fbfe510
    Found by: stack scanning
Attached patch testcaseSplinter Review
Steps to reproduce:

1. Apply the patch
2. Go to 'obj_dir/dom/media/tests/mochitest and run 'make'
3. Run the test from the src folder: 'TEST_PATH=dom/media/tests/mochitest/ make -C ${OBJ_DIR} mochitest-plain'
OS: Mac OS X → All
Priority: -- → P1
Whiteboard: [WebRTC][automation-blocked] → [WebRTC][blocking-webrtc+][automation-blocked]
This is a timer that is being destroyed without being executed. The proximal fix is to change the NS_IMPL_ISUPPORTS1 on line 46 of nr_timer.cpp to NS_IMPL_THREADSAFE_ISUPPORTS1. However, it's important to realize that in many cases a timer is holding a strong reference to an object in a form that isn't a RefPtr. For instance, often the timer callback itself does the destruction of the object. If timers are being destroyed without being fired, that's going to make things leak. Do we need to write the timers so that the Release does the action?
Note: this is only an issue on full system shutdown during periods when timers are live.
Attachment #689116 - Flags: review?(ekr)
Comment on attachment 689116 [details] [diff] [review]
nrappkit timers need to be threadsafe

Review of attachment 689116 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm
Attachment #689116 - Flags: review?(ekr) → review+
Flags: in-testsuite?
https://hg.mozilla.org/mozilla-central/rev/0e0f68fef5c4
Assignee: nobody → rjesup
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Whiteboard: [WebRTC][blocking-webrtc+][automation-blocked] → [WebRTC][blocking-webrtc+][automation-blocked][qa-]
Works fine now and no assertion or crash can be seen on shutdown.

I don't think the in-testsuite flag is applicable here given that we can't shutdown the browser. Could be something for a Mozmill test.
Status: RESOLVED → VERIFIED
Flags: in-testsuite?
Flags: in-testsuite-
Flags: in-qa-testsuite?(hskupin)
Whiteboard: [WebRTC][blocking-webrtc+][automation-blocked][qa-] → [WebRTC][blocking-webrtc+][qa-]
Nils, do you think its worth creating a test for?
Flags: needinfo?(drno)
Flags: in-qa-testsuite?(hskupin)
Flags: in-qa-testsuite?
Please see https://bugzilla.mozilla.org/show_bug.cgi?id=799142#c54 for my request of writing one Mozmill test to cover these scenarios.
Flags: needinfo?(drno)
You need to log in before you can comment on or make changes to this bug.