245.05 KB, text/plain
7.22 KB, text/plain
5.31 KB, text/plain
18.85 KB, patch
|Details | Diff | Splinter Review|
Created attachment 730380 [details] Log output + callstack Reproduce steps: 1. Open URL 2. Quit the browser using Ctrl+Q (Cmd+Q on Mac) within 10 seconds or so. These are the same reproduce steps as Bug 842749, except that now that the JS re-entrancy has been fixed, the browser is ironically much less responsive during crashtests/822197.html. This is presumably "more correct" since this crashtest contains JS creating 70 PeerConnections in a for-loop. E.g. you have to work hard on a Mac now to right-click the Nightly icon in the taskbar while this is running and see anything other than "Force Quit". If I wait it out (takes about 30 seconds), e.g. until right-click on Nightly produces the "Quit" option, or until the beachball-of-death on the mac stops, then Firefox exits cleanly. However, if I manage to get the Quit in early, either by right-clicking on Nightly just as the page loaded, or hit Cmd+Q sometime during the run, then it hangs every time for me in the same place: #0 __psynch_mutexwait () #1 pthread_mutex_lock () #2 in PR_Lock () #3 mozilla::Mutex::Lock()@BlockingResourceBase.cpp:228 #4 mozilla::BaseAutoLock<mozilla::Mutex>::BaseAutoLock() @Mutex.h:153 #5 mozilla::BaseAutoLock<mozilla::Mutex>::BaseAutoLock() @Mutex.h:154 #6 CSF::CC_SIPCCService::removeCCObserver() @CC_SIPCCService.cpp:742 #7 CSF::CallControlManagerImpl::disconnect() @CallControlManagerImpl.cpp:234 #8 CSF::CallControlManagerImpl::destroy() @CallControlManagerImpl.cpp:57 #9 sipcc::PeerConnectionCtx::Cleanup() @PeerConnectionCtx.cpp:221
Ekr, can you take a look at this? It looks like this is fallout from your DISPATCH patch landing, and it's currently marked a blocker to release. Thanks.
Assignee: nobody → ekr
Duplicate of this bug: 835851
If you prefer a non-interactive reproduce case, see Bug 835851 (download attachments and run crashtest).
Can you provide a backtrace of all the threads? We may be able to guess who is holding this mutex.
Here's a backtrace of all threads from Bug 835851 (easiest steps): attachment 730791 [details].
Created attachment 730854 [details] This output shows the deadlock. The SIPCC thread does a sync dispatch to main in CC_SIPCCService::onCallEvent() while holding its m_lock. The Main thread is busy shutting down in CC_SIPCCService::removeCCObserver where it waits on the m_lock.
Severity: normal → critical
Crash Signature: [@ mozilla::Mutex::Lock()]
Created attachment 732972 [details] [diff] [review] Async onEvent dispatch to PC on main. WIP. Update. Work in progress. As discussed last week on IRC, our solution to the hang is: (1) Fix PeerConnectionCtx::onCallEvent() to dispatch NORMAL (async) to main. (2) remove unused aCall; (3) make onCallEvent_m static; and (4) pass in strdup(aCall->getPeerConnection()), effectively a weak ref to the pc to onCallEvent_m, to handle running late in shutdown (pc gone). Open issues: - CC_*Ptr is not thread-safe so we must not manipulate the ref count on multiple threads at once, the way it does here. Previously, SyncRunnable enforced this. Now that we are async, this must be addressed. We can either change C_*Ptr to threadsafe, or... Since it seems like pc->onEvent just gets aInfo->callStateToString(event) anyway, I can just duplicate the string and pass that in to avoid the reference altogether. - I think I can make Bug 835851 into a crashtest.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #7) > - CC_*Ptr is not thread-safe Just as a minor correction here: as of Bug 848423, these pointers are thread-safe.
> Just as a minor correction here: as of Bug 848423, these pointers are thread-safe. Good news.
Created attachment 733052 [details] Progress, but now callstacks showing new hang in cprJoinThread() The patch seems to work, but exposes another hang in cprJoinThread(). I may as well address that at the same time. Current thinking on IRC is to have the GSM thread et al. use a new BreakableSyncRunnable derivative that takes a PRInt32 argument, so that ccUnload() can break through on shutdown.
Created attachment 734162 [details] [diff] [review] Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. Now fixes SIPCC join-hang on shutdown, in addition to earlier fix. To recap, this patch changes two things: 1) PeerConnectionCtx::onCallEvent() is dispatched NORMAL (async) to main. 2) Safe joining of SyncRunnable-using SIPCC threads is now a feature of thread_monitor.c Try - https://tbpl.mozilla.org/?tree=Try&rev=220d2c4d600e
Comment on attachment 734162 [details] [diff] [review] Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. Review of attachment 734162 [details] [diff] [review]: ----------------------------------------------------------------- r+, but please add some comments about why it's done this way and the implications -- especially about spinning the event loop if we were called from JS code.
Attachment #734162 - Flags: review?(rjesup) → review+
Created attachment 734237 [details] [diff] [review] Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. r=jesup Comment-only update. Carrying forward r+ from Randell.
Whiteboard: [webrtc][blocking-webrtc+] → [webrtc][blocking-webrtc+][webrtc-uplift]
Target Milestone: --- → mozilla23
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Verified on 4/8 with STR in comment 0 - not seeing a hang.
Status: RESOLVED → VERIFIED
Comment on attachment 734237 [details] [diff] [review] Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. r=jesup [Approval Request Comment] Bug caused by (feature/regressing bug #): n/a User impact if declined: Possible deadlocks during shutdown, especially when webrtc is overloaded Testing completed (on m-c, etc.): on m-c for a while. Manual testing of reproducible hang. Risk to taking this patch (and alternatives if risky): Low risk. SIPCC/signaling shutdown is currently only done at browser-quit time. It's unlikely even in a bug in this code would do more than leak objects or deadlock shutdown. String or IDL/UUID changes made by this patch: none
Attachment #734237 - Flags: approval-mozilla-aurora?
Comment on attachment 734237 [details] [diff] [review] Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. r=jesup low risk, well tested patch . Been on m-c for a few days and the hang is explicitly verified by QA .
Attachment #734237 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
status-firefox22: --- → fixed
Whiteboard: [webrtc][blocking-webrtc+][webrtc-uplift] → [webrtc][blocking-webrtc+]
Target Milestone: mozilla23 → mozilla22
The m-c version where it landed is 23.
Target Milestone: mozilla22 → mozilla23
You need to log in before you can comment on or make changes to this bug.