Closed
Bug 855498
Opened 12 years ago
Closed 12 years ago
Hang in CC_SIPCCService::mLock on shutdown if Ctrl/Cmd+Q hit during long crashtests/822197.html
Categories
(Core :: WebRTC: Signaling, defect, P2)
Tracking
()
VERIFIED
FIXED
mozilla23
Tracking | Status | |
---|---|---|
firefox22 | --- | fixed |
People
(Reporter: jib, Assigned: jib)
References
()
Details
(Keywords: hang, Whiteboard: [webrtc][blocking-webrtc+])
Crash Data
Attachments
(4 files, 2 obsolete files)
245.05 KB,
text/plain
|
Details | |
7.22 KB,
text/plain
|
Details | |
5.31 KB,
text/plain
|
Details | |
18.85 KB,
patch
|
jib
:
review+
bajaj
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
Reproduce steps:
1. Open URL
2. Quit the browser using Ctrl+Q (Cmd+Q on Mac) within 10 seconds or so.
These are the same reproduce steps as Bug 842749, except that now that the JS re-entrancy has been fixed, the browser is ironically much less responsive during crashtests/822197.html.
This is presumably "more correct" since this crashtest contains JS creating 70 PeerConnections in a for-loop. E.g. you have to work hard on a Mac now to right-click the Nightly icon in the taskbar while this is running and see anything other than "Force Quit".
If I wait it out (takes about 30 seconds), e.g. until right-click on Nightly produces the "Quit" option, or until the beachball-of-death on the mac stops, then Firefox exits cleanly.
However, if I manage to get the Quit in early, either by right-clicking on Nightly just as the page loaded, or hit Cmd+Q sometime during the run, then it hangs every time for me in the same place:
#0 __psynch_mutexwait ()
#1 pthread_mutex_lock ()
#2 in PR_Lock ()
#3 mozilla::Mutex::Lock()@BlockingResourceBase.cpp:228
#4 mozilla::BaseAutoLock<mozilla::Mutex>::BaseAutoLock() @Mutex.h:153
#5 mozilla::BaseAutoLock<mozilla::Mutex>::BaseAutoLock() @Mutex.h:154
#6 CSF::CC_SIPCCService::removeCCObserver() @CC_SIPCCService.cpp:742
#7 CSF::CallControlManagerImpl::disconnect() @CallControlManagerImpl.cpp:234
#8 CSF::CallControlManagerImpl::destroy() @CallControlManagerImpl.cpp:57
#9 sipcc::PeerConnectionCtx::Cleanup() @PeerConnectionCtx.cpp:221
Assignee | ||
Updated•12 years ago
|
Whiteboard: [webrtc][blocking-webrtc+]
Comment 1•12 years ago
|
||
Ekr, can you take a look at this? It looks like this is fallout from your DISPATCH patch landing, and it's currently marked a blocker to release. Thanks.
Assignee: nobody → ekr
Assignee | ||
Comment 3•12 years ago
|
||
If you prefer a non-interactive reproduce case, see Bug 835851 (download attachments and run crashtest).
Comment 4•12 years ago
|
||
Can you provide a backtrace of all the threads? We may be able to guess who is holding this mutex.
Flags: needinfo?(jib)
Assignee | ||
Comment 5•12 years ago
|
||
Here's a backtrace of all threads from Bug 835851 (easiest steps): attachment 730791 [details].
Flags: needinfo?(jib)
Assignee | ||
Comment 6•12 years ago
|
||
The SIPCC thread does a sync dispatch to main in CC_SIPCCService::onCallEvent() while holding its m_lock.
The Main thread is busy shutting down in CC_SIPCCService::removeCCObserver where it waits on the m_lock.
Assignee | ||
Updated•12 years ago
|
Assignee: ekr → jib
Updated•12 years ago
|
Assignee | ||
Updated•12 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 7•12 years ago
|
||
Update. Work in progress.
As discussed last week on IRC, our solution to the hang is:
(1) Fix PeerConnectionCtx::onCallEvent() to dispatch NORMAL (async) to main.
(2) remove unused aCall;
(3) make onCallEvent_m static; and
(4) pass in strdup(aCall->getPeerConnection()), effectively a weak ref to
the pc to onCallEvent_m, to handle running late in shutdown (pc gone).
Open issues:
- CC_*Ptr is not thread-safe so we must not manipulate the ref count on
multiple threads at once, the way it does here. Previously, SyncRunnable
enforced this. Now that we are async, this must be addressed. We can
either change C_*Ptr to threadsafe, or...
Since it seems like pc->onEvent just gets aInfo->callStateToString(event)
anyway, I can just duplicate the string and pass that in to avoid the
reference altogether.
- I think I can make Bug 835851 into a crashtest.
Comment 8•12 years ago
|
||
(In reply to Jan-Ivar Bruaroey [:jib] from comment #7)
> - CC_*Ptr is not thread-safe
Just as a minor correction here: as of Bug 848423, these pointers are thread-safe.
Assignee | ||
Comment 9•12 years ago
|
||
> Just as a minor correction here: as of Bug 848423, these pointers are thread-safe.
Good news.
Assignee | ||
Comment 10•12 years ago
|
||
The patch seems to work, but exposes another hang in cprJoinThread().
I may as well address that at the same time.
Current thinking on IRC is to have the GSM thread et al. use a new BreakableSyncRunnable derivative that takes a PRInt32 argument, so that ccUnload() can break through on shutdown.
Assignee | ||
Comment 11•12 years ago
|
||
Now fixes SIPCC join-hang on shutdown, in addition to earlier fix.
To recap, this patch changes two things:
1) PeerConnectionCtx::onCallEvent() is dispatched NORMAL (async) to main.
2) Safe joining of SyncRunnable-using SIPCC threads is now a feature of
thread_monitor.c
Try - https://tbpl.mozilla.org/?tree=Try&rev=220d2c4d600e
Attachment #732972 -
Attachment is obsolete: true
Attachment #734162 -
Flags: review?(rjesup)
Comment 12•12 years ago
|
||
Comment on attachment 734162 [details] [diff] [review]
Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads.
Review of attachment 734162 [details] [diff] [review]:
-----------------------------------------------------------------
r+, but please add some comments about why it's done this way and the implications -- especially about spinning the event loop if we were called from JS code.
Attachment #734162 -
Flags: review?(rjesup) → review+
Assignee | ||
Comment 13•12 years ago
|
||
Comment-only update. Carrying forward r+ from Randell.
Attachment #734162 -
Attachment is obsolete: true
Attachment #734237 -
Flags: review+
Assignee | ||
Updated•12 years ago
|
Keywords: checkin-needed
Comment 14•12 years ago
|
||
Whiteboard: [webrtc][blocking-webrtc+] → [webrtc][blocking-webrtc+][webrtc-uplift]
Target Milestone: --- → mozilla23
Updated•12 years ago
|
Keywords: checkin-needed
Comment 15•12 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Flags: in-testsuite-
Comment 16•12 years ago
|
||
Verified on 4/8 with STR in comment 0 - not seeing a hang.
Status: RESOLVED → VERIFIED
Keywords: verifyme
Comment 17•12 years ago
|
||
Comment on attachment 734237 [details] [diff] [review]
Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. r=jesup
[Approval Request Comment]
Bug caused by (feature/regressing bug #): n/a
User impact if declined: Possible deadlocks during shutdown, especially when webrtc is overloaded
Testing completed (on m-c, etc.): on m-c for a while. Manual testing of reproducible hang.
Risk to taking this patch (and alternatives if risky): Low risk. SIPCC/signaling shutdown is currently only done at browser-quit time. It's unlikely even in a bug in this code would do more than leak objects or deadlock shutdown.
String or IDL/UUID changes made by this patch: none
Attachment #734237 -
Flags: approval-mozilla-aurora?
Comment 18•12 years ago
|
||
Comment on attachment 734237 [details] [diff] [review]
Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. r=jesup
low risk, well tested patch . Been on m-c for a few days and the hang is explicitly verified by QA .
Attachment #734237 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 19•12 years ago
|
||
status-firefox22:
--- → fixed
Whiteboard: [webrtc][blocking-webrtc+][webrtc-uplift] → [webrtc][blocking-webrtc+]
Target Milestone: mozilla23 → mozilla22
Comment 20•12 years ago
|
||
The m-c version where it landed is 23.
Target Milestone: mozilla22 → mozilla23
You need to log in
before you can comment on or make changes to this bug.
Description
•