Closed Bug 855498 Opened 12 years ago Closed 12 years ago

Hang in CC_SIPCCService::mLock on shutdown if Ctrl/Cmd+Q hit during long crashtests/822197.html

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla23

Tracking Flags:

Tracking

Status

firefox22

---

fixed

People

(Reporter: jib, Assigned: jib)

References

(
URL
)

Details

(Keywords: hang, Whiteboard: [webrtc][blocking-webrtc+])

Crash Data

Attachments

(4 files, 2 obsolete files)

Log output + callstack 12 years ago Jan-Ivar Bruaroey [:jib] (needinfo? me) 245.05 KB, text/plain		Details
This output shows the deadlock. 12 years ago Jan-Ivar Bruaroey [:jib] (needinfo? me) 7.22 KB, text/plain		Details
Async onEvent dispatch to PC on main. WIP. 12 years ago Jan-Ivar Bruaroey [:jib] (needinfo? me) 6.55 KB, patch		Details \| Diff \| Splinter Review
Progress, but now callstacks showing new hang in cprJoinThread() 12 years ago Jan-Ivar Bruaroey [:jib] (needinfo? me) 5.31 KB, text/plain		Details
Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. 12 years ago Jan-Ivar Bruaroey [:jib] (needinfo? me) 17.59 KB, patch	jesup : review+	Details \| Diff \| Splinter Review
Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. r=jesup 12 years ago Jan-Ivar Bruaroey [:jib] (needinfo? me) 18.85 KB, patch	jib : review+ bajaj : approval-mozilla-aurora+	Details \| Diff \| Splinter Review

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Description

•

12 years ago

Attached file Log output + callstack — Details

Reproduce steps: 1. Open URL 2. Quit the browser using Ctrl+Q (Cmd+Q on Mac) within 10 seconds or so. These are the same reproduce steps as Bug 842749, except that now that the JS re-entrancy has been fixed, the browser is ironically much less responsive during crashtests/822197.html. This is presumably "more correct" since this crashtest contains JS creating 70 PeerConnections in a for-loop. E.g. you have to work hard on a Mac now to right-click the Nightly icon in the taskbar while this is running and see anything other than "Force Quit". If I wait it out (takes about 30 seconds), e.g. until right-click on Nightly produces the "Quit" option, or until the beachball-of-death on the mac stops, then Firefox exits cleanly. However, if I manage to get the Quit in early, either by right-clicking on Nightly just as the page loaded, or hit Cmd+Q sometime during the run, then it hangs every time for me in the same place: #0 __psynch_mutexwait () #1 pthread_mutex_lock () #2 in PR_Lock () #3 mozilla::Mutex::Lock()@BlockingResourceBase.cpp:228 #4 mozilla::BaseAutoLock<mozilla::Mutex>::BaseAutoLock() @Mutex.h:153 #5 mozilla::BaseAutoLock<mozilla::Mutex>::BaseAutoLock() @Mutex.h:154 #6 CSF::CC_SIPCCService::removeCCObserver() @CC_SIPCCService.cpp:742 #7 CSF::CallControlManagerImpl::disconnect() @CallControlManagerImpl.cpp:234 #8 CSF::CallControlManagerImpl::destroy() @CallControlManagerImpl.cpp:57 #9 sipcc::PeerConnectionCtx::Cleanup() @PeerConnectionCtx.cpp:221

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Updated

•

12 years ago

Depends on: 842749

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Updated

•

12 years ago

Whiteboard: [webrtc][blocking-webrtc+]

Maire Reavy [:mreavy]

Comment 1

•

12 years ago

Ekr, can you take a look at this? It looks like this is fallout from your DISPATCH patch landing, and it's currently marked a blocker to release. Thanks.

Assignee: nobody → ekr

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Comment 3

•

12 years ago

If you prefer a non-interactive reproduce case, see Bug 835851 (download attachments and run crashtest).

Eric Rescorla (:ekr)

Comment 4

•

12 years ago

Can you provide a backtrace of all the threads? We may be able to guess who is holding this mutex.

Flags: needinfo?(jib)

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Comment 5

•

12 years ago

Here's a backtrace of all threads from Bug 835851 (easiest steps): attachment 730791 [details].

Flags: needinfo?(jib)

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Comment 6

•

12 years ago

Attached file This output shows the deadlock. — Details

The SIPCC thread does a sync dispatch to main in CC_SIPCCService::onCallEvent() while holding its m_lock. The Main thread is busy shutting down in CC_SIPCCService::removeCCObserver where it waits on the m_lock.

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Updated

•

12 years ago

Assignee: ekr → jib

Scoobidiver (away)

Updated

•

12 years ago

Severity: normal → critical

Crash Signature: [@ mozilla::Mutex::Lock()]

Keywords: hang

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Updated

•

12 years ago

Status: NEW → ASSIGNED

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Comment 7

•

12 years ago

Attached patch Async onEvent dispatch to PC on main. WIP. (obsolete) — Details — Splinter Review

Update. Work in progress. As discussed last week on IRC, our solution to the hang is: (1) Fix PeerConnectionCtx::onCallEvent() to dispatch NORMAL (async) to main. (2) remove unused aCall; (3) make onCallEvent_m static; and (4) pass in strdup(aCall->getPeerConnection()), effectively a weak ref to the pc to onCallEvent_m, to handle running late in shutdown (pc gone). Open issues: - CC_*Ptr is not thread-safe so we must not manipulate the ref count on multiple threads at once, the way it does here. Previously, SyncRunnable enforced this. Now that we are async, this must be addressed. We can either change C_*Ptr to threadsafe, or... Since it seems like pc->onEvent just gets aInfo->callStateToString(event) anyway, I can just duplicate the string and pass that in to avoid the reference altogether. - I think I can make Bug 835851 into a crashtest.

Adam Roach [:abr]

Comment 8

•

12 years ago

(In reply to Jan-Ivar Bruaroey [:jib] from comment #7) > - CC_*Ptr is not thread-safe Just as a minor correction here: as of Bug 848423, these pointers are thread-safe.

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Comment 9

•

12 years ago

> Just as a minor correction here: as of Bug 848423, these pointers are thread-safe. Good news.

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Comment 10

•

12 years ago

Attached file Progress, but now callstacks showing new hang in cprJoinThread() — Details

The patch seems to work, but exposes another hang in cprJoinThread(). I may as well address that at the same time. Current thinking on IRC is to have the GSM thread et al. use a new BreakableSyncRunnable derivative that takes a PRInt32 argument, so that ccUnload() can break through on shutdown.

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Comment 11

•

12 years ago

Attached patch Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. (obsolete) — Details — Splinter Review

Now fixes SIPCC join-hang on shutdown, in addition to earlier fix. To recap, this patch changes two things: 1) PeerConnectionCtx::onCallEvent() is dispatched NORMAL (async) to main. 2) Safe joining of SyncRunnable-using SIPCC threads is now a feature of thread_monitor.c Try - https://tbpl.mozilla.org/?tree=Try&rev=220d2c4d600e

Attachment #732972 - Attachment is obsolete: true

Attachment #734162 - Flags: review?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Comment 12

•

12 years ago

Comment on attachment 734162 [details] [diff] [review] Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. Review of attachment 734162 [details] [diff] [review]: ----------------------------------------------------------------- r+, but please add some comments about why it's done this way and the implications -- especially about spinning the event loop if we were called from JS code.

Attachment #734162 - Flags: review?(rjesup) → review+

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Comment 13

•

12 years ago

Attached patch Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. r=jesup — Details — Splinter Review

Comment-only update. Carrying forward r+ from Randell.

Attachment #734162 - Attachment is obsolete: true

Attachment #734237 - Flags: review+

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Assignee

Updated

•

12 years ago

Keywords: checkin-needed

Randell Jesup [:jesup] (needinfo me)

Comment 14

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/332061725181

Whiteboard: [webrtc][blocking-webrtc+] → [webrtc][blocking-webrtc+][webrtc-uplift]

Target Milestone: --- → mozilla23

Randell Jesup [:jesup] (needinfo me)

Updated

•

12 years ago

Keywords: checkin-needed

Phil Ringnalda (:philor)

Comment 15

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/332061725181

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Jason Smith [:jsmith]

Updated

•

12 years ago

Flags: in-testsuite-

Jason Smith [:jsmith]

Updated

•

12 years ago

Keywords: verifyme

Jason Smith [:jsmith]

Comment 16

•

12 years ago

Verified on 4/8 with STR in comment 0 - not seeing a hang.

Status: RESOLVED → VERIFIED

Keywords: verifyme

Randell Jesup [:jesup] (needinfo me)

Comment 17

•

12 years ago

Comment on attachment 734237 [details] [diff] [review] Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. r=jesup [Approval Request Comment] Bug caused by (feature/regressing bug #): n/a User impact if declined: Possible deadlocks during shutdown, especially when webrtc is overloaded Testing completed (on m-c, etc.): on m-c for a while. Manual testing of reproducible hang. Risk to taking this patch (and alternatives if risky): Low risk. SIPCC/signaling shutdown is currently only done at browser-quit time. It's unlikely even in a bug in this code would do more than leak objects or deadlock shutdown. String or IDL/UUID changes made by this patch: none

Attachment #734237 - Flags: approval-mozilla-aurora?

bhavana bajaj [:bajaj]

Comment 18

•

12 years ago

Comment on attachment 734237 [details] [diff] [review] Async onEvent to PC on main + safe join of SyncRunnable-using SIPCC threads. r=jesup low risk, well tested patch . Been on m-c for a few days and the hang is explicitly verified by QA .

Attachment #734237 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Randell Jesup [:jesup] (needinfo me)

Comment 19

•

12 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/2534a95ecee7

status-firefox22: --- → fixed

Whiteboard: [webrtc][blocking-webrtc+][webrtc-uplift] → [webrtc][blocking-webrtc+]

Target Milestone: mozilla23 → mozilla22

Scoobidiver (away)

Comment 20

•

12 years ago

The m-c version where it landed is 23.

Target Milestone: mozilla22 → mozilla23

You need to log in before you can comment on or make changes to this bug.