Closed Bug 1407415 Opened 2 years ago Closed 5 months ago

CamerasParent::StopVideoCapture() should try and avoid blocking the IPDL Background ("PBackground") thread

Categories

(Core :: WebRTC: Audio/Video, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 + wontfix
firefox68 --- fixed

People

(Reporter: asuth, Assigned: jib)

References

(Regressed 1 open bug)

Details

Crash Data

Attachments

(3 files)

In bug 1403692 we're seeing a shutdown hang due to "profile-before-change" being unable to advance due to a content-process PBackground-actor shutdown resulting in CamerasParent::StopVideoCapture() blocking the background thread while waiting on a monitor.  (This prevents ServiceWorkerRegistrar's shutdown from advancing because its PBackground message does not get serviced in a timely fashion.)

The rationale in that method for waiting on a monitor in that method is:
  // Hold here until the WebRTC thread is gone. We need to dispatch
  // the thread deletion *now*, or there will be no more possibility
  // to get to the main thread.
in relation to the comment from above:
  // We are called from the main thread (xpcom-shutdown) or
  // from PBackground (when the Actor shuts down).
  // Shut down the WebRTC stack (on the capture thread)

However, I don't think the claim holds up.  Specifically, ThreadEventTarget::Dispatch explicitly allows dispatch to the main-thread after "xpcom-shutdown-threads"[1].  If there's concern about dispatching between other threads, that's legit, but the right thing to do in that case is to adopt use of nsIAsyncShutdown[2].  Additional shutdown phases can be added if the only alternative is blocking a monitor or spinning a nested event loop that's not the one async shutdown already spins.

1: https://searchfox.org/mozilla-central/rev/b53e29293c9e9a2905f4849f4e3c415e2013f0cb/xpcom/threads/ThreadEventTarget.cpp#135
2: https://searchfox.org/mozilla-central/source/toolkit/components/asyncshutdown/nsIAsyncShutdown.idl
@jib can you prioritize this please.
Component: WebRTC → WebRTC: Audio/Video
Flags: needinfo?(jib)
Gian-Carlo, can you comment on the analysis here and in bug 1403692 comment 5?

Andrew, the code in comment 0 landed in 44 (bug 1209987) yet bug 1403692 says FF56 is unaffected. Smells like a regression, yet I don't think much has changed in CamerasParent. What changed?
Rank: 17
Flags: needinfo?(jib) → needinfo?(gpascutto)
Priority: -- → P2
The Part 2 patch from bug 1399646 that landed in Firefox 57 changed ServiceWorkerRegistrar to use nsIAsyncShutdown.  nsIAsyncShutdown triggers crashes on "long" waits (I think previously 1sec, now 8sec).  Previously ServiceWorkerRegistrar would spin its own nested event loop which would not trigger crashes on its own.  Instead, the general shutdown hang timer which has a much longer duration would be the one, if any, to trigger a crash.
(In reply to Jan-Ivar Bruaroey [:jib] (on PTO Oct 11) from comment #2)
> Gian-Carlo, can you comment on the analysis here and in bug 1403692 comment
> 5?

Anything I understand (understood!) about this problem is documented in the comments or the implementation bugs. Given that the argument here is that the comments are wrong, I don't see what I could sensibly add.

Bug 1209987, specifically comment 6 and comment 20 discuss the different implementation strategies that were considered in order to get the thread dance right (or I should say "not obviously wrong"), and comment 20 deals with the issue that lead to this bug.

(In reply to Andrew Sutherland [:asuth] from comment #0)
> However, I don't think the claim holds up.  Specifically,
> ThreadEventTarget::Dispatch explicitly allows dispatch to the main-thread
> after "xpcom-shutdown-threads"[1]. 
> 1:
> https://searchfox.org/mozilla-central/rev/
> b53e29293c9e9a2905f4849f4e3c415e2013f0cb/xpcom/threads/ThreadEventTarget.
> cpp#135

Maybe I'm still a bit drowsy in the early morning but isn't that code doing the exact opposite as what you say?

if (gXPCOMThreadsShutDown && !mIsMainThread) {
    return NS_ERROR_ILLEGAL_DURING_SHUTDOWN;
}
Flags: needinfo?(gpascutto)
Correction about nsAsyncShutdown's crash timeout duration.  I got tricked by a diff, currently the crash delay is 1 minute as controlled by the "toolkit.asyncshutdown.crash_timeout" (unit=ms) pref, see https://searchfox.org/mozilla-central/search?q=toolkit.asyncshutdown.crash_timeout.

(In reply to Gian-Carlo Pascutto [:gcp] from comment #4)
> Bug 1209987, specifically comment 6 and comment 20 discuss the different
> implementation strategies that were considered in order to get the thread
> dance right (or I should say "not obviously wrong"), and comment 20 deals
> with the issue that lead to this bug.

Thank you for the context; I love comments like https://bugzilla.mozilla.org/show_bug.cgi?id=1209987#c20 that explain the design rationale!

Based on the discussion there, it seems feasible to move the shutdown observer from "xpcom-shutdown" to an nsIAsyncShutdown "profile-before-change" blocker.  This provides a strong guarantee that all threads should still be available for the duration of the shutdown.  I suggest "profile-before-change" because this is when many other things shutdown, and content should be dead/mooted by this point, so it's not clear what benefit there is to keeping WebRTC alive beyond that point.

The obvious disclaimer is that the implication of this bug is that WebRTC shutdown may sometimes take a very long time.  The upside to moving to async shutdown is that it allows us to further parallelize the shutdown process, but if a single WebRTC shutdown can take >1 minute, there's still a problem.  It's worth noting that in bug 1403692 I did notice multiple "VideoCaptureThread" instances in at least some cases, so perhaps it's possible that WebRTC shutdown itself could benefit from the parallelization.  However, it still might be necessary to further enhance shutdown, depending on why the video capture thread shutdown is taking so long.  (Maybe resources should just be leaked at some point, relying on process termination to free them?)
 
> Maybe I'm still a bit drowsy in the early morning but isn't that code doing
> the exact opposite as what you say?
> 
> if (gXPCOMThreadsShutDown && !mIsMainThread) {
>     return NS_ERROR_ILLEGAL_DURING_SHUTDOWN;
> }

The Dispatch() method is invoked on the thread we want to run the runnable, so mIsMainThread==true for the main thread.  Following the completion of the "xpcom-shutdown-threads" observer notification when gXPCOMThreadsShutDown is set to true, this results in `if (true && !true)` which simplifies to `if (false)`, so the error will not be thrown and dispatch will succeed.

However, if an async shutdown blocker is used like proposed above, this ceases to be a problem because shutdown can't advance before "profile-before-change" until the blocker is released.
(In reply to Andrew Sutherland [:asuth] from comment #5)
> shutdown can't advance before "profile-before-change"
s/before/beyond/
Who's submitting patches?

Jib could I ask you to take a look at creating a patch for this (because it's blocking the long standing bug 1403692)?

Flags: needinfo?(jib)

Tracking this since it seems to be the cause for a long standing top crash.

Assignee: nobody → jib

Putting it on my plate.

Flags: needinfo?(jib)

Any luck or can we defer this to 67?

Flags: needinfo?(jib)

67 is probably a good idea at this point. Sorry for sitting on this for so long.

Flags: needinfo?(jib)
Duplicate of this bug: 1403692
Crash Signature: [@ AsyncShutdownTimeout | profile-before-change | ServiceWorkerRegistrar: Flushing data]

(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #12)

67 is probably a good idea at this point. Sorry for sitting on this for so long.

Any news on this one?

Flags: needinfo?(jib)

Odd, this compiles locally. Will fix.

CamerasParent.h:73:1: error: Refcounted record 'CamerasParent' has multiple mRefCnt members
Flags: needinfo?(jib)
Pushed by jbruaroey@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5edef21b353c
Add a using statement to reduce word wrapping. r=dminor
https://hg.mozilla.org/integration/autoland/rev/5dbc2ba75388
Consolidate media::GetShutdownBarrier() use and return a RefPtr r=dminor
https://hg.mozilla.org/integration/autoland/rev/bd790e80a2f7
Add a CamerasParent shutdown blocker. r=dminor,asuth
Regressions: 1549383

The fix for this crash seems too heavy for me for an uplift in RC week and we won't have enough volume and time on Nightly to know if it is effective, marking as wontfix for 67.

Regressions: 1552755
Regressions: 1576335
You need to log in before you can comment on or make changes to this bug.