Closed Bug 1403692 Opened 7 years ago Closed 5 years ago

ShutdownHang with CamerasParent::StopVideoCapture() blocking PBackground thread manifests as [Crash in AsyncShutdownTimeout | profile-before-change | ServiceWorkerRegistrar: Flushing data]

Categories

(Core :: DOM: Service Workers, defect, P2)

57 Branch
defect

Tracking

()

RESOLVED DUPLICATE of bug 1407415
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 - wontfix
firefox67 --- wontfix
firefox68 --- fixed

People

(Reporter: philipp, Assigned: asuth)

References

Details

(Keywords: crash, regression)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-931d961e-72eb-46b6-90a4-809dd0170926.
=============================================================
###!!! ABORT: file /builds/worker/workspace/build/src/dom/workers/ServiceWorkerRegistrar.cpp, line 1027

these async shutdown timeout reports are starting to show up in firefox 57 across platforms. i haven't spotted the signature in 58 though...
I don't think this is caused by bug 1399646. I was already running into ServiceWorkerRegistrar shutdown issues before that bug, which is why I made those changes. Bug 1399646 just changes the failure mode and gives us better diagnostics.
My interpretation of comment 1 is that this is a signature change, instead of a new regression. However, it's still good to see if we know what happened according to the information newly added in bug 1399646. Any thoughts, Ben?
Flags: needinfo?(bkelly)
Andrew, have you looked at this code recently?
Flags: needinfo?(bkelly) → needinfo?(bugmail)
(In reply to Ben Kelly [:bkelly] from comment #3)
> Andrew, have you looked at this code recently?

Yes, in bug 1402215 and previously I've ended up delving into AsyncShutdown a bit.  Taking, leaving needinfo for investigation response.
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Priority: -- → P2
It looks like the problem is a race involving content-process PBackground channel teardown and normal browser shutdown, with mozilla::camera::CamerasParent::StopVideoCapture() wedging the PBackground thread by blocking on a monitor.

Every report I've randomly sampled shows that the "IPDL Background" thread is stuck in `mozilla::camera::CamerasParent::StopVideoCapture()` on its mThreadMonitor.Wait() call.  This monitor in turn seems to be waiting on a VideoCaptureThread to shutdown.  This shutdown is triggered by PBackgroundParent::DestroySubtree() via MessageChannel::OnNotifyMaybeChannelError() thanks to someone/something calling MessageChannel::PostErrorNotifyTask().  Unfortunately because of the IsOnCxxStack() re-entrancy guard that uses a 10ms delay, we miss out on seeing some context on the source of the error.

We do, however, know that this isn't PBackground self-shutdown because PBackground only shuts itself down on "xpcom-shutdown-threads", which cannot have happened yet, because we're still in "profile-before-change".  Also, under e10s, the PCameras request would have come from a content process, not the main process.

So basically what happens is:
- ServiceWorkerRegistrar::ProfileStopped() gets invoked.
- We end up in the `!!child` path and we invoke BackgroundChild*::SendShutdownServiceWorkerRegistrar.
- BackgroundParentImpl::RecvShutdownServiceWorkerRegistrar never gets a chance to run because PBackground is janked hard by capture.

I'm not sure there's a clean mitigation that ServiceWorkerRegistrar can perform.  I'm going to file a new bug about CamerasParent::StopVideoCapture() blocking PBackground for content-child termination that blocks this one.  The method's comments hand-wave the monitor since the single code-path is used for both xpcom-shutdown and content child PBackground shutdown.  It also posits how "there will be no more possibility to get to the main thread", but I don't think that checks out given that you can always dispatch to the main thread because of deadlock concerns.  (The various thread methods are all like "oh, this is the main thread? yeah, that's cool.")
Summary: Crash in AsyncShutdownTimeout | profile-before-change | ServiceWorkerRegistrar: Flushing data → ShutdownHang with CamerasParent::StopVideoCapture() blocking PBackground thread manifests as [Crash in AsyncShutdownTimeout | profile-before-change | ServiceWorkerRegistrar: Flushing data
Flags: needinfo?(bugmail)
Summary: ShutdownHang with CamerasParent::StopVideoCapture() blocking PBackground thread manifests as [Crash in AsyncShutdownTimeout | profile-before-change | ServiceWorkerRegistrar: Flushing data → ShutdownHang with CamerasParent::StopVideoCapture() blocking PBackground thread manifests as [Crash in AsyncShutdownTimeout | profile-before-change | ServiceWorkerRegistrar: Flushing data]
Depends on: 1407415
I can hit shutdownhang easily (near 100% reproduced rate) by opening many Indexeddb database, when I debug Bug 1411908 and Bug 1427238.

STR:
1. Open a tab; Go to weather.com
2. Load another web page which opens a lot of indexeddb databases. Like: https://shawnjohnjr.github.io/persist-demo/indexedbtest.html
3. Close browser.
4. Wait for shutdownhang.
I have been hitting that for a few days (roughly a week ?) on each shutdown to apply updates.
Too late for a fix in 63. We can still take a patch in 65, and potentially in 64 beta.
This doesn't look as high volume in 64 release as it was in 63. 
It's still happening in 65 beta, though - I was able to reproduce it with the steps from comment 6: https://crash-stats.mozilla.com/report/index/b76b1322-4333-415d-8651-5fa8a0181220

Tracking this in bug 1407415 instead. Maybe we could dupe this to that bug.

Andrew, are you actively working on this for 67?

Flags: needinfo?(bugmail)

No, just going to dupe this over to bug 1407415 as proposed in comment 10.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: needinfo?(bugmail)
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.