Closed Bug 1517482 Opened 5 years ago Closed 5 years ago

Crash in shutdownhang | ntdll.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe

Categories

(Core :: Storage: Quota Manager, defect)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 1487194

People

(Reporter: marcia, Unassigned)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

[Tracking Requested - why for this release]: Need to keep on eye on this for 65.

This bug was filed from the Socorro interface and is
report bp-7716ccb5-f882-415e-bec2-8afc30190102.
=============================================================

Seen while looking at crash stats: https://bit.ly/2F3oLHE. This crash has been around in earlier releases back to at least 63 but there has been a marked movement up the crash stats list - it is now #3 top browser crash on 64.

Some URLs in the URL list seem to be prefaced by wyciwyg
* wyciwyg://0/http://www.weatherzone.com.au/sa/adelaide/prospect ://0/http://www.weatherzone.com.au/sa/adelaide/prospect 

Sadly there are no correlations showing on release. We should probably try to figure out what caused this signature to increase in 64.

Top 10 frames of crashing thread:

0 ntdll.dll ntdll.dll@0x9e294 
1 ntdll.dll ntdll.dll@0x25f28 
2 kernelbase.dll SleepConditionVariableSRW 
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:58
4 xul.dll mozilla::ThreadEventQueue<mozilla::PrioritizedEventQueue<mozilla::EventQueue> >::GetEvent xpcom/threads/ThreadEventQueue.cpp:168
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1172
6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:530
7 xul.dll static bool mozilla::SpinEventLoopUntil<mozilla::ProcessFailureBehavior::ReportToCaller, `lambda at z:/build/build/src/dom/quota/ActorsParent.cpp:2884:3'> xpcom/threads/nsThreadUtils.h:347
8 xul.dll mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe dom/quota/ActorsParent.cpp:2884
9 xul.dll nsObserverService::NotifyObservers xpcom/ds/nsObserverService.cpp:295

=============================================================
Andrew, is this something under your purview these days?
Flags: needinfo?(bugmail)
Or Jan I'd wager :)
Flags: needinfo?(jvarga)
So the comment 0 crash signature is the main thread doing QuotaManager shutdown while the PBackground thread is QuotaManager waiting on the QuotaManager I/O thread to do an initialization sweep triggered by an attempt to open an IndexedDB database.

I wonder if this problem might be exacerbated by things like the newtab page and other chrome callers (remotesettings and friends) using IndexedDB more now, so if these are profiles with a busted QM profile, and since the I/O required to do an init sweep takes a while, it's possible for user actions related to initiating shutdown to end up in an I/O bound scenario that delays shutdown long enough to cause a problem.

In particular, I'd be concerned about chrome consumers just because content pages should be dead by this point in shutdown, but there are 2 DOM Workers in the parent process alive which are presumably chrome, plus there could be other JSM's also doing IDB stuff.

Which is mainly to say I think there are potentially 2 things to address here:
1. QM issues.  This is being addressed by https://wiki.mozilla.org/DOM/Workers-Storage/projects/WSIF which has a stack of patches actively in review, but we're also trying to take a conservative path here.  In the short term we might be able to attempt to mitigate this specific crasher by declaring that if QuotaManager hasn't been initialized by the time the "quit-application" observer notification is generated that QM should just insta-fail all requests to initialize.  Note that this would not prevent IDB database opens from happening in a healthy profile; those would continue to be allowed.  It's just in a broken profile where it's clear that the profile isn't magically going to get healthy during shutdown.
2. Maybe we need to make sure that we're cutting off attempts to open IndexedDB databases to chrome at an earlier stage than we are?  This crash is by definition beyond "profile-before-change", so there is a question of when the request was actually initiated.

Tom, thoughts on the first point?  (Also, the second point is certainly something to consider.)
Flags: needinfo?(shes050117)
Flags: needinfo?(jvarga)
Flags: needinfo?(bugmail)
(In reply to Andrew Sutherland [:asuth] from comment #3)
> So the comment 0 crash signature is the main thread doing QuotaManager
> shutdown while the PBackground thread is QuotaManager waiting on the
> QuotaManager I/O thread to do an initialization sweep triggered by an
> attempt to open an IndexedDB database.
> 
> I wonder if this problem might be exacerbated by things like the newtab page
> and other chrome callers (remotesettings and friends) using IndexedDB more
> now, so if these are profiles with a busted QM profile, and since the I/O
> required to do an init sweep takes a while, it's possible for user actions
> related to initiating shutdown to end up in an I/O bound scenario that
> delays shutdown long enough to cause a problem.
> 
> In particular, I'd be concerned about chrome consumers just because content
> pages should be dead by this point in shutdown, but there are 2 DOM Workers
> in the parent process alive which are presumably chrome, plus there could be
> other JSM's also doing IDB stuff.
> 
> Which is mainly to say I think there are potentially 2 things to address
> here:
> 1. QM issues.  This is being addressed by
> https://wiki.mozilla.org/DOM/Workers-Storage/projects/WSIF which has a stack
> of patches actively in review, but we're also trying to take a conservative
> path here.  In the short term we might be able to attempt to mitigate this
> specific crasher by declaring that if QuotaManager hasn't been initialized
> by the time the "quit-application" observer notification is generated that
> QM should just insta-fail all requests to initialize.  Note that this would
> not prevent IDB database opens from happening in a healthy profile; those
> would continue to be allowed.  It's just in a broken profile where it's
> clear that the profile isn't magically going to get healthy during shutdown.

This idea looks really good to me.

I also have a naive thought about initialization: Having milestones/checkpoints for sweeping so that we don't need to sweep or do the whole IOs for everything again, but I'm not sure whether it is really useful because it only helps for failure cases. Also, it may create some unexpected issues.

> 2. Maybe we need to make sure that we're cutting off attempts to open
> IndexedDB databases to chrome at an earlier stage than we are?  This crash
> is by definition beyond "profile-before-change", so there is a question of
> when the request was actually initiated.

Could that be during the sweeping? I checked the previous state before going to both Cache [1] and IndexedDB (OpenDatabase) [2]. Both of them check the cancel flag so that perhaps it's already earlier enough? However, it seems that we don't have a way to abort sweeping.

Also, not sure if it's possible to pass IsShuttingDown() in [3].

[1] https://searchfox.org/mozilla-central/rev/ecf61f8f3904549f5d65a8a511dbd7ea4fd1a51d/dom/cache/Context.cpp#241-244
[2] https://searchfox.org/mozilla-central/rev/ecf61f8f3904549f5d65a8a511dbd7ea4fd1a51d/dom/indexedDB/ActorsParent.cpp#18491-18495
[3] https://searchfox.org/mozilla-central/rev/ecf61f8f3904549f5d65a8a511dbd7ea4fd1a51d/dom/quota/ActorsParent.cpp#3841
Flags: needinfo?(shes050117)

these crashes are looking like a likely signature shift from bug 1487194

See Also: → 1487194

I'm going to mirror the 64 flag from bug 1487194 here.

Can we dupe this over to bug 1487194?

Flags: needinfo?(mozillamarcia.knous)

Duping to Bug 1487194 based on Comment 5 and triage discussion. Seems to be a signature change and that bug has an assignee.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(mozillamarcia.knous)
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.