Closed Bug 1487194 Opened 6 years ago Closed 5 years ago

Crash in shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe

Categories

(Core :: Storage: Quota Manager, defect, P1)

defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 - wontfix

People

(Reporter: marcia, Assigned: janv)

References

Details

(Keywords: crash, regression, Whiteboard: DWS_NEXT)

Crash Data

This bug was filed from the Socorro interface and is
report bp-e8b77a57-9fb8-4c81-9f8b-7222d0180829.
=============================================================

I reprocessed a few crashes after we got symbols back (See Bug 1483909), and it appears the first 2 I reprocessed are signatures that are still happening even after the fix for Bug 1411908. 

Top 10 frames of crashing thread:

0 ntdll.dll NtWaitForAlertByThreadId 
1 ntdll.dll RtlSleepConditionVariableSRW 
2 kernelbase.dll SleepConditionVariableSRW 
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:58
4 xul.dll mozilla::ThreadEventQueue<mozilla::PrioritizedEventQueue<mozilla::EventQueue> >::GetEvent xpcom/threads/ThreadEventQueue.cpp:168
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1099
6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:519
7 xul.dll static bool mozilla::SpinEventLoopUntil<mozilla::ProcessFailureBehavior::ReportToCaller, `lambda at z:/build/build/src/dom/quota/ActorsParent.cpp:2862:3'> xpcom/threads/nsThreadUtils.h:324
8 xul.dll mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe dom/quota/ActorsParent.cpp:2862
9 xul.dll nsObserverService::NotifyObservers xpcom/ds/nsObserverService.cpp:295

=============================================================
Flags: needinfo?(jvarga)
Priority: -- → P1
Adding a signature that was present when we didn't have symbols.
Crash Signature: [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] → [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | ntdll.dll@0x9221a]
Adding the additional signatures carried over from the other bug that are still happening. Note that [@ shutdownhang | static bool mozilla::SpinEventLoopUntil<T>] signature did go away, so that was not added to this bug.

Most of the signatures are Win only, with the exception of [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] which is cross platform.
Crash Signature: [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | ntdll.dll@0x9221a] → [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | static bool mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang…
OS: Windows 10 → All
Hardware: Unspecified → All
QA Whiteboard: DWS_NEXT
QA Whiteboard: DWS_NEXT
Whiteboard: DWS_NEXT
Note to myself, fixing bug 1252409 may help here.
Assignee: nobody → jvarga
Implementation of AbortOperationsForProcess by all quota clients may help too.
Priority: P1 → P2
Flags: needinfo?(jvarga)
This is currently #6 overall top crash on Nightly 65.
I'll take a look again after I fix other critical bugs.
Flags: needinfo?(jvarga)
Won't fixing for 63/64 per the triage team.
I've been hanging and shutdown crashing on my Windows machine with this signature for a few weeks now, which is making my Nightly restarts rather painful.

Can I assume, based on the previous conversation, that this is not something that is affecting most other Nightly users? If so, is there something I can do to my profile to avoid this shutdown hang?
I don't know, maybe you have "Clear history when Firefox closes" turned on, so it tries to clear everything which is managed by quota manager before shutdown.

:ytausky may look into this bug to find more data
> maybe you have "Clear history when Firefox closes" turned on

I don't have this set - I keep my history from session to session.
See Also: → 1517482

(In reply to Mike Conley (:mconley) (:⚙️) from comment #9)

I've been hanging and shutdown crashing on my Windows machine with this
signature for a few weeks now, which is making my Nightly restarts rather
painful.

Can I assume, based on the previous conversation, that this is not something
that is affecting most other Nightly users? If so, is there something I can
do to my profile to avoid this shutdown hang?

Tom, you've been looking at these sorts of issues - can you take a look here, too? Thankfully Mike is able to let us run any diagnostics on his profile if necessary.

Flags: needinfo?(jvarga) → needinfo?(shes050117)

(In reply to Andrew Overholt [:overholt] from comment #12)

Tom, you've been looking at these sorts of issues - can you take a look here, too? Thankfully Mike is able to let us run any diagnostics on his profile if necessary.

Sure, I am going to check if our hypothesis of this issue is correct or there might be another issue involving.


Mike, before moving a bit forward, I want to get more information from your profile. Could you test your profile by running [1]? Thanks! This is a website to test whether your QuotaManager related storage APIs work or not. If you see "Good: Totally Working. (fullyOperational)" for all the APIs, then it means your QuotaManager can complete the storage initialization.

If your result of that is positive, could you tell me how big is your QM_REPOSITORIES_INITIALIZATION_TIME on "about:telemetry" page? It's a probe to check how many time does your profile take to be initialized by the QuotaManager.
Note, it seems that this probe only can be shown on about:telemetry for a short while, so if you cannot find it on that page, I guess you probably need to restart the Firefox and initializing your profile by [1] and see whether the result shows up.

[1] https://firefox-storage-test.glitch.me/

Flags: needinfo?(shes050117) → needinfo?(mconley)

Hi ttung,

I'm afraid this stopped reproducing a few days ago. I no longer crash on shutdown! That's good for me, but less good I think for diagnosing this bug. :/

Just in case it's still useful, I ran https://firefox-storage-test.glitch.me/, everything came out fullyOperational, and QM_REPOSITORIES_INITIALIZATION_TIME was 824.

Flags: needinfo?(mconley)
Crash Signature: shutdownhang | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] [@ shutdownhang | ntdll.dll@0x9221a] → shutdownhang | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] [@ shutdownhang | ntdll.dll@0x9221a] [@ shutdownhang | ntdll.dll | kernel32.dll | mozilla::SpinEventLoopUnti…

(In reply to Mike Conley (:mconley) (:⚙️) from comment #14)

I'm afraid this stopped reproducing a few days ago. I no longer crash on shutdown! That's good for me, but less good I think for diagnosing this bug. :/

That's true, but it's a good new that it's recoverable

Just in case it's still useful, I ran https://firefox-storage-test.glitch.me/, everything came out fullyOperational, and QM_REPOSITORIES_INITIALIZATION_TIME was 824.

Have you tried to clean up your profile/storage after hitting this issue? Also, do you happen to keep the crash id?

From the crash reports, there seem to have more than one symptoms. It would be useful to know that even having an initialization time around 800ms, users might still run into this crash.

(In reply to Tom Tung [:tt, :ttung] from comment #16)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #14)
Oops, I forgot to say: Thanks for your information! :/

Crash Signature: mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] → mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | ntdll.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe]

After checking 10 recent Nightly reports, I found:

Some of their QuotaManager IO threads are initializing: [1] [4] [5] [6] [7]
All their PBackground threads (IPDL Background) is waiting in IndexedDB's SpinEventLoopUntil() [11] [12]

For the five initializing cases ([1] [4] [5] [6] [7]), reducing the frequency for failure initialization might help because their bottleneck seems to be QuotaManager IO thread. (SpinEventLoopUntil in indexedDB is waiting for gFactory and gLiveDatabaseHashTable empty which should be cleaned after finishing the operation for opening a database)

However, if some of them have unbroken profiles, I guess having a mechanism to abort the initialization should be considered.

I'm thinking whether can we extend the timeout for the watchdog. Right now, we have 30s to shut down and [13] shows that the 95th shortest initialization time is in the 20s~25s for Release users. Also, the initialization time of my current daily used profile is around 8s and it increases to 15s after turning on the pref for the local storage. I guess the current timeout might be too short for users who have huge profiles.

[1] https://crash-stats.mozilla.com/report/index/3d12cd4e-69db-4f78-940f-d00c00190121#allthreads
[2] https://crash-stats.mozilla.com/report/index/af08cb07-f9d0-43ac-aac6-fdf310190120#allthreads
[3] https://crash-stats.mozilla.com/report/index/6d05e628-1541-45b8-92cd-90fab0190120#allthreads
[4] https://crash-stats.mozilla.com/report/index/180b561e-60c3-4ab4-ae0f-219ea0190120#allthreads
[5] https://crash-stats.mozilla.com/report/index/ae7fe052-1029-47e4-9407-2c6a20190120#allthreads
[6] https://crash-stats.mozilla.com/report/index/2a493c20-97f4-42bb-899d-027ec0190120#allthreads
[7] https://crash-stats.mozilla.com/report/index/52d685f0-d430-47fd-a7dc-ab7140190120#allthreads
[8] https://crash-stats.mozilla.com/report/index/83f336c8-8822-43f0-87bd-ebef30190120#allthreads
[9] https://crash-stats.mozilla.com/report/index/7315a730-32f0-4ac9-8913-07bf40190120#allthreads
[10] https://crash-stats.mozilla.com/report/index/31a0a111-b18a-4ba4-905f-88e6d0190120#allthreads
[11] https://searchfox.org/mozilla-central/rev/330daedbeac2bba296d663668e0e0cf248bc6823/dom/quota/ActorsParent.cpp#3234
[12] https://searchfox.org/mozilla-central/rev/78e8de968efd77ef3b30ea081bce07614cd6776b/dom/indexedDB/ActorsParent.cpp#15779
[13] https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2019-01-20&include_spill=0&keys=__none__!__none__!__none__&max_channel_version=nightly%252F66&measure=QM_REPOSITORIES_INITIALIZATION_TIME&min_channel_version=nightly%252F63&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2018-12-10&table=0&trim=1&use_submission_date=0

(In reply to Mike Conley (:mconley) (:⚙️) from comment #19)
Thank you for the reply and for providing more information! They do help me to confirm my suspicions.

(In reply to Tom Tung [:tt, :ttung] from comment #16)

Have you tried to clean up your profile/storage after hitting this issue? Also, do you happen to keep the crash id?

I don't recall ever cleaning up my profile or storage between the times where I was hitting this, and when I stopped hitting it.

Considering the initialization time is quite small (824ms v.s. 30s shutdown timer), we should be aware of that an unbroken profile with a small amount of initialization time can still run into this crash.

Here are some crash IDs:

https://crash-stats.mozilla.org/report/index/1a8ad063-5470-4095-a1b4-b9ea30190108
https://crash-stats.mozilla.org/report/index/1941409f-b07a-433b-bd0f-3b9870181221
https://crash-stats.mozilla.org/report/index/74367742-1986-4c71-a366-5f78e0181220
https://crash-stats.mozilla.org/report/index/22ac0311-2561-46bf-8b0f-36d980181217
https://crash-stats.mozilla.org/report/index/89da0b7c-d7ac-4ea5-a7f4-509ff0181212
https://crash-stats.mozilla.org/report/index/76131ce8-d330-4302-a43e-8dd420181212
https://crash-stats.mozilla.org/report/index/bb735040-ff8d-4df8-8443-fc8e80181211

Let me know if you need more.

The reports show that even when the QuotaManager IO thread has nothing to do (I cannot distinguish if it's done the initialization), PBackground thread still gets stuck. It looks like having the same symptom with cases [2][3][8][9][10] in comment 18. We probably need to figure out the reason for this.

I'm completely going off on a hunch here, but could this be related to bug 1521869? If Observe() doesn't see the write to done when the queue is empty it could wait on the event queue forever.

Talking with Yaron.

I think it's probably more related to another SpinEventLoop() in IndexedDB [1].

He also mentioned that Bug 1499719 could be related to. I believe it could be at least one of the reasons for crashing because it's related to whether |gLiveDatabaseHashtable->Count()| becomes zero. And, it might be the reason why these crashing PBackground threads were waiting for.

[1] https://searchfox.org/mozilla-central/rev/78e8de968efd77ef3b30ea081bce07614cd6776b/dom/indexedDB/ActorsParent.cpp#15779

Adding 67 as affected.

FYI, this is the #2 overall top crash for 65 at the moment.

Flags: needinfo?(shes050117)

I'm going to reduce the initialization tries for certain failure cases in bug 1521541 and that might help to reduce the number of crashes. Also, I suspect the gLiveDatabaseHashtable still holds database pointer even after the pointer is a nullptr, so maybe we could iterate the elements of gLiveDatabaseHashtable during shutdown.

Flags: needinfo?(shes050117)

(In reply to Tom Tung [:tt, :ttung] from comment #25)

I'm going to reduce the initialization tries for certain failure cases in bug 1521541 and that might help to reduce the number of crashes. Also, I suspect the gLiveDatabaseHashtable still holds database pointer even after the pointer is a nullptr, so maybe

Note that for the cases that users shut down their Firefox in their first initialization, bug 1521541 won't be helpful.

Tracking since this is a top crash.

Tom's investigating here so I'm going to update the assignee. While he's making great progress, it's unlikely we'll have something to uplift to 66 so I'm going to mark it fix-optional for 66.

Assignee: jvarga → shes050117
See Also: → 1527503

Given we have multiple symptoms (at least two) based on crash reports and we are going to land something which might change the signature a bit, I don't think we can fix this on Firefox 66 completely. But, it's no doubt that we will try to mitigate the issue and fix it as soon as possible.

After meeting with Jan, I will try to catch the issue for gLiveDatabaseHashtable by a testing build first.

Depends on: 1530284

If there's anything we can uplift to mitigate this in 66, I can still take a patch.

Flags: needinfo?(shes050117)

(In reply to Liz Henry (:lizzard) (use needinfo) from comment #31)

If there's anything we can uplift to mitigate this in 66, I can still take a patch.

No, we have some patches for mitigating this, but they are on the review and it'd be safer to put them on the Nightly for a while after getting r+.

Flags: needinfo?(shes050117)
Crash Signature: mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | ntdll.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] → mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | ntdll.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | mozilla::SpinE…
Crash Signature: mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] → mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | ntdll.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe]
Crash Signature: mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | ntdll.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] → mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | ntdll.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | __psynch_cvwait | <name omitte…
Crash Signature: omitted> | mozilla::dom::quota::QuotaManager::Observer::Observe] → omitted> | mozilla::dom::quota::QuotaManager::Observer::Observe] [@ shutdownhang | ntdll.dll | kernelbase.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::Observer::Observe]

Tom, which bug are the patches in that you mention in comment 32? Are those ready to land?

Flags: needinfo?(shes050117)

Yes, I landed patches in bug 1423917 today and plan to land patches in bug 1521541 tomorrow. Will work on bug 1530284 (it seems to be related to this bug).

Flags: needinfo?(shes050117)

I'm doubting this is going to be upliftable to release but maybe it can make it into beta 67. It's a very high volume crash.

Priority: P2 → P1

We are going to add separate timers for IndexedDB and LocalStorage shutdown, bug 1540668 and 1539750.
It means the signature of the crash will change again, but it should be easier to identify such crashes in future and we should have a bit more information about conditions that caused a shutdown hang (using new crash annotations).

Ok, IndexedDB related shutdown crashes are now showing under bug 1541776.

LocalStorage (LSNG) related shutdown crashes are now showing under bug 1541928.

The reason for shutdown hang on indexedDB might be caught on bug 1543089 and localstroage (only if turning the pref on) on bug 1543209

Taking this back.

Assignee: shes050117 → jvarga
Status: NEW → ASSIGNED

Jan, have we made progress on this bug? Would uplifting bug 1543089 help reduce crashes on beta?

Flags: needinfo?(jvarga)

(In reply to Pascal Chevrel:pascalc from comment #42)

Jan, have we made progress on this bug? Would uplifting bug 1543089 help reduce crashes on beta?

No, that would just transform some shutdown hangs to normal (safe) crashes. That bug just catches possible problems.
A fix for that should be submitted to bug 1544870. However the volume is quite low, so it isn't a high priority.

Anyway, multiple bugs were fixed that could cause IndexedDB shutdown hangs: bug 1538619, bug 1539393, bug 1542541
I think first two had a negligible impact on IndexedDB shutdown hangs, but the last one noticeably reduced indexedDB shutdown hangs on nightly and beta (but we need to wait a few more days to say how much it reduced the hangs on beta).
IndexedDB shutdown hangs on Nightly are tracked in bug 1541776.
IndexedDB shutdown hangs on beta are best visible in bug 1542485.
Release and ESR are best visible in this bug.

Sorry, but we had to separate IndexedDB shutdown hangs from other shutdown hangs, otherwise we were not able to make a progress.

The next step is to add more debugging info, bug 1542478.

Flags: needinfo?(jvarga)

[Tracking Requested - why for this release]:
Looks unlikely for 67. Probably track for 68.

(For those playing along at home, this isn't a new shutdown hang in 67 or 68 but we're closer now to root cause fixes than we have been so that's :+1:.)

I don't think this needs tracking for 68. Yay for comment 45 though.

Crash Signature: mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | ntdll.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | mozilla::SpinE… → mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | ntdll.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe]
Crash Signature: shutdownhang | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] [@ shutdownhang | ntdll.dll@0x9221a] [@ shutdownhang | ntdll.dll | kernel32.dll | mozilla::SpinEventLoopUnti… → shutdownhang | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] [@ shutdownhang | ntdll.dll | kernel32.dll | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManag…
Crash Signature: [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | static bool mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang… → [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHan…

I know the crash in "mozilla::dom::workerinternals::RuntimeService::CrashIfHanging" is affected by this, but let's keep it separate.
Also, the "mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe" signature doesn't exist in 67 and following versions, so I'm going to WONTFIX this bug. Remaining work is tracked in bug 1542485 (and also bug 1541928 and bug 1541776).

Status: ASSIGNED → RESOLVED
Crash Signature: [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHan… → [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | mozilla::dom::quota::QuotaManager::ShutdownObserver::Observe] [@ shutdownhang | ntdll.dll | kernel32.dll | mozilla::SpinEv…
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.