Closed Bug 1542181 Opened 7 months ago Closed 3 months ago

"AsyncShutdownTimeout | profile-change-teardown | Extension shutdown:" at state "Shutdown: Storage"

Categories

(WebExtensions :: General, defect, P1)

defect

Tracking

(firefox-esr60 unaffected, firefox-esr68 wontfix, firefox68 wontfix, firefox69 verified, firefox70 verified)

RESOLVED FIXED
mozilla70
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- wontfix
firefox68 --- wontfix
firefox69 --- verified
firefox70 --- verified

People

(Reporter: robwu, Assigned: rpl)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(3 files)

Crash reports for pre-release channels, at async shutdown timeout state "Shutdown: Storage":

These numbers may seem similar, but they are not. 66 has been around for much longer (a full release cycle with 13 releases), whereas 67 beta only had 7 releases in the past 2.5 weeks. The crashes happen on all desktop platforms, with the majority (96 - 98%) on Windows.

Flags: needinfo?(lgreco)
Assignee: nobody → lgreco
Flags: needinfo?(lgreco)
Priority: -- → P1

Changing to P2 because of no clear path forward.

Priority: P1 → P2
Priority: P2 → P1
Pushed by luca.greco@alcacoop.it:
https://hg.mozilla.org/integration/autoland/rev/fe7f6df2adb8
IndexedDB.jsm Transaction promiseCompleted should reject on a transaction abort event. r=kmag
https://hg.mozilla.org/integration/autoland/rev/108e6ade4cd8
Abort storage.local IDB data migration if the extension or app is shutting down. r=robwu,mixedpuppy
https://hg.mozilla.org/integration/autoland/rev/b7a9e6822e73
Mark newly installed extensions as migrated to the storage.local IDB backend. r=robwu,mixedpuppy

Can you please provide some steps to manually verify this issue? If no manual testing is needed please mark this bug as "qe-verify-"

Flags: needinfo?(lgreco)

(In reply to Madalin Cotetiu from comment #7)

Can you please provide some steps to manually verify this issue? If no manual testing is needed please mark this bug as "qe-verify-"

My apologies for time I needed to come back to this, I've been trying to think of some reasonable ways to be able to manually trigger this issue consistently.

The only scenario that we are sure about how to trigger consistently is related to a QuotaExceededError error raised while migrating existing data from the JSONFile to the IndexedDB backend , which is also tested in a newly added automated tests landed as part of Bug 1555491, and so I'm marking this as qe-verify-.

Flags: needinfo?(lgreco) → qe-verify-

Is this something we should consider uplifting to Beta for Fx69?

Flags: needinfo?(lgreco)

Is this something we should consider uplifting to Beta for Fx69?

Hi Ryan, my apologies for the lag time on this, I've been away on PTO until yesterday.

Ideally I would like to request an uplift to Fx69 for this, but first I'm going to take another look to the crash stats and telemetry related to the storage.local API data migration results (to see if there is already any visible effect of this change while it has been baking on nightly, or if I notice anything suspicious).

I'm leaving the needinfo flag assigned to me, and I'll come back to this asap to clear it (and likely create the uplift request at the same time).

Flags: needinfo?(lgreco)
Flags: needinfo?(lgreco)

Comment on attachment 9075782 [details]
Bug 1542181 - IndexedDB.jsm Transaction promiseCompleted should reject on a transaction abort event. r?kmag!

Beta/Release Uplift Approval Request

  • User impact if declined: If an IndexedDB transaction (related to a storage.local API call, or a storage.local data migration) is aborted, the related promise will get stuck (never get rejected or resolved).
    The changes applied could also help on reducing the number of async shutdown timeouts crashes related to the storage.local data migrations.

The other two patches do also aim to reduce the async shutdown timeouts related to storage.local data migrations by:

  • abort an ongoing storage.local data migration is the application (or the extension) is already in the process of shutting down
  • avoid to look for existing storage.local data (as part of a storage.local data migration) for newly installed extensions
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: It would be reasonable to still test that a built in and a third party extension are able to successfully using the storage.local API on the main platforms and that it is possible for a user with general IndexedDB issues to still fallback to the JSONFile backend by manually flipping the related about:config pref.

STR for the scenario "users without any general IndexedDB issue":

  • create a new Firefox profile
  • open a new tab and trigger the screenshot extension
  • check that no unexpected IndexedDB (or ExtensionStorageIDB) errors have been logged in the Browser Console
  • install AdBlock Plus and configure it from its options page (e.g. toggle "Block additional tracking" and disable "Allow Acceptable Ads")
  • restart the Firefox instance and check that the AdBlock plus configurations have been persisted

STR for the scenario "users with general IndexedDB issues":

  • create a new Firefox profile
  • create some unexpected files in the "<PROFILE>/storage/default" directory
  • open the Browser Console
  • install AdBlock Plus and configure it
  • check that QuotaManagerService / IndexedDB errors have been logged in the Browser Console
Quota Something (<NAME_OF_THE_UNEXPECTED_FILE_CREATED>) in the directory that doesn't belong!: ActorsParent.cpp:4114
IndexedDB UnknownError: ActorsParent.cpp:581
  • open about:config in a new tab
  • turn the "ExtensionStorageIDB.migrated.{d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d}" preference to false
  • restart AdBlock Plus (e.g. by disabling and re-enabling it) and configure the extension from its options page
  • restart the Firefox instance
  • check that the expected AdBlock Plus options have been persisted
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): The risk of this change shouldn't be high, as it has been backing on nightly for a while, there haven't been new bug reports related to the storage.local API and/or the related data migration (and based on the telemetry events collected the DataMigrationAborted error introduced by this change has been actually triggered), the change is also covered by new test cases.

I'm currently setting the change risk to "medium" mainly because of one of the patches:

  • "Bug 1542181 - Mark newly installed extensions as migrated to the storage.local IDB backend"

which avoids the data migrations for newly installed extensions, but as a side-effect it would prevent users that have still general issues with IndexedDB (in particular the one related to Bug 944918) to fall back automatically to the JSONFile backend.
The affected user would still be able to force the storage.local backend fallback manually by setting the "ExtensionStorageIDB.migrated.<EXT_ID>" to false and restarting the browser (or reloading the extension).

As an alternative to reduce the change risk to low, we could uplift only 2 of the 3 patches (skipping "Bug 1542181 - Mark newly installed extensions as migrated to the storage.local IDB backend" from the uplifted patches).

  • String changes made/needed:
Flags: needinfo?(lgreco)
Attachment #9075782 - Flags: approval-mozilla-beta?
Attachment #9074213 - Flags: approval-mozilla-beta?
Attachment #9074214 - Flags: approval-mozilla-beta?
Flags: qe-verify- → qe-verify+
QA Whiteboard: [qa-triaged]

Hey :ttung,
I've been giving another look to the crash reports related to this issue and I noticed that the "QuotaManager IO" thread seems to be busy on mozilla::dom::quota::QuotaManager::EnsureOriginIsInitialized (in some of the stack traces it seems to be still iterating through the files on disk while initializing the repository).

The part of the stack trace common to the crash reports (at least the ones that I checked) looks like this:

  • nsresult mozilla::dom::quota::QuotaManager::InitializeRepository(mozilla::dom::quota::PersistenceType)
  • mozilla::dom::quota::QuotaManager::EnsureTemporaryStorageIsInitialized()
  • mozilla::dom::quota::QuotaManager::EnsureOriginIsInitializedInternal(mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, nsTSubstring<char> const&, nsIFile**, bool*)
  • mozilla::dom::quota::QuotaManager::EnsureOriginIsInitialized(mozilla::dom::quota::PersistenceType, nsTSubstring<char> const&, nsTSubstring<char> const&, nsTSubstring<char> const&, nsIFile**)
  • nsresult mozilla::dom::indexedDB::`anonymous namespace'::OpenDatabaseOp::DoDatabaseWork()
  • nsresult mozilla::dom::indexedDB::`anonymous namespace'::FactoryOp::Run()
  • nsThread::ProcessNextEvent(bool, bool*)
  • ...

I guess that the QuotaManager initialization is being triggered by one of the IndexedDB API call part of the extension data migration (e.g. this ones), unfortunately none of the changes we already landed is going to be able to abort the data migration while we are already waiting for an IndexedDB call to resolve.

Do you think there is something we could do on the QuotaManager (and/or IndexedDB side) to abort the ongoing work if Firefox is already in the process of shutting down?

Follows links to some of the crash reports I've been looking:

Flags: needinfo?(shes050117)

(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #12)

Do you think there is something we could do on the QuotaManager (and/or IndexedDB side) to abort the ongoing work if Firefox is already in the process of shutting down?

Hey Luca, thanks for the detialed information!

We found that some of users seem to take more than 50 seconds to initialze their profiles, so we have had a way to abort the initlaization. QM can abort the initialization while it receives a shutdown signal in https://bugzilla.mozilla.org/show_bug.cgi?id=1556730. It's "profile-before-change-qm" [1-2].

The summary of this bug says "profile-change-teardown" (also shows on the crash reports). And, it seems that this topic is fired before the topic for shutting down QuotaManager? [3] If it's, maybe we have to shutdown quotamanager eariler.

Andrew, and Jan do you have idea on this? Thanks in advance!

[1] https://searchfox.org/mozilla-central/rev/9775cca0a10a9b5c5f4e15c8f7b3eff5bf91bbd0/dom/quota/ActorsParent.cpp#110
[2] https://searchfox.org/mozilla-central/rev/9775cca0a10a9b5c5f4e15c8f7b3eff5bf91bbd0/dom/quota/ActorsParent.cpp#4089-4092
[3] https://searchfox.org/mozilla-central/source/toolkit/components/places/Shutdown.h#21

Flags: needinfo?(shes050117)
Flags: needinfo?(jvarga)
Flags: needinfo?(bugmail)

I couldn't reproduce the steps from "users without any general IndexedDB issue" on Firefox Nightly 68.0a1 (2019-05-01), the same results are also on the latest Firefox Nightly. No unexpected IndexedDB (or ExtensionStorageIDB) errors have been logged in the Browser Console and AdBlock plus configurations have been persisted.

For "users without any general IndexedDB issue": I couldn't find any differences here either between Firefox Nightly 68.0a1 (2019-05-01) and the latest Firefox Nightly. I couldn't find QuotaManagerService / IndexedDB errors in the browser console but AdBlock plus configurations have been persisted at the end.

I might have a couple of question that could help:

  1. Do you refer to "take a screenshot" when you said to trigger the screenshot extension?
  2. What kind of unexpected file should I create in "<PROFILE>/storage/default" directory?

Thanks.

Flags: needinfo?(lgreco)

(In reply to Hani Yacoub from comment #14)

I couldn't reproduce the steps from "users without any general IndexedDB issue" on Firefox Nightly 68.0a1 (2019-05-01), the same results are also on the latest Firefox Nightly. No unexpected IndexedDB (or ExtensionStorageIDB) errors have been logged in the Browser Console and AdBlock plus configurations have been persisted.

For "users without any general IndexedDB issue": I couldn't find any differences here either between Firefox Nightly 68.0a1 (2019-05-01) and the latest Firefox Nightly. I couldn't find QuotaManagerService / IndexedDB errors in the browser console but AdBlock plus configurations have been persisted at the end.

That's good, the "user without any general IndexedDB issue" STR had only the purpose of double-checking that the storage API still works as expected with and without these changes.

I might have a couple of question that could help:

  1. Do you refer to "take a screenshot" when you said to trigger the screenshot extension?

yes

  1. What kind of unexpected file should I create in "<PROFILE>/storage/default" directory?

Any unexpected file that is not already ignores should work (from what I recall the file that starts with a dot char are currently ignored, everything else should still trigger the issue), I usually just create a "THIS_FILE_IS_BREAKING_STORAGE" empty file inside that directory.

Flags: needinfo?(lgreco)

(In reply to Tom Tung [:tt, :ttung] from comment #13)

The summary of this bug says "profile-change-teardown" (also shows on the crash reports). And, it seems that this topic is fired before the topic for shutting down QuotaManager? [3] If it's, maybe we have to shutdown quotamanager eariler.

QuotaManager needs to remain operational through its current "profile-before-change-qm" phase[1]. However, I think it's reasonable for us to require that any callers that want to use QuotaManager storage beyond the "quit-application-granted" phase (https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Observer_Notifications#Application_shutdown) explicitly provide the nsIAsyncShutdownClient that represents the phase that is being blocked and the nsIAsyncShutdownBlocker that has been added as a blocker to the client.

Any DirectoryLock/IDB transaction/etc. that doesn't have an associated client/blocker will be terminated when the "quit-application-granted" phase starts with an error-code indicating shutdown. Any lock/transaction that does have a client/blocker will be terminated when that phase starts. Any request to open a new directorylock/etc. whose phase has already started will immediately fail with an error-code indicating shutdown. Some level of being able to dynamically update the associated phases may potentially be necessary. For example, the IndexedDB commit() method that was recently added to the spec (https://w3c.github.io/IndexedDB/#dom-idbtransaction-commit) is intended to be a durable commit, and it might make sense to allow the commit some extra leeway to commit as long as the commit is likely to conclude in a timely fashion. (That is, if the amount of data is reasonable and we've already opened the database, etc.)

So in the case of extension data migration, given that it already bails if nsIAppStartup.shuttingDown is true and that flag gets set just prior to triggering "quit-application-granted", the extension code wouldn't need to change at all because our new default behavior would handle this. It's likely that this is also the right decision for most other consumers as well, especially those that haven't thought about shutdown in much depth.

Note that the proposed choice of quit-application-granted may be a little bit too aggressive in the case of content principals. We will need to audit shutdown[2] to make sure that all of the window globals in question will have been reliably terminated, with DOMEventTargetHelper::DisconnectFromOwner() having been invoked. Our explicit goal would be that the global needs to not be alive to process any error message that result from the automatic termination of pending QuotaManager client requests. It's possible that this might mean working with other teams to move up when window globals are torn down, etc.

1: In particular, QuotaManager exists as an important system primitive and if we want to see it widely and consistently used, it needs to be usable until late in the shutdown process. That said, as the rest of the above text suggests, we do want to encourage consumers of QuotaManager to shutdown earlier.
2: My shutdown notes are at https://github.com/asutherland/asuth-gecko-notes/blob/master/ipc/SHUTDOWN.md but haven't been updated for a while at this moment and were never 100% thorough.

Flags: needinfo?(bugmail)

Comment on attachment 9074213 [details]
Bug 1542181 - Abort storage.local IDB data migration if the extension or app is shutting down.

Fixes a potential source of shutdown hangs. Let's see how it goes on Beta. Approved for 69.0b13.

Attachment #9074213 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9074214 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9075782 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #16)
Thanks for clarifying and the explanation! Andrew has already answered and proposed a better solution, so I'm going to remove Jan's ni.

I was worried about the case where a profile needs to take more than 50 seconds to be initialized its storage. If the caller comes from a website, then it would be taken care of when the "profile-before-change-qm" is fired.

If the caller comes from the extension (ExtensionStorageIDB), now is taken care of by the patches in this bug. (And the operation of initialization will be aborted in "profile-before-change-qm")

For the future, maybe handling the abort/decline the incoming request while shutdown on QuotaManager for internal request or extension is needed if it's possible. Handle that in "quit-application-granted" sounds good to me, but I will check that on our team meeting or arrange a meeting for more discussion. "A profile needs to take more than 50 seconds to init storage" would still be a problem in this case, but it might soonly be taken care of by Jan's work on QM init speedup.

Flags: needinfo?(jvarga)

It appears kinto.js had ported or independently re-created the missing abort handling. I've filed https://github.com/Kinto/kinto.js/issues/1043 for that since I think the code in question only exists in mozilla-central as a compiled file at https://searchfox.org/mozilla-central/source/services/common/kinto-offline-client.js.

See Also: → 1573828

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0
Build ID: 20190814094216

Verified as fixed on the latest Nightly 70.0a1 and Firefox 69 beta 13. Verified by using the STR from Comment 11 and the additional indications from Comment 15. Tested on Windows 10x64, Ubuntu 18.04 x64 bit and Mac OS X 10.14.

You need to log in before you can comment on or make changes to this bug.