Enable remote settings blocklist to ride the train after the beta experiment has finished successfully
Categories
(Toolkit :: Blocklist Implementation, task, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox72 | --- | fixed |
People
(Reporter: Gijs, Assigned: Gijs)
References
(Depends on 1 open bug)
Details
Attachments
(3 files)
We'll want to switch 72 nightly over and have it ride the train with that if there are no issues found as a result of the beta experiment.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Pushed by gijskruitbosch@gmail.com: https://hg.mozilla.org/integration/autoland/rev/6e3d285d14ee enable remote settings blocklist on nightly, r=mconley
Comment 3•5 years ago
|
||
Backed out changeset 6e3d285d14ee (bug 1594521) for toolkit/mozapps/extensions/test/browser/browser_CTP_plugins.js. CLOSED TREE
Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=276218423&repo=autoland&lineNumber=12295
Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&revision=6e3d285d14eeffbf8861d39c664b796bd9640322
Backout:
https://hg.mozilla.org/integration/autoland/rev/170b730233cce76d5fea91f337eecf38b0bfd1e3
Assignee | ||
Updated•5 years ago
|
Pushed by gijskruitbosch@gmail.com: https://hg.mozilla.org/integration/autoland/rev/dea221c2e93e enable remote settings blocklist on nightly, r=mconley
Comment 5•5 years ago
|
||
bugherder |
Comment 6•5 years ago
|
||
Backed out for xpcshell failures on test_isDebuggable.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/c1c84c2869be73727aa38e6220792aa83cb25e6a
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=276494898&repo=autoland&lineNumber=4862
Assignee | ||
Comment 7•5 years ago
•
|
||
I can reproduce this locally, though only if I run the entire manifest (toolkit/mozapps/extensions/test/xpcshell/xpcshell.ini
), which makes it really difficult to get any meaningful debug output for the specific test that's failing. The test is passing, and reaches the end of the test task cleanly, but somehow fails to exit and hangs on shutdown. I expect it's nothing to do with this specific test, but somehow remote settings and/or its indexeddb access does not play nicely with the xpcshell test infrastructure these tests use. Any clues as to where I should start debugging?
We're hoping to ship this with 72, and this is a bit of a nasty surprise and the only blocker to the otherwise straightforward pref flip.
Comment 8•5 years ago
|
||
My best guess is that there's an async shutdown blocker preventing shutdown from finishing. It might help to turn on AsyncShutdown logging. The following message in the log is also suspicious:
[task 2019-11-15T21:33:01.321Z] 21:33:01 INFO - PID 5331 | ###!!! [Parent][DispatchAsyncMessage] Error: PBackgroundIDBFactory::Msg_PBackgroundIDBFactoryRequestConstructor Value error: message was deserialized, but contained an illegal value
and makes me thing indexedDB is doing the blocking (possibly not even via AsyncShutdown, alas...)
Assignee | ||
Comment 9•5 years ago
|
||
It seems like there's some kind of deadlock? I don't know how to figure out more though. :-(
Assignee | ||
Comment 10•5 years ago
|
||
Finding the DOM worker being present in my stacks suspicious. Is it possible that we just need to terminate https://searchfox.org/mozilla-central/rev/b58e44b74ef2b4a44bdfb4140c2565ac852504be/services/settings/RemoteSettingsWorker.jsm#91-93 on xpcom-shutdown
(or perhaps even earlier?). It seems to work locally, but not sure if that's just a wallpaper...
Assignee | ||
Comment 11•5 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #8)
My best guess is that there's an async shutdown blocker preventing shutdown from finishing. It might help to turn on AsyncShutdown logging.
I tried this by just setting the pref in the test that's breaking, and that works when running that individual test that fails, but I get no output when running the entire manifest.
The following message in the log is also suspicious:
[task 2019-11-15T21:33:01.321Z] 21:33:01 INFO - PID 5331 | ###!!! [Parent][DispatchAsyncMessage] Error: PBackgroundIDBFactory::Msg_PBackgroundIDBFactoryRequestConstructor Value error: message was deserialized, but contained an illegal value
and makes me thing indexedDB is doing the blocking (possibly not even via AsyncShutdown, alas...)
Between the error and the main purpose of that worker being IDB access, maybe some of the indexeddb folks have ideas about what's happening here. (context: comment #6 and later; when running some extension tests, things hang on shutdown when using the new (indexeddb-via-remotesettings-based) blocklist backend).
Comment 12•5 years ago
|
||
I'll take a deeper look later, but re xpcom-shutdown
, IndexedDB is no longer usable once "profile-before-change" has concluded. QuotaManager and all clients get torn down during "profile-before-change-qm" which happens after "profile-before-change". It seems very believable that an IndexedDB request might hang if it's issued after QuotaManager begins shutdown.
From the stack trace, XPCOMInit.cpp:632 is https://searchfox.org/mozilla-central/rev/b58e44b74ef2b4a44bdfb4140c2565ac852504be/xpcom/build/XPCOMInit.cpp#632 is "xpcom-shutdown-threads" which is very late and it's indicating PBackground is failing to shutdown, which also seems reasonable to get broken by a late IndexedDB request. (The problem could also be related to bug 1542884?)
One idea I've been kicking around in this space is that consumers of APIs like ChromeWorkers or SQLite.jsm or other uses that really need to be aware of shutdown need to provide a ShutdownBlocker at creation time that indicates what phase they are expected to terminate in. We'd pick a default like "profile-before-change" if nothing is specified. And in that case we'd hard terminate the ChromeWorker and deny creation of any new ChromeWorkers once profile-before-change occurs.
Comment 14•5 years ago
|
||
Backout got merged: https://hg.mozilla.org/mozilla-central/rev/c1c84c2869be
Assignee | ||
Comment 15•5 years ago
|
||
OK. I've confirmed that when this happens it's because the remote settings worker sticks around (by finding a frame in which I could inspect the WorkerPrivate
instance on the DOM Worker
thread in xcode, and its mScriptURL
is the remote settings worker script).
What I'm not sure about is why. As far as I can tell nothing normally shuts down the worker - it just happens "naturally" at some point (?) when things work. Is that expected? I don't see other worker consumers (like the session store worker) do anything in particular to stop its worker.
If that is expected, then why isn't it shutting down here? As far as I can tell from the stacks in the attachment, all threads are stuck waiting for "something". And yes, in the case of the BackgroundImpl stuff, that's spinning an event loop inside xpcom-shutdown-threads
, but it looks like that's supposed to time things out after 10 seconds, which (apparently?) isn't happening here, for reasons I don't understand. Like, is it just because there's JS alive waiting for one of the IDB promises (pretty much everything in https://searchfox.org/mozilla-central/source/services/settings/RemoteSettingsWorker.js seems to be async IDB methods) and it's never resolving? Something else?
What's even less clear to me is how wallpaper-y it would be to make the remote settings module force-terminate()
its worker at profile-before-change (and refuse to start it once that's happened). If there's a fundamental issue with indexeddb and how it handles shutdown here, just nuking the worker from orbit, potentially mid-operation, would be Bad. But I don't really understand, nor do I see how I can find out, whether that's the case. I looked at the BackgroundImpl stuff a bit but I don't see how I can log something meaningful from it (like names of actors or something).
Ideas very welcome...
Comment 16•5 years ago
•
|
||
Oh, this may just be an instance of bug 1594572, then...
Assignee | ||
Comment 17•5 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #16)
Oh, this may just be an instance of bug 1594521, then...
You meant to copy a different bug number, I assume? :-)
Comment 18•5 years ago
|
||
(In reply to :Gijs (he/him) from comment #17)
(In reply to Kris Maglione [:kmag] from comment #16)
Oh, this may just be an instance of bug 1594521, then...
You meant to copy a different bug number, I assume? :-)
Bah. Bug 1594572.
Comment 19•5 years ago
•
|
||
Right, so I think there are a few things going on here:
- The worker is getting created late in the shutdown process and racing the worker's RuntimeService's compulsory shutdown of all workers. This is a bug in the RuntimeService, bug 1594572.
- Something is going wrong in IndexedDB/PBackground when IDB requests are issued against a dying PBackground. I've filed bug 1597312 on this.
- Either RemoteSettings or a RemoteSettings consumer is doing things well after shutdown has started, or not using shutdown blockers to avoid things it started before shutdown from bleeding into shutdown.
I'll focus on the 3rd point for the remainder of the comment here. I do want to emphasize that I acknowledge both the first two as important and serious bugs; and as situations that workers and IndexedDB have historically lucked out about because content consumers are destroyed earlier in cleanup. System JS, and in particular JSMs, present an interesting complication because their lifetime is basically forever. And ChromeWorkers as well, by default, at least until the runtime service terminates the worker. And all of this is because shutdown is super confusing and not really documented or owned in Firefox/Gecko.
(In reply to :Gijs (he/him) from comment #15)
What's even less clear to me is how wallpaper-y it would be to make the remote settings module force-
terminate()
its worker at profile-before-change (and refuse to start it once that's happened).
It's not wallpaper-y for a system service to be aware of system shutdown. When the runtime service forces workers to terminate, they are aborted without regard for what they are doing. Ideally the owner of the worker would use an appropriate shutdown blocker and:
- Block shutdown if the operation is important.
- Abort any outstanding transactions if they are not. (While IndexedDB will force-abort transactions at profile-before-change-qm, this is very late in the shutdown process, and allows a potentially huge amount of wasted work to happen which has its own overhead as any pending transactions are rolled back.)
Besides the improved shutdown performance, a major benefit for services that report telemetry need to be able to distinguish between unexpected errors and expected situations like shutdown. For example, Activity Stream's telemetry reports a tremendous number of TRANSACTION_FAILED errors that occur in its use of IndexedDB. Hopefully these are all happening at shutdown when QuotaManager shuts down at profile-before-change-qm
and aborts all in-progress transactions, but it's not clear.
It's my hope that we can fix both the workers and IDB bug in this cycle, but I would strongly encourage adding a shutdown blocker, especially if the remote settings API wants to have invariants about its storage being reliable in the face of browser shutdown.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 20•5 years ago
|
||
Depends on D52939
Comment 21•5 years ago
|
||
Pushed by gijskruitbosch@gmail.com: https://hg.mozilla.org/integration/autoland/rev/579519d2db34 enable remote settings blocklist on nightly, r=mconley https://hg.mozilla.org/integration/autoland/rev/7cfb4262cf1c avoid instantiating remote settings worker after shutdown, r=leplatrem,asuth
Comment 22•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/579519d2db34
https://hg.mozilla.org/mozilla-central/rev/7cfb4262cf1c
Description
•