abort() pending read transactions in remote settings' Database.jsm when we shutdown to enable faster shutdown
Categories
(Firefox :: Remote Settings Client, defect)
Tracking
()
People
(Reporter: Gijs, Assigned: Gijs)
References
(Blocks 3 open bugs)
Details
(Whiteboard: [fxperf:p1])
Attachments
(2 files)
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #6)
(In reply to :Gijs (he/him) from comment #3)
- It feels like read requests to indexeddb shouldn't be blocking shutdown (because performance), but I'm unclear on whether, per bug 1545438 comment 26, we need to do that at least for now to avoid a much longer hang (I think so.). :asuth, can you clarify whether I'm right in thinking we're better off blocking the completion of
profile-before-changeon those IDB requests completing for now?If you're not going to wait for the reads, it's appropriate to invoke abort() on any IDBTransaction instances that have already been created and then call close().
It seems like we could do this fairly easily in Database.jsm by making executeIDB take an option to indicate the transaction can be aborted, and then adding that option to callers that do read-only transactions. executeIDB can use that option to keep track of pending read transactions, and when the async shutdown promise getter trips we can call abort on all those transactions. Per the spec, it seems this should cause an abort event which executeIDB can listen for, and it can reject its promise with an error in that case.
After the changes in bug 1598924 doing so should automatically throw and stop processing whatever we were doing, and then call .close(). I think this should always be fine for reads on the RS databases, though we should perhaps audit the callsites of the read-only methods from the previous paragraph to ensure there aren't problematic consumer patterns where they do multiple writes interspersed with reads that aren't in the same transaction - of course, that would already be bad, because both the writes and reads can already be aborted/throw for other reasons.
| Assignee | ||
Updated•1 year ago
|
| Assignee | ||
Updated•1 year ago
|
| Assignee | ||
Comment 1•1 year ago
|
||
Pushed by gijskruitbosch@gmail.com: https://hg.mozilla.org/integration/autoland/rev/15a52d0bf116 abort pending read transactions from the DB when quitting, r=asuth,leplatrem
Backout by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c012372b0db2 Backed out changeset 15a52d0bf116 for bc failures on browser_RecipeRunner.js
Comment 4•1 year ago
|
||
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=15a52d0bf11640e0fbbbd30c09d7a7e822e0524a&selectedJob=295107068
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=295107068&repo=autoland
https://treeherder.mozilla.org/logviewer.html#?job_id=295096429&repo=autoland
https://treeherder.mozilla.org/logviewer.html#?job_id=295096602&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/c012372b0db2
| Assignee | ||
Comment 5•1 year ago
|
||
Note: there were also xpcshell failures:
TEST-UNEXPECTED-FAIL | xpcshell-legacyconfig.ini:toolkit/components/search/tests/xpcshell/test_selectedEngine.js | test_fallback_kept_after_restart - [test_fallback_kept_after_restart : 265] A promise chain failed to handle a rejection: A mutation operation was attempted on a database that did not allow mutations. - stack: ensureShutdownBlocker/<@resource://services-settings/Database.jsm:534:21
| Assignee | ||
Comment 6•1 year ago
|
||
The issues with browser_RecipeRunner.js are easy to fix - there we can just await the calls to db.close() (which they're not right now), and then we stop trying to reuse a db we're in the middle of closing.
However, fundamentally all the methods on Database (now InternalDatabase) share their access to _idb. And all of the ones that want that object call await this.open(). That's great. The problem is that a bunch of the operations they then proceed to do are asynchronous, and so if there are multiple consumers using the DB via remotesettings, one of them can end up calling this.db.close() while one of the others (using the same db!) is mid-operation. This does not go well.
Mathieu, how do you want this part of remote settings to work? Should we somehow keep track of pending operations and make Database.close a no-op as long as at least one operation is still ongoing?
Comment 9•1 year ago
|
||
As a general FYI, the IndexedDB backend will automatically close the underlying database connections if they're idle for a while and reopen it on demand. It would probably be fine to keep the database around once the consumer triggers the open, and then hold onto it until shutdown as long as there's an expectation that there will be future database accesses.
Also, re: database closing, note that IDB's close() doesn't actually take effect "until all outstanding transactions have completed" so as long as things are happening within a transaction, things should generally keep working.
| Assignee | ||
Comment 10•1 year ago
|
||
Ah, I think I've figured it out. I think this is a data race in my patch. This happens:
- call A calls close on the DB (which async-awaits the cached
_idbref) - call B asks for a DB (which also async-awaits the cached
_idbref)
(microtask checkpoint runs, and A+B get their db)
- call A closes the DB
- call B tries to start a transaction on the closed DB
If calling close clears the this._idb property sync, instead of only doing so after await'ing it, this problem should go away, because then in step 2, B gets a fresh ref.
If the microtask queue goes the "other way", so B runs before A, this already can't happen:
- call B asks for a DB (awaits this._idb)
- call A calls close, clearing _idb and await'ing it
(microtask checkpoint runs, and A+B get their db)
- call B starts a transaction on the DB.
- call A calls close, which per Andrew in comment #9 will wait for the transaction to finish.
| Assignee | ||
Comment 11•1 year ago
|
||
So apparently comment #10, while sounding plausible, is still wrong - it's possible for the close to happen between when we call open() (getting a cached promise) and when the promise resolves and we actually try to use said DB...
| Assignee | ||
Comment 12•1 year ago
|
||
(In reply to :Gijs (he/him) from comment #11)
So apparently comment #10, while sounding plausible, is still wrong - it's possible for the close to happen between when we call
open()(getting a cached promise) and when the promise resolves and we actually try to use said DB...
OK, this took me a surprising amount of time to work out, but it's because open is declared async. See also this (to me, surprising) minimal example: https://jsbin.com/ceyiquxeqe/edit?html,js,output
Declaring open as a sync function (that just returns a promise) makes this issue go away. The fact that we take an extra turn through the event loop / extra microtask queue to get the DB if open is declared async is... surprising, I think, and feels fragile.
I'm not really sure what to do about that though, short of re-architecting Database.jsm and remote settings to somehow never call close, or keep track of pending transactions and delaying close until all pending transactions have finished...
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #9)
As a general FYI, the IndexedDB backend will automatically close the underlying database connections if they're idle for a while and reopen it on demand. It would probably be fine to keep the database around once the consumer triggers the open, and then hold onto it until shutdown as long as there's an expectation that there will be future database accesses.
I would do this (ie just not call close) if it wasn't for the other issues that we've had in this space, where IDB seems to take 30 seconds, even during shutdown, to decide the connection is idle and to close it...
That said, not all the operations in RS currently call close when done, so this generally feels like it's already messy.
Comment 13•1 year ago
|
||
(In reply to :Gijs (he/him) from comment #12)
I would do this (ie just not call
close) if it wasn't for the other issues that we've had in this space, where IDB seems to take 30 seconds, even during shutdown, to decide the connection is idle and to close it...
To clarify, I meant calling close (and abort) at your component's shutdown, not never calling close.
Updated•1 year ago
|
| Assignee | ||
Comment 15•1 year ago
|
||
If I can't manage to get this into 76 before we branch I'll probably be looking to uplift.
| Assignee | ||
Updated•1 year ago
|
| Assignee | ||
Comment 16•1 year ago
|
||
Andrew, Mathieu: just to check, this should now be back in your queues...
| Assignee | ||
Comment 18•1 year ago
|
||
I'm now running into this assertion failure:
GECKO(1226) | Assertion failure: data.mRecursionDepth > mBaseRecursionDepth, at /builds/worker/checkouts/gecko/xpcom/base/CycleCollectedJSContext.cpp:546
and not 100% sure what to do about that...
| Assignee | ||
Updated•1 year ago
|
Comment 19•1 year ago
|
||
Pushed by gijskruitbosch@gmail.com: https://hg.mozilla.org/integration/autoland/rev/7312dfafef58 abort pending read transactions from the DB when quitting, r=asuth,leplatrem
| Assignee | ||
Comment 21•1 year ago
|
||
Comment 22•1 year ago
|
||
Pushed by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/825f035018c2 follow-up: add missing Array.from call a=fix
Comment 23•1 year ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/7312dfafef58
https://hg.mozilla.org/mozilla-central/rev/825f035018c2
Comment 24•1 year ago
|
||
This is not fixed as it continued appearing after the last follow-up:
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&searchStr=xpc&revision=09fc2941dd6844a6404834ec3bd76ed4f2e44102&selectedJob=296019366
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&searchStr=xpc&tochange=7f069649d6cf0d7d8cba14cfd987e835509b22fa&fromchange=825f035018c2a7c4526ad68909d96ddd585b7a71
Gijs, could you please take a look at this?
| Assignee | ||
Comment 25•1 year ago
|
||
(In reply to Noemi Erli[:noemi_erli] from comment #24)
This is not fixed as it continued appearing after the last follow-up:
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&searchStr=xpc&revision=09fc2941dd6844a6404834ec3bd76ed4f2e44102&selectedJob=296019366
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&searchStr=xpc&tochange=7f069649d6cf0d7d8cba14cfd987e835509b22fa&fromchange=825f035018c2a7c4526ad68909d96ddd585b7a71Gijs, could you please take a look at this?
Thanks for the heads-up, I filed bug 1627179 for these failures.
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Description
•