Closed Bug 1624983 Opened 5 years ago Closed 5 years ago

abort() pending read transactions in remote settings' Database.jsm when we shutdown to enable faster shutdown

Categories

(Firefox :: Remote Settings Client, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 76
Tracking Status
firefox74 --- wontfix
firefox75 --- wontfix
firefox76 --- fixed

People

(Reporter: Gijs, Assigned: Gijs)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [fxperf:p1])

Attachments

(2 files)

Per bug 1621759 comment 6:

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #6)

(In reply to :Gijs (he/him) from comment #3)

  1. It feels like read requests to indexeddb shouldn't be blocking shutdown (because performance), but I'm unclear on whether, per bug 1545438 comment 26, we need to do that at least for now to avoid a much longer hang (I think so.). :asuth, can you clarify whether I'm right in thinking we're better off blocking the completion of profile-before-change on those IDB requests completing for now?

If you're not going to wait for the reads, it's appropriate to invoke abort() on any IDBTransaction instances that have already been created and then call close().

It seems like we could do this fairly easily in Database.jsm by making executeIDB take an option to indicate the transaction can be aborted, and then adding that option to callers that do read-only transactions. executeIDB can use that option to keep track of pending read transactions, and when the async shutdown promise getter trips we can call abort on all those transactions. Per the spec, it seems this should cause an abort event which executeIDB can listen for, and it can reject its promise with an error in that case.

After the changes in bug 1598924 doing so should automatically throw and stop processing whatever we were doing, and then call .close(). I think this should always be fine for reads on the RS databases, though we should perhaps audit the callsites of the read-only methods from the previous paragraph to ensure there aren't problematic consumer patterns where they do multiple writes interspersed with reads that aren't in the same transaction - of course, that would already be bad, because both the writes and reads can already be aborted/throw for other reasons.

See Also: → 1621759
Whiteboard: [fxperf]
Blocks: 1624878
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Pushed by gijskruitbosch@gmail.com: https://hg.mozilla.org/integration/autoland/rev/15a52d0bf116 abort pending read transactions from the DB when quitting, r=asuth,leplatrem
Backout by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c012372b0db2 Backed out changeset 15a52d0bf116 for bc failures on browser_RecipeRunner.js

Note: there were also xpcshell failures:

TEST-UNEXPECTED-FAIL | xpcshell-legacyconfig.ini:toolkit/components/search/tests/xpcshell/test_selectedEngine.js | test_fallback_kept_after_restart - [test_fallback_kept_after_restart : 265] A promise chain failed to handle a rejection: A mutation operation was attempted on a database that did not allow mutations. - stack: ensureShutdownBlocker/<@resource://services-settings/Database.jsm:534:21

The issues with browser_RecipeRunner.js are easy to fix - there we can just await the calls to db.close() (which they're not right now), and then we stop trying to reuse a db we're in the middle of closing.

However, fundamentally all the methods on Database (now InternalDatabase) share their access to _idb. And all of the ones that want that object call await this.open(). That's great. The problem is that a bunch of the operations they then proceed to do are asynchronous, and so if there are multiple consumers using the DB via remotesettings, one of them can end up calling this.db.close() while one of the others (using the same db!) is mid-operation. This does not go well.

Mathieu, how do you want this part of remote settings to work? Should we somehow keep track of pending operations and make Database.close a no-op as long as at least one operation is still ongoing?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(mathieu)

As a general FYI, the IndexedDB backend will automatically close the underlying database connections if they're idle for a while and reopen it on demand. It would probably be fine to keep the database around once the consumer triggers the open, and then hold onto it until shutdown as long as there's an expectation that there will be future database accesses.

Also, re: database closing, note that IDB's close() doesn't actually take effect "until all outstanding transactions have completed" so as long as things are happening within a transaction, things should generally keep working.

Ah, I think I've figured it out. I think this is a data race in my patch. This happens:

  1. call A calls close on the DB (which async-awaits the cached _idb ref)
  2. call B asks for a DB (which also async-awaits the cached _idb ref)

(microtask checkpoint runs, and A+B get their db)

  1. call A closes the DB
  2. call B tries to start a transaction on the closed DB

If calling close clears the this._idb property sync, instead of only doing so after await'ing it, this problem should go away, because then in step 2, B gets a fresh ref.

If the microtask queue goes the "other way", so B runs before A, this already can't happen:

  1. call B asks for a DB (awaits this._idb)
  2. call A calls close, clearing _idb and await'ing it

(microtask checkpoint runs, and A+B get their db)

  1. call B starts a transaction on the DB.
  2. call A calls close, which per Andrew in comment #9 will wait for the transaction to finish.

So apparently comment #10, while sounding plausible, is still wrong - it's possible for the close to happen between when we call open() (getting a cached promise) and when the promise resolves and we actually try to use said DB...

(In reply to :Gijs (he/him) from comment #11)

So apparently comment #10, while sounding plausible, is still wrong - it's possible for the close to happen between when we call open() (getting a cached promise) and when the promise resolves and we actually try to use said DB...

OK, this took me a surprising amount of time to work out, but it's because open is declared async. See also this (to me, surprising) minimal example: https://jsbin.com/ceyiquxeqe/edit?html,js,output

Declaring open as a sync function (that just returns a promise) makes this issue go away. The fact that we take an extra turn through the event loop / extra microtask queue to get the DB if open is declared async is... surprising, I think, and feels fragile.

I'm not really sure what to do about that though, short of re-architecting Database.jsm and remote settings to somehow never call close, or keep track of pending transactions and delaying close until all pending transactions have finished...

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #9)

As a general FYI, the IndexedDB backend will automatically close the underlying database connections if they're idle for a while and reopen it on demand. It would probably be fine to keep the database around once the consumer triggers the open, and then hold onto it until shutdown as long as there's an expectation that there will be future database accesses.

I would do this (ie just not call close) if it wasn't for the other issues that we've had in this space, where IDB seems to take 30 seconds, even during shutdown, to decide the connection is idle and to close it...

That said, not all the operations in RS currently call close when done, so this generally feels like it's already messy.

(In reply to :Gijs (he/him) from comment #12)

I would do this (ie just not call close) if it wasn't for the other issues that we've had in this space, where IDB seems to take 30 seconds, even during shutdown, to decide the connection is idle and to close it...

To clarify, I meant calling close (and abort) at your component's shutdown, not never calling close.

Whiteboard: [fxperf] → [fxperf:p1]

If I can't manage to get this into 76 before we branch I'll probably be looking to uplift.

Flags: needinfo?(mathieu)

Andrew, Mathieu: just to check, this should now be back in your queues...

Flags: needinfo?(mathieu)
Flags: needinfo?(bugmail)

I'm now running into this assertion failure:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a99771ff5245ab7e6acd02e4b8320a3bda05270a&selectedJob=295911786

GECKO(1226) | Assertion failure: data.mRecursionDepth > mBaseRecursionDepth, at /builds/worker/checkouts/gecko/xpcom/base/CycleCollectedJSContext.cpp:546

and not 100% sure what to do about that...

Flags: needinfo?(mathieu)
See Also: → 1626935
Flags: needinfo?(bugmail)
Pushed by gijskruitbosch@gmail.com: https://hg.mozilla.org/integration/autoland/rev/7312dfafef58 abort pending read transactions from the DB when quitting, r=asuth,leplatrem
Pushed by malexandru@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/825f035018c2 follow-up: add missing Array.from call a=fix
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 76
Regressions: 1627179
Blocks: 1621759
See Also: 1621759
Blocks: 1629005
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: