Open Bug 1619480 Opened 6 years ago Updated 1 year ago

Decide on how to handle nested transactions

Tracking

()

Status:

NEW

People

(Reporter: lina, Unassigned)

References

Details

Lina Butler (ex-Mozilla)

Reporter

Description

•

6 years ago

•

Edited

Currently, a mozStorageConnection can be used either synchronously or asynchronously, from three different types of consumers—C++ (via the mozStorage XPCOM API), JS (via Sqlite.jsm), and Rust (via the mozStorage Rust bindings). Allowing sync and async statements to run on the same connection has implications for how we handle transactions, which we haven't really formalized. Let's use this bug to document what we currently have, the problems with it, and some possible solutions and their trade-offs. We might eventually close this as WONTFIX, or morph it into a metabug, with concrete bugs as dependencies.

To recap, mozStorage has two different types of connections: sync and async:

Sync connections implement mozIStorageConnection, which inherits from mozIStorageAsyncConnection. These provide access to extra synchronous APIs, which execute SQL statements directly on the current thread. Using these sync APIs from the main thread is allowed, but a Bad Idea. They also implement the async API, and can be used asynchronously just like async-only connections.
Async-only connections manage their own background thread, which they use to execute all SQL statements. You get an async-only connection if you call asyncClone on an existing storage connection (sync or async), or if you call Services.storage.openAsyncDatabase. The async API is callback-based, and a little clunky to use from JS, so we also provide a Sqlite.jsm wrapper that exposes an idiomatic API on top of it. Async-only connections also implement mozIStorageConnection (and thus expose sync methods), but attempting to call one from the main thread asserts. However, it's perfectly fine to call sync methods on an async-only connection from a background thread! History does this (FixAndDecayFrecencyRunnable is executed on the connection's async thread), as does bookmark sync.

So, what happens when a storage connection—like the Places connection, which is synchronous—has multiple consumers: some in C++, some in JS, some in Rust? The consumers are unaware of one another, which is fine for the most part, until we go to run a transaction. Consider this case:

Sqlite.jsm executes BEGIN TRANSACTION on the connection's async thread. This succeeds.
On the next tick of the event loop, a C++ consumer tries to use the connection synchronously, from the main thread. It tries to execute BEGIN, which fails because we already opened a transaction in (1), and SQLite transactions can't nest!
On the next tick, Sqlite.jsm executes an INSERT, UPDATE, or DELETE. This runs another statement on the async thread.
Two more ticks pass, with more INSERT and UPDATEs executed on the async thread.
One more tick passes, and Sqlite.jsm issues a COMMIT.

This seems reasonable, but what if the call in step 2 was "insert this bookmark"? (Thankfully, we've removed most of the sync methods from Places in bug 1083465, but still keep some around to support the tagging service—see bug 424160 for that). Now we've lost a write, and with no real way to communicate that to the user! Or, what if step 2 issued a ROLLBACK instead? That would work, since Sqlite.jsm opened a transaction, but now we've lost isolation, because steps 3 and 4 aren't protected by a transaction. Another consumer might run its own statements between (3) and (4), and see that partial state. This is called a dirty read. And step 5 would fail with a "no transaction is active" error.

Sqlite.jsm does manage its own per-connection transaction queue, so two independent calls to executeTransaction will be serialized. However, that queue doesn't know anything about C++ or Rust consumers, or, indeed, whether a transaction is currently active.

We can also invert the above scenario slightly:

A C++ caller executes BEGIN on one thread.
On the same tick of the event loop—this time, we're using the connection synchronously—C++ executes an INSERT, then yields.
One tick passes, and Sqlite.jsm goes to start its own transaction. As above, this fails.
Another tick passes, and control returns to C++, which executes COMMIT. This succeeds.

So it appears our choices are to either allow dirty reads and writes, or prevent some writes from succeeding, potentially losing data. What do we do about this now?

The C++ API has a mozStorageTransaction RAII helper that tries to BEGIN a transaction. However, if that fails, _it assumes a transaction is already open_, and goes ahead and makes any changes as part of the active transaction. When it's done, it sees it didn't open a transaction, and doesn't issue a COMMIT. That last part is important; if it were to issue a COMMIT (or ROLLBACK), it would change the transaction out from underneath whoever started it, like Sqlite.jsm, allowing for dirty reads as explained above. But, in exchange, this becomes a dirty write instead of a dirty read. Our changes might have invalidated the assumptions of the Sqlite.jsm consumer, and might result in writing inconsistent data when Sqlite.jsm commits. Worse, if Sqlite.jsm rolls the transaction back, our changes will be rolled back, too.
Sqlite.jsm has a transaction queue for all connections. For connections opened via Sqlite.openConnection, attempting to BEGIN a transaction when one is already in progress will throw, which is good. However, for existing mozStorage connections wrapped via Sqlite.wrapStorageConnection (the Places connection does this!), we have the same problem as C++—Sqlite.jsm can't know if a transaction is active or not. So it does the same thing as mozStorageTransaction, and plows ahead and makes changes as part of the current transaction.
Rust doesn't do anything special to manage transactions, or handle wrapped vs. non-wrapped connections, and will return an error if one is already open.

Ugh. So the problem isn't just that transactions can't nest, it's that using a connection from multiple threads (sync and async) that aren't aware of each other's transactions causes problems.

Mak summarized a few solutions in this excellent review comment, and I'll restate them here for completeness. 😊

What about SAVEPOINTs? Those can nest, but we run into the same issues mixing sync and async calls. If a sync statement starts a savepoint, then an async statement starts another savepoint, then the event loop yields back to another sync statement...it'll make whatever changes as part of the (wrong) async savepoint. However, savepoints will work for async-only connections, and mean we won't have to maintain our own transaction queue.

What if we go all the way and swallow all errors when we BEGIN or COMMIT? Now we end up with dirty writes and dirty writes, and a consumer can't know if the changes it's making are actually isolated from other consumers. It means another consumer can commit your partial changes, or roll them back out from underneath you.

What if we implement our own transaction API, and transparently call BEGIN and COMMIT for you? This avoids the problem above, where a transaction might get committed or rolled back without the consumer knowing, but still has the dirty writes problem, since any nested changes can't be rolled back—they're not part of a transaction—and a RAISE(ROLLBACK, ...) in a trigger or an automatic rollback by SQLite (search for the "Response To Errors Within A Transaction" heading on that page) can still roll a transaction back underneath us.

Marco Bonardo [:mak]

Comment 1

•

6 years ago

Thank you for the well written summary about this problem.

Priority: -- → P3

Lina Butler (ex-Mozilla)

Reporter

Updated

•

6 years ago

Comment 2

•

6 years ago

Yes, excellent write-up!

I think the best practice right now for SQLite usage in-tree is to have a database connection held by a single thread/task queue which C++/rust runnables are dispatched at and which synchronously run their transaction in what one might call an "immediate mode" fashion. And if there are reads that may need to happen which can't wait for that transaction, a WAL database is used and a separate thread/task queue gets its own SQLite connection which leverages the concurrency provided by WAL mode and also runs in "immediate mode".

Historically it sounded like Places has been unable to do this because of:

The need to do some things synchronously on the main thread which inherently would need to avoid blocking on transactions because that would jank the main thread more than necessary.
JS code that lives on the main thread for various reasons. (Presumably XPConnect only being main-thread being a significant factor, but presumably callers might also already have been main thread and the implications of remoting not trivial.)
(mozStorage)

Any plan for having a transaction queue or a write baton isn't going to work if Places needs to poke holes through it.

I expect asking for a similar Places write-up is likely an unreasonably large request, so it's probably most practical if a Places expert could indicate what stumbling blocks there are to implementing something like the following:

Create some kind of transaction queue helper for mozStorage with an API surface like:
- Promise requestAsyncWriteTransaction(Promise completionPromise): This is perhaps a bad signature, but the basic idea would be that JS callers could receive a promise that resolves when it's their turn to use the underling async storage connection, and when they're done (or if they give up on waiting for the database) they resolve the completionPromise.
- queueWriteTransactionRunnable(nsIRunnable *aRunnable): Provide a runnable that will be run on the async database thread with exclusive access to the database until it's done. The main thing this does beyond that status quo is avoid weird interleaving with the requestAsyncWriteTransaction consumers.
There could also be read-only variants of the above API calls so that a separate read-only connection could be used and could be wrapped in a read-only transaction so that it sees a coherent database state the entire time.
The nefarious main-thread code could use whatever connection suits it.

I should also raise that it's totally possible for us to expose a synchronous mozStorage API to ChromeWorkers via WebIDL, and that the ChromeWorker thread could be the/a database thread. (We can't randomly run workers on any thread, but we can dispatch random runnables at worker threads.)

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

BMO Automation

Updated

•

1 year ago

Product: Toolkit → Core

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Decide on how to handle nested transactions

Categories

(Core :: SQLite and Embedded Database Bindings, task, P3)

Tracking

()

People

(Reporter: lina, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Updated

Updated