Bug 1619480 Comment 0 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Original comment by

Lina Butler (ex-Mozilla)

on 2020-03-02 17:47:11 PST

Currently, a `mozStorageConnection` can be used either synchronously or asynchronously, from three different types of consumers—C++ (via the mozStorage [XPCOM API](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/mozIStorageConnection.idl)), JS (via [`Sqlite.jsm`](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/toolkit/modules/Sqlite.jsm)), and Rust (via the [mozStorage Rust bindings](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/rust/src/lib.rs)). Allowing sync and async statements to run on the same connection has implications for how we handle transactions, which we haven't really formalized. Let's use this bug to document what we currently have, the problems with it, and some possible solutions and their trade-offs. We might eventually close this as `WONTFIX`, or morph it into a metabug, with concrete bugs as dependencies.

To recap, mozStorage has two different types of connections: sync and async:

* Sync connections implement [`mozIStorageConnection`](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/mozIStorageConnection.idl), which inherits from [`mozIStorageAsyncConnection`](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/mozIStorageAsyncConnection.idl). These provide access to extra synchronous APIs, which execute SQL statements directly on the current thread. Using these sync APIs from the main thread is allowed, but a Bad Idea. They _also_ implement the async API, and can be used asynchronously just like async-only connections.
* Async-only connections manage their own [background thread](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/mozStorageConnection.cpp#574-596), which they use to execute all SQL statements. You get an async-only connection if you call [`asyncClone`](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/mozStorageConnection.cpp#1427-1429,1450-1453) on an existing storage connection (sync or async), or if you call [`Services.storage.openAsyncDatabase`](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/mozStorageService.cpp#498-501,581-582). The async API is callback-based, and a little clunky to use from JS, so we also provide a `Sqlite.jsm` wrapper that exposes an idiomatic API on top of it. Async-only connections _also_ implement `mozIStorageConnection` (and thus expose sync methods), but attempting to [call one from the _main thread_ asserts](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/mozStorageConnection.cpp#952-962). However, it's perfectly fine to call sync methods on an async-only connection from a background thread! History does this ([`FixAndDecayFrecencyRunnable`](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/toolkit/components/places/nsNavHistory.cpp#216-220,238-252,254-260,267-272,274-280,282-283) is executed on the connection's async thread), as does [bookmark sync](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/toolkit/components/places/bookmark_sync/src/merger.rs#96-111,202-205).

So, what happens when a storage connection—like the Places connection, which is synchronous—has multiple consumers: some in C++, some in JS, some in Rust? The consumers are unaware of one another, which is fine for the most part, _until_ we go to run a transaction. Consider this case:

1. `Sqlite.jsm` executes `BEGIN TRANSACTION` on the connection's async thread. This succeeds.
2. On the next tick of the event loop, a C++ consumer tries to use the connection synchronously, from the main thread. It tries to execute `BEGIN`, which fails because we already opened a transaction in (1), and SQLite transactions can't nest!
3. On the next tick, `Sqlite.jsm` executes an `INSERT`, `UPDATE`, or `DELETE`. This runs another statement on the async thread.
4. Two more ticks pass, with more `INSERT` and `UPDATE`s executed on the async thread.
5. One more tick passes, and `Sqlite.jsm` issues a `COMMIT`.

This seems reasonable, but what if the call in step 2 was "insert this bookmark"? (Thankfully, we've removed most of the sync methods from Places in bug 1083465, but still keep some around to [support the tagging service](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/toolkit/components/places/TaggingService.jsm#44-50,173-180,184-188,194,196,226,262,412)—see bug 424160 for that). Now we've lost a write, and with no real way to communicate that to the user! Or, what if step 2 issued a `ROLLBACK` instead? That _would_ work, since `Sqlite.jsm` opened a transaction, but now we've lost isolation, because steps 3 and 4 aren't protected by a transaction. Another consumer might run its own statements between (3) and (4), and see that partial state. This is called a [_dirty read_](https://en.wikipedia.org/wiki/Isolation_%28database_systems%29#Dirty_reads). And step 5 would fail with a "no transaction is active" error.

`Sqlite.jsm` _does_ [manage its own per-connection _transaction queue_](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/toolkit/modules/Sqlite.jsm#269-271,611,711-715), so two independent calls to `executeTransaction` will be serialized. However, that queue doesn't know anything about C++ or Rust consumers, or, indeed, whether a transaction is currently active.

We can also invert the above scenario slightly:

1. A C++ caller executes `BEGIN` on one thread.
2. On the same tick of the event loop—this time, we're using the connection synchronously—C++ executes an `INSERT`, then yields.
3. One tick passes, and `Sqlite.jsm` goes to start its own transaction. As above, this fails.
4. Another tick passes, and control returns to C++, which executes `COMMIT`. This succeeds.

So it appears our choices are to either allow dirty reads and writes, or prevent some writes from succeeding, potentially losing data. What do we do about this now?

* The C++ API has a [`mozStorageTransaction`](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/mozStorageHelper.h#57) [RAII](https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization) helper that tries to `BEGIN` a transaction. However, if that fails, _it [assumes a transaction is already open_](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/mozStorageHelper.h#87-89), and **goes ahead and makes any changes as part of the active transaction**. When it's done, it sees it didn't open a transaction, and [_doesn't_ issue a `COMMIT`[(https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/mozStorageHelper.h#94). That last part is important; if it were to issue a `COMMIT` (or `ROLLBACK`), it would change the transaction out from underneath whoever started it, like `Sqlite.jsm`, allowing for dirty reads as explained above. But, in exchange, this becomes a _dirty write_ instead of a dirty read. Our changes might have invalidated the assumptions of the `Sqlite.jsm` consumer, and might result in writing inconsistent data when `Sqlite.jsm` commits. Worse, if `Sqlite.jsm` rolls the transaction back, _our changes will be rolled back, too_.
* `Sqlite.jsm` has a transaction queue for all connections. For connections opened via `Sqlite.openConnection`, attempting to `BEGIN` a transaction when one is already in progress [will throw](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/toolkit/modules/Sqlite.jsm#644-648), which is good. However, for existing mozStorage connections wrapped via `Sqlite.wrapStorageConnection` (the [Places connection](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/toolkit/components/places/PlacesUtils.jsm#2049-2058) does this!), we have the same problem as C++—`Sqlite.jsm` can't know if a transaction is active or not. So it [does the same thing as `mozStorageTransaction`](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/toolkit/modules/Sqlite.jsm#630-642,666,689), and plows ahead and makes changes as part of the current transaction.
* Rust [doesn't do anything special](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/rust/src/lib.rs#197-204) to manage transactions, or handle wrapped vs. non-wrapped connections, and will return an error if one is already open.

Ugh. So the problem isn't just that transactions can't nest, it's that using a connection from multiple threads (sync and async) that aren't aware of each other's transactions causes problems.

Mak summarized a few solutions in [this excellent review comment](https://phabricator.services.mozilla.com/D63732#1972744), and I'll restate them here for completeness. 😊

What about [`SAVEPOINT`s](https://sqlite.org/lang_savepoint.html)? Those _can_ nest, but we run into the same issues mixing sync and async calls. If a sync statement starts a savepoint, then an async statement starts another savepoint, then the event loop yields back to another sync statement...it'll make whatever changes as part of the (wrong) async savepoint. However, savepoints _will_ work for async-only connections, and mean we won't have to maintain our own transaction queue.

What if we go all the way and swallow all errors when we `BEGIN` or `COMMIT`? Now we end up with dirty writes and dirty writes, and a consumer can't know if the changes it's making are actually isolated from other consumers. It means another consumer can commit your partial changes, or roll them back _out from underneath you_.

What if we implement our own transaction API, and transparently call `BEGIN` and `COMMIT` for you? This avoids the problem above, where a transaction might get committed or rolled back without the consumer knowing, but still has the dirty writes problem, since any nested changes can't be rolled back—they're not part of a transaction—and a [`RAISE(ROLLBACK, ...)` in a trigger](https://sqlite.org/lang_createtrigger.html#raise) or an [automatic rollback by SQLite](https://www.sqlite.org/lang_transaction.html) (search for the "Response To Errors Within A Transaction" heading on that page) can still roll a transaction back underneath us.

Revision 1 by

Lina Butler (ex-Mozilla)

on 2020-03-02 17:51:12 PST

To recap, mozStorage has two different types of connections: sync and async:

We can also invert the above scenario slightly:

So it appears our choices are to either allow dirty reads and writes, or prevent some writes from succeeding, potentially losing data. What do we do about this now?

* The C++ API has a [`mozStorageTransaction`](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/mozStorageHelper.h#57) [RAII](https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization) helper that tries to `BEGIN` a transaction. However, if that fails, _it [assumes a transaction is already open_](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/mozStorageHelper.h#87-89), and **goes ahead and makes any changes as part of the active transaction**. When it's done, it sees it didn't open a transaction, and [_doesn't_ issue a `COMMIT`](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/mozStorageHelper.h#94). That last part is important; if it were to issue a `COMMIT` (or `ROLLBACK`), it would change the transaction out from underneath whoever started it, like `Sqlite.jsm`, allowing for dirty reads as explained above. But, in exchange, this becomes a _dirty write_ instead of a dirty read. Our changes might have invalidated the assumptions of the `Sqlite.jsm` consumer, and might result in writing inconsistent data when `Sqlite.jsm` commits. Worse, if `Sqlite.jsm` rolls the transaction back, _our changes will be rolled back, too_.
* `Sqlite.jsm` has a transaction queue for all connections. For connections opened via `Sqlite.openConnection`, attempting to `BEGIN` a transaction when one is already in progress [will throw](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/toolkit/modules/Sqlite.jsm#644-648), which is good. However, for existing mozStorage connections wrapped via `Sqlite.wrapStorageConnection` (the [Places connection](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/toolkit/components/places/PlacesUtils.jsm#2049-2058) does this!), we have the same problem as C++—`Sqlite.jsm` can't know if a transaction is active or not. So it [does the same thing as `mozStorageTransaction`](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/toolkit/modules/Sqlite.jsm#630-642,666,689), and plows ahead and makes changes as part of the current transaction.
* Rust [doesn't do anything special](https://searchfox.org/mozilla-central/rev/91f6c02fcf4c16f78fdc4417f61f192688294066/storage/rust/src/lib.rs#197-204) to manage transactions, or handle wrapped vs. non-wrapped connections, and will return an error if one is already open.

Ugh. So the problem isn't just that transactions can't nest, it's that using a connection from multiple threads (sync and async) that aren't aware of each other's transactions causes problems.

Mak summarized a few solutions in [this excellent review comment](https://phabricator.services.mozilla.com/D63732#1972744), and I'll restate them here for completeness. 😊

Back to Bug 1619480 Comment 0