Use WAL journal mode for IndexedDB

RESOLVED FIXED in Firefox 40

Status

()

Core
DOM: IndexedDB
RESOLVED FIXED
4 years ago
2 years ago

People

(Reporter: gps, Assigned: Ben Turner (not reading bugmail, use the needinfo flag!))

Tracking

Trunk
mozilla40
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox40 fixed)

Details

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

4 years ago
I was poking around the IndexedDB source and noticed it isn't using SQLite's WAL journal. The WAL journal will likely make things much, much faster and will incur fewer fsync()s during writes. I suggest IndexedDB switch to using WAL mode.

See bug 830492 for some discussion on WAL and what it can do.
We didn't do this originally because WAL was brand new when we were writing IndexedDB. We also use a few options together that we weren't convinced would be compatible with it (unlock_notify, 'immediate' transactions, shared cache, and lots and lots of threads). I don't think we have ever tried WAL with that combination before so we could of course find bugs, but I've chatted with the SQLite folks and there's at least one other big project that uses similar options. Given that I guess we could give it a try.

However, it doesn't seem clear that IndexedDB is really going to benefit from it.

IndexedDB is supposed to be a generic database solution for web apps so it's hard to guess whether they will be doing more reading or more writing. Figuring out a checkpoint strategy sounds hard. Furthermore, IndexedDB promises durability so every commit must be followed by a call to fsync. I guess the rollback journal requires two per commit, but they should happen in quick succession so I wonder if that makes much difference. There are also warnings in the docs about using WAL with large write transaction that could easily happen since we can't predict web app behavior.

And the other main benefit I see (MVCC) is not something that IndexedDB can really use completely. It's nice that we could allow a write to start while a previous read is in progress, but the opposite case (write in progress when a read begins) is something that we would have to block.

So yeah, we may want to play with this, but I don't yet see a compelling case for it. I imagine some web apps could benefit and some wouldn't.
(Reporter)

Comment 2

4 years ago
When you see "large writes" in SQLite's docs, they are usually referring to transactions >10MB if not >100MB. I suspect their definition of "large" might be a magnitude larger than IndexedDB's. Although, I literally have no clue how IndexedDB is being used in the wild.
Sure... But syncing my email account could do that easily.
Blocks: 1131766
Created attachment 8562311 [details] [diff] [review]
Patch, v1

This switches us to use WAL mode for IndexedDB. I'd rather not land this by itself, but together with bug 1112702.
Assignee: nobody → bent.mozilla
Status: NEW → ASSIGNED
Attachment #8562311 - Flags: review?(Jan.Varga)
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #4)
> I'd rather not land this by itself, but together with bug 1112702.

Oops, meant bug 1131766.
Blocks: 1112702
Depends on: 1131776
I'd suggest to evaluate PRAGMA journal_size_limit, or the wal will never shrink, and in some cases it can grow up to 7 times the size of the database (clearly then something should be done for the statement causing that, but still...)
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #6)
> I'd suggest to evaluate PRAGMA journal_size_limit, or the wal will never
> shrink

Yeah, initially I did this, but the final approach doesn't require it. We have two mechanisms that control the WAL size:

1. Quota control. The WAL file for IDB will be placed under quota control, so if it gets larger than the allotted space for the origin that is using it we will fail and roll everything back.
2. TRUNCATE-level checkpointing happens a few seconds after the database goes idle. This will reclaim all space used by the WAL, but will avoid doing so if the db is under heavy use. This is implemented in bug 1131766.

Comment 8

3 years ago
Comment on attachment 8562311 [details] [diff] [review]
Patch, v1

Review of attachment 8562311 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me, although I think I found a bug in TelemetryVFS.cpp

I ran all the tests and only test_temporary_storage.js didn't pass. The test assumes that we track quota for database files only. This patch adds WAL to quota tracking which confuses the test.
However I think there's more serious problem with WAL and quota tracking. So far we deleted sqlite databases manually in our IDB implementation, so we could also update quota accordingly. With the WAL, the virtual file system calls xDelete for the WAL when it's deleted. We don't have any hook in xDelete() at the moment, so the quota object is never updated!

::: dom/indexedDB/ActorsParent.cpp
@@ +2289,5 @@
> +    if (NS_WARN_IF(NS_FAILED(rv))) {
> +      return rv;
> +    }
> +  }
> +#endif 

extra space

@@ +2400,5 @@
> +#endif
> +};
> +
> +template <template <class> class SmartPtr, class FileOrURLType>
> +struct StorageOpenTraits<SmartPtr<FileOrURLType>> 

extra space

@@ +9609,5 @@
>      rv = file->GetLeafName(leafName);
>      if (NS_WARN_IF(NS_FAILED(rv))) {
>        return rv;
>      }
>  

This will skip even directories named .DS_Store, *.sqlite-journal, etc.
Not a big deal, but is there a reason why you want it before the isDirectory check ?

@@ +13258,5 @@
>  
>      rv = fmDirectory->Remove(true);
>      if (NS_WARN_IF(NS_FAILED(rv))) {
> +      // We may have deleted some files, check if we can and update quota
> +      // information before returning the error.

Good catch

::: storage/src/TelemetryVFS.cpp
@@ +161,5 @@
> +   *     - \0
> +   *   - \0
> +   *   - Journal Path
> +   *   - \0
> +   *   - WAL Path (zName)

zName or zWALName ?
Attachment #8562311 - Flags: review?(Jan.Varga) → review+

Comment 9

3 years ago
(In reply to Jan Varga [:janv] from comment #8)

> I ran all the tests and only test_temporary_storage.js didn't pass. The test
> assumes that we track quota for database files only. This patch adds WAL to
> quota tracking which confuses the test.
> However I think there's more serious problem with WAL and quota tracking. So
> far we deleted sqlite databases manually in our IDB implementation, so we
> could also update quota accordingly. With the WAL, the virtual file system
> calls xDelete for the WAL when it's deleted. We don't have any hook in
> xDelete() at the moment, so the quota object is never updated!

Fortunately, we can use DatabasePathFromWALPath() in xDelete() to get the quota info.

The former problem with test_temporary_storage.js (WAL added to quota tracking) is a bit harder.
The simplest solution would be to just disable WAL quota tracking for this test. In that case we need a new test that catches the xDelete() problem.
(In reply to Jan Varga [:janv] from comment #8)
> This looks good to me, although I think I found a bug in TelemetryVFS.cpp

Yep, indeed! Thanks for finding that. I'll fix it.

> This will skip even directories named .DS_Store, *.sqlite-journal, etc.
> Not a big deal, but is there a reason why you want it before the isDirectory
> check ?

At the time I was thinking that it was a waste to stat the files, but meh. I changed the order back.

> zName or zWALName ?

zWALName!
Created attachment 8568880 [details] [diff] [review]
Review fixes [not for checkin]
Attachment #8562311 - Attachment is obsolete: true
Created attachment 8568881 [details] [diff] [review]
Patch, v1.1
Attachment #8568881 - Flags: review+
Attachment #8568880 - Attachment description: Review fixes → Review fixes [not for checkin]

Comment 13

3 years ago
Comment on attachment 8568880 [details] [diff] [review]
Review fixes [not for checkin]

Review of attachment 8568880 [details] [diff] [review]:
-----------------------------------------------------------------

::: storage/src/TelemetryVFS.cpp
@@ +305,5 @@
> +  QuotaManager *quotaManager = QuotaManager::Get();
> +  MOZ_ASSERT(quotaManager);
> +
> +  return quotaManager->GetQuotaObject(
> +      PersistenceTypeFromText(nsDependentCString(persistenceType)),

Nit: indentation
Attachment #8568880 - Flags: review+

Comment 14

3 years ago
Comment on attachment 8568881 [details] [diff] [review]
Patch, v1.1

Review of attachment 8568881 [details] [diff] [review]:
-----------------------------------------------------------------

This patch doesn't have the temp_store pragma. Are there any problems with it ?

Comment 15

3 years ago
Comment on attachment 8568881 [details] [diff] [review]
Patch, v1.1

Review of attachment 8568881 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ +9777,5 @@
> +      rv = initInfo.mDatabaseWALFile->GetFileSize(&fileSize);
> +      if (NS_SUCCEEDED(rv)) {
> +        MOZ_ASSERT(fileSize >= 0);
> +        aUsageInfo->AppendToDatabaseUsage(uint64_t(fileSize));
> +      } else if (NS_WARN_IF(NS_FAILED(rv) && rv != NS_ERROR_FILE_NOT_FOUND)) {

You also need to check NS_ERROR_FILE_TARGET_DOES_NOT_EXIST, otherwise the method fails on mac.
(In reply to Jan Varga [:janv] from comment #14)
> This patch doesn't have the temp_store pragma. Are there any problems with
> it ?

No, it was an experiment that I didn't mean to leave in. SQLite does as much in memory as it can and then spills to disk in extreme cases with the default setting. Changing it to memory-only mode gives you OOM errors instead, so it doesn't seem worth it to me.
(In reply to Jan Varga [:janv] from comment #15)
> You also need to check NS_ERROR_FILE_TARGET_DOES_NOT_EXIST, otherwise the
> method fails on mac.

Sadface.

Comment 18

3 years ago
I found another bug while looking at test_temporary_storage.js
When a new database is being created, WAL is created first. Let's say that WAL is created successfully.
Then during checkpointing the database file is created. However if sqlite can't finish creating of the database file we can end up with very small .sqlite file which seems to be good enough to open in sqlite3, but it doesn't have our IDB schema. In my testing the file size 8192 and it has really simple schema:
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
COMMIT;

So then when I try to delete such database from JS, I see this error in the console:
0:01.27 PROCESS_OUTPUT: Thread-1 (pid:23326) "Assertion failure: (((bool)(__builtin_expect(!!(!NS_FAILED_impl( connection->CreateStatement(static_cast<const nsLiteralCString&>(nsLiteralCString( "SELECT name " "FROM database")), getter_AddRefs(stmt)))), 1)))), at /Users/varga/Sources/Mozilla/dom/indexedDB/ActorsParent.cpp:18110"

Clearly, we were able to open mozStorageConnection, but the database table doesn't exist.
(In reply to Jan Varga [:janv] from comment #18)
> Clearly, we were able to open mozStorageConnection, but the database table
> doesn't exist.

This can actually happen with non-WAL too. I filed bug 1143830 on it.
https://hg.mozilla.org/integration/mozilla-inbound/rev/cbd862dd036f
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/beb016173f6f - something in there appears to have made browser_storage_dynamic_windows.js (in devtools-chrome-4) on Win7 debug (almost all the time) and Win8 debug (more rarely) hit "Assertion failure: !mAsyncExecutionThread (AsyncClose has not been invoked on this connection!)".
Flags: needinfo?(bent.mozilla)
Hrm, IndexedDB doesn't use async mozStorage... mak, any idea what's going on with that failure?
Flags: needinfo?(mak77)
I took a brief look earlier today but unfortunately without logging enabled and given our assertion not telling us the database in question, it's really not clear from the logs what code went wrong.  (But it is clear some code is active that creates an asynchronous connection or uses a connection in an asynchronous way and then "leaks" it without calling AsyncClose.  I'd accuse some JS code.)

I'd suggest running locally with the patch-set applied for that test to see what is doing suspicious stuff.  Or somehow making a try build do that.
(In reply to Andrew Sutherland [:asuth] from comment #23)
> I'd suggest running locally with the patch-set applied for that test to see
> what is doing suspicious stuff.  Or somehow making a try build do that.

That is, with storage PR_LOG logging enabled to stdout/a file.
(In reply to Andrew Sutherland [:asuth] from comment #24)

This is a storage problem, bug 1071360.
Flags: needinfo?(bent.mozilla)
I don't have any more insight than what Andrew provided.  I guess investigation will continue in bug 1071360.
Flags: needinfo?(mak77)
https://hg.mozilla.org/integration/mozilla-inbound/rev/8a8e8efa5d38
Backed out for frequent bc1 oranges
https://treeherder.mozilla.org/logviewer.html#?job_id=8256324&repo=mozilla-inbound

https://hg.mozilla.org/integration/mozilla-inbound/rev/65b1df3a23c7
Flags: needinfo?(bent.mozilla)
(In reply to Wes Kocher (:KWierso) from comment #28)
> Backed out for frequent bc1 oranges
> https://treeherder.mozilla.org/logviewer.html#?job_id=8256324&repo=mozilla-
> inbound

Actually bug 1112620
Flags: needinfo?(bent.mozilla)
https://hg.mozilla.org/integration/mozilla-inbound/rev/46c549bb6343
https://hg.mozilla.org/mozilla-central/rev/46c549bb6343
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox40: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Sorry for the late question, since you moved to WAL, did you consider locking_mode = EXCLUSIVE? It would remove the need for shm files (that have been reported to be problematic on some network fs) and a tiny perf gain. I think we're still in a situation where only one process touches the connection, right?
Flags: needinfo?(bent.mozilla)
(In reply to Marco Bonardo [::mak] from comment #32)

Yep, I tried to do this early on. The problem is that we still grab the database from two different threads, and EXCLUSIVE doesn't allow that. Hopefully we're going to drop that second thread in the near future and then I can use EXCLUSIVE.
Flags: needinfo?(bent.mozilla)
You need to log in before you can comment on or make changes to this bug.