Last Comment Bug 866846 - Use WAL journal mode for IndexedDB
: Use WAL journal mode for IndexedDB
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: DOM: IndexedDB (show other bugs)
: Trunk
: All All
-- normal (vote)
: mozilla40
Assigned To: Ben Turner (not reading bugmail, use the needinfo flag!)
:
: Hsin-Yi Tsai [:hsinyi]
Mentors:
Depends on: 1131776
Blocks: 1112702 1131766
  Show dependency treegraph
 
Reported: 2013-04-29 11:53 PDT by Gregory Szorc [:gps] (away until 2017-03-20)
Modified: 2015-04-02 12:14 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
Patch, v1 (49.19 KB, patch)
2015-02-10 13:08 PST, Ben Turner (not reading bugmail, use the needinfo flag!)
jvarga: review+
Details | Diff | Splinter Review
Review fixes [not for checkin] (4.49 KB, patch)
2015-02-24 16:35 PST, Ben Turner (not reading bugmail, use the needinfo flag!)
jvarga: review+
Details | Diff | Splinter Review
Patch, v1.1 (50.06 KB, patch)
2015-02-24 16:36 PST, Ben Turner (not reading bugmail, use the needinfo flag!)
bent.mozilla: review+
Details | Diff | Splinter Review

Description User image Gregory Szorc [:gps] (away until 2017-03-20) 2013-04-29 11:53:32 PDT
I was poking around the IndexedDB source and noticed it isn't using SQLite's WAL journal. The WAL journal will likely make things much, much faster and will incur fewer fsync()s during writes. I suggest IndexedDB switch to using WAL mode.

See bug 830492 for some discussion on WAL and what it can do.
Comment 1 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2013-04-29 17:01:04 PDT
We didn't do this originally because WAL was brand new when we were writing IndexedDB. We also use a few options together that we weren't convinced would be compatible with it (unlock_notify, 'immediate' transactions, shared cache, and lots and lots of threads). I don't think we have ever tried WAL with that combination before so we could of course find bugs, but I've chatted with the SQLite folks and there's at least one other big project that uses similar options. Given that I guess we could give it a try.

However, it doesn't seem clear that IndexedDB is really going to benefit from it.

IndexedDB is supposed to be a generic database solution for web apps so it's hard to guess whether they will be doing more reading or more writing. Figuring out a checkpoint strategy sounds hard. Furthermore, IndexedDB promises durability so every commit must be followed by a call to fsync. I guess the rollback journal requires two per commit, but they should happen in quick succession so I wonder if that makes much difference. There are also warnings in the docs about using WAL with large write transaction that could easily happen since we can't predict web app behavior.

And the other main benefit I see (MVCC) is not something that IndexedDB can really use completely. It's nice that we could allow a write to start while a previous read is in progress, but the opposite case (write in progress when a read begins) is something that we would have to block.

So yeah, we may want to play with this, but I don't yet see a compelling case for it. I imagine some web apps could benefit and some wouldn't.
Comment 2 User image Gregory Szorc [:gps] (away until 2017-03-20) 2013-04-29 17:03:46 PDT
When you see "large writes" in SQLite's docs, they are usually referring to transactions >10MB if not >100MB. I suspect their definition of "large" might be a magnitude larger than IndexedDB's. Although, I literally have no clue how IndexedDB is being used in the wild.
Comment 3 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2013-04-29 17:08:18 PDT
Sure... But syncing my email account could do that easily.
Comment 4 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-02-10 13:08:20 PST
Created attachment 8562311 [details] [diff] [review]
Patch, v1

This switches us to use WAL mode for IndexedDB. I'd rather not land this by itself, but together with bug 1112702.
Comment 5 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-02-10 13:09:25 PST
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #4)
> I'd rather not land this by itself, but together with bug 1112702.

Oops, meant bug 1131766.
Comment 6 User image Marco Bonardo [::mak] 2015-02-11 00:57:45 PST
I'd suggest to evaluate PRAGMA journal_size_limit, or the wal will never shrink, and in some cases it can grow up to 7 times the size of the database (clearly then something should be done for the statement causing that, but still...)
Comment 7 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-02-11 09:32:45 PST
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #6)
> I'd suggest to evaluate PRAGMA journal_size_limit, or the wal will never
> shrink

Yeah, initially I did this, but the final approach doesn't require it. We have two mechanisms that control the WAL size:

1. Quota control. The WAL file for IDB will be placed under quota control, so if it gets larger than the allotted space for the origin that is using it we will fail and roll everything back.
2. TRUNCATE-level checkpointing happens a few seconds after the database goes idle. This will reclaim all space used by the WAL, but will avoid doing so if the db is under heavy use. This is implemented in bug 1131766.
Comment 8 User image Jan Varga [:janv] 2015-02-23 15:55:02 PST
Comment on attachment 8562311 [details] [diff] [review]
Patch, v1

Review of attachment 8562311 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me, although I think I found a bug in TelemetryVFS.cpp

I ran all the tests and only test_temporary_storage.js didn't pass. The test assumes that we track quota for database files only. This patch adds WAL to quota tracking which confuses the test.
However I think there's more serious problem with WAL and quota tracking. So far we deleted sqlite databases manually in our IDB implementation, so we could also update quota accordingly. With the WAL, the virtual file system calls xDelete for the WAL when it's deleted. We don't have any hook in xDelete() at the moment, so the quota object is never updated!

::: dom/indexedDB/ActorsParent.cpp
@@ +2289,5 @@
> +    if (NS_WARN_IF(NS_FAILED(rv))) {
> +      return rv;
> +    }
> +  }
> +#endif 

extra space

@@ +2400,5 @@
> +#endif
> +};
> +
> +template <template <class> class SmartPtr, class FileOrURLType>
> +struct StorageOpenTraits<SmartPtr<FileOrURLType>> 

extra space

@@ +9609,5 @@
>      rv = file->GetLeafName(leafName);
>      if (NS_WARN_IF(NS_FAILED(rv))) {
>        return rv;
>      }
>  

This will skip even directories named .DS_Store, *.sqlite-journal, etc.
Not a big deal, but is there a reason why you want it before the isDirectory check ?

@@ +13258,5 @@
>  
>      rv = fmDirectory->Remove(true);
>      if (NS_WARN_IF(NS_FAILED(rv))) {
> +      // We may have deleted some files, check if we can and update quota
> +      // information before returning the error.

Good catch

::: storage/src/TelemetryVFS.cpp
@@ +161,5 @@
> +   *     - \0
> +   *   - \0
> +   *   - Journal Path
> +   *   - \0
> +   *   - WAL Path (zName)

zName or zWALName ?
Comment 9 User image Jan Varga [:janv] 2015-02-24 02:42:27 PST
(In reply to Jan Varga [:janv] from comment #8)

> I ran all the tests and only test_temporary_storage.js didn't pass. The test
> assumes that we track quota for database files only. This patch adds WAL to
> quota tracking which confuses the test.
> However I think there's more serious problem with WAL and quota tracking. So
> far we deleted sqlite databases manually in our IDB implementation, so we
> could also update quota accordingly. With the WAL, the virtual file system
> calls xDelete for the WAL when it's deleted. We don't have any hook in
> xDelete() at the moment, so the quota object is never updated!

Fortunately, we can use DatabasePathFromWALPath() in xDelete() to get the quota info.

The former problem with test_temporary_storage.js (WAL added to quota tracking) is a bit harder.
The simplest solution would be to just disable WAL quota tracking for this test. In that case we need a new test that catches the xDelete() problem.
Comment 10 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-02-24 14:58:39 PST
(In reply to Jan Varga [:janv] from comment #8)
> This looks good to me, although I think I found a bug in TelemetryVFS.cpp

Yep, indeed! Thanks for finding that. I'll fix it.

> This will skip even directories named .DS_Store, *.sqlite-journal, etc.
> Not a big deal, but is there a reason why you want it before the isDirectory
> check ?

At the time I was thinking that it was a waste to stat the files, but meh. I changed the order back.

> zName or zWALName ?

zWALName!
Comment 11 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-02-24 16:35:34 PST
Created attachment 8568880 [details] [diff] [review]
Review fixes [not for checkin]
Comment 12 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-02-24 16:36:21 PST
Created attachment 8568881 [details] [diff] [review]
Patch, v1.1
Comment 13 User image Jan Varga [:janv] 2015-02-25 01:45:24 PST
Comment on attachment 8568880 [details] [diff] [review]
Review fixes [not for checkin]

Review of attachment 8568880 [details] [diff] [review]:
-----------------------------------------------------------------

::: storage/src/TelemetryVFS.cpp
@@ +305,5 @@
> +  QuotaManager *quotaManager = QuotaManager::Get();
> +  MOZ_ASSERT(quotaManager);
> +
> +  return quotaManager->GetQuotaObject(
> +      PersistenceTypeFromText(nsDependentCString(persistenceType)),

Nit: indentation
Comment 14 User image Jan Varga [:janv] 2015-02-26 07:15:28 PST
Comment on attachment 8568881 [details] [diff] [review]
Patch, v1.1

Review of attachment 8568881 [details] [diff] [review]:
-----------------------------------------------------------------

This patch doesn't have the temp_store pragma. Are there any problems with it ?
Comment 15 User image Jan Varga [:janv] 2015-03-11 09:18:46 PDT
Comment on attachment 8568881 [details] [diff] [review]
Patch, v1.1

Review of attachment 8568881 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ +9777,5 @@
> +      rv = initInfo.mDatabaseWALFile->GetFileSize(&fileSize);
> +      if (NS_SUCCEEDED(rv)) {
> +        MOZ_ASSERT(fileSize >= 0);
> +        aUsageInfo->AppendToDatabaseUsage(uint64_t(fileSize));
> +      } else if (NS_WARN_IF(NS_FAILED(rv) && rv != NS_ERROR_FILE_NOT_FOUND)) {

You also need to check NS_ERROR_FILE_TARGET_DOES_NOT_EXIST, otherwise the method fails on mac.
Comment 16 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-03-14 08:52:51 PDT
(In reply to Jan Varga [:janv] from comment #14)
> This patch doesn't have the temp_store pragma. Are there any problems with
> it ?

No, it was an experiment that I didn't mean to leave in. SQLite does as much in memory as it can and then spills to disk in extreme cases with the default setting. Changing it to memory-only mode gives you OOM errors instead, so it doesn't seem worth it to me.
Comment 17 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-03-14 08:55:02 PDT
(In reply to Jan Varga [:janv] from comment #15)
> You also need to check NS_ERROR_FILE_TARGET_DOES_NOT_EXIST, otherwise the
> method fails on mac.

Sadface.
Comment 18 User image Jan Varga [:janv] 2015-03-16 12:59:59 PDT
I found another bug while looking at test_temporary_storage.js
When a new database is being created, WAL is created first. Let's say that WAL is created successfully.
Then during checkpointing the database file is created. However if sqlite can't finish creating of the database file we can end up with very small .sqlite file which seems to be good enough to open in sqlite3, but it doesn't have our IDB schema. In my testing the file size 8192 and it has really simple schema:
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
COMMIT;

So then when I try to delete such database from JS, I see this error in the console:
0:01.27 PROCESS_OUTPUT: Thread-1 (pid:23326) "Assertion failure: (((bool)(__builtin_expect(!!(!NS_FAILED_impl( connection->CreateStatement(static_cast<const nsLiteralCString&>(nsLiteralCString( "SELECT name " "FROM database")), getter_AddRefs(stmt)))), 1)))), at /Users/varga/Sources/Mozilla/dom/indexedDB/ActorsParent.cpp:18110"

Clearly, we were able to open mozStorageConnection, but the database table doesn't exist.
Comment 19 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-03-16 13:06:42 PDT
(In reply to Jan Varga [:janv] from comment #18)
> Clearly, we were able to open mozStorageConnection, but the database table
> doesn't exist.

This can actually happen with non-WAL too. I filed bug 1143830 on it.
Comment 20 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-03-28 18:30:38 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/cbd862dd036f
Comment 21 User image Phil Ringnalda (:philor) 2015-03-29 13:17:24 PDT
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/beb016173f6f - something in there appears to have made browser_storage_dynamic_windows.js (in devtools-chrome-4) on Win7 debug (almost all the time) and Win8 debug (more rarely) hit "Assertion failure: !mAsyncExecutionThread (AsyncClose has not been invoked on this connection!)".
Comment 22 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-03-29 20:45:57 PDT
Hrm, IndexedDB doesn't use async mozStorage... mak, any idea what's going on with that failure?
Comment 23 User image Andrew Sutherland [:asuth] 2015-03-29 20:58:00 PDT
I took a brief look earlier today but unfortunately without logging enabled and given our assertion not telling us the database in question, it's really not clear from the logs what code went wrong.  (But it is clear some code is active that creates an asynchronous connection or uses a connection in an asynchronous way and then "leaks" it without calling AsyncClose.  I'd accuse some JS code.)

I'd suggest running locally with the patch-set applied for that test to see what is doing suspicious stuff.  Or somehow making a try build do that.
Comment 24 User image Andrew Sutherland [:asuth] 2015-03-29 20:59:11 PDT
(In reply to Andrew Sutherland [:asuth] from comment #23)
> I'd suggest running locally with the patch-set applied for that test to see
> what is doing suspicious stuff.  Or somehow making a try build do that.

That is, with storage PR_LOG logging enabled to stdout/a file.
Comment 25 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-03-29 23:59:22 PDT
(In reply to Andrew Sutherland [:asuth] from comment #24)

This is a storage problem, bug 1071360.
Comment 26 User image Marco Bonardo [::mak] 2015-03-30 08:45:49 PDT
I don't have any more insight than what Andrew provided.  I guess investigation will continue in bug 1071360.
Comment 27 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-03-30 15:09:34 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/8a8e8efa5d38
Comment 29 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-03-31 14:32:04 PDT
(In reply to Wes Kocher (:KWierso) from comment #28)
> Backed out for frequent bc1 oranges
> https://treeherder.mozilla.org/logviewer.html#?job_id=8256324&repo=mozilla-
> inbound

Actually bug 1112620
Comment 30 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-03-31 16:05:13 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/46c549bb6343
Comment 31 User image Ryan VanderMeulen [:RyanVM] 2015-04-01 09:47:53 PDT
https://hg.mozilla.org/mozilla-central/rev/46c549bb6343
Comment 32 User image Marco Bonardo [::mak] 2015-04-02 05:20:57 PDT
Sorry for the late question, since you moved to WAL, did you consider locking_mode = EXCLUSIVE? It would remove the need for shm files (that have been reported to be problematic on some network fs) and a tiny perf gain. I think we're still in a situation where only one process touches the connection, right?
Comment 33 User image Ben Turner (not reading bugmail, use the needinfo flag!) 2015-04-02 12:14:44 PDT
(In reply to Marco Bonardo [::mak] from comment #32)

Yep, I tried to do this early on. The problem is that we still grab the database from two different threads, and EXCLUSIVE doesn't allow that. Hopefully we're going to drop that second thread in the near future and then I can use EXCLUSIVE.

Note You need to log in before you can comment on or make changes to this bug.