Closed Bug 573492 Opened 10 years ago Closed 9 years ago

Use WAL journaling for places.sqlite

Categories

(Toolkit :: Places, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla2.0b9
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: mak, Assigned: sdwilsh)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [fixed-in-places])

Attachments

(1 file, 11 obsolete files)

the new journal mode could reduce fsyncs without large changes on our side.
Depends on: SQLite3.7.0.1
Summary: Investigate using WAL journal for places.sqlite → Use WAL journaling for places.sqlite
Assignee: nobody → mak77
Status: NEW → ASSIGNED
Attached patch patch v1.0 (obsolete) — Splinter Review
Attached patch patch v1.1 (obsolete) — Splinter Review
minor change.

Notice this applies on top of bug 552023, landing this before will require moving the shutdown journal setup to PlacesDBFlush.
Attachment #459835 - Attachment is obsolete: true
Comment on attachment 459920 [details] [diff] [review]
patch v1.1

asking feedback instead of review for the dependency. It should be ready to review apart that but you could need to unbitrot it if willing to push before I'm back.
Attachment #459920 - Flags: feedback?(sdwilsh)
tentative blocking for I/O (fsyncs) win.
WAL is a new journaling mode introduced in sqlite 3.7.0 that reduces I/O and fragmentation.
blocking2.0: --- → ?
Attached patch patch v1.2 (obsolete) — Splinter Review
Attachment #459920 - Attachment is obsolete: true
Attachment #460037 - Flags: feedback?(sdwilsh)
Attachment #459920 - Flags: feedback?(sdwilsh)
Comment on attachment 460037 [details] [diff] [review]
patch v1.2

This looks fine.  I'll do a proper review of it too.
Attachment #460037 - Flags: review?(sdwilsh)
Attachment #460037 - Flags: feedback?(sdwilsh)
Attachment #460037 - Flags: feedback+
OS/2 guys: maybe you want to see if this patch works on OS/2. As we don't have WAL support implemented in our SQLite port, it should fall-back, but might be good to check this before this lands.

> nsresult
>+nsNavHistory::SetJournaling(const nsCString& aJournalMode) {
Just curious: is Mozilla (or places) migrating to a Java-like end-of-the line brace style?
Note that while using WAL will result in less fsyncs (one instead of three), it'll still sync on every commit.

To take full advantage, the synchronous mode should be changed to normal ("PRAGMA synchronous = NORMAL"). I believe this setting is already mapped to a hidden pref ("toolkit.storage.synchronous").

With that setting, sqlite will only fsync the database when checkpointing the journal. This trades off durability (the last seconds of changes before a crash may be lost). If I read the documentation correctly, in WAL mode it's safe to use NORMAL (it doesn't increase the risk of corruption like it supposedly does for non-WAL databases).

Finally since checkpointing does sync and will therefore block, it should be done in some background thread periodically instead of relying on the automatic mechanism which triggers when the WAL file grows to 1000 pages and will cause an unexpected hitch. Fortunately the checkpointing doesn't block the database, that is, you can read and write to it while it's taking place.
(In reply to comment #7)
> OS/2 guys: maybe you want to see if this patch works on OS/2. As we don't have
> WAL support implemented in our SQLite port, it should fall-back, but might be
> good to check this before this lands.
Please let us know if we'll need to ifdef this out or make special code for you guys.

> Just curious: is Mozilla (or places) migrating to a Java-like end-of-the line
> brace style?
No, but that's something I'd catch in review.

(In reply to comment #8)
> Note that while using WAL will result in less fsyncs (one instead of three),
> it'll still sync on every commit.
Right, we are OK with having fsyncs (we want durability), but reducing them is a win.

> To take full advantage, the synchronous mode should be changed to normal
> ("PRAGMA synchronous = NORMAL"). I believe this setting is already mapped to a
> hidden pref ("toolkit.storage.synchronous").
We will not be doing that though because we've seen an increase in database corruption in the past when we had it set to that.  Places is always at synchronous = FULL.

> Finally since checkpointing does sync and will therefore block, it should be
> done in some background thread periodically instead of relying on the automatic
> mechanism which triggers when the WAL file grows to 1000 pages and will cause
> an unexpected hitch. Fortunately the checkpointing doesn't block the database,
> that is, you can read and write to it while it's taking place.
A large portion of places SQLite I/O is done on a background thread already to avoid fsyncs, writes, and reads on the main thread.
blocking2.0: ? → betaN+
(In reply to comment #6)
> Comment on attachment 460037 [details] [diff] [review]
> patch v1.2
(In reply to comment #7)
> OS/2 guys: maybe you want to see if this patch works on OS/2. As we don't have
> WAL support implemented in our SQLite port, it should fall-back, but might be
> good to check this before this lands.
After a slight correction of the patch for bit rot it built fine, the browser comes up and stays alive. Any stress test I can do to find out whether there might be a not so obvious regression?
Ideally running the tests should do it.  Also making sure history is saved when you restart.
Comment on attachment 460037 [details] [diff] [review]
patch v1.2

>+++ b/toolkit/components/places/src/nsNavHistory.cpp
>@@ -142,9 +143,12 @@ using namespace mozilla::places;
> // Filename used to backup corrupt databases.
> #define DATABASE_CORRUPT_FILENAME NS_LITERAL_STRING("places.sqlite.corrupt")
> 
>-// We use the TRUNCATE journal mode to reduce the number of fsyncs.  Without
>-// this setting we had a Ts hit on Linux.  See bug 460315 for details.
>-#define DATABASE_JOURNAL_MODE "TRUNCATE"
>+// We use the WAL journal mode to reduce the number of fsyncs.
>+#define DATABASE_DEFAULT_JOURNAL_MODE NS_LITERAL_CSTRING("wal")
>+// If the system does not support WAL, or we cannot use it, we will fallback
>+// to this setting.  Other journal modes caused a Ts hit on Linux.
>+// See bug 460315 for details.
>+#define DATABASE_FALLBACK_JOURNAL_MODE NS_LITERAL_CSTRING("truncate")
We should use an enum for this at this point.

> nsresult
>+nsNavHistory::SetJournaling(const nsCString& aJournalMode) {
>+  // Note: binding seems to be not working on pragma statements.
yeah, it does not work, and I'm not sure the comment is really needed here.

>+  // WARNING:
>+  //  Don't leave unfinalized statements around till journal mode has been set,
>+  //  some journaling mode does not like them and will fail to be set.
nit: wording is a bit awkward

>+  // Be sure to set journaling after cache_size and page_size, some mode will
>+  // prevent that kind of changes.
nit: journal mode

>-    rv = mDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
>-        "PRAGMA journal_mode = MEMORY"));
>-    NS_ENSURE_SUCCESS(rv, rv);
>+    (void)SetJournaling(NS_LITERAL_CSTRING("memory"));
We used to check the result here.  I think we still should.

>-    rv = mDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
>-        "PRAGMA journal_mode = " DATABASE_JOURNAL_MODE));
>-    NS_ENSURE_SUCCESS(rv, rv);
>+    if (NS_FAILED(SetJournaling(DATABASE_DEFAULT_JOURNAL_MODE))) {
>+      (void)SetJournaling(DATABASE_FALLBACK_JOURNAL_MODE);
>+    }
Ditto.

>+    // Set back journaling to a backwards compatible value.  Newer journal modes
>+    // like WAL make the database incomatible with old versions of the browser.
>+    (void)SetJournaling(NS_LITERAL_CSTRING("memory"));
We should link to the SQLite documentation about this too.

>@@ -5483,38 +5527,14 @@ nsNavHistory::VacuumDatabase()
>                                              nsnull);
>     }
> 
>-    // Actually vacuuming a database is a slow operation, since it could take
>-    // seconds.  Part of the time is spent in updating the journal file on disk
>-    // and this is particularly bad on devices with slow I/O.  Temporary
>-    // moving the journal to memory could increase a bit the possibility of
>-    // corruption if we crash during this time, but makes the process really
>-    // faster.
>-    nsCOMPtr<mozIStorageStatement> journalToMemory;
>-    rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>-        "PRAGMA journal_mode = MEMORY"),
>-      getter_AddRefs(journalToMemory));
>-    NS_ENSURE_SUCCESS(rv, rv);
>-
>-    nsCOMPtr<mozIStorageStatement> vacuum;
>-    rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING("VACUUM"),
>-                                  getter_AddRefs(vacuum));
>-    NS_ENSURE_SUCCESS(rv, rv);
>-
>-    nsCOMPtr<mozIStorageStatement> journalToDefault;
>-    rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>-        "PRAGMA journal_mode = " DATABASE_JOURNAL_MODE),
>-      getter_AddRefs(journalToDefault));
>-    NS_ENSURE_SUCCESS(rv, rv);
>-
>-    mozIStorageBaseStatement *stmts[] = {
>-      journalToMemory,
>-      vacuum,
>-      journalToDefault
>-    };
So, why are we nuking all of this?

>+++ b/toolkit/components/places/src/nsNavHistory.h
>+  nsresult SetJournaling(const nsCString& aJournalMode);
>+  nsCString mCurrentJournalMode;
Both of these should take the previously mentioned enum I think.
Attachment #460037 - Flags: review?(sdwilsh) → review-
Just a comment about the memory-journal-during-vacuum thing. The only difference between a memory journal and no journal is that the former allows you to rollback a transaction, whereas the latter doesn't. In both cases, a interruption will corrupt the database almost every time (1). A memory journal doesn't protect the database integrity at all.

In the particular case of vacuum, there are two phases (creating the clean db and copying it back). A crash during the second phase without a disk journal will corrupt the database practically 100% of the time (the way everything is rearranged during a vacuum the chances of survival are almost zero).

Moreover vacuum can't be inside any other transactions, and should have no reason to rollback (at the point where writes to the actual database file start no more disk space is needed). So a memory journal gives you exactly nothing. Either you risk the database during that time (journal off) or you don't (disk journal). Currently you do, and I don't think playing that lottery once a month or so is a reasonable game.

Also, a memory journal during vacuum has an additional risk: a vacuum journal will reach the size of the database itself. If the db is large and the machine has little RAM and starts swapping, the time needed and therefore the risk window would be widened a lot (and the user will be more likely to end the process at that time since it stops responding).

SQLite could probably be improved to vacuum faster on filesystems that support atomic rename (right now it generates a defragmented db on a temp file, and then copies it back over the original file; it could just rename the temporary file over the original one). This would also help defragment the files on-disk.

(1) http://www.sqlite.org/pragma.html#pragma_journal_mode
And also the fact setting journal to another mode, vacuum, and go back to WAL will fail, because we have unfinalized statements. This will be a problem because we need to do vacuum out of wal if the page size is not the expected one (page size cannot be changed after entering wal mode).
This issue should be moved to the global vacuum component bug, most likely when a vacuum+page_size change happens we will have to fallback to truncate mode till the browser is restarted to correctly do the page_size change.

Btw, regarding the crash->corruption problems we are aware of them, but the chance of having a crash while copying back the file were smaller than the perf penalty we were paying. Actually I redid the same tests and looks like vacuum is now faster also in truncate mode (and almost as fast in wal mode), so the memory thing can go away with recent sqlite versions.
(In reply to comment #12)
> >-    rv = mDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
> >-        "PRAGMA journal_mode = MEMORY"));
> >-    NS_ENSURE_SUCCESS(rv, rv);
> >+    (void)SetJournaling(NS_LITERAL_CSTRING("memory"));
> We used to check the result here.  I think we still should.
> 
> >-    rv = mDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
> >-        "PRAGMA journal_mode = " DATABASE_JOURNAL_MODE));
> >-    NS_ENSURE_SUCCESS(rv, rv);
> >+    if (NS_FAILED(SetJournaling(DATABASE_DEFAULT_JOURNAL_MODE))) {
> >+      (void)SetJournaling(DATABASE_FALLBACK_JOURNAL_MODE);
> >+    }
> Ditto.


I'm not sure makes sense to check failures there, if we fail setting mode we will stay to the default DELETE mode, and browser can still work, the code will already warn in debug builds.
If we check failure the database could be marked as corrupt, when maybe it's not and is instead just a filesystem not supporting that mode. That will cause us to be completely not working.
Imo warnings are enough there.
Alright, I'm fine with it then.
Attached patch patch v2.0 (obsolete) — Splinter Review
Addresses comments and can land before temp tables removal.
Attachment #460037 - Attachment is obsolete: true
Attachment #462339 - Flags: review?(sdwilsh)
Probably the suggestion to run checkpointing in a separate thread makes sense... but it would be hard to detect each transaction without using sqlite3_wal_hook(), so that should probably be exposed in Storage, or storage should automatically do checkpointing in a separate thread for all users (that seems to make sense)
btw, some checkpointing could already happen in a separate thread... if the query that causes wal to go over limit is an async query for example, checkpointing happens in that thread... since we are supposed to use async stmts more than sync ones, the problem could be fixed by itself.
Comment on attachment 462339 [details] [diff] [review]
patch v2.0

For a review with more context with the comments, please see http://reviews.visophyte.org/r/462339/.

on file: toolkit/components/places/src/nsNavHistory.h line 126
>     JOURNAL_DELETE = 0
>     // Can reduce fsyncs on Linux when journal is deleted (See bug 460315).
>     // We fallback to this mode when WAL is unavailable.
>   , JOURNAL_TRUNCATE
>     // Journal is never deleted, can cause a privacy hit.
>   , JOURNAL_PERSIST
>     // Unsafe in case of crashes on database swap or low memory.
>   , JOURNAL_MEMORY
>     // Can globally reduce number of fsyncs.  We try to use this mode by default.
>   , JOURNAL_WAL
>     // Unsafe mode due to corruption on crashes.
>   , JOURNAL_OFF

Not sure it's worthwhile to have modes here that we do not use.


r=sdwilsh
Attachment #462339 - Flags: review?(sdwilsh) → review+
Blocks: 563538
Attached patch patch v2.1 (obsolete) — Splinter Review
Attachment #462339 - Attachment is obsolete: true
Attached patch patch v2.2 (obsolete) — Splinter Review
Attachment #464813 - Attachment is obsolete: true
Attached patch patch v2.3 (obsolete) — Splinter Review
forgot to qrefresh.
Attachment #464825 - Attachment is obsolete: true
Looks like this regressed Ts on linux:

Regression: Ts increase 25.7% on Linux Firefox
   Previous results:
       450.368 from build 20100813121430 of revision da59cce7bfa2 at 2010-08-13 12:20:20 on talos-r3-fed-018 run # 0
   New results:
       566.211 from build 20100813121912 of revision dfd35987dd82 at 2010-08-13 12:50:31 on talos-r3-fed-004 run # 0
   http://mzl.la/benerL
   http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=da59cce7bfa2&tochange=dfd35987dd82
(In reply to comment #24)
> Looks like this regressed Ts on linux:

noticed, we are waiting for numbers from all platforms to decide.
forgot to mark
http://hg.mozilla.org/mozilla-central/rev/dfd35987dd82
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b4
Just to annotate things down, since I think it's useful for everybody, while looking around for known perf issues on ext4, I found an article saying ext4 with barriers on can regress sqlite performances, and seeing how wal and barrries are working that could be a bad interaction.
I asked catlee if it would be possible to run a Ts test on a linux box with barrier=0 on ext4, he said should be possible. If the regression is linux only that test would be really interesting.
But I can't tell more till I see other filesystems results.
the global effect was Ts regressions on Linux and Windows, some of them are worrying too like a 52% on Win7...

But a 10% Ts win on OS X.

Seeing the fact max dirty profile loses less than min dirty profile, sounds like there is some fixed "setup time" to create the wal that hurts more small profiles in percentage.
Btw, due to the fact this hurts 2 systems out of 3 i'm going to backout and we should ask SQLite team if they have evidence of these times.
Backed out
http://hg.mozilla.org/mozilla-central/rev/41d486ae06b2
http://hg.mozilla.org/mozilla-central/rev/2fa5b630f3bc

Shawn, can we contact someone at SQLite and try to figure out if this expensiveness on creation is expected/fixable somehow?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
To summarize, we saw the following regressions (sorted by platform):
Ts increase 25.7% on Linux Firefox
Ts, MIN Dirty Profile increase 23.8% on Linux Firefox
Ts, MED Dirty Profile increase 24.7% on Linux Firefox
Ts, MAX Dirty Profile increase 20.7% on Linux Firefox
Ts Shutdown, Cold MIN Dirty Profile increase 9.3% on Linux Firefox

Ts increase 30.8% on Linux x64 Firefox
Ts, MIN Dirty Profile increase 21.8% on Linux x64 Firefox
Ts, MED Dirty Profile increase 20.6% on Linux x64 Firefox
Ts, MAX Dirty Profile increase 21.4% on Linux x64 Firefox

Ts increase 37.3% on XP Firefox
Ts, MIN Dirty Profile increase 27.4% on XP Firefox
Ts, MED Dirty Profile increase 29.4% on XP Firefox
Ts, MAX Dirty Profile increase 28.9% on XP Firefox
Ts, Cold MIN Dirty Profile increase 21.6% on XP Firefox
Ts, Cold MED Dirty Profile increase 30.6% on XP Firefox
Ts, Cold MAX Dirty Profile increase 28% on XP Firefox

Ts increase 52.7% on Win7 Firefox
Ts, MIN Dirty Profile increase 44.4% on Win7 Firefox
Ts, MED Dirty Profile increase 28% on Win7 Firefox
Ts, MAX Dirty Profile increase 9.97% on Win7 Firefox
Ts, Cold MIN Dirty Profile increase 48.4% on Win7 Firefox
Ts, Cold MED Dirty Profile increase 26.9% on Win7 Firefox
Ts, Cold MAX Dirty Profile increase 9.8% on Win7 Firefox
Ts Shutdown increase 19.7% on Win7 Firefox
Tp4 (Memset) increase 1.67% on Win7 Firefox
Tp4 (%CPU) increase 10.4% on Win7 Firefox


And the following improvements:
Ts, MIN Dirty Profile decrease 10% on MacOSX 10.6.2 Firefox
Ts, MAX Dirty Profile decrease 8.93% on MacOSX 10.6.2 Firefox
Dromaeo (V8) increase 9.92% on XP Firefox (weird...probably unrelated to this change)
SQLite folks are looking into this.
Attached file Just enabling WAL (obsolete) —
I pushed this minimal changeset to the places branch (http://hg.mozilla.org/projects/places//rev/b3c24cff8af1) and this is the perf data comparison with changesets on mozilla-central (116f2046b9ef, 007d994cac53, 2dbb1278a15c, 34737d9895f8, and 40fa4ebeacbb) that should be a good baseline.

It looks like Ts is basically even (small regressions at worst).  What's more interesting is the Ts_generated results that show a regression for min, but med and max are OK.  Cannot really explain that yet.
Attached file Just enabling WAL (better comparisons) (obsolete) —
OK, this fixes some issues with the compare-talos stuff and displays more information.  I also had some more talos runs kicked off this morning (not sure if windows is reporting that yet though, but mac and linux should be).  Same changesets as before.

There's only one small possible regression here on windows for Ts_generated_max.
Attachment #467040 - Attachment is obsolete: true
Attachment #467194 - Attachment is obsolete: true
OK, just going to provide links at this point instead of doing HTML attachments.

Just enabling WAL (http://hg.mozilla.org/projects/places/rev/b3c24cff8af1):
http://tinyurl.com/2fugkpq

Just enabling WAL with SQLite 3.7.1 alpha build (http://hg.mozilla.org/projects/places/rev/a8a5613f5855):
http://tinyurl.com/23rvzot

Converting the database at shutdown back to TRUNCATE on SQLite 3.7.0.1 (http://hg.mozilla.org/projects/places/rev/3760a7e09c3b):
http://tinyurl.com/24ygezf (I will be filing a bug to get two more talos runs on this change shortly)
Interesting thing to note so far is that we aren't seeing the regressions from comment 30 (although we are seeing different regressions - weird).
Er...fun times.  I've been testing against SQLite 3.6.23.1...

Interesting that tests were not failing.  Anywho, I just pushed mak's original change, and now I'm going to reland 3.7.1, and we'll get numbers on that.
All of these are with a snapshot of 3.7.1.

mak's original set of changes:
http://tinyurl.com/27zgkqd

Just enabling WAL:
http://tinyurl.com/23rvzot

Converting the database at shutdown back to TRUNCATE:
http://tinyurl.com/27tu36p
Based on comment 37, it looks like the regression is caused by switching back to TRUNCATE on shutdown.
Marco, I just talked to LegNeato about the possibility of getting 3.7.1 on 1.9.2 and he said that shipping it in beta 5 and letting people beat on it for a little bit would be enough for him to be comfortable.  However, he is concerned about folks downgrading from 4.0 to 3.6, but wants to talk to Cww to see how often people downgrade.  If push comes to shove, we can do a change to places2.sqlite as well, but I think we clearly cannot do the TRUNCATE switch on shutdown.
I have done some tests about switching out of WAL on an Win XP machine. Going to MEMORY or OFF instead of TRUNCATE roughly doubles the speed of the switch. I think this is safe.

Using PRAGMA synchronous = OFF and then switching to any non-WAL mode is instantaneous, but that's probably not safe for the data if there's a crash at that very moment.
(In reply to comment #40)
> I have done some tests about switching out of WAL on an Win XP machine. Going
> to MEMORY or OFF instead of TRUNCATE roughly doubles the speed of the switch. I
> think this is safe.
If a crash were to happen, we'd be out of luck then though (at least for downgrades; might be OK for staying on the same version).

> Using PRAGMA synchronous = OFF and then switching to any non-WAL mode is
> instantaneous, but that's probably not safe for the data if there's a crash at
> that very moment.
Yeah, that's risky.
I think the problem is slightly different. enabling WAL has a cost clearly, but since Talos discards the best and worst result, if you switch to TRUNCATE each startup will pay the WAL cost, if you don't do that, then only the first startup will pay that cost and that result will be discarded, so you won't see a regression (but it was there just once). If the problem would be setting TRUNCATE on shutdown you'd see a regression in Ts Shutdown but not in Ts.
So, having an insight in why creating a wal journal is so expensive for SQLite would still be good.

If there is no way to speed up the wal creation on SQLite side, the only solutions to the regression on our side are:
- push sqlite 3.7.1 to 3.5.x and 3.6.x and pay the wal cost just once (or ignore 3.5.x since it's unlilely someone will downgrade to that one).
- make places2.sqlite, manually migrate history from places.sqlite and delete it (expensive from a coding point of view)
(In reply to comment #42)
> I think the problem is slightly different. enabling WAL has a cost clearly, but
> since Talos discards the best and worst result, if you switch to TRUNCATE each
> startup will pay the WAL cost, if you don't do that, then only the first
> startup will pay that cost and that result will be discarded, so you won't see
> a regression (but it was there just once). If the problem would be setting
> TRUNCATE on shutdown you'd see a regression in Ts Shutdown but not in Ts.
> So, having an insight in why creating a wal journal is so expensive for SQLite
> would still be good.
Right, this is what I was trying to say.  Sorry if it came across differently.

> - push sqlite 3.7.1 to 3.5.x and 3.6.x and pay the wal cost just once (or
> ignore 3.5.x since it's unlilely someone will downgrade to that one).
I personally prefer this due to costs (and not worry about 1.9.1).
Shawn, so what's current plan for sqlite 3.7.1 on 1.9.2?
Changing my patch to avoid truncate change at shutdown is pretty trivial but this is blocked by the downgrade compatibility issue.
Whiteboard: [waiting for sqlite3.7.1 on 1.9.2 branch][breaks backwards compatibility]
Depends on: SQLite3.7.1
No longer depends on: SQLite3.7.0.1
(In reply to comment #44)
> Shawn, so what's current plan for sqlite 3.7.1 on 1.9.2?
> Changing my patch to avoid truncate change at shutdown is pretty trivial but
> this is blocked by the downgrade compatibility issue.
Needs to bake in a beta before we can land it there.  We should take this change though but need to make sure we backport before final.
Stealing this.
Assignee: mak77 → sdwilsh
Target Milestone: mozilla2.0b4 → ---
Attached patch v2.4 (obsolete) — Splinter Review
Removes the bit rot (this isn't ready to land yet)
Attachment #464826 - Attachment is obsolete: true
Attached patch v2.5 (obsolete) — Splinter Review
This just does the startup stuff and does not convert at shutdown.  Going to have mak review this just to make sure I didn't do anything stupid, but this passes our xpchshell tests with flying colors.
Attachment #474235 - Attachment is obsolete: true
Attachment #474264 - Flags: review?(mak77)
Status: REOPENED → ASSIGNED
Whiteboard: [waiting for sqlite3.7.1 on 1.9.2 branch][breaks backwards compatibility] → [needs review mak]
Keywords: perf
Blocks: 595574
Comment on attachment 474264 [details] [diff] [review]
v2.5

># HG changeset patch
># Parent a293b6b40cece8a2fed0121bdef9c2b8eba72878
># User Marco Bonardo <mbonardo@mozilla.com>, Shawn Wilsher <me@shawnwilsher.com>

didn't know setting multiple users was working this way. How does this appear in pushlog?


> nsresult
>+nsNavHistory::SetJournalMode(enum JournalMode aJournalMode) {
>+  nsCAutoString journalMode;
>+  switch (aJournalMode) {
>+    case JOURNAL_DELETE:
>+      journalMode.AssignLiteral("delete");
>+      break;
>+    case JOURNAL_TRUNCATE:
>+      journalMode.AssignLiteral("truncate");
>+      break;
>+    case JOURNAL_MEMORY:
>+      journalMode.AssignLiteral("memory");
>+      break;
>+    case JOURNAL_WAL:
>+      journalMode.AssignLiteral("wal");
>+      break;
>+    default:
>+      NS_ABORT_IF_FALSE(false, "Trying to set an unknown journal mode.");

should be NS_NOTREACHED
Also it's probably better to set default to a default mode, or we will try to ser journal to an empty string

>+  NS_WARN_IF_FALSE(succeeded,
>+                   nsPrintfCString(128, "Setting journal mode failed: %s",
>+                                   PromiseFlatCString(journalMode).get()).get());
>+  if (succeeded) {
>+    mCurrentJournalMode = aJournalMode;
>+  }
>+  else {
>+    return NS_ERROR_FAILURE;
>+  }

since there is already an if (succeeded) I'd convert the NS_WARN_IF_FALSE to a NS_WARNING inside the if

>+nsresult
> nsNavHistory::InitDB()
> {

>@@ -1832,9 +1887,7 @@ nsNavHistory::MigrateV9Up(mozIStorageCon
>     // reducing write times by a half, but will temporary consume more memory
>     // and increase risks of corruption if we should crash in the middle of this
>     // update.
>-    rv = mDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
>-        "PRAGMA journal_mode = MEMORY"));
>-    NS_ENSURE_SUCCESS(rv, rv);
>+    (void)SetJournalMode(JOURNAL_MEMORY);
> 
>     rv = mDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
>         "UPDATE moz_places SET last_visit_date = "
>@@ -1844,9 +1897,12 @@ nsNavHistory::MigrateV9Up(mozIStorageCon
>     NS_ENSURE_SUCCESS(rv, rv);
> 
>     // Restore the default journal mode.
>-    rv = mDBConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
>-        "PRAGMA journal_mode = " DATABASE_JOURNAL_MODE));
>-    NS_ENSURE_SUCCESS(rv, rv);
>+    if (NS_FAILED(SetJournalMode(JOURNAL_WAL))) {
>+      // Ignore errors, if we fail here the database could be considered corrupt
>+      // and we won't be able to go on, even if it's just matter of a bogus file
>+      // system.  The default mode (DELETE) will be fine in such a case.
>+      (void)SetJournalMode(JOURNAL_TRUNCATE);
>+    }

I'd remove these temporary journal changes in migrateV9Up:
- first because from my testing latest sqlite versions are almost as fast, so the gain is ignorable
- second because this code path is only touched by users upgrading from FX3.5.x, that is a small percentage
- third because they are risky both for not being able to restore WAL, but also for dataloss on crashes. Since now tha gain is minor, the risk does not pay for it.


>+    // If journal mode is WAL, a VACUUM cannot upgrade page_size value.
>+    // If current page_size is not the expected one, journal mode must be
>+    // changed to a rollback one.  Once done we won't be able to go back to WAL
>+    // mode though, since unfinalized statements exist.  Just keep using
>+    // compatible mode till next restart.

from what you said, unfinalized is wrong, should be not-resetted

>     nsCOMPtr<mozIStoragePendingStatement> ps;
>     nsRefPtr<VacuumDBListener> vacuumDBListener =
>       new VacuumDBListener(mPrefBranch);
>-    rv = mDBConn->ExecuteAsync(stmts, NS_ARRAY_LENGTH(stmts),
>-                               vacuumDBListener, getter_AddRefs(ps));
>+    rv = vacuum->ExecuteAsync(nsnull, getter_AddRefs(ps));

the callback should be vacuumDBListener, not nsnull

r=me with the above fixed/answered
Attachment #474264 - Flags: review?(mak77) → review+
(In reply to comment #49)
> didn't know setting multiple users was working this way. How does this appear
> in pushlog?
Shows up as the first person on the web, but hg stores it properly so if that ever gets fixed, things show up right.  On the changeset page is also shows up correctly.
Whiteboard: [needs review mak] → [needs updated patch]
Perf comparisons will show up here: http://tinyurl.com/29tcotl
Attached patch v2.6Splinter Review
Addresses review comments
Attachment #474264 - Attachment is obsolete: true
Whiteboard: [needs updated patch] → [needs perf data]
Depends on: 596970
Also includes Tp4 numbers: http://tinyurl.com/2632hze
Depends on: 598153
Depends on: 598179
(In reply to comment #49)
> Comment on attachment 474264 [details] [diff] [review]
> >+      NS_ABORT_IF_FALSE(false, "Trying to set an unknown journal mode.");
> 
> should be NS_NOTREACHED

We really should be using ABORTs instead of ASSERTIONs for new code.
where is this documented? If this is what we want, we should blog about it and add some deprecate annotation to nsDebug.h

btw, using NS_ABORT here is fine.
(In reply to comment #54)
> We really should be using ABORTs instead of ASSERTIONs for new code.
Why?
It looks to me like we get the following performance wins with this patch:
Tp4 win on windows and linux (32 and 64).
ts_cold on linux (32 and 64).

...and the following regressions:
memory usage regressions
tp4 on osx 64-bit only (32-bit is fine)

Notes:
1) the Ts generated places database results are weird and don't make any sense.  This could be due to an artifact in how the databases are created (they aren't created with WAL).  Some of the smaller sized databases are reporting regressions, and the bigger ones are reporting wins, which is not what I would expect.  I've never really trusted these numbers because of their noise, and I'm not convinced they are real regressions (getting more data points on the before and after would likely help, but is difficult to do).
2) The increase in memory usage is expected since WAL uses shared memory to write to the files (mmap on unix, and I have no idea what it uses on windows).  These are expected, but give us wins (and let us use less memory when we use more than one connection to a database like we will in bug 582703).

Comment 53 has a link to the page I'm using to compare runs and.  I'm going to ask for explicit approval to land this and to accept these regressions.  If stuff shakes out that we didn't expect on mozilla-central, then I'll obviously need to backout and do more investigation.
Whiteboard: [needs perf data] → [needs approval]
Attachment #476849 - Flags: approval2.0?
And for the sake of interesting numbers, this is bug 573492 (WAL) and bug 552023 (kill temp tables) perf numbers: http://tinyurl.com/2dodkwm
See bug 583611 comment 19 on issues with upgrading to SQLite 3.7.1 on branch.  We are in a bit of a sticky situation...
Blocks: 599980
(In reply to comment #57)
> ...and the following regressions:
> memory usage regressions
> tp4 on osx 64-bit only (32-bit is fine)
Sorry, that's ts_cold.  tp4 looks fine.
Blocks: 600980
I might be a bit late to the table here... but I've been reading about WAL at http://www.sqlite.org/draft/wal.html and it very specifically says:

"When a reader needs a page of content, it first checks the WAL to see if that page appears there, and if so it pulls in the last copy of the page that occurs in the WAL prior to the reader's end mark. If no copy of the page exists in the WAL prior to the reader's end mark, then the page is read from the original database file. Readers can exist in separate processes, so to avoid forcing every reader to scan the entire WAL looking for pages (the WAL file can grow to multiple megabytes, depending on how often checkpoints are run), a data structure called the "wal-index" is maintained in shared memory which helps readers locate pages in the WAL quickly and with a minimum of I/O. The wal-index greatly improves the performance of readers, but the use of shared memory means that all readers must exist on the same machine. This is why the write-ahead log implementation will not work on a network filesystem."

How do we cater for people who do actually use a network filesystem? I'm sure there are folks who are using NFS...
(In reply to comment #61)
> How do we cater for people who do actually use a network filesystem? I'm sure
> there are folks who are using NFS...
Firefox has never supported using the same profile in more than one running instance of Firefox, so this isn't a problem here.
Somebody want to tell this guy?

https://blogs.cs.umbc.edu/willm1/2009/02/08/firefox-3-over-nfs-sucks/

But understood... though where is this in the documentation?
(In reply to comment #63)
> Somebody want to tell this guy?
> 
> https://blogs.cs.umbc.edu/willm1/2009/02/08/firefox-3-over-nfs-sucks/
That's a completely different issue.

> But understood... though where is this in the documentation?
To be clear, you can use a profile over NFS, you just can't have more than one copy of Firefox using that profile.  This applies always, regardless of where your profile is located.
Ah! OK, that makes more sense now.
http://hg.mozilla.org/projects/places/rev/31c960a57b46
Whiteboard: [needs approval] → [landed in places]
Whiteboard: [landed in places] → [fixed-in-places]
Depends on: 602729
Shawn, regarding the Ts reg we see on Linux, could still make sense to investigate comment 27, it is still possible ext4 with barriers is hurting us, and that would explain why we suffer here an not on osx or win. Also if that would be the case, I'd not care much about perf when the user explicitly decided to drop perf for data coherence.
I believe that when I asked someone from releng, they said we were running ext3, not ext4 on the talos boxes.  Should probably confirm that though...
well they told me the exact opposite on August, i.e. all boxes are on ext4 :(
(In reply to comment #69)
> well they told me the exact opposite on August, i.e. all boxes are on ext4 :(
Alright, let's check again and then report back here :)
<jhford-buildduty>	sdwilsh: fedora talos slaves have barriers enabled
that "could" explain why sometimes linux shows regressions not shown by other OS, slower writes could do it.
Blocks: 609122
Comment on attachment 476849 [details] [diff] [review]
v2.6

<johnath> sdwilsh: do I want to approve the patch on bug 573492?
<sdwilsh> johnath: it's part of the places branch...when that's ready to go in, it's going to be merged.  By itself, it's a regression, with all of our stuff...well, see http://etherpad.mozilla.com:9000/places-branch
<sdwilsh> johnath: can probably cancel the request, fwiw
Attachment #476849 - Flags: approval2.0?
http://hg.mozilla.org/mozilla-central/rev/31c960a57b46
Status: ASSIGNED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b9
You need to log in before you can comment on or make changes to this bug.