Closed Bug 1356220 Opened 7 years ago Closed 7 years ago

Set journal_size_limit on favicons.sqlite

Categories

(Toolkit :: Places, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: mak, Assigned: mak)

References

Details

(Whiteboard: [fxsearch])

Attachments

(1 file)

journal_size_limit only applied to a single db, not to all the attached dbs, as such we should apply it to favicons.sqlite.

It's unclear whether the same applies to wal_autocheckpoint, the Sqlite documentation doesn't say anything about that.
Richard, does wal_autocheckpoint apply to all the attached databases?
Flags: needinfo?(drh)
The "PRAGMA wal_autocheckpoint" is on by default with a value of 1000, for all attached databases.  If you change the setting to something other than 1000, that change also applies to all attached databases.  In other words, all attached databases have the same autocheckpoint setting.

The default autocheckpoint value of 1000 has worked so well for so many people for so long, that we have never seen any need to provide more fine-grained control.  That could be done, though, in theory and if there is a genuine need.  What problem are you trying to solve?
Flags: needinfo?(drh)
(In reply to D. Richard Hipp from comment #1)
> The default autocheckpoint value of 1000 has worked so well for so many
> people for so long, that we have never seen any need to provide more
> fine-grained control.  That could be done, though, in theory and if there is
> a genuine need.  What problem are you trying to solve?

We use a default page size of 32768, with 1000 pages our wal may grow quite a bit.
For the default page size of 4096 it'd be 8 times smaller.
We set wal_autocheckpoint to 16 pages to limit it to 512KiB. The point is that the default is too large for our page size.

It could even be increased and maybe by doing that we'd gain some perf on large transactions. What do you think? Do you have data about the relation between perf and wal size?

Now that we introduced the first attached database, Here I'm mostly trying to have consistent settings between the main one and the attached one, I don't want the attached db wal to grow to 1000 pages.
to give a complete picture, we set wal_autocheckpoint to 16 pages (512KiB) and journal_size_limit to 3 times that size.
Flags: needinfo?(drh)
Performance is better with larger autocheckpoint sizes (up to a point).  This comes about because with a larger autocheckpoint, more transactions will fit into the WAL file at once, and so when the checkpoint does run, pages that are changed by multiple transactions are only written once, instead of once per transaction.  Also, there is some overhead associated with starting and ending the checkpoint operation, so the less often you run checkpoint, the less overhead.

With too large of a WAL file, query performance begins to fall off (slightly) because SQLite has to spend more time hunting for previously modified pages in the WAL file.  That effect only comes into play for really big autocheckpoint sizes - larger than 1000 - so should not be a factor here.

For best performance, I think you should make wal_autocheckpoint as large as you can (with an upper bound of a few thousand) subject to your temporary disk space usage constraints.

If you want to get really sophisticated, you can use the sqlite3_wal_hook() interface (https://www.sqlite.org/c3ref/wal_hook.html) to register your own function to determine when checkpoints are run, perhaps based on such factors as the amount of free disk space currently available.  The "PRAGMA wal_autocheckpoint" is implemented by using sqlite3_wal_hook(), so if you use sqlite3_wal_hook() directly, that will disable any "PRAGMA wal_autocheckpoint" settings you might have previously made.  The decision function used by "PRAGMA wal_autocheckpoint" can be seen at (https://www.sqlite.org/src/artifact/158326243c5?ln=2001-2019).  It simply invokes sqlite3_wal_checkpoint() if the current size of the WAL file exceeds the "PRAGMA wal_autocheckpoint" setting.  But you can see how you might extend this to take into account which database file is being written, as well as other factors such as available disk space.

NB: The calls to sqlite3BeginBenignMalloc() and sqlite3EndBenignMalloc() are used for testing and are no-ops in a production environment.  You can ignore those two lines of the example - lines 2014 and 2016.

OT: Have you done performance studies lately to see if 32768 still makes since as the page size for FF?  There has been a lot of optimization work done since that decision was made, years and years ago.  Is it still the best choice?
Flags: needinfo?(drh)
(In reply to D. Richard Hipp from comment #4)
> OT: Have you done performance studies lately to see if 32768 still makes
> since as the page size for FF?  There has been a lot of optimization work
> done since that decision was made, years and years ago.  Is it still the
> best choice?

No, the last study was done some years ago by the Firefox Perf Team (Taras Glek at the time) and it's when we moved from 4096 to 32768. It was based on our tier1 platforms (Win, Mac, Lin) and their respective FS.
It's likely that things changed in the last years, with all the great effort put into Sqlite performance.

There are various reasons that may prevent us from running a new measurement shortly:
1. Resources are very limited, all of the company is concentrated on Photon (new UI Design) and Quantum (Backend performance) and there's not much space for other investigations. To spend resources on this kind of measurements we should have some incentive, like a sign that things could improve in a meaningful way for the users.
2. It's still hard to change the page size of WAL databases, since that requires changing the journal, changing the page size and vacuuming. We didn't yet find resources and a good UI approach to convert all of the users to 32k pages, indeed we still have users on 1K and 4K pages (bug 746486), even if it's a minor part of our users base.

Thank you for all the suggestions, I think here I will probably increase the wal size a little bit, considered places.sqlite is larger than the other dbs in the profile folder (up to 60MiB) it seems to make total sense. Maybe I'll just use the default wal journal size of 4MiB. Sounds like it could help some of our larger transactions (history removals, bookmarks imports...)
(In reply to Marco Bonardo [::mak] from comment #5)
> considered places.sqlite is larger than the other dbs
> in the profile folder (up to 60MiB)

It can even take 80MB for sure, at least in my case, with over 75k bookmarks and over 100k in history ;)
(In reply to Virtual_ManPL [:Virtual] - (please needinfo? me - so I will see your comment/reply/question/etc.) from comment #6)
> It can even take 80MB for sure, at least in my case, with over 75k bookmarks
> and over 100k in history ;)

before we moves favicons out of it, sure, mine was 80MB as well, now the limit is set to 60.
Now I remember why we set a small wal autocheckpoint, we use WAL with synchronous = NORMAL to limit the number of fsyncs, and we're trying to limit the number of lost transactions in case of a crash.
It is my understanding that in NORMAL mode it's possible to lose multiple of the transactions in the wal journal. But each transaction has also a checksum, so most of the ones successfully written before the crash should be recoverable in any case.

Based on this, I think I will bump the wal autocheckpoint to 2MB and run some Talos tests to see what happens.
While looking into this, I also found that our journal_size_limit was bogus, it was very low and likely to cause a truncate at every checkpoint :( I'll set it 2MB over the wal checkpoint limit.

TL;DR: the current plan is wal_autocheckpoint = 2MB, journal_size_limit = 4MB.
note: I'm running a Talos comparison on the Try server.
(In reply to Marco Bonardo [::mak] from comment #8)
> Now I remember why we set a small wal autocheckpoint, we use WAL with
> synchronous = NORMAL to limit the number of fsyncs, and we're trying to
> limit the number of lost transactions in case of a crash.
> It is my understanding that in NORMAL mode it's possible to lose multiple of
> the transactions in the wal journal. But each transaction has also a
> checksum, so most of the ones successfully written before the crash should
> be recoverable in any case.

If by "crash" you mean a power loss event, then your understanding is correct.  But if FF dies abruptly due to a SIGKILL, for example, nothing will be lost.  Also, the multiple transaction loss due a power failure is theoretical, and is much less likely to happen on a modern filesystems.
Comment on attachment 8860465 [details]
Bug 1356220 - Set a journal_size_limit on favicons.sqlite and improve the wal autocheckpoint value.

https://reviewboard.mozilla.org/r/132460/#review135394
Attachment #8860465 - Flags: review?(adw) → review+
Talos shows an improvement in tp5n nonmain_normal_fileio

(In reply to D. Richard Hipp from comment #11)
> Also, the multiple transaction loss due a power failure is
> theoretical, and is much less likely to happen on a modern filesystems.

Thanks, this makes me more comfortable.
Pushed by mak77@bonardo.net:
https://hg.mozilla.org/integration/autoland/rev/4719f83be952
Set a journal_size_limit on favicons.sqlite and improve the wal autocheckpoint value. r=adw
https://hg.mozilla.org/mozilla-central/rev/4719f83be952
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
and we see talos performance improvements :)

== Change summary for alert #6138 (as of April 22 2017 08:13 UTC) ==

Improvements:

 16%  tp5n nonmain_normal_fileio windows7-32 pgo e10s     412840441.42 -> 347366520.62
 16%  tp5n nonmain_normal_fileio windows7-32 pgo          412385991.56 -> 347133076.33
 16%  tp5n nonmain_normal_fileio windows7-32 opt          411477009.75 -> 346549722.75
 16%  tp5n nonmain_normal_fileio windows7-32 opt e10s     410590352.17 -> 346451660.08

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=6138
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: