Closed Bug 1001579 Opened 10 years ago Closed 4 months ago

Optimize SQLite connection settings

Categories

(MailNews Core :: Database, defect)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: gps, Unassigned)

References

Details

(Keywords: perf)

I was poking around severe performance issues with my Thunderbird profile. I noticed my global-messages-db.sqlite file was ~1GB and thought "hmm, that's a big database. I should explore that."

I started poking around the code and I've identified some suboptimal SQLite settings.

For reference, the SQLite connection is established at https://hg.mozilla.org/comm-central/file/734c1ddd7443/mailnews/db/gloda/modules/datastore.js#l1297. Essentially:

1) Set custom cache size
2) PRAGMA SYNCHRONOUS=FULL

There is a P1 performance problem around SYNCHRONOUS=FULL. SYNCHRONOUS=FULL means that data is flushed to disk (fflush() on POSIX) after every transaction. (Remember that SQLite creates an implicit transaction for statements not explicitly executed after BEGIN TRANSACTION.) While fflush() is supposedly file/stream level, on Linux it is a global filesystem flush, effectively an fsync(). Continuous fsync()'s are notoriously bad for performance. Bug 421482 tracked a major Firefox performance bug caused by SYNCHRONOUS=FULL. Thunderbird is currently guilty of the same sins.

Next, global-messages-db is not using the WAL (https://sqlite.org/wal.html). We've enabled the WAL on pretty much every major SQLite database in Firefox/Gecko. Unless Thunderbird has a compelling reason not to use the WAL, it should enable WAL mode.

Thunderbird is using the default PRAGMA JOURNAL_MODE of DELETE. TRUNCATE should be used as the default because it is usually faster. Note that TRUNCATE should only be used if WAL could not be used.

We should probably issue PRAGMA TEMP_STORE=memory to ensure that temporary databases are created in memory, not on disk. You may not want to do this if Thunderbird is creating large temporary tables. I don't know Thunderbird well enough to know if it is.

I don't see anywhere in Thunderbird's code base where vacuuming occurs. See https://sqlite.org/lang_vacuum.html. I manually vacuumed my global-messages-db database and the new database was over 100 MB smaller! And I don't delete a lot of my messages. For people that commonly delete many messages, the lack of vacuuming will lead to much worse degraded performance over time due to non-contiguous data blocks, hurting performance where random I/O isn't fast. I recommend Thunderbird periodically perform a VACUUM statement to optimize the database and gain better performance. Please note that vacuuming a database can take ages. It's best to perform the VACUUM very occasionally (once a month or so) and only during idle.

Finally (and this is likely out of scope), you may wish to switch to SQLite.jsm. It helps reduce the foot guns associated with SQLite and will likely make your SQLite code much easier to read and maintain by hiding a lot of the implementation details. It also has a shrink memory API!

You will likely want to use https://hg.mozilla.org/mozilla-central/file/09a19b25b9cf/services/metrics/storage.jsm#l1107 as a reference for implementing the connection init logic. Please note that FHR has manual WAL checkpoint control. WAL checkpointing is disabled during init and schema creation for performance reasons. The WAL is checkpointed after key operations to minimize data loss risks. The manual WAL checkpointing is mostly done out of paranoia. I believe for places.sqlite (and Thunderbird's database by extension), relying on the WAL by itself should be sufficient. In terms of data loss protection, switching from SYNCHRONOUS=FULL to JOURNAL_MODE=WAL, SYNCHRONOUS=NORMAL sacrifices very little (essentially immediate power loss scenarios). We've made the determination for most of Firefox's databases that that the added protection of immediate power loss scenarios doesn't outweigh the performance costs. The WAL is 99.9% as effective with better performance.

If you have any questions, :mak and :Yoric are also knowledgeable about SQLite. They may even disagree with some of my suggestions!

It's worth stating that with SQLite, no configuration fits all. Thunderbird may have specific requirements around its global-messages-db database that may invalidate my standard assumptions and make my recommendations bad.
Depends on: 467688
These are all very excellent suggestions; I just want to quickly chime in to clarify some stuff and provide some context.

(In reply to Gregory Szorc [:gps] (not actively checking bugmail) from comment #0)
> (Remember that SQLite creates an implicit transaction for
> statements not explicitly executed after BEGIN TRANSACTION.)

The global database manually controls transactions when indexing in order to avoid destroying the system with fsyncs.  (This is also especially important when using FTS3.)  However, in event-driven indexing on a fast computer receiving messages via IDLE notifications, there is a good chance indexing will turn on and off again nearly instantly, in which case this protection isn't going to do much.


> Next, global-messages-db is not using the WAL (https://sqlite.org/wal.html).
> We've enabled the WAL on pretty much every major SQLite database in
> Firefox/Gecko. Unless Thunderbird has a compelling reason not to use the
> WAL, it should enable WAL mode.

(The global database pre-dates WAL mode.  This would be a really good thing for the aforementioned situation where batch processing is not occurring.  )


> We should probably issue PRAGMA TEMP_STORE=memory to ensure that temporary
> databases are created in memory, not on disk. You may not want to do this if
> Thunderbird is creating large temporary tables. I don't know Thunderbird
> well enough to know if it is.

Fulltext global database searches will result in temporary tables, although cache sizes were chosen so that they should not usually spill to disk.


> I don't see anywhere in Thunderbird's code base where vacuuming occurs. See
> https://sqlite.org/lang_vacuum.html. I manually vacuumed my
> global-messages-db database and the new database was over 100 MB smaller!

There are existing bugs on this (and ideally they would be addressed), but the short story is basically that:
- VACUUMing a potentially multiple-gigabyte is going to be a kick in the pants to the system and the user (although if SQLite got smarter about VACUUMs, this would not be the case)
- the global database is just an index and the best way to optimize it is to delete it and let it be rebuilt.  This will also rectify accumulated inconsistencies that VACUUMing would not resolve.
- The next generation of gloda was going to using sharding to separate SQLite files and transparently do VACCUMing in disk-friendly bite-size pieces in the background.


> In terms of data loss protection, switching
> from SYNCHRONOUS=FULL to JOURNAL_MODE=WAL, SYNCHRONOUS=NORMAL sacrifices
> very little (essentially immediate power loss scenarios).

In the Thunderbird 3.0 development stage we witnessed repeated corruption from macbooks whose batteries were running low which is why this got cranked up a bit.  Between likely advancements in OS X, mac hardware, and SQLite, this probably makes a lot of sense.
probably also setGrowthIncrement may help here. It's all stuff that came after gloda, fwiw, so it's expected that it didn't use it yet :)
Some of the bugs Andrew mentions are blocking bug 1023000. Any new bugs that flow out of gps' bug with the potential for significant benefit should be made to block bug 1023000.
Keywords: perf
See Also: → 1023000
> 1) Set custom cache size

I think we increased block size at one point, but not cache size. What do you recommend?

Do your other performance issues continue?
Flags: needinfo?(gps)
See Also: → 581606
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #4)
> > 1) Set custom cache size
> 
> I think we increased block size at one point, but not cache size. What do
> you recommend?

Gloda already tries to be clever about cache size (inspired by predecessors) and is page size aware in its allocation, see:
https://dxr.mozilla.org/comm-central/source/mailnews/db/gloda/modules/datastore.js#1269

Really the biggest wins for gloda would be the async opening bug you've referenced recently and possibly auditing the adaptive indexing rate logic gloda has which at least one user reported seems to throttle itself down in a ridiculous fashion on their machine.  And SQLite-wise, turning WAL on could be a win, but I think some light benchmarking including looking at the fsync count and I/O counts would be appropriate since the large amount of data thrown at FTS3 and the use of long-lived transactions could result in interesting complications that make WAL not a 100% guaranteed win.
I think asuth answered my needinfo.
Flags: needinfo?(gps)
Depends on: 1698305
Severity: normal → S3

With a new backend datastore coming, we're not going to pursue this.
Let's take it off the books.

Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.