Open Bug 1813330 Opened 1 year ago Updated 7 months ago

storage-sync-v2.sqlite-wal extension data never cleaned up

Categories

(Firefox :: Sync, defect, P2)

defect

Tracking

()

People

(Reporter: jscher2000, Unassigned)

Details

Attachments

(1 file)

My storage-sync-v2.sqlite-wal file consumes 31MB and contains at least 481 copies/versions of my NoScript site permissions list (which is stored in storage-sync-v2.sqlite). Unlike other -wal files, this one does not appear to be cleaned up during shutdown.

For privacy reasons, and possibly performance reasons, it would be preferable if the extension data in the -wal file did not accumulate indefinitely.

A post on the NoScript support forums mentions a command-line workaround --

sqlite3 /path/to/storage-sync-v2.sqlite .tables

-- but this is not convenient for users.

Thank you for reporting this! I wonder if we aren't cleaning up all the statements properly when we close the connection. https://www.sqlite.org/wal.html#the_wal_file says:

Usually, the WAL file is deleted automatically when the last connection to the database closes. However, if the last process to have the database open exits without cleanly shutting down the database connection [...] then the WAL file might be retained on disk after all connections to the database have been closed.

mozStorage, the library that Firefox uses to access most SQLite databases, automatically finalizes all remaining prepared statements if closing the connection fails with SQLITE_BUSY. That code path is a failsafe—callers are supposed to finalize their statements explicitly, and debug builds will print warnings and crash to catch those cases in tests—but could help explain why this doesn't usually happen with other databases in Firefox.

But Rusqlite, the library we use for webext-storage, does something different. It flushes its prepared statement cache and then tries to close the underlying connection, but doesn't check for any outstanding statements on the connection with sqlite3_next_stmt, or finalize them, like mozStorage does. Flushing the prepared statement cache "should" finalize everything, though...

The other thing that's different is that some of our other databases (Cache API, IndexedDB, localStorage, cookies, Places) set PRAGMA wal_autocheckpoint, and sometimes PRAGMA journal_size_limit, but webext-storage doesn't.

https://www.sqlite.org/pragma.html#pragma_journal_size_limit says:

Similarly, in WAL mode, the write-ahead log file is not truncated following a checkpoint. Instead, SQLite reuses the existing file for subsequent WAL entries since overwriting is faster than appending. [...] The default journal size limit is -1 (no limit). The SQLITE_DEFAULT_JOURNAL_SIZE_LIMIT preprocessor macro can be used to change the default journal size limit at compile-time.

I wonder if that's why old data accumulates in it indefinitely, like in your storage-sync-v2.sqlite-wal. Another hunch is that trying to checkpoint these large WAL files on shutdown bumps into the async shutdown timeout, and so they keep growing because they're never fully checkpointed...but it's just a hunch for now. 😊

Did you by chance notice the -wal (and -shm?) hanging around consistently after every shutdown, or are there times when they are cleaned up? And, could you please check if you see any shutdown hangs mentioning StorageSyncService: shutdown in about:crashes? Thanks!

Flags: needinfo?(jscher2000)

(In reply to Lina Butler [:lina] from comment #1)

Did you by chance notice the -wal (and -shm?) hanging around consistently after every shutdown, or are there times when they are cleaned up?

This issue was just brought to my attention, so I don't know whether it is intermittent. Judging by what I see across the many profiles I use occasionally for testing, it is very common to have -wal and -shm files left over after shutdown. Some of these were last modified in 2021 or early 2022, despite visiting the profile every few months.

And, could you please check if you see any shutdown hangs mentioning StorageSyncService: shutdown in about:crashes? Thanks!

I don't usually get shutdown hangs. Either Firefox crashes in spectacular fashion or it restarts normally.

Flags: needinfo?(jscher2000)

Thanks Lina! This seems easy to repro (one of my test profiles has the .sqlite file with <300k but the sqlite-wal with 6MB), and I suspect one of your pointers will be the culprit.

Severity: -- → S2
Priority: -- → P2

Well, this got complicated and I got semi-stuck.

  • I can reproduce this reliably - if I open the DB with my local sqlite utility, the .wal file is always closed - ie, that utility closes the database "correctly" - ie, no .wal file after this step.
  • I instrumented a local Firefox build, printing a message as the DB opens and printing whether our "close" function succeeded or not.
  • I use about:debugging, inspect any local addon, and in that console execute await browser.storage.sync.get("") - as expected I get back and empty object, but I do see the message I added when the database opens.
  • I exit Firefox, and see a message that the DB close was successful.
  • I noticed the -wal and -shm files came back.

So I tried to reproduce this best I can in a stand-alone Rust test case which uses the webext-storage component doing roughly what the above does - opens, one simple query and closing it. Can not repro the wal file being left behind (although I do observe it existing before the close is called). A key difference with that test case is that it's using the version of sqlite bundled with rusqlite, whereas in gecko it's using the version already there - this is built with custom flags but I can't see any that might impact this.

Regarding the other flags mentioned by Lina:

  • The docs for journal_size_limit do talk about the file persisting after a transaction or checkpoint, this forum post talks a little about this.
  • wal_autocheckpoint is subtly different - see the above forum post. However, it does say the default value is 1000. Given the sqlite in gecko defaults to a page size of 32k, the fact that OP reports a 32MB WAL seems to correspond to this, and I expect this to be the maximum size anyone will report. Further the sqlite WAL docs explicitly says "SQLite will automatically checkpoint ... when the last database connection on a database file closes." This forum post also clearly states the WAL should be removed after closing the DB (and as above, is what I can reproduce in a simple test)

So, putting all this together, if I explicitly execute PRAGMA wal_autocheckpoint = 1; then the issue no longer reproduces. However, what's happening here is that the -wal is being removed on each and every commit, which isn't really what we want as that will reduce the performance of the database. We don't mind the -wal growing as the database is in use and don't want to re-create and delete it on every single transaction - what we want is just for it to be removed when the DB is closed.

tl;dr - I can reliably reproduce this in gecko, but can not work out how to reproduce it stand-alone. I've a work-around which makes the reported behavior go away but that will reduce the performance of the database. So I'm not sure how to move forward here.

Attached image screenshot of sizes

I also see both storage-sync-v2.sqlite-wal and synced-tab-db-wal growing a lot more than expected.
The slow shutdown path may actually also be due to these journals not being merged often enough, so all the merge work is left to shutdown.
In mozStorage we set DEFAULT_JOURNAL_SIZE_LIMIT and use wal_autocheckpoint.

The journal_size_limit makes so that when the journal reaches the limit AND IT'S EMPTY, Sqlite tries to truncate it. That means if you have a long running transaction, the journal is never empty, and the limit will be ignored. Then the journal can grow without limit. Thus first thing I'd try to set the size limit and check there's no long running transactions.

The autocheckpoint must also be adapted to the app and the database more specifically, 1000 (with 32KiB pages) is quite large for a db that, at least in my case, is less than 1MB. It's true you don't want to checkpoint at every transaction, but you also want your data to be safely stored in the main file.
With such small database, you're basically never emptying your journal and keep appending into it, and then hoping that shutdown will merge things. You risk the user moving around a db with an hot journal, that's an almost sure corruption.

Usually we set journal_size_limit to 3 times wal_autocheckpoint, and I think 1.5MiB and 512KiB are reasonable for this (unless you expect large transactions to happen often)

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: