LSNG: Database corruption is not handled during data loading
Categories
(Core :: Storage: localStorage & sessionStorage, defect, P2)
Tracking
()
People
(Reporter: janv, Unassigned)
References
(Blocks 1 open bug)
Details
The error code NS_ERROR_FILE_CORRUPTED is currently checked only after opening a database. However, it can happen that we get this error after running the query for data loading.
I think we should just throw away a corrupted database and create a new one when this happens.
Note that PrepareDatastoreOp::VerifyDatabaseInformation should check the error code too.
Comment 1•5 years ago
|
||
Note that in mozStorage bug 1125157 we've discussed adding a mechanism for errors like corruption to be handled via a listener/observer mechanism so that we don't necessarily have to special-case corruption handling at every mozStorage-using call-site. We'll still need LSNG-specific logic, but it would be ideal to avoid making an entirely LSNG-specific solution to the problem.
Agreed that clearing the corrupted database is the only viable course of action and it should be tracked via telemetry. Precedent is that we wouldn't wipe the origin for this scenario, just the LS storage, and that continues to make sense.
Reporter | ||
Comment 2•5 years ago
|
||
That sounds good.
Updated•5 years ago
|
Reporter | ||
Comment 3•5 years ago
|
||
I believe this now needs to be P1.
Reporter | ||
Comment 4•5 years ago
|
||
:drh, according to https://www.sqlite.org/lang_attach.html, transactions involving multiple attached databases are atomic. However, it seems we are experiencing a weird database corruption, probably when Firefox unexpectedly crashes and a transaction involving multiple databases hasn't been committed yet. In our case, we have two sqlite databases and they both don't use WAL. Have you heard about corruption that is more likely to happen when multiple databases are involved ? Is there anything we should avoid doing in this setup ?
The corrupted database has these pragmas:
PRAGMA synchronous = FULL;
PRAGMA page_size = 1024;
PRAGMA auto_vacuum = INCREMENTAL;
Comment 5•5 years ago
|
||
We do not know of any corruption problems in SQLite, involving multiple databases or otherwise. There is nothing special you need to do to avoid corruption when using multiple databases. It should just work.
See also https://www.sqlite.org/howtocorrupt.html for a discussion of the various out-of-band ways that SQLite database files have gone corrupt in the past. I don't think any of these issues apply to FF, but it never hurts to review the list from time to time.
Reporter | ||
Comment 6•5 years ago
|
||
So I found a discussion that mentions database corruption like this: http://sqlite.1065341.n5.nabble.com/btreeInitPage-returns-error-code-11-td87095.html
Integrity check on the corrupted database gives me:
Page 4: btreeInitPage() returns error code 11
Page 5: btreeInitPage() returns error code 11
I looked at the hex dump of those pages and they are zeroed.
Comment 7•5 years ago
|
||
SQLite writes some content to disk, then invokes fsync() (or the Windows equivalent FlushFileBuffers()) and waits for the OS to guarantee that the content is safely on disk. The OS passes this task off to the SSD controller. But the SSD controller lies and says the information is saved, even though it is still in a volatile cache waiting to be written. Then SQLite does some other changes that depend on the first bits being saved, and the newer changes actually reach non-volatile storage first. Then the power goes out. When the machine comes back up, the first content written comes up as all zeros.
We don't know that this is what happened in your case, but it is frequent hypothesis of people who study this kind of thing.
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Updated•2 years ago
|
Description
•