1590640 - LSNG: Database corruption is not handled during data loading

Reporter

Description

•

5 years ago

The error code NS_ERROR_FILE_CORRUPTED is currently checked only after opening a database. However, it can happen that we get this error after running the query for data loading.

I think we should just throw away a corrupted database and create a new one when this happens.

Note that PrepareDatastoreOp::VerifyDatabaseInformation should check the error code too.

Andrew Sutherland [:asuth] (he/him)

Comment 1

•

5 years ago

Note that in mozStorage bug 1125157 we've discussed adding a mechanism for errors like corruption to be handled via a listener/observer mechanism so that we don't necessarily have to special-case corruption handling at every mozStorage-using call-site. We'll still need LSNG-specific logic, but it would be ideal to avoid making an entirely LSNG-specific solution to the problem.

Agreed that clearing the corrupted database is the only viable course of action and it should be tracked via telemetry. Precedent is that we wouldn't wipe the origin for this scenario, just the LS storage, and that continues to make sense.

Comment 2

•

5 years ago

That sounds good.

Perry Jiang [:perry] [no longer employee, use ni?]

Updated

•

5 years ago

Priority: -- → P3

Jan Varga [:janv]

Reporter

Comment 3

•

5 years ago

I believe this now needs to be P1.

Assignee: nobody → jvarga

Status: NEW → ASSIGNED

Priority: P3 → P1

Jan Varga [:janv]

Reporter

Comment 4

•

5 years ago

:drh, according to https://www.sqlite.org/lang_attach.html, transactions involving multiple attached databases are atomic. However, it seems we are experiencing a weird database corruption, probably when Firefox unexpectedly crashes and a transaction involving multiple databases hasn't been committed yet. In our case, we have two sqlite databases and they both don't use WAL. Have you heard about corruption that is more likely to happen when multiple databases are involved ? Is there anything we should avoid doing in this setup ?

The corrupted database has these pragmas:
PRAGMA synchronous = FULL;
PRAGMA page_size = 1024;
PRAGMA auto_vacuum = INCREMENTAL;

Flags: needinfo?(drh)

D. Richard Hipp

Comment 5

•

5 years ago

We do not know of any corruption problems in SQLite, involving multiple databases or otherwise. There is nothing special you need to do to avoid corruption when using multiple databases. It should just work.

See also https://www.sqlite.org/howtocorrupt.html for a discussion of the various out-of-band ways that SQLite database files have gone corrupt in the past. I don't think any of these issues apply to FF, but it never hurts to review the list from time to time.

Flags: needinfo?(drh)

Jan Varga [:janv]

Reporter

Comment 6

•

5 years ago

So I found a discussion that mentions database corruption like this: http://sqlite.1065341.n5.nabble.com/btreeInitPage-returns-error-code-11-td87095.html

Integrity check on the corrupted database gives me:
Page 4: btreeInitPage() returns error code 11
Page 5: btreeInitPage() returns error code 11

I looked at the hex dump of those pages and they are zeroed.

D. Richard Hipp

Comment 7

•

5 years ago

SQLite writes some content to disk, then invokes fsync() (or the Windows equivalent FlushFileBuffers()) and waits for the OS to guarantee that the content is safely on disk. The OS passes this task off to the SSD controller. But the SSD controller lies and says the information is saved, even though it is still in a volatile cache waiting to be written. Then SQLite does some other changes that depend on the first bits being saved, and the newer changes actually reach non-volatile storage first. Then the power goes out. When the machine comes back up, the first content written comes up as all zeros.

We don't know that this is what happened in your case, but it is frequent hypothesis of people who study this kind of thing.

Jan Varga [:janv]

Reporter

Updated

•

5 years ago

Priority: P1 → P2

Jan Varga [:janv]

Reporter

Updated

•

5 years ago

Assignee: jvarga → nobody

Status: ASSIGNED → NEW

Andrew Sutherland [:asuth] (he/him)

Updated

•

4 years ago

Blocks: 1704440

Andrew Sutherland [:asuth] (he/him)

Updated

•

4 years ago

Blocks: 1704432

Andrew Sutherland [:asuth] (he/him)

Updated

•

4 years ago

Updated

•

2 years ago

Severity: normal → S3

Lina Butler [:lina]

Updated

•

5 months ago

Bugzilla

LSNG: Database corruption is not handled during data loading

Categories

(Core :: Storage: localStorage & sessionStorage, defect, P2)

Tracking

()

People

(Reporter: janv, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated

Updated

Updated

Updated

Updated

Updated