1368719 - [DataLoss] BrowserDB.prepareSchema() creates new DB if existing DB file is locked

Stefan Arentz | :st3fan | ⏰ EST | he/him

Reporter

Description

•

7 years ago

Data loss when database recovery fails. We have a situation where the database is fully recoverable, but we give up early and wipe it.

Stephan Leroux [:sleroux]

Updated

•

7 years ago

tracking-fxios: ? → 8.0+

Priority: -- → P2

Whiteboard: [MobileCore]

Justin D'Arcangelo [:justindarc]

Assignee

Updated

•

7 years ago

Assignee: nobody → jdarcangelo

Status: NEW → ASSIGNED

Justin D'Arcangelo [:justindarc]

Assignee

Updated

•

7 years ago

Iteration: --- → 1.23

Justin D'Arcangelo [:justindarc]

Assignee

Updated

•

7 years ago

Priority: P2 → P1

Jenn Chaulk (:jchaulk)

Updated

•

7 years ago

Iteration: 1.23 → 1.24

Justin D'Arcangelo [:justindarc]

Assignee

Updated

•

7 years ago

Comment 2

•

7 years ago

http://www.sqlite.org/wal.html

-----
8. Sometimes Queries Return SQLITE_BUSY In WAL Mode

The second advantage of WAL-mode is that writers do not block readers and readers to do not block writers. This is mostly true. But there are some obscure cases where a query against a WAL-mode database can return SQLITE_BUSY, so applications should be prepared for that happenstance.

Cases where a query against a WAL-mode database can return SQLITE_BUSY include the following:

    If another database connection has the database mode open in exclusive locking mode then all queries against the database will return SQLITE_BUSY. Both Chrome and Firefox open their database files in exclusive locking mode, so attempts to read Chrome or Firefox databases while the applications are running will run into this problem, for example.

    When the last connection to a particular database is closing, that connection will acquire an exclusive lock for a short time while it cleans up the WAL and shared-memory files. If a second database tries to open and query the database while the first connection is still in the middle of its cleanup process, the second connection might get an SQLITE_BUSY error.

    If the last connection to a database crashed, then the first new connection to open the database will start a recovery process. An exclusive lock is held during recovery. So if a third database connection tries to jump in and query while the second connection is running recovery, the third connection will get an SQLITE_BUSY error. 
-----

AFAIK, effectively iOS enables a limited form of concurrency when applications are switching and extensions are in use.

This might explain issues I've seen with the use of the Send To extension within Firefox in conjunction with switching apps.

Jenn Chaulk (:jchaulk)

Updated

•

7 years ago

Iteration: 1.24 → 1.25

Richard Newman [:rnewman]

Comment 3

•

7 years ago

Bug 1377646 covers a situation whereby the Sync-based data restoration approach won't work, making this bug much more noticeable.

Comment 4

•

7 years ago

I just reproduced this (again).

- Firefox was open and browsing.
- Hit 'Share', picked the same Firefox's Send Tab extension.
- Send Tab window opens, then the screen flashes as it crashes.
- Try again. This time I get "You have no devices connected to Sync" -- the clients table is empty and the last sync timestamp is recent, so we don't download again. At this point we have a new blank empty browser.db.
- The main Firefox activity continues to work -- the database connections stay open to the moved browser.db file, so bookmarks and history are visible.
- Background then foreground Firefox. Now browser.db is empty from the perspective of the main app -- all history and bookmarks and remote tabs are lost.

Three good questions:

- Why is the extension trying to open a new DB connection when hosted in the same process? Naturally we expect to need to create a new Profile instance when we're running as an extension in a host app, but that should be unnecessary within Firefox.
- Why does the DB connection fail?
- Why can a failure in an extension cause DB recovery out from under the active app? Preventing this seems like a Really Good Idea. (Indeed, I'd propose that in an extension we don't upgrade or try to create DB tables -- if they're not right, tell the user to launch Firefox before they use the extension.)

Richard Newman [:rnewman]

Comment 5

•

7 years ago

> - Why is the extension trying to open a new DB connection when hosted in the
> same process? Naturally we expect to need to create a new Profile instance
> when we're running as an extension in a host app, but that should be
> unnecessary within Firefox.

This seems to be another regression-ish from the Swift 3 migration, as in Bug 1377646:

https://github.com/mozilla-mobile/firefox-ios/commit/dae882fbadd4e467f7a6e4f04007528931fffddc#diff-7a45ce0d70d55bdf297fcacd47e4e874L379

We used to only open a given database once, no matter how many different Profile instances were created. That was itself incorrect: it made Profile a "secret singleton", but it meant that an extension homed in the same app wouldn't reopen the database.

Stefan Arentz | :st3fan | ⏰ EST | he/him

Reporter

Comment 6

•

7 years ago

Moving this to 8.1 since there is nothing actionable here that could fit in the last week before release.

tracking-fxios: 8.0+ → 8.1+

Stefan Arentz | :st3fan | ⏰ EST | he/him

Reporter

Updated

•

7 years ago

Whiteboard: [MobileCore] → [MobileCore][DataLoss]

Jenn Chaulk (:jchaulk)

Updated

•

7 years ago

Iteration: 1.25 → 1.26

Richard Newman [:rnewman]

Comment 7

•

7 years ago

Rephrasing this bug for clarity.

We do some data recovery when we think we can't open a database -- that is, when the sequence of [init, open, prepare schema, check tables] fails -- by resetting Sync so that we download existing data into a new database.

In theory that is back to working after Bug 1377646. We'd like to not rely on it; that's this bug.

This bug is to more carefully handle the situation in which we can't perform one of those steps during DB opening. Typically this'll fail not because of real corruption, but because the disk is full, another process has an exclusive transaction, etc.

These errors are always possible, even when we attempt to ensure that only one connection is open at a time -- processes can run simultaneously, the disk can genuinely be full, etc.

N.B., we currently run our writes in exclusive transactions to try to detect conflicts at the start of a write, rather than at the end.

Summary: Data loss when database recovery fails → Data loss when database preparation fails

Stefan Arentz | :st3fan | ⏰ EST | he/him

Reporter

Comment 8

•

7 years ago

Making this a 9.0 bug but this could also move back to 8.2 if it becomes more concrete. This feels more like a meta / research bug currently.

tracking-fxios: 8.1+ → 9.0

Justin D'Arcangelo [:justindarc]

Assignee

Comment 9

•

7 years ago

This is related to Bug 1385408. The current implementation assumes that if a transaction cannot be acquired, that the DB must be bad so a new one is created. This is incorrect behavior if the DB file is simply in use by another process and results in complete data loss. We need to either look at the reason *why* a transaction cannot be acquired and conditionally decide whether or not to go into "recovery mode", or we need to do a better job at coordinating multiple processes requesting access to the DB.

Iteration: 1.26 → ---

Comment 10

•

7 years ago

(In reply to Justin D'Arcangelo [:justindarc] from comment #9)
> We need to either look at
> the reason *why* a transaction cannot be acquired and conditionally decide
> whether or not to go into "recovery mode", or we need to do a better job at
> coordinating multiple processes requesting access to the DB.

And see Comment 4!

Also note that we already detect when we have real corruption (look in SwiftData.swift). SQLITE_BUSY or a closed database seems to be the most likely causes, and those have good explanations.

Justin D'Arcangelo [:justindarc]

Assignee

Updated

•

7 years ago

Iteration: --- → 1.28

Justin D'Arcangelo [:justindarc]

Assignee

Updated

•

7 years ago

Depends on: 1388147

Justin D'Arcangelo [:justindarc]

Assignee

Comment 11

•

7 years ago

Attached file GitHub Pull Request — Details

Attachment #8900030 - Flags: review?(rnewman)

Justin D'Arcangelo [:justindarc]

Assignee

Updated

•

7 years ago

tracking-fxios: 9.0 → 9.1

Priority: P1 → P2

Richard Newman [:rnewman]

Comment 12

•

7 years ago

Comment on attachment 8900030 [details] [review]
GitHub Pull Request

Forgot to update Bugzilla.

Attachment #8900030 - Flags: review?(rnewman) → review-

Jenn Chaulk (:jchaulk)

Updated

•

7 years ago

Iteration: 1.28 → 1.29

Priority: P2 → P1

Jenn Chaulk (:jchaulk)

Updated

•

7 years ago

Iteration: 1.29 → ---

Priority: P1 → P2

Justin D'Arcangelo [:justindarc]

Assignee

Comment 13

•

7 years ago

This issue has since been fixed as part of Bug 1394185. Closing.

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

Resolution: --- → DUPLICATE

Justin D'Arcangelo [:justindarc]

Assignee

Updated

•

7 years ago

tracking-fxios: 9.1 → 9.0