Closed Bug 1317093 Opened 8 years ago Closed 8 years ago

browser.db dataloss in Firefox for iOS 6 between b2 and b3

Categories

(Firefox for iOS :: Data Storage, defect, P1)

All
iOS
defect

Tracking

()

RESOLVED FIXED
Iteration:
1.9
Tracking Status
fxios-v6.0 --- fixed
fxios-v7.0 --- fixed
fxios 6.0+ ---

People

(Reporter: rnewman, Assigned: sleroux)

References

Details

(Keywords: dataloss, Whiteboard: [MobileAS])

iPad Pro, iOS 10.x. Recent install, fully synced. Firefox Beta had been used earlier today.

I tried to send a link from Tweetbot, and got the "You have no devices" screen. Weird. I tried again, and got a totally black screen. (Tweetbot was now wedged at this screen.)

I switched to Firefox. In the tab list were the two open tabs from earlier today.

I opened Settings. Sync reported fully synced, and tapping Sync Now completed instantly.

I opened a new tab and got the default top sites.

The history list and synced device list was similarly empty.

The Logins list was unchanged.

Killing Firefox and relaunching it did not recover browser.db.

Conclusion: browser.db was damaged or locked, or replaced. The expected Sync-driven data recovery did not run.

Potential causes:

- Race between extension and app.
  - The app has a forceClosed ref to the DB; the extension just opened it and tried reading or writing.
  - The app still has the connection open to finish a sync; the extension thus fails to open the DB.
- An error in our DB closing code, recently added to fix iOS 10 crashes.
- An error in our extension DB access code; perhaps we're not closing the DB correctly?
Sorry, I forgot which device this happened on!

This happened on my iPhone 6S, which has been running Beta for ages.
This just happened again, this time on my iPad, seemingly while I was using it. I was just testing Bug 1316349, and noticed that even pages I had *just* opened -- still had open! -- weren't matching in history. Everything's gone.

IIRC the only significant thing I did was use Send Tab from within Firefox.

Between this, Bug 1316349, and Bug 1317096, I don't think we can release v6.
Severity: normal → critical
tracking-fxios: --- → ?
Keywords: dataloss
Also of note: once the browser enters this state, browser visits aren't recorded, and history stays empty.

And I just noticed history start appearing again, so it does indeed appear to be syncing back down.
I just got this to repro by simply hitting home then launching Firefox again. Literally: see my top sites; hit Home; hit Firefox; see the 5 default top sites.

It's possible that a sync was happening in the background.
My guess is that Bug 1291304 or Bug 1305180 is flawed.
More thinking:

- Do we run the backgrounding code when we launch the share extension from our own process? If so, are we racing our shutdown writes with the extension's startup open? If not, we can still get SQLITE_BUSY from racing writes!

- When we resume, do we coordinate with a still-pending shutdown? The reopen should go into the same queue that's closing the DB, otherwise we can reopen a currently open DB that might at that moment be servicing a write…
I can trivially reproduce part of this bug by delaying profile.shutdown in syncOnDidEnterBackground, exactly as if it synced on exit, then immediately relaunching the app. The queued shutdown happens after the app relaunches, closing the active database connection. Logs show something like:

2016-11-13 19:39:25.112 [Debug] [Profile.swift:269] reopen() > Reopening profile.
[INFO] GCDWebServer started on port 6571 and reachable at http://localhost:6571/
2016-11-13 19:39:25.525 [Debug] [BrowserViewController.swift:484] loadQueuedTabs() > Loading queued tabs in the background.
2016-11-13 19:39:25.526 [Debug] [BrowserViewController.swift:498] dequeueQueuedTabs() > Queue. Count: 0.
2016-11-13 19:39:25.997 [Debug] [Profile.swift:282] shutdown() > Shutting down profile.
2016-11-13 19:39:25.997 [Debug] [SwiftData.swift:525] deinit > deinit: closing connection on thread <NSThread: 0x170261500>{number = 7, name = (null)}.
2016-11-13 19:39:25.998 [Debug] [SwiftData.swift:592] closeCustomConnection(immediately:) > Closing custom connection for /private/var/mobile/Containers/Shared/AppGroup/C68FF084-36F0-4473-817F-26E0D73AE3F1/profile.profile/browser.db on <NSThread: 0x170261500>{number = 7, name = (null)}.
2016-11-13 19:39:26.000 [Debug] [SwiftData.swift:612] closeCustomConnection(immediately:) > Closed /private/var/mobile/Containers/Shared/AppGroup/C68FF084-36F0-4473-817F-26E0D73AE3F1/profile.profile/browser.db.
2016-11-13 19:39:28.916 [Debug] [BrowserViewController.swift:473] viewDidLayoutSubviews() > BVC viewDidLayoutSubviews…
2016-11-13 19:39:28.922 [Debug] [BrowserViewController.swift:480] viewDidLayoutSubviews() > BVC done.
2016-11-13 19:39:28.924 [Warning] [SwiftData.swift:88] getSharedConnection() > >>> Database is closed for /private/var/mobile/Containers/Shared/AppGroup/C68FF084-36F0-4473-817F-26E0D73AE3F1/profile.profile/browser.db
2016-11-13 19:39:28.926 [Debug] [SQLiteBookmarksModel.swift:842] modelFactory > Using local+mirror bookmark factory.
2016-11-13 19:39:28.926 [Debug] [SQLiteBookmarksModel.swift:120] modelForRoot() > Getting model for root.
2016-11-13 19:39:28.928 [Warning] [SwiftData.swift:88] getSharedConnection() > >>> Database is closed for /private/var/mobile/Containers/Shared/AppGroup/C68FF084-36F0-4473-817F-26E0D73AE3F1/profile.profile/browser.db
2016-11-13 19:39:28.929 [Error] [BookmarksPanel.swift:175] onModelFailure > Error: failed to get data: Optional(Couldn't get children: Error Domain=mozilla Code=0 "Non-open connection; can't execute query." UserInfo={NSLocalizedDescription=Non-open connection; can't execute query.}.)


and the app behaves as if the DB is empty until you quit and reopen the app.

We need a more considered state machine here, where the 'should be shut down' and 'should be reopened' states are a property of the DB, processed sequentially by a looper on the queue.


Look at the above sequence. If instead of a plain query we instead had a call to createOrUpdate, it would fail and we'd move the database out of the way.

That might imply that we need to rethink that error handling, too.
We call createOrUpdate six times during startup, so there's lots of opportunity for error.

On my iPhone 6S, there's 1150ms between application:willFinishLaunchingWithOptions: and the createOrUpdate for SQLiteRemoteClientsAndTabs.

If you manage to background the app *any time* before all browser.db createOrUpdate calls occur, browser.db will be deleted.

Try it: add

        self.shutdown()

in the lazy clients getter, mimicking our on-background shutdown firing just before our on-startup setup flow.

If you're super fast at hitting the home button, you can probably cause data corruption just by hitting Home real fast on a cold start.


21:20:54.828 [Debug] [BrowserDB.swift:83] updateTable(_:table:) > Trying update BROWSER version 19
21:20:54.829 [Debug] [BrowserDB.swift:151] createOrUpdate > Table BROWSER already exists.
21:20:55.271 [Debug] [Profile.swift:282] shutdown() > Shutting down profile.
21:20:55.272 [Debug] [SwiftData.swift:525] deinit > deinit: closing connection on thread <NSThread: 0x17407d480>{number = 3, name = (null)}.
21:20:55.272 [Debug] [SwiftData.swift:592] closeCustomConnection(immediately:) > Closing custom connection for /private/var/mobile/Containers/Shared/AppGroup/5B91923C-BBAE-418B-A299-18A7CFFD89E3/profile.profile/browser.db on <NSThread: 0x17407d480>{number = 3, name = (null)}.
21:20:55.277 [Debug] [SwiftData.swift:612] closeCustomConnection(immediately:) > Closed /private/var/mobile/Containers/Shared/AppGroup/5B91923C-BBAE-418B-A299-18A7CFFD89E3/profile.profile/browser.db.
21:20:55.280 [Warning] [SwiftData.swift:88] getSharedConnection() > >>> Database is closed for /private/var/mobile/Containers/Shared/AppGroup/5B91923C-BBAE-418B-A299-18A7CFFD89E3/profile.profile/browser.db
21:20:55.288 [Warning] [SwiftData.swift:88] getSharedConnection() > >>> Database is closed for /private/var/mobile/Containers/Shared/AppGroup/5B91923C-BBAE-418B-A299-18A7CFFD89E3/profile.profile/browser.db
21:20:55.289 [Warning] [SwiftData.swift:163] transaction(synchronous:transactionClosure:) > BEGIN EXCLUSIVE failed.
21:20:55.289 [Debug] [BrowserDB.swift:180] createOrUpdate > Couldn't create or update ["clients", "tabs", "commands"].
21:20:55.290 [Debug] [BrowserDB.swift:181] createOrUpdate > Attempting to move browser.db to another location.
Layered recommendations:

- Don't try to recover from a locked DB if the error is that our DB connection is closed.
- Don't just shutdown on the queue: when we get to the front of the queue, look to see whether we've already been reopened. If so, don't close the DB!
- Possibly: don't shut down until we're done starting up.
Depends on: 1315755
Depends on: 1316324
> - Don't try to recover from a locked DB if the error is that our DB connection is closed.

Bug 1315755 has a fix.

> - Don't just shutdown on the queue: when we get to the front of the queue,
> look to see whether we've already been reopened. If so, don't close the DB!

Bug 1316324 will address that.
Steph has both of the dependencies, so he might as well get the credit for this one, too :)
Assignee: nobody → sleroux
Status: NEW → ASSIGNED
Re: Comment 8

Looking at BrowserDB.swift:createOrUpdate, if we do encounter an error with the transaction and we try to move/recreate the database, it will fail because we forceClose the connection on line 205. This will cause the new/moved database to be empty - hence the lack of history/bookmarks. 

I might be just rewording what you wrote above but just to help understand I'm going to write it down anyways.

Another interesting point is that createOrUpdate won't just fail because of corruption. The likely cause is the addition of the forceClose calls and the use of the FailedSQLiteConnection in place of ConcreteSQLiteConnection when the DB is closed. Here's a scenario

1. User launches the app
2. Backgrounds shortly after
3. profile.shutdown is called
4. createOrUpdate will fail
5. DB is moved because it thinks we're in a 'bad' state vs a 'closed' state
6. Since the connection is gone, we cant recreate the tables
7. empty DB!

As a side note, this goes back to our discussion about having the databases accessible from lazy properties. Since they are created on-demand, the time of which we create is not explicit. I feel like if we were to force create our DB handles on app foregrounded and close them on backgrounded we wouldn't be getting as many issues.
> I might be just rewording what you wrote above but just to help understand
> I'm going to write it down anyways.

Always worthwhile!


> Another interesting point is that createOrUpdate won't just fail because of
> corruption. The likely cause is the addition of the forceClose calls and the
> use of the FailedSQLiteConnection in place of ConcreteSQLiteConnection when
> the DB is closed.

Yes, exactly. That's why Bug 1315755 is a partial fix for this: we detect a closed connection explicitly, and don't go through the recovery flow.

If we didn't have the recovery flow at all, the only symptom here would be Bug 1316324: a transient closed database when it should be open. Relaunching the app would fix things.

(On the other hand, I think the recovery flow still has value for real database corruption.)


> As a side note, this goes back to our discussion about having the databases
> accessible from lazy properties. Since they are created on-demand, the time
> of which we create is not explicit. I feel like if we were to force create
> our DB handles on app foregrounded and close them on backgrounded we
> wouldn't be getting as many issues.

That's true -- it reduces some potential race windows -- but we would still have known bugs (e.g., preventing a late close from impacting a subsequent session). I don't think the laziness is intrinsically a bad thing: it just needs to be correct, and I think that part is. It's our larger lifecycle problems that cause this bug, not the laziness per se.
Both dependencies bugs are resolved so I'll mark this as resolved and we can retest in the next beta.
Status: ASSIGNED → RESOLVED
Iteration: --- → 1.9
Closed: 8 years ago
Resolution: --- → FIXED
Whiteboard: [MobileAS]
Priority: -- → P1
You need to log in before you can comment on or make changes to this bug.