Open Bug 1328123 Opened 7 years ago Updated 8 months ago

Full re-sync triggered; investigation and breakdown needed

Categories

(Firefox for iOS :: General, defect, P3)

All
iOS
defect

Tracking

()

Tracking Status
fxios + ---

People

(Reporter: rnewman, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [MobileCore][Sync])

I noticed my local Fennec was syncing longer than I expected and sucking battery.

At around the same time I enabled passcode and touch ID, but I believe the long syncs were already occurring.

My device was already populated with data and syncing:

```
[SyncStateMachine.swift:667] advance() > Cached meta/global fetched at 1479165390330, newer than server modified 1467128053210. Using cached meta/global.
…
[Profile.swift:1119] takeActionsOnEngineStateChanges > No collections need reset. Moving on.
…
[ClientsSynchronizer.swift:353] synchronizeLocalClients(_:withServer:info:) > No remote changes for clients. (Last fetched 1483303746260.)
```

My logins DB already was configured:

```
[BrowserDB.swift:473] tablesExist > 2 tables exist. Expected 2
```

but for some reason we decided to download _everything_, and the local DB was empty when the records were applied:

```
[LoginsSynchronizer.swift:169] synchronizeLocalLogins(_:withServer:info:) > Synchronizing passwords. Last fetched: 0.
[LoginsSynchronizer.swift:175] synchronizeLocalLogins(_:withServer:info:) > Records header hint: Optional(739)
[SQLiteLogins.swift:740] applyChangedLogin(_:local:mirror:) > Never seen remote record {941bd3e5-0541-134c-9b47-d8ec03693788}. Mirroring.
```

Bookmarks got to run next, but had no work to do:

```
2017-01-02 09:09:44.362 [Info] [Downloader.swift:45] init(collectionClient:basePrefs:collection:) > Downloader configured with prefs 'profile.sync.synchronizer.bookmarks..downloader.bookmarks..'.
2017-01-02 09:09:44.362 [Debug] [BookmarksDownloader.swift:119] go(_:greenLight:) > Downloading up to 100 records.
2017-01-02 09:09:44.363 [Debug] [Downloader.swift:127] go(_:limit:) > Modified: 1483272229100; last 1483272229100.
2017-01-02 09:09:44.363 [Debug] [Downloader.swift:129] go(_:limit:) > No more data to batch-download.
```

That download was on Sunday Jan 1 at noon local time, when I was at home.

Then history continued from apparently running earlier, given newer=0 and a continuation token:

```
2017-01-02 09:09:44.365 [Info] [Downloader.swift:45] init(collectionClient:basePrefs:collection:) > Downloader configured with prefs 'profile.sync.synchronizer.history..downloader.history..'.
2017-01-02 09:09:44.366 [Debug] [Downloader.swift:127] go(_:limit:) > Modified: 1483305044110; last 0.
2017-01-02 09:09:44.367 [Debug] [Downloader.swift:157] downloadNextBatchWithLimit(_:infoModified:) > Fetching newer=0, offset=Optional("1481690632950:94").
```

and downloaded 80,000 records, applying them one by one to a dirty DB:

```
2017-01-02 09:09:46.792 [Debug] [SQLiteHistory.swift:911] insertOrUpdatePlace(_:modified:) > Warning: history item WHYYzJNZRRh2 changed both locally and remotely. Comparing timestamps from different clocks!
…
2017-01-02 09:21:38.564 [Debug] [Downloader.swift:192] handleSuccess > Advancing baseTimestamp to 1483345007450 - 1
2017-01-02 09:21:38.565 [Debug] [Downloader.swift:204] handleSuccess > Advancing lastModified to Optional(1483345007450) ?? 1483345007450.
2017-01-02 09:21:38.572 [Debug] [BrowserDB.swift:306] checkpoint() > Checkpointing a BrowserDB.
```

Then the horror:

```
[HistorySynchronizer.swift:174] uploadOutgoingFromStorage(_:lastTimestamp:withServer:) > Uploading 84852 modified places.
```

These logs are congruent with logins.db having been discarded, and a full sync reset having occurred for both logins.db and browser.db.

My best course of action now is to uninstall and set up Sync again: I can't possibly wait for 80,000 records to upload, and then redownload on each of my other devices.


There are probably multiple bugs here:

- logins.db should not have been trashed. We should re-examine our recovery approach: can we recover without losing the contents of the file? SQLite itself should not lose data even in a crash.

- This re-download and re-apply shouldn't result in 80,000 records to upload. I'd need to leave my device at home on wifi for hours. Those history items aren't really changed.

- I'm puzzled why we were triggered to re-download bookmarks and history. Bookmarks that I created locally (which can't have synced) are still present in the DB, which implies that browser.db was not deleted and recreated. My hypothesis is that logins.db got trashed, NotificationDatabaseWasRecreated was sent, but the filename was not "logins.db", so handleRecreationOfDatabaseNamed reset Bookmarks and History, too.

- We should again reconsider whether we download all history. A user with a decent backlog but a poor network connection might never get fully in sync!
Priority: -- → P2
Whiteboard: [MobileCore]
Richard, can you find out if this happens on the v6.x branch and give us an idea if this is serious enough to block 6.0? We are on a very tight schedule. With 6.0 supposed to go in Review on Friday. So if this is happening on 6.0 we need to act fast.
There's no chance I can try to repro this on 6.x; I'm traveling internationally (in the lounge right now) and over-committed until at least Friday morning.

The only things that you might want to check immediately:

- Does adding a passcode or touch ID have any chance of leaving logins.db in a bad state, thus causing a sync reset (particularly a reset of more than just logins)? This should be easy enough to determine just by watching logs in Xcode.

- If logins.db is rendered unopenable, and is thus replaced, is the filename in the notification "logins.db" or something else? If it's something else, I would suggest blocking ship -- it's not a regression, but that would be serious. This should be easy enough to manually fake to test.

Someone could also pre-emptively refactor that code to not use filenames for the notification, and instead to use some enum passed in when BrowserDB is instantiated.

Everything else can realistically wait for a 6.x or 7.0, because the investigation will be more involved.
Assignee: nobody → sleroux
Status: NEW → ASSIGNED
I can't reproduce these results on v6.x. I have tried adding a passcode and enabled touch id and i see nothing weird happen. My sync is pretty stable. I don't see any messages about logins.db or any retries.
> - Does adding a passcode or touch ID have any chance of leaving logins.db in a bad state, thus causing a sync reset (particularly a reset of more than just logins)? This should be easy enough to determine just by watching logs in Xcode.

Not that I can see. The only thing I can see in the code is both the touch ID/passcode metadata and logins.db SQLCipher key are both stored in the same keychain but the keys don't collide so I don't see how one would affect the other.

> - If logins.db is rendered unopenable, and is thus replaced, is the filename in the notification "logins.db" or something else? If it's something else, I would suggest blocking ship -- it's not a regression, but that would be serious. This should be easy enough to manually fake to test.

I confirmed that if logins.db is unable to be created or opened, the NotificationDatabaseWasRecreated will pass the correct database filename to Profile.swift to only regenerate the logins.db and does not touch browser.db and vice-versa. I could preemptively refactor to enums but probably best to leave it along until we know what's going on just in case.

Could this be related to our springboard/background crash at all? I can't see the DB being in an inconsistent state if the OS won't allow disk access at all in that case but who knows.
So then it's back to the root facts:

- Somehow history and logins (and perhaps other engines?) were forced to re-sync from 0.
- The server and local database still had all of my history data.
- After downloading and applying all of the history records, Firefox decided that every record needed reupload.

I haven't been node-reassigned.

My #1 concern is the reupload of history. Everything else is an annoyance, but the reupload is a killer.

I'll file a separate bug for that.


(In reply to Stephan Leroux [:sleroux] from comment #4)

> Could this be related to our springboard/background crash at all? I can't
> see the DB being in an inconsistent state if the OS won't allow disk access
> at all in that case but who knows.

I'd be surprised: my profile wasn't missing data, which is why I hypothesized that we might be accidentally resetting timestamps too broadly.
Depends on: 1330045
Priority: P2 → P3
Assignee: sleroux → nobody
Status: ASSIGNED → NEW
We should add some Sentry logging to find out if this is happening frequently.
Severity: critical → normal
Flags: needinfo?(jdarcangelo)
Whiteboard: [MobileCore] → [MobileCore][Sync]
Severity: normal → S3

Clear a needinfo that is pending on an inactive user.

Inactive users most likely will not respond; if the missing information is essential and cannot be collected another way, the bug maybe should be closed as INCOMPLETE.

For more information, please visit BugBot documentation.

Flags: needinfo?(justindarc)
You need to log in before you can comment on or make changes to this bug.