Closed Bug 1350088 Opened 3 years ago Closed 3 years ago

Sync fails when syncing add-on data if any WebExtensions add-ons are enabled

Categories

(WebExtensions :: General, defect, P2)

54 Branch
defect

Tracking

(firefox54 fixed, firefox55 fixed)

RESOLVED FIXED
mozilla55
Tracking Status
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: bugs, Assigned: glasserc)

Details

(Whiteboard: triaged)

Attachments

(3 files, 1 obsolete file)

Attached file Sync Log
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0
Build ID: 20170323004002

Steps to reproduce:

Used a sync account to sync.


Actual results:

Sync failed during WebExtensions add-on data sync. Reason: "request failed: Error: Server has been flushed."


Expected results:

Sync should've worked.
Thanks for the report! Are you running your own Sync server, or using the default FxA/Sync server?
Status: UNCONFIRMED → NEW
Component: Untriaged → Sync
Ever confirmed: true
Flags: needinfo?(bugs)
Using the default server.
Flags: needinfo?(bugs)
Ethan, it looks like the error is coming from http://searchfox.org/mozilla-central/rev/c48398abd9f0f074c69f2223260939e30e8f99a8/services/common/kinto-offline-client.js#1748. Any ideas what would cause that?
Flags: needinfo?(eglassercamp)
Hi blackwind, are you an add-on developer who has been using this API since before December or so? I'm wondering if you got bit by https://bugzilla.mozilla.org/show_bug.cgi?id=1323240. But if you only started using chrome.storage.sync recently, maybe this is an unexpected consequence of https://bugzilla.mozilla.org/show_bug.cgi?id=1333810 ? Both of those were one-off occurrences and I don't foresee any other such migrations coming up now that we've shipped chrome.storage.sync in Release, so if it's one of those, and especially if you don't have any important data in it, you should be able to blow away the chrome.storage.sync Sqlite database and everything will be OK. If you do have any important data, I might be able to come up with a way to solve the problem more gracefully.

If it's neither of those, I don't have any particular ideas. Which WebExtensions do you have installed (and, if you happen to know, which ones use chrome.storage.sync)?
Flags: needinfo?(eglassercamp) → needinfo?(bugs)
I'm an add-on developer, but not one who uses WebExtensions.

As the log shows, the error is being triggered by Redirector and Snap Links Plus. When I disabled these add-ons, sync completed successfully. I've since migrated back to their most recent non-WebExtensions versions, and sync still completes successfully. If I update again to the WebExtensions versions, sync fails.

Neither add-on appears to use chrome.storage.sync.

I just temporarily installed two other WebExtensions add-ons and was still able to sync. Perhaps this only occurs with add-ons that have been upgraded to WebExtensions?
Flags: needinfo?(bugs)
Ah, yeah, sorry, I forgot to read the log. I installed the addons too but haven't been able to reproduce the error. Although the addons don't use chrome.storage.sync directly, they install an onChanged listener, which triggers syncing for chrome.storage. But because the extensions don't use chrome.storage.sync, there shouldn't be any data on the server. The "server was flushed" case is triggered if you've synced successfully before, and the timestamp has changed, but there is no data available. I spoke with :leplatrem and the only thing he could think of is did you maybe log out of one FxA account and into another? If so, I don't think we support that and we don't have a clear approach in mind to solve it. Since there shouldn't be any data in storage.sync, you could just delete the sqlite database and restart your browser; that should also clear the weird state. But otherwise I'm really curious what happened and how to prevent it from happening again.
Flags: needinfo?(bugs)
> I spoke with :leplatrem and the only thing he could think of is did you maybe log out of one FxA account and into another?

I didn't, no.

> Since there shouldn't be any data in storage.sync, you could just delete the sqlite database and restart your browser; that should also clear the weird state.

This, indeed, solved it.

If there's any further insight I can provide, do let me know.
Flags: needinfo?(bugs)
Component: Sync → WebExtensions: General
Product: Firefox → Toolkit
Priority: -- → P2
Whiteboard: triaged
I have a strong suspicion that this is a duplicate of bug 1359879.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1359879
OK, I just hit this myself. It does seem to be related to the other bug but it manifests differently for reasons that aren't clear to me. I think it should be reproducible by doing:

> http DELETE https://webextensions.settings.services.mozilla.com/v1/buckets/default/collections/storage-sync-crypto Authorization:"Bearer $OAUTH_BEARER_TOKEN"

(where $OUTH_BEARER_TOKEN comes from `fxa-client --bearer --scopes sync:addon_storage --oauth-server https://oauth.accounts.firefox.com/v1 --account-server https://api.accounts.firefox.com/v1 --auth fxa-user@example.tld --out env.sh`). I'm reopening this and I'll write a patch that recovers from this state the same way we would recover from the other one.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
OK, so here's what happened in my personal case. I have two laptops, a home one running Firefox 53 (Release) and a work one running Nightly. The work one seems to have successfully got keys from the server on April 28th. The home one seems to have successfully got keys from the server on April 30th.[*] I think the disparity is due to the fact that at the time, I was trying to track down bug 1359879, and doing some unusual things to do so -- I was using a custom build of Nightly on my work laptop but with a different profile. I was also resetting the local sync status on different profiles to force a reupload of the keys when they disappeared from the server. It's possible that the home laptop was the one I forced to reupload the keys most recently. During this time, though, because of https://github.com/Kinto/kinto/issues/1209, keyrings were being deleted from the server occasionally, and in particular on 2017-05-02. But neither profiles would try to upload or fetch keys until a new extension was added. This didn't happen until last night. Both profiles then started failing -- the 53 version because of the same bug (1359879), and the Nightly one because the server was saying it had no new keys, despite the fact that the 53 version one had uploaded new keys. (This is the "server was flushed" message, and it happens that the server actually was "flushed", because of the Kinto data loss bug.) I think I got lucky here -- the "surviving" keys could have been the homebrewed Nightly one and then the current last_modified on the server would have related to neither current profile, which would have been even more confusing.

The "server was flushed" message occurs because the client asks "What's changed since Thursday?" and the server says "Nothing's changed; the last update was Friday". This could happen "in the wild" because a user has two devices, one of which synced keys on Friday and one on Thursday, but with the database being wiped in between. However, normally the delay between syncing keys shouldn't be so big, because the first device that synced keys did so because a new extension was added, and as extensions are synced by Firefox Sync, the second device should install that new extension and want to sync keys very shortly thereafter. So normally I think we'd expect to see the other bug (with the "record was null" error), which happens when the user has two devices which both synced keys at the same time and then the database was wiped. But if, for example, one device was powered down (say, a "work" computer), then you can definitely get into this state.

I think this bug only happened because of the Kinto data loss bug, which we've since fixed, and certainly my local situation is idiosyncratic, but I see from :markh's diagnostics that some 20% of failed syncs come from this error. For this reason, I'd like to add recovery code to the client that behaves similarly to the code in bug 1359879.

[*] "Synced keys on April 30th" is kind of a simplification; really it's "synced keys so that the client saw a last_modified of April 30th", but that's kind of a mouthfull.
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Oops, didn't mean to resolve it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: nobody → eglassercamp
Comment on attachment 8869579 [details]
Bug 1350088: Recover when we hit "server was flushed",

https://reviewboard.mozilla.org/r/141162/#review146616
Attachment #8869579 - Flags: review?(kmaglione+bmo) → review+
Keywords: checkin-needed
Comment on attachment 8869579 [details]
Bug 1350088: Recover when we hit "server was flushed",

Approval Request Comment
[Feature/Bug causing the regression]: chrome.storage.sync and Kinto bug https://github.com/Kinto/kinto/issues/1209. this is closely related to bug 1359879; the same underlying cause can cause this error or that one depending on circumstances of the devices the user has.
[User impact if declined]: chrome.storage.sync will still fail for a relatively small population of users who were affected by kinto#1209 and subsequently added extensions. Firefox Sync will continue to work, however.
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: Not yet
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Not really
[Why is the change risky/not risky?]: This is a targeted fix that responds only to a specific error condition.
[String changes made/needed]: None
Attachment #8869579 - Flags: approval-mozilla-beta?
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/7cc62fcd5aba
Recover when we hit "server was flushed", r=kmag
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/7cc62fcd5aba
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
The patch doesn't apply to beta:
     

grafting 401774:7cc62fcd5aba "Bug 1350088: Recover when we hit "server was flushed", r=kmag"
merging toolkit/components/extensions/ExtensionStorageSync.jsm
merging toolkit/components/extensions/test/xpcshell/test_ext_storage_sync.js
 warning: conflicts while merging toolkit/components/extensions/test/xpcshell/test_ext_storage_sync.js! (edit, then use 'hg resolve --mark')
Flags: needinfo?(eglassercamp)
(In reply to Sylvestre Ledru [:sylvestre] from comment #18)
> The patch doesn't apply to beta:
>      
> 
> grafting 401774:7cc62fcd5aba "Bug 1350088: Recover when we hit "server was
> flushed", r=kmag"
> merging toolkit/components/extensions/ExtensionStorageSync.jsm
> merging toolkit/components/extensions/test/xpcshell/test_ext_storage_sync.js
>  warning: conflicts while merging
> toolkit/components/extensions/test/xpcshell/test_ext_storage_sync.js! (edit,
> then use 'hg resolve --mark')

test needs rebasing around bug 1353542 (async/await conversion).
Comment on attachment 8869579 [details]
Bug 1350088: Recover when we hit "server was flushed",

work around a server-side bug in Sync, with added test, beta54+
Attachment #8869579 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
needs rebasing as mentioned in comment #19 like

grafting 420751:7cc62fcd5aba "Bug 1350088: Recover when we hit "server was flushed", r=kmag"
merging toolkit/components/extensions/ExtensionStorageSync.jsm
merging toolkit/components/extensions/test/xpcshell/test_ext_storage_sync.js
warning: conflicts while merging toolkit/components/extensions/test/xpcshell/test_ext_storage_sync.js! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
Attached patch Same fix but for mozilla-beta (obsolete) — Splinter Review
Here is the rebased patch for mozilla-beta. Thanks :RyanVM for walking me through the process!
Flags: needinfo?(eglassercamp)
ARGH, please disregard, that patch is not correct.
This time I remembered to amend the commit first.
Attachment #8872703 - Attachment is obsolete: true
(In reply to Ethan Glasser-Camp (:glasserc) from comment #15)
> [Is this code covered by automated tests?]: Yes
> [Has the fix been verified in Nightly?]: Not yet
> [Needs manual test from QE? If yes, steps to reproduce]: No

Setting qe-verify- based on Ethan's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
Product: Toolkit → WebExtensions
You need to log in before you can comment on or make changes to this bug.