Closed Bug 1343414 Opened 7 years ago Closed 7 years ago

Firefox cannot sync because HMACVerificationException

Categories

(Firefox for Android Graveyard :: Android Sync, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1345742

People

(Reporter: robberphex, Unassigned)

References

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0
Build ID: 20170126153103

Steps to reproduce:

I clear firefox data. and configure sync, client "Sync Now", And syncing is end.




Actual results:

but my bookmark doesn't sync, actually.


Expected results:

I think the bookmark will synced.
Severity: normal → major
OS: Unspecified → Android
Priority: -- → P2
Hardware: Unspecified → ARM
Attachment #8842261 - Attachment mime type: text/x-log → text/plain
The HMACVerificationException might be a genuine separate issue, but that's not responsible for what you're seeing regarding the bookmarks. If you look a little more closely, you'll find these lines in your log:
> 03-01 10:14:17.204 14820 28624 I FxAccounts: firefox :: CountCheckingServer11RepositorySession :: Collection bookmarks is starting a first sync; checking counts.
> 03-01 10:14:17.569 14820 28624 I FxAccounts: firefox :: CountCheckingServer11RepositorySession :: First sync for bookmarks: 7186 items.
> 03-01 10:14:17.570 14820 28624 W FxAccounts: firefox :: CountCheckingServer11RepositorySession :: Too many items to sync safely. Skipping.

This is because the previous Sync implementation didn't cope too well with handling large quantities of bookmarks, with corruption becoming increasingly likely. I'm not 100 % sure, but I think bug 1291821 will fix this starting from Firefox 54 (see also bug 1335922 for a similar report).

As for the HMAC errors, those seem to be relating to history and I've recently seen them in my desktop sync logs as well (e.g. https://pastebin.mozilla.org/8980629), although I'm not sure with what impact on sync, since incremental history updates still seem to be working successfully.

Oh, and the priority setting you've set (P2) is for the developers to decide...
Status: UNCONFIRMED → NEW
Component: Firefox Accounts → Firefox Sync: Backend
Ever confirmed: true
Priority: P2 → --
Product: Firefox for Android → Cloud Services
Version: 51 Branch → unspecified
OS: Android → All
Hardware: ARM → All
Thanks for filing! From bug 1335922, comment 1, it sounds like 5k bookmarks is the maximum that Android can sync without incremental buffering. Bug 1291821 is in Nightly now, if you'd like to give it a try. Grisha, any ideas on the `HMACVerificationException`?

Jan, I'll go ahead and file a Desktop bug for your issue.
Severity: major → normal
Component: Firefox Sync: Backend → Android Sync
Product: Cloud Services → Android Background Services
Attached file firefox sync nightly
I tried in fennec-54.0a1.en-US.android-arm.apk(sha256:915af090334c24c4731a3189ace35c15095faf371951bdd51aa7f3ea53a41eae)

And, there is still cannot sync bookmarks.
(In reply to robberphex from comment #3)
> Created attachment 8842702 [details]
> firefox sync nightly
> 
> I tried in
> fennec-54.0a1.en-US.android-arm.apk(sha256:
> 915af090334c24c4731a3189ace35c15095faf371951bdd51aa7f3ea53a41eae)
> 
> And, there is still cannot sync bookmarks.

I don't know how you've managed it, but you've got data on the server that was encrypted with a different set of keys than your Firefox Account is currently using.  There's not much to be done here other than to reset your password, which will abandon all this data on the server.  If you do the reset on a device that has all of your data (bookmarks, etc) it should all get re-uploaded, and then when you re-connect your Android device (with the new password!) it should all get downloaded smoothly.

Sorry for this -- you've found an edge condition that shouldn't be possible.  Are you self-hosting Sync and/or Firefox Accounts?
Flags: needinfo?(robberphex)
Somehow, my phone is now getting HMACVerificationExceptions as well and I had only reset my account last week or so - and no, I'm not self-hosting anything.
(In reply to Nick Alexander :nalexander from comment #4)
> (In reply to robberphex from comment #3)
> > Created attachment 8842702 [details]
> > firefox sync nightly
> > 
> > I tried in
> > fennec-54.0a1.en-US.android-arm.apk(sha256:
> > 915af090334c24c4731a3189ace35c15095faf371951bdd51aa7f3ea53a41eae)
> > 
> > And, there is still cannot sync bookmarks.
> 
> I don't know how you've managed it, but you've got data on the server that
> was encrypted with a different set of keys than your Firefox Account is
> currently using.  There's not much to be done here other than to reset your
> password, which will abandon all this data on the server.  If you do the
> reset on a device that has all of your data (bookmarks, etc) it should all
> get re-uploaded, and then when you re-connect your Android device (with the
> new password!) it should all get downloaded smoothly.
> 
> Sorry for this -- you've found an edge condition that shouldn't be possible.
> Are you self-hosting Sync and/or Firefox Accounts?

Thank you, I have resented account!
Flags: needinfo?(robberphex)
(In reply to Jan Henning [:JanH] from comment #7)
> And another user who somehow ended up in this state:
> http://forums.mozillazine.org/viewtopic.php?p=14735823#p14735823
> https://pastebin.mozilla.org/8981121

Jan: you wrote:

> Unfortunately, the only fix is to reset your sync account
> (disconnect all devices, either delete and recreate the account
> or else reset the password and subsequently choose a different…

That's not actually true: if you know which collection is affected (history in that user's case) it's sufficient to uncheck History in Sync preferences, wait a moment, and check it again.
grisha: has anything changed in the Android Sync code that might confuse the collection-keys mechanism?

rfkelly: has anything changed in the token server X-Client-State code that might confuse the shared state mechanism?
Flags: needinfo?(rfkelly)
Flags: needinfo?(gkruglov)
I don't think anything relevant changed on Android side, but I'm not 100% confident. I'll double check...

Also, not sure if related (but it might manifest very similarly to the user), but there have been iOS sync failures filed recently - Bug 1344717.
Flags: needinfo?(gkruglov)
> rfkelly: has anything changed in the token server X-Client-State code that might confuse the shared state mechanism?

No, no changes in that code for a good long while now.
Flags: needinfo?(rfkelly)
I got in the same situation (HMACVerificationException) after installing 2 firefoxes side by side on Android to test bug 1342679.
> I got in the same situation (HMACVerificationException) after installing 2 firefoxes side by side on Android to test bug 1342679

Given we've switched off some stuff on the server to try to fix that tabs bug, :eoger it would be great if you can try this again and see if the HMAC error is still reproducible.
Flags: needinfo?(eoger)
It looks like the problem is fixed, although I'm not sure what's the mechanism behind Android's "get all open tabs to be synced", because some don't appear (but I noticed that bug before)
Flags: needinfo?(eoger)
I'd like to see if we can identify why this happened by looking at server logs.  If I understand correctly, the proposed mechanism here is that the clients somehow got confused and uploaded two incompatible sets of keys.  One way to verify this could be:

* Find the account that :eoger used in Comment 12 above, and the corresponding sync userid
* Grep syncstorage nginx logs from yesterday for that userid, looking for access to the "crypto" collection
* See if we can find multiple writes to that collection that look suspicious

If that's successful, it'll give us two things:

* The ability to determine which users may have been affected by this issue and uploaded corrupted data
* The ability to look at what happened *before* the duplicate crypto/keys write and see if we can fully understand how it was caused

:eoger, can you confirm the sync cluster endpoint you used in Comment 12, or if not, the FxA user id?  We can pass them to ops out-of-band if you don't want to put potential PII in this bug.

:nalexander, does the above seem likely to identify the cause of this issue, uploading multiple keys?

:bobm, does this sound doable from your perspective?
Flags: needinfo?(nalexander)
Flags: needinfo?(eoger)
Flags: needinfo?(bobm)
Transmitted the info to rfkelly
Flags: needinfo?(eoger)
> :nalexander, does the above seem likely to identify the cause of this issue,
> uploading multiple keys?

Yes, I think so.  The situation may even be triggered by a single client:
- client observes blank server
- client upload new random crypto/keys
- client completes sync
- same client observes non-blank server
- client observer incorrect info/collections for crypto/keys
- client thinks that new crypto/keys is needed
- client uploads second random crypto/keys
- client observes incorrect info/collections for a data collection
- client downloads records it already uploads with original crypto/keys
- badness ensues

rnewman, grisha: do you have any thoughts on this?
Flags: needinfo?(rnewman)
Flags: needinfo?(nalexander)
Flags: needinfo?(gkruglov)
To be clear, AFAICT the one bug I've managed to find should *not* have affected crypto/keys, since that's not in memcached.  The bug was essentially:

* For the "tabs", "clients" or "meta" collection, which are stored in memcached
* If a client updates BSOs that already exist, without creating any new ones
* Then we might fail to update the last-modified timestamp for that collection

I'm hoping a bit of forensics on the requests here will let us figure out a code path that could lead from that bug to the observed behaviour.  If not, there may be a second bug at play.
(In reply to Ryan Kelly [:rfkelly] from comment #18)
> To be clear, AFAICT the one bug I've managed to find should *not* have
> affected crypto/keys, since that's not in memcached.  The bug was
> essentially:

OK.

> * For the "tabs", "clients" or "meta" collection, which are stored in
> memcached
> * If a client updates BSOs that already exist, without creating any new ones
> * Then we might fail to update the last-modified timestamp for that
> collection

So, when I say "observes blank server" above, it might be that we'd see that based on the "meta" collection.  This code is very complex, but could it be that the "meta" key in info/collections is missing entirely?  Or would it always be present and just stale after a write?

At [1] we see if we have a blank server, which would be the case if the "meta" key is missing entirely (see [2]).  That might lead us to produce a meta/global and potentially crypto/keys, although it really looks like Android at least really does try to wipe the whole server at [3] (the request is at [4]), which should avoid this particular issue entirely.  (Unless there's a problem wiping storage that might interact with memcached as well -- I remember there being some issue with "DELETE path" vs "DELETE path/" vs "DELETE path/storage" or similar.)

I don't see a clear path to follow here, but I do think exploring what can go wrong when info/collections["meta"] lies will be the direction to take.  I think that "tabs" and "clients" are just along for the ride when it comes to crypto/keys.

[1] https://dxr.mozilla.org/mozilla-central/rev/58753259bfeb3b818eac7870871b0aae1f8de64a/mobile/android/services/src/main/java/org/mozilla/gecko/sync/stage/FetchMetaGlobalStage.java#62

[2] https://dxr.mozilla.org/mozilla-central/rev/58753259bfeb3b818eac7870871b0aae1f8de64a/mobile/android/services/src/main/java/org/mozilla/gecko/sync/InfoCollections.java#96

[3] https://dxr.mozilla.org/mozilla-central/rev/58753259bfeb3b818eac7870871b0aae1f8de64a/mobile/android/services/src/main/java/org/mozilla/gecko/sync/GlobalSession.java#752

[4] https://dxr.mozilla.org/mozilla-central/rev/58753259bfeb3b818eac7870871b0aae1f8de64a/mobile/android/services/src/main/java/org/mozilla/gecko/sync/GlobalSession.java#857
> could it be that the "meta" key in info/collections is missing entirely?
>  Or would it always be present and just stale after a write?

My working theory is that it might be present with a stale last-modified timestamp, but never actually missing after it was written to.
I don't see any indication in the desktop code that an stale timestamp on meta would cause different keys to be uploaded. The keys collection itself is used to decide that.

What could conceivably happen is:

- A client unchecks an engine -- say, history. This causes a collection to be wiped and a new meta/global to be uploaded.
- Another client syncs. It sees no changed meta/global, so it syncs partial history.

That's not bad -- a client is just sending data when it shouldn't.

We could also have:

- A client starts over, calling wipeServer and uploading a new meta/global and new keys.
- Another client syncs. It sees no changed meta/global. However, it sees changed keys, so it fetches those…

If crypto can be stale, this will royally screw things up.


I do also note this in the code:


    // Can't check against local modified: clock drift.
    if (info[CRYPTO_COLLECTION] < serverModified) {
      this._log.error("Consistency failure: info/collections crypto entry " +
                      "is stale after successful upload.");
      throw new Error("Symmetric key upload failed.");
    }


I doubt that code path is well-tested, but it shouldn't be hit in this case. But again, if crypto could be stale…
Flags: needinfo?(rnewman)
> - A client starts over, calling wipeServer and uploading a new meta/global and new keys.

Does this actually delete the old data from the server first, or does it upload meta/global over the top of the old record?
(In reply to Ryan Kelly [:rfkelly] from comment #22)
> > - A client starts over, calling wipeServer and uploading a new meta/global and new keys.
> 
> Does this actually delete the old data from the server first, or does it
> upload meta/global over the top of the old record?

On Android, it definitely wipes first, with the DELETE command I linked to in an earlier comment.

On Desktop, the relevant code is around https://dxr.mozilla.org/mozilla-central/rev/58753259bfeb3b818eac7870871b0aae1f8de64a/services/sync/modules/service.js#915 but I can't tell if there's a bad path that this bug would tickle.
I think we've found the real bug; rfkelly will link it up when he files it.
Flags: needinfo?(rfkelly)
Flags: needinfo?(gkruglov)
Flags: needinfo?(bobm)
Depends on: 1345742
Flags: needinfo?(rfkelly)
Closing this as a dupe, as I don't think we have a reason to believe there could have been another root cause here.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Product: Android Background Services → Firefox for Android
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.