Closed Bug 1056523 Opened 10 years ago Closed 10 years ago

HMAC error on crypto/keys after password reset

Categories

(Firefox :: Sync, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla34
Tracking Status
firefox31 --- wontfix
firefox32 --- wontfix
firefox33 + verified
firefox34 + verified

People

(Reporter: rfkelly, Assigned: ckarlof)

References

Details

Attachments

(1 file, 1 obsolete file)

This was first reported as part of Bug 1042109, I'm spinning it out into its own bug.  Steps to reproduce:

1) Set up a fresh firefox profile and sign in to sync, using an account whose password you're not particularly attached to.

2) Ensure everything is syncing correctly, then leave this instance of firefox running.

3) Using a second fresh firefox profile, go to set up sync but use the "forgot password?" flow to reset the password on the account from step (1).

4) Wait.  After about an hour, the firefox instance from (1) will eventually discover that its session token has been revoked.  The "Sync Now" menu item will change to "Reconnect to Sync".

5) Reconnect it to sync, using the new password from (3).  The login should succeed, but the subsequent attempt to sync will fail with an error bar saying "Wrong Recovery Key".  The logfile reveals it is a HMAC mismatch error while decrypting the key bundle.

6) Restart firefox; this seemed to fix the problem for me, with further syncs completing successfully.

I suspect the error here is something to do with Bug 1044536, but am filing this separately in case it's a red herring.
I can reproduce this will slightly different steps, and I think this is a slightly serious bug.

I've gotten my sync into a state where I can't connect, because I get "wrong recovery key" every time.

Thinking through it, it certainly seems like the guard you point out in Bug 1044536 is the problem: 

>     if (!this._syncKeyBundle) {
>      // We are given kA/kB as hex.
>      this._syncKeyBundle = deriveKeyBundle(Utils.hexToBytes(userData.kB));
>     }

When we are re-logging in here, we already have a syncKeyBundle, so we don't update it, which causes us to use the wrong key.

It seems that the guard was introduced in this refactoring:

http://hg.mozilla.org/mozilla-central/diff/2b1aeb3de102/services/sync/modules/browserid_identity.js
In bug 982848 we also stopped removing the syncKeyBundle during errors when fetching the token. This was done for complicated reasons, but I think we should have continued to remove it during authentication errors.
Attachment #8477722 - Flags: feedback?(rfkelly)
Attachment #8477722 - Flags: feedback?(mhammond)
I decided the easiest fix to this problem was to make sure this.resetCredentials is called whenever we initialize the identity module with the current identity.

However, this raises other questions about how we are handling forced re-auths. When we trigger a forced reauth, we essentially re-initialize the current identity module rather than destroy the current one that it is "login rejected state" and create a new one for the newly re-authed user (which is what happens when the user clicks "Disconnect" and then logs back in). Although my patch may be the quick and dirty fix, a better fix might be what we do something along the lines above.

Thoughts?

FWIW, this bug is currently screwing users in prod though, I believe.
An easy way to test this if you are building Firefox is to follow Ryan's STR, but in _ensureValidToken, change

>    if (this.hasValidToken()) {

to 

>    if (false && this.hasValidToken()) { 

This will cause the first browser to get into the reconnect to sync state immediately after the password reset on the second browser (if you trigger Sync Now on each browser).
Comment on attachment 8477722 [details] [diff] [review]
0001-Bug-1056523-Ensure-sync-credentials-are-reset-during.patch

Review of attachment 8477722 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense to me.  I'm a big fan of re-starting from as controlled a state as possible.

> -        // Drop the sync key bundle, but still expect to have one.

nit: I think there still needs to be some sort of key-bundle-related comment here, otherwise the rest of the comment is pretty obscure.
Maybe something along the lines of "We should expect to have a syncKeyBundle, despite not actually having one.  This ensures that the UI blah blah blah."
Attachment #8477722 - Flags: feedback?(rfkelly) → feedback+
Comment on attachment 8477722 [details] [diff] [review]
0001-Bug-1056523-Ensure-sync-credentials-are-reset-during.patch

Review of attachment 8477722 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, I think this looks good - but I'm worried about edge-cases we've struck before we might hit again, and the lack of any tests in this area makes this a real possibility.  So I guess I'd r+ this but would want it flagged for a concerted QA effort.  I opened bug 1057916 for better tests (but as I mention in that bug, it's non-trivial)
Attachment #8477722 - Flags: feedback?(mhammond) → feedback+
http://mxr.mozilla.org/mozilla-central/source/services/sync/modules/browserid_identity.js?force=1#407(In reply to Ryan Kelly [:rfkelly] from comment #7)
> > -        // Drop the sync key bundle, but still expect to have one.
> 
> nit: I think there still needs to be some sort of key-bundle-related comment
> here, otherwise the rest of the comment is pretty obscure.
> Maybe something along the lines of "We should expect to have a
> syncKeyBundle, despite not actually having one.  This ensures that the UI
> blah blah blah."

What this does is set this._shouldHaveSyncKeyBundle = true, but the currentAuthState probably doesn't need it set *for this code path* because currentAuthState will probably do the right thing with this._authFailureReason:

http://mxr.mozilla.org/mozilla-central/source/services/sync/modules/browserid_identity.js?force=1#407

It only relies on this._shouldHaveSyncKeyBundle if this._authFailureReason is missing.

It's an unsatisfying comment, but I can add it. The alternative would be to remove the line "this._shouldHaveSyncKeyBundle = true;", which is cringeworthy, but probably ok.

Which do you prefer Mark?
Flags: needinfo?(mhammond)
I don't think we can remove the set of _shouldHaveSyncKeyBundle, otherwise ensureLoggedIn() etc will do the wrong thing.  This code is quite messy - .whenReadyToAuthenticate, ._shouldHaveSyncKeyBundle and ._authFailureReason are all reflecting the same basic underlying state, which is bad.  But not bad enough we should take the risk of fixing it here.

So TBH, I think the patch as it stands is fine, but an improved comment might read something like:

// this._authFailureReason being non-null ensures we are in the correct currentAuthState, and this._shouldHaveSyncKeyBundle being true ensures everything that cares knows that there is no authentication dance still under way.

all other async function= true// This will arrange for us to be in the right 'currentAuthState'
// such that UI will show the right error.		// such that UI will show the right error.
Flags: needinfo?(mhammond)
(oops - please ignore the last 2 lines of that :)
Attachment #8477722 - Attachment is obsolete: true
Attachment #8479968 - Flags: review?(mhammond)
As far as a test plan, we need to test Ryan's original STR in comment 1 as well as other normal Sync flows. We'd also want to make sure other flows around change and reset password continue to work as expected.

Unfortunately, testing flows with change and reset password are somewhat time consuming because you have to wait for cached Sync authentication credentials to expire. I've created bug 1059391 to create a debugging pref to not cache sync auth credentials to make this process easier.
Attachment #8479968 - Flags: review?(mhammond) → review+
Whiteboard: [qa+]
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
We might want to consider uplifting this later after it's gotten some qa/testing in Nightly.
Assignee: nobody → ckarlof
Flags: qe-verify+
OS: Linux → All
Hardware: x86_64 → All
Whiteboard: [qa+]
Chris, could you fill an uplift request for 33 (beta)? thanks
Flags: needinfo?(ckarlof)
Comment on attachment 8479968 [details] [diff] [review]
0001-Bug-1056523-Ensure-sync-credentials-are-reset-during.patch

Approval Request Comment
[Feature/regressing bug #]: 
[User impact if declined]: Users could lock themselves our of their FxA Sync account after a password reset.
[Describe test coverage new/current, TBPL]: manual testing
[Risks and why]: low; this patch clears state during re-initialization of the module (after password reset) that should have been done on initial landing.
[String/UUID change made/needed]:
Attachment #8479968 - Flags: approval-mozilla-beta?
Flags: needinfo?(ckarlof)
Comment on attachment 8479968 [details] [diff] [review]
0001-Bug-1056523-Ensure-sync-credentials-are-reset-during.patch

Beta+
Attachment #8479968 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I`m trying to reproduce the issue on old Nightly following steps from comment 0. I`m stuck at point 4). Is there a way to hurry up the process there? I`ve been waiting for almost two hours and I`m still connected to Sync (hitting sync now will start a sync). 
I saw bug 1059391 has a pref that could help but it was introduced just after this fix was pushed to m-c. Is there a workaround to speed this step? Or maybe some other steps?
Flags: needinfo?(ckarlof)
Damn, I guess I should have landed those in reverse order. #priorities

The only thing I can think of is to provide you of a build of a legacy nightly with bug 1059391 manually applied. Other thoughts :markh?
Flags: needinfo?(ckarlof) → needinfo?(mhammond)
I've no better idea other than to do as step 4 says, and "wait an hour" - however, I think Bogdan did that and still couldn't repro.  Chris, is there any reason you can think of as to why that's not failing as described in comment 0?
Flags: needinfo?(mhammond) → needinfo?(ckarlof)
A couple of details:

In step 3), make sure you log in to sync in the 2nd browser after reseting the password.

After waiting an hour, you may have to manually trigger a sync (e.g., Sync Now) to see the error.
Flags: needinfo?(ckarlof)
Finally I was able to reproduce the error using old Nightly (2014-08-25) on Windows 7 64bit and Mac OS X 10.9.5 (after ~ one hour it decided to show the error). Using latest Aurora and Firefox 33 beta 7 I am not able to see "Wrong Recovery Key" error. 
The console does show some js warnings https://pastebin.mozilla.org/6659680. Is this something we should worry about?
Flags: needinfo?(ckarlof)
> 1411997665542   Sync.Engine.Bookmarks      WARN  DATA LOSS: Both local and remote changes to record: toolbar

Famous last words, but those are probably unrelated to this bug. It's warning of a conflict between changes to a record made by two different clients, something that is not unlikely to come up during client disconnection triggered during your test.
Flags: needinfo?(ckarlof)
Ok then, thanks Chris.
Based on comment 27 and comment 28 I will go ahead and close this as verified fixed.
Status: RESOLVED → VERIFIED
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: