Closed Bug 1042109 Opened 10 years ago Closed 10 years ago

Sync fails, password reset doesn't help

Categories

(Firefox :: Sync, defect)

defect
Not set
normal

Tracking

()

VERIFIED DUPLICATE of bug 1056523

People

(Reporter: dustin, Unassigned)

Details

(Whiteboard: [qa+])

Attachments

(2 files)

Yesterday, I started getting gray bars.  Figuring they're usually transient, I ignore them until today, when I clicked "Preferences".  The prefs UI told me I was logged out (well, that's how I read "Please sign in to reconnect .."), so I clicked "Sign In".  The password I have in my password file for the account didn't work, which is odd because it's not like I change this password frequently.  So I clicked "Forgot Password", got a reset email, and reset it to the password in my password file.

An hour or so later, I'm still getting gray bars.

about:sync-log for the latest sync:

1406043502611	Sync.ErrorHandler	DEBUG	Flushing file log.
1406043502616	Sync.BrowserIDManager	ERROR	Failed to fetch a token for authentication: AuthenticationError([object Object])
1406043502618	Sync.BrowserIDManager	INFO	currentAuthState returning error.login.reason.account due to previous failure
1406043502619	Sync.Status	DEBUG	Status.login: error.login.reason.account => error.login.reason.account
1406043502619	Sync.Status	DEBUG	Status.service: error.login.failed => error.login.failed
1406043502624	Sync.BrowserIDManager	INFO	currentAuthState returning error.login.reason.account due to previous failure
1406043502624	Sync.Status	DEBUG	Status.login: error.login.reason.account => error.login.reason.account
1406043502624	Sync.Status	DEBUG	Status.service: error.login.failed => error.login.failed
1406043502633	Sync.BrowserIDManager	INFO	currentAuthState returning error.login.reason.account due to previous failure
1406043502633	Sync.Status	DEBUG	Status.login: error.login.reason.account => error.login.reason.account
1406043502633	Sync.Status	DEBUG	Status.service: error.login.failed => error.login.failed
1406043502643	Sync.BrowserIDManager	INFO	currentAuthState returning error.login.reason.account due to previous failure
1406043502643	Sync.Status	DEBUG	Status.login: error.login.reason.account => error.login.reason.account
1406043502643	Sync.Status	DEBUG	Status.service: error.login.failed => error.login.failed
1406043502685	Sync.BrowserIDManager	INFO	currentAuthState returning error.login.reason.account due to previous failure
1406043502685	Sync.Status	DEBUG	Status.login: error.login.reason.account => error.login.reason.account
1406043502685	Sync.Status	DEBUG	Status.service: error.login.failed => error.login.failed
1406043502715	Sync.BrowserIDManager	INFO	currentAuthState returning error.login.reason.account due to previous failure
1406043502715	Sync.Status	DEBUG	Status.login: error.login.reason.account => error.login.reason.account
1406043502715	Sync.Status	DEBUG	Status.service: error.login.failed => error.login.failed
1406043502744	Sync.BrowserIDManager	INFO	currentAuthState returning error.login.reason.account due to previous failure
1406043502744	Sync.Status	DEBUG	Status.login: error.login.reason.account => error.login.reason.account
1406043502744	Sync.Status	DEBUG	Status.service: error.login.failed => error.login.failed
1406043502773	Sync.BrowserIDManager	INFO	currentAuthState returning error.login.reason.account due to previous failure
1406043502773	Sync.Status	DEBUG	Status.login: error.login.reason.account => error.login.reason.account
1406043502773	Sync.Status	DEBUG	Status.service: error.login.failed => error.login.failed
1406043502774	Sync.BrowserIDManager	INFO	currentAuthState returning error.login.reason.account due to previous failure
1406043502774	Sync.Status	DEBUG	Status.login: error.login.reason.account => error.login.reason.account
1406043502774	Sync.Status	DEBUG	Status.service: error.login.failed => error.login.failed
1406043502775	Sync.ErrorHandler	DEBUG	Log cleanup threshold time: 1405179502775
1406043502775	Sync.ErrorHandler	DEBUG	No logs to clean up.
1406043503071	Sync.Resource	DEBUG	mesg: GET fail 401 https://sync-61-us-west-2.sync.services.mozilla.com/1.5/3520378/info/collections
1406043503071	Sync.Resource	DEBUG	GET fail 401 https://sync-61-us-west-2.sync.services.mozilla.com/1.5/3520378/info/collections
1406043503071	Sync.Service	WARN	401: login failed.
1406043503071	Sync.Service	DEBUG	Cluster value = null
1406043503072	Sync.Status	DEBUG	Status.login: error.login.reason.account => error.login.reason.account
1406043503072	Sync.Status	DEBUG	Status.service: error.login.failed => error.login.failed
1406043503072	Sync.SyncScheduler	DEBUG	Clearing sync triggers and the global score.

Clicking preferences, I see that I'm logged out again.  Logging in, the password works this time, and Firefox says syncing will begin again.

So, I have no idea
 1. What happened to my account to start all of this - do passwords expire?  account compromise?
 2. Why resetting the password didn't work to restart sync
 3. Whether sync will work correctly now
At worst, there's a bug somewhere here.  At best, this is a UX problem, as I'm pretty baffled.

I'm happy to provide any additional data I can.
I believe this is a known issue, but adding some extra eyes to look over the specific errors. I have also been getting yellow and gray bars with sync 1.5 accounts.
Whiteboard: [qa+]
The current policy is that when you change or reset your password, all your attached devices get disconnected and you have to re-enter your new password on all your devices (including the one you reset your password on). This is even true on the device after you choose your new password. You unfortunately have to re-enter it for Sync on the same device. We hope to improve that in the future.
That makes sense (but probably deserves some UX work).

So the known issue was that existing passwords were invalidated?
I believe the "known issue" that James is referring to is the UX paper cuts resulting from the policies about password reset and change that I described.
OK, so the password loss is a bit of a mystery, then.  I hadn't changed anything with my sync account for weeks before this occurred.
I just got around to re-authenticating my desktop (comment 0 was on my laptop), and syncing now immediately fails with "Wrong Recovery Key" and

1406325134935	Sync.ErrorHandler	DEBUG	Flushing file log.
1406325134936	Sync.Service	DEBUG	Exception: Aborting sync, remote setup failed JS Stack trace: sync@enginesync.js:85:7 < onNotify@service.js:1255:7 < WrappedNotify@util.js:143:1 < WrappedLock@util.js:98:9 < _lockedSync@service.js:1249:1 < sync/<@service.js:1240:7 < WrappedCatch@util.js:72:9 < sync@service.js:1228:5
1406325134937	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1406325134943	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1406325134946	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1406325134949	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1406325134952	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1406325134956	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1406325134959	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1406325134962	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1406325134963	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1406325134963	Sync.ErrorHandler	DEBUG	Log cleanup threshold time: 1405461134963
1406325134972	Sync.ErrorHandler	DEBUG	No logs to clean up.
1406325139985	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1406325156413	Sync.Tracker.History	DEBUG	Saving changed IDs to history
1406325169621	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1406325169623	Sync.ErrorHandler	DEBUG	Beginning user-triggered sync.
1406325169624	Sync.Service	DEBUG	User-Agent: Firefox/30.0 FxSync/1.32.0.20140605174243.
1406325169624	Sync.Service	INFO	Starting sync at 2014-07-25 17:52:49
1406325169624	Sync.Service	DEBUG	In sync: should login.
1406325169625	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1406325169625	Sync.Status	DEBUG	Status.service: success.status_ok => success.status_ok
1406325169625	Sync.Service	INFO	Logging in user djmitche@gmail.com
1406325169626	Sync.Service	DEBUG	Caching URLs under storage user base: https://sync-61-us-west-2.sync.services.mozilla.com/1.5/4219102/
1406325169626	Sync.BrowserIDManager	INFO	Fetching assertion and token from: https://token.services.mozilla.com/1.0/sync/1.5
1406325169627	FirefoxAccounts	DEBUG	already verified
1406325169628	Sync.BrowserIDManager	DEBUG	Getting an assertion
1406325169628	FirefoxAccounts	DEBUG	enter getAssertion()
1406325169628	FirefoxAccounts	DEBUG	getKeyPair: already have a keyPair
1406325169629	FirefoxAccounts	DEBUG	 getCertificate already had one
1406325169629	FirefoxAccounts	DEBUG	getAssertionFromCert
1406325169630	FirefoxAccounts	DEBUG	getAssertionFromCert returning signed: true
1406325169631	Sync.BrowserIDManager	DEBUG	Getting a token
1406325169763	Sync.BrowserIDManager	DEBUG	Successfully got a sync token
1406325169884	Sync.Resource	DEBUG	mesg: GET success 200 https://sync-61-us-west-2.sync.services.mozilla.com/1.5/4219102/info/collections
1406325169884	Sync.Resource	DEBUG	GET success 200 https://sync-61-us-west-2.sync.services.mozilla.com/1.5/4219102/info/collections
1406325169884	Sync.Service	DEBUG	Fetching global metadata record
1406325169885	Sync.BrowserIDManager	DEBUG	_ensureValidToken already has one
1406325170013	Sync.Resource	DEBUG	mesg: GET success 200 https://sync-61-us-west-2.sync.services.mozilla.com/1.5/4219102/storage/meta/global
1406325170013	Sync.Resource	DEBUG	GET success 200 https://sync-61-us-west-2.sync.services.mozilla.com/1.5/4219102/storage/meta/global
1406325170014	Sync.Service	DEBUG	Weave Version: 1.32.0 Local Storage: 5 Remote Storage: 5
1406325170014	Sync.Service	INFO	Sync key is up-to-date: no need to upgrade.
1406325170014	Sync.Service	DEBUG	Fetching and verifying -- or generating -- symmetric keys.
1406325170015	Sync.BrowserIDManager	DEBUG	_ensureValidToken already has one
1406325170138	Sync.Resource	DEBUG	mesg: GET success 200 https://sync-61-us-west-2.sync.services.mozilla.com/1.5/4219102/info/collections
1406325170138	Sync.Resource	DEBUG	GET success 200 https://sync-61-us-west-2.sync.services.mozilla.com/1.5/4219102/info/collections
1406325170138	Sync.Service	INFO	Testing info/collections: {"passwords":1406317284.25,"tabs":1406317284.83,"clients":1406317283,"crypto":1406044161.07,"forms":1406317283.17,"meta":1406044210.76,"prefs":1406317284.49,"bookmarks":1406044163.79,"addons":1406044210.62,"history":1406317283.72}
1406325170138	Sync.CollectionKeyManager	INFO	Testing for updateNeeded. Last modified: 0
1406325170139	Sync.Service	INFO	collection keys reports that a key update is needed.
1406325170139	Sync.BrowserIDManager	DEBUG	_ensureValidToken already has one
1406325170265	Sync.Resource	DEBUG	mesg: GET success 200 https://sync-61-us-west-2.sync.services.mozilla.com/1.5/4219102/storage/crypto/keys
1406325170265	Sync.Resource	DEBUG	GET success 200 https://sync-61-us-west-2.sync.services.mozilla.com/1.5/4219102/storage/crypto/keys
1406325170265	Sync.CollectionKeyManager	INFO	Updating collection keys...
1406325170266	Sync.CollectionKeyManager	WARN	Got exception "Record SHA256 HMAC mismatch: should be 3e235456236ed28a3b958991a4e85da172319fede435b70bef59afe8cf8a1731, is 4ba9d4d45ec55e84f1dbca5f3445a6d6a21e210a5ef63d03bf35d0250537b26a" decrypting storage keys with sync key.
1406325170266	Sync.CollectionKeyManager	INFO	Aborting updateContents. Rethrowing.
1406325170266	Sync.Service	WARN	Got exception "Record SHA256 HMAC mismatch: should be 3e235456236ed28a3b958991a4e85da172319fede435b70bef59afe8cf8a1731, is 4ba9d4d45ec55e84f1dbca5f3445a6d6a21e210a5ef63d03bf35d0250537b26a" fetching cryptoKeys.
1406325170266	Sync.Status	DEBUG	Status.login: error.login.reason.recoverykey => error.login.reason.recoverykey
1406325170266	Sync.Status	DEBUG	Status.service: success.status_ok => error.login.failed
1406325170266	Sync.Status	DEBUG	Status.sync: error.sync.reason.credentials_changed => error.sync.reason.credentials_changed
1406325170266	Sync.Status	DEBUG	Status.service: error.login.failed => error.sync.failed
1406325170266	Sync.Service	WARN	Failed to fetch symmetric keys. Failing remote setup.
1406325170266	Sync.Service	WARN	Remote setup failed.
1406325170266	Sync.SyncScheduler	DEBUG	Clearing sync triggers and the global score.

I've retried this a few times with the same result.  So, it looks like just re-setting the password wasn't enough..
Ouch, "Record SHA256 HMAC mismatch", this should not be happening with the new system.  Do you have any additional logs from earlier sync attempts, on either device?  It would be great it you can attach whatever logs you have available and we can try to piece together what went wrong.

One theory: this has something to do with resetting your password to the same value as you initially used.  This has given us trouble in the past and may have regressed. I can try to reproduce this behaviour and see if any clues show up.

It's also possible that some client-side race caused it to upload a "keys" record encrypted with an old master key, and now you're getting HMAC mismatch errors when using the correct master key.  Going through the reset password flow again may unblock you from this state (but unfortunately won't help us get to the bottom of what has gone wrong).

Also, /cc Danny in case he has additional perspective on the password-stoppped-working-for-no-apparent-reason problem that was the genesis of this bug.
Flags: needinfo?(dustin)
I did reset the password to what I thought was its previous value, although that previous value hadn't worked for logins.

I have a *lot* of logs - hopefully a zip of the logs dir is adequate!
Flags: needinfo?(dustin)
Awesome, thank you.  I will dig in.

From the laptop logs, the earliest auth-related error I can see is:

"""
TokenServerClientServerError({"now":"2014-07-22T00:33:09.338Z","message":"Authentication failed.","cause":"invalid-credentials","response_body":"{\"status\": \"invalid-credentials\", \"errors\": [{\"location\": \"body\", \"name\": \"\", \"description\": \"Unauthorized\"}]}","response_headers":{"content-type":"application/json; charset=UTF-8","date":"Tue, 22 Jul 2014 00:33:09 GMT","x-timestamp":"1405989189","content-length":"110","connection":"keep-alive"},"response_status":401})
"""

The "invalid-credentials" status here is interesting, as it means the submitted assertion was invalid in some way.  It was not an x-client-state or generation-number problem, since those have dedicated status strings.  Maybe it was using an expired cert?  Maybe we sent you a busted cert?  Unclear.

This would have caused the browser to put you in the "logged out" state, where it asks you to re-enter your password.

There is no log recording the part where you reset your password, I guess because it's not hooked up to the main sync engine and its logging machinery.  But we do get subsequent error logs where it's trying to use its now-invalid-due-to-password-change auth token to fetch a new certificate:

"""
1406043502476	FirefoxAccounts	DEBUG	(Response) /certificate/sign: code: 401 - Status text: Unauthorized
1406043502477	FirefoxAccounts	DEBUG	Clock offset vs https://api.accounts.firefox.com/v1: -477
1406043502477	FirefoxAccounts	DEBUG	Received 401 for /certificate/sign: retrying
1406043502606	FirefoxAccounts	DEBUG	(Response) /certificate/sign: code: 401 - Status text: Unauthorized
1406043502607	FirefoxAccounts	DEBUG	Clock offset vs https://api.accounts.firefox.com/v1: -607
1406043502608	FirefoxAccounts	ERROR	error POSTing /certificate/sign: {"code":401,"errno":110,"error":"Unauthorized","message":"Invalid authentication token in request signature","info":"https://github.com/mozilla/fxa-auth-server/blob/master/docs/api.md#response-format"}
1406043502608	FirefoxAccounts	ERROR	HAWK.signCertificate error: {"code":401,"errno":110,"error":"Unauthorized","message":"Invalid authentication token in request signature","info":"https://github.com/mozilla/fxa-auth-server/blob/master/docs/api.md#response-format"}
"""

Followed by a very unhelpful [object Object]:

"""
1406043502616	Sync.BrowserIDManager	ERROR	Failed to fetch a token for authentication: AuthenticationError([object Object])
"""

That's the last error log, so I guess that after this point the new password was successfully entered and sync went back to working ok.  Which is unfortunate because this is probably the part where it uploads the new-and-now-failing key bundle.
The desktop logs show a successful sync from before the trouble started on the laptop.

After the password-changing stuff happens on the laptop, it starts getting "invalid-client-state" errors:

"""
1406044802531	Sync.BrowserIDManager	ERROR	Authentication error in _fetchTokenForUser: AuthenticationError(TokenServerClientServerError({"now":"2014-07-22T16:00:02.530Z","message":"Authentication failed.","cause":"invalid-client-state","response_body":"{\"status\": \"invalid-client-state\", \"errors\": [{\"location\": \"body\", \"name\": \"\", \"description\": \"Unauthorized\"}]}","response_headers":{"content-type":"application/json; charset=UTF-8","date":"Tue, 22 Jul 2014 16:00:02 GMT","x-timestamp":"1406044802","content-length":"111","connection":"keep-alive"},"response_status":401})
"""

This is expected, and is supposed to tell the client that it needs to go refresh its encryption keys.  But it doesn't seem to do so - this first occurrence is followed by many dozens of instances of the same failure.

In error-1406054408914 we eventually see it discover that its auth token is invalid, when it goes to refresh its certificate at the auth server:

"""
1406091140165	FirefoxAccounts	ERROR	error POSTing /certificate/sign: {"code":401,"errno":110,"error":"Unauthorized","message":"Invalid authentication token in request signature","info":"https://github.com/mozilla/fxa-auth-server/blob/master/docs/api.md#response-format"}
1406091140165	FirefoxAccounts	ERROR	HAWK.signCertificate error: {"code":401,"errno":110,"error":"Unauthorized","message":"Invalid authentication token in request signature","info":"https://github.com/mozilla/fxa-auth-server/blob/master/docs/api.md#response-format"}
1406091140166	Sync.BrowserIDManager	ERROR	Authentication error in _fetchTokenForUser: AuthenticationError([object Object]
"""

This is again followed by a great many similar logs, perhaps from subsequent background sync attempts.  I'm a little surprised that it keeps trying to use the obviously-not-working auth token.

This lasts all the way through to error-1406325134935, where you re-enter your password, the client tries to fetch its sync encryption keys, and discovers the HMAC mismatch.  It stays in that state until the end of the available logs.
So, some things for us to dig in to:

* Try to find out why you were getting the "invalid-credentials" errors from the laptop, by looking in the server logs.  This may not be easy to correlate, but we can try.

* File a follow-up bug to log something better than "[object Object]" in the Sync.BrowserIDManager error logger.

* Come up with some hypotheses about why your password stopped working in the first place.

* File a follow-up bug to avoid lots of failing requests to /certificate/sign if we've been told that the auth token is invalid.

* Go looking for race conditions that might result in bad key reteival/upload and this HMAC error.  Ugh.

Chris, who is managing desktop sync client code at the moment?  IIRC :markh is on holidays for quite some time.
Flags: needinfo?(ckarlof)
> * Come up with some hypotheses about why your password stopped working in the first place.

Let's do the obvious one first I guess...Dustin, is it possible that this was a typo, either when entering the initial password at account creation, or when transferring it into your password file for safe-keeping?

Is your laptop still syncing happily despite the errors on the desktop machine?  Will signing-out and signing-in on your desktop unbreak things?  And if you're feeling adventurous: Will signing-out and then signing-in on your laptop put you in the same broken state as the desktop?  

Based on the logs, I also believe that doing another run at the password-reset flow has a good chance of digging your account out of this broken state.  If you do decide to try this, please go into about:config and enable the "services.sync.log.appender.file.logOnSuccess" option.  This might allow us to spot bad-key-generation shenanigans if they happen a second time around.
Flags: needinfo?(dustin)
> * File a follow-up bug to log something better than "[object Object]" in the Sync.BrowserIDManager error logger.

Filed Bug 1044528.
> * File a follow-up bug to avoid lots of failing requests to /certificate/sign if we've been told that the auth token is invalid.

Filed Bug 1044530.
> * Try to find out why you were getting the "invalid-credentials" errors from the laptop, by looking in the server logs.

Filed Bug 1044532 to see what we can find...
> * Go looking for race conditions that might result in bad key reteival/upload and this HMAC error.

And I took a wild stab at this in Bug 1044536, but will otherwise leave that to the experts...
Wow, much conclusions!

(In reply to Ryan Kelly [:rfkelly] from comment #14)
> Let's do the obvious one first I guess...Dustin, is it possible that this
> was a typo, either when entering the initial password at account creation,
> or when transferring it into your password file for safe-keeping?

It's certainly possible, and unfortunately that's as precise as I can be.  I did have the wrong account name (dustin@mozilla.com) in the password file (both browsers had cached djmitche@gmail.com), so *something* was wrong.

> Is your laptop still syncing happily despite the errors on the desktop
> machine?

Yes.

>           Will signing-out and signing-in on your desktop unbreak things? 

Yes, cool!

> And if you're feeling adventurous: Will signing-out and then signing-in on
> your laptop put you in the same broken state as the desktop?  

No, that sync'd successfully after sign-out/sign-in.

> Based on the logs, I also believe that doing another run at the
> password-reset flow has a good chance of digging your account out of this
> broken state.  If you do decide to try this, please go into about:config and
> enable the "services.sync.log.appender.file.logOnSuccess" option.  This
> might allow us to spot bad-key-generation shenanigans if they happen a
> second time around.

I didn't do this, since the account seems to be working properly now.  If it would provide useful data to you, though, I'm happy to try.
Flags: needinfo?(dustin)
OK, it sounds like the desktop machine was the one with the mis-matched key, which is by far the better of the two options.  I'll try to reproduce by following what you did above with two difference devices.

> I didn't do this, since the account seems to be working properly now.
> If it would provide useful data to you, though, I'm happy to try.

Thanks, but given that things are working now, trying this is probably of no further value.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
:rfkelly did you still want info/feedback from :ckarlof?
Nope.  I still plan to try to reproduce this though.
Flags: needinfo?(ckarlof)
FYI I managed to reproduce the HMAC errors using a two-device password reset flow similar to what seems to have happened here.  Filed separately as Bug 1056523.
Resolution: FIXED → DUPLICATE
Good enough.
Status: RESOLVED → VERIFIED
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: