Closed Bug 1189148 Opened 9 years ago Closed 8 years ago

Refresh FxA certificate when the tokenserver gives a 401

Categories

(Firefox :: Sync, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox49 --- fixed

People

(Reporter: rfkelly, Assigned: tcsc)

References

Details

Attachments

(1 file)

When the tokenserver responds with a "401 Unauthorized", we currently retreat straight to a state of "you must reconnect to sync".  It's possible that the auth error is due to some change in the FxA configuration, e.g. a key rotation as disucced in Bug 1186238.

Instead, we should treat a 401 from the tokenserver as a signal to try refreshing our FxA certificate, and only go all the way to "reconnect" if that operation fails.  So there's these layers of falling back from auth error:

  * 401 from storage node -> reauth with tokenserver
  * 401 from tokenserver -> refresh FxA certificate
  * 401 from FxA -> now you really do have to re-enter your password
Flags: firefox-backlog+
It would be helpful to understand how we react to the different states, too.
Priority: -- → P2
See Also: → 1256459
Next steps are the investigate which of the above steps we're doing and which we're aren't.
Assignee: nobody → tchiovoloni
Status: NEW → ASSIGNED
Comment on attachment 8748863 [details]
MozReview Request: Bug 1189148 - Attempt to refresh the FXA certificate when a 401 is received from the token server. r?markh

https://reviewboard.mozilla.org/r/50559/#review47413

tl;dr - doesn't do anything ;)

::: services/fxaccounts/FxAccounts.jsm:564
(Diff revision 1)
>  
>    /**
>     * returns a promise that fires with the assertion.  If there is no verified
>     * signed-in user, fires with null.
>     */
> -  getAssertion: function getAssertion(audience) {
> +  getAssertion: function getAssertion(audience, generateFreshCertificate=false) {

spaces around operators in default args too :) - here any most other places - however, I think we can probably avoid new default args.

::: services/sync/modules/browserid_identity.js:629
(Diff revision 1)
>          return token;
>        })
>        .catch(err => {
>          // TODO: unify these errors - we need to handle errors thrown by
>          // both tokenserverclient and hawkclient.
> -        // A tokenserver error thrown based on a bad response.
> +        // A tokenserver error thrown based on a bad response. The fist time

typo: "fist" - and I'm not sure what the comment is trying to say, but it looks here is where you meant to retry the token fetch.

I also think it might be clearer to just add a new public method on FxA called, say, invalidateCertificate() which would just set { cert: null } in the account state object - that would avoid all the new default params and extra checks. Then when handling the auth error we'd just call set this.identity._token to null, call fxa.invalidateCert() and do a single retry to promiseClusterURL - which should then re-enter getAssertion(), which should get a new certificate.

::: services/sync/tests/unit/test_browserid_identity.js:494
(Diff revision 1)
> +  browseridManager._tokenServerClient = mockTSC;
> +
> +  yield browseridManager.initializeWithCurrentIdentity();
> +  yield browseridManager.whenReadyToAuthenticate.promise;
> +
> +  do_check_true(haveRefreshedCert);

This test passes without any other changes from this patch O_o. It should be verifying that signCert was called twice, but the test passes and it's only called once (and best I can tell, the mocked tokenserver is only called once too).
Attachment #8748863 - Flags: review?(markh)
(In reply to Ryan Kelly [:rfkelly] from comment #0)
>   * 401 from tokenserver -> refresh FxA certificate

Ryan, if we still have a valid keypair can we reuse it, or should we discard that along with the cert?
Flags: needinfo?(rfkelly)
(In reply to Mark Hammond [:markh] from comment #4)
> Comment on attachment 8748863 [details]

Also, the test should be sure to cover the 2 cases of the refreshed cert succeeding (which should end up invisible to Sync), vs the refresh still failing with a 401 (which should end up looking like the current 401 case to Sync)
> if we still have a valid keypair can we reuse it,

I believe it should be fine to re-use, as there's nothing in the keypair that's tied to server state.
Flags: needinfo?(rfkelly)
Comment on attachment 8748863 [details]
MozReview Request: Bug 1189148 - Attempt to refresh the FXA certificate when a 401 is received from the token server. r?markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/50559/diff/1-2/
Attachment #8748863 - Flags: review?(markh)
> I also think it might be clearer to just add a new public method on FxA
> called, say, invalidateCertificate() which would just set { cert: null } in
> the account state object - that would avoid all the new default params and
> extra checks. Then when handling the auth error we'd just call set
> this.identity._token to null, call fxa.invalidateCert() and do a single
> retry to promiseClusterURL - which should then re-enter getAssertion(),
> which should get a new certificate.

I'm not sure exactly what you mean here. Who should be calling promiseClusterURL? That appears to be a local function inside BrowserIDClusterManager#_findCluster.  I've put up a patch that *might* be what you're saying, but I suspect not...


> This test passes without any other changes from this patch O_o. It should be
> verifying that signCert was called twice, but the test passes and it's only
> called once (and best I can tell, the mocked tokenserver is only called once
> too).

Ah, I had been under the impression that the token would be requested before the certificate, so on the first pass it would still exercise the code in question.

> Also, the test should be sure to cover the 2 cases of the refreshed cert succeeding 
> (which should end up invisible to Sync), vs the refresh still failing with a 401 
> (which should end up looking like the current 401 case to Sync)

I believe this is handled by `test_getTokenErrors` in that same file.
Comment on attachment 8748863 [details]
MozReview Request: Bug 1189148 - Attempt to refresh the FXA certificate when a 401 is received from the token server. r?markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/50559/diff/2-3/
(In reply to Thom Chiovoloni [:tcsc] from comment #9)
> I'm not sure exactly what you mean here. Who should be calling
> promiseClusterURL? That appears to be a local function inside
> BrowserIDClusterManager#_findCluster.  I've put up a patch that *might* be
> what you're saying, but I suspect not...

Yeah, I was thinking the retry would be in _findCluster() because the first time we fetch the token at initialize time we've just fetched the certificate so it's "guaranteed" to be good. However, that's not a great assumption as getAssertion() may actually return a previously generated assertion - I'm fairly confident it never will in the current world (ie, that this will be the first call to getAssertion()) but that's a bad assumption in general.

So yeah, I agree having that logic in _fetchTokenForUser is the correct place.

Regarding the retry I was thinking a patch like:

index 20865d4..f1b07ab 100644
--- a/services/sync/modules/browserid_identity.js
+++ b/services/sync/modules/browserid_identity.js
@@ -595,6 +595,17 @@ this.BrowserIDManager.prototype = {
       .then(() => maybeFetchKeys())
       .then(() => getAssertion())
       .then(assertion => getToken(tokenServerURI, assertion))
+      .catch(err => {
+        // If we get a 401 fetching the token it may be that our certificate
+        // needs to be regenerated.
+        if (!err.response || err.response.status !== 401) {
+          return Promise.reject(err);
+        }
+        log.warn("Token server returned 401, refreshing certificate and retrying token fetch");
+        return fxa.invalidateCertificate()
+          .then(() => getAssertion())
+          .then(assertion => getToken(tokenServerURI, assertion))
+      })
       .then(token => {
         // TODO: Make it be only 80% of the duration, so refresh the token
         // before it actually expires. This is to avoid sync storage errors

thus avoiding any recursion and the default parameters then required to prevent infinite recursion. To my mind that's a little cleaner and your test passes. What do you think? (And the fact that code has tokenServerURI as both a local *and* a parameter even when the local of the same name is in scope is confusing, so we should consider fixing that (IMO by just nuking it as a param)

> > Also, the test should be sure to cover the 2 cases of the refreshed cert succeeding 
> > (which should end up invisible to Sync), vs the refresh still failing with a 401 
> > (which should end up looking like the current 401 case to Sync)
> 
> I believe this is handled by `test_getTokenErrors` in that same file.

Yep, agreed, thanks!
Ah, yes, that is substantially simpler. And I agree about ditching the tokenServerURI argument.
Comment on attachment 8748863 [details]
MozReview Request: Bug 1189148 - Attempt to refresh the FXA certificate when a 401 is received from the token server. r?markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/50559/diff/3-4/
Attachment #8748863 - Flags: review?(markh) → review+
Comment on attachment 8748863 [details]
MozReview Request: Bug 1189148 - Attempt to refresh the FXA certificate when a 401 is received from the token server. r?markh

https://reviewboard.mozilla.org/r/50559/#review47927

Awesome, thanks!
Keywords: checkin-needed
Needs rebasing.
Keywords: checkin-needed
Flags: needinfo?(tchiovoloni)
Comment on attachment 8748863 [details]
MozReview Request: Bug 1189148 - Attempt to refresh the FXA certificate when a 401 is received from the token server. r?markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/50559/diff/4-5/
Comment on attachment 8748863 [details]
MozReview Request: Bug 1189148 - Attempt to refresh the FXA certificate when a 401 is received from the token server. r?markh

https://reviewboard.mozilla.org/r/50559/#review50468

This last revision seems to have stuff from unrelated bugs - bug 1268761 and bug 1266640?
Attachment #8748863 - Flags: review+
Comment on attachment 8748863 [details]
MozReview Request: Bug 1189148 - Attempt to refresh the FXA certificate when a 401 is received from the token server. r?markh

https://reviewboard.mozilla.org/r/50559/#review50472

I can't believe reviewboard sucks in this way :(
Attachment #8748863 - Flags: review+
Yeah, it's definitely a bit weird.
Flags: needinfo?(tchiovoloni)
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/fb7cb004b26a
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: