Closed Bug 977502 Opened 10 years ago Closed 10 years ago

Sync will never retry if initial authentication fails.

Categories

(Firefox :: Sync, defect, P1)

defect

Tracking

()

VERIFIED FIXED
Firefox 30
Tracking Status
firefox29 + verified
firefox30 --- verified

People

(Reporter: markh, Assigned: markh)

References

Details

Attachments

(3 files, 4 obsolete files)

If FxA fails to authenticate, things don't work quite as expected.  Specifically, Sync goes into a CLIENT_NOT_CONFIGURED state which prevents Sync from ever automatically retrying.  

Note too that even if sync *did* retry, there is no mechanism by which browserid_identity can be asked to restart the auth dance again.

This is needed to fix bug 969528 - currently *every* problem causes Sync Prefs to enter a "must re-login" state, and every other part of the UI to enter a "please setup sync" state.
Currently most errors authenticating will cause Status.service to be CLIENT_NOT_CONFIGURED - If failure occurs before we have a token, we return LOGIN_FAILED_NO_USERNAME for currentAuthState, which causes CLIENT_NOT_CONFIGURED.

This patch arranges for currentAuthState to return either LOGIN_FAILED_LOGIN_REJECTED (if a "real" 401 auth error) or LOGIN_FAILED_NETWORK_ERROR for other problems.  The former code is treated as transient by Sync (ie, it should retry as normal) while the latter is considered fatal so the retry semantics differ appropriately.

While Sync will now retry, it has no way of forcing bid_identity to restart the login process, so this patch adds a new identity.login() method, which asks the identity to start a new login process - the legacy identity does nothing.  Sync calls this method in Service.login(), which is only called if we aren't already logged in.

A fly in this ointment is that even after .login() is called, we aren't yet .readyToAuthenticate - so that login will actually fail with Status.login = LOGIN_FAILED_NOT_READY;.  So there's a new observer in policies for 'weave:service:login:complete', which tells Sync to sync now if a sync is due.  browserid_identity sends this notification once things are done, so everything (should ;) work as expected.
Attachment #8382870 - Flags: feedback?(rnewman)
Attachment #8382870 - Flags: feedback?(ckarlof)
This patch adds somewhat "deep" testing for auth failures.  It arranges for fake HTTP responses to be delivered to both the token server and the auth server for hawk.  If arranges for a real 401 response and also a 200 non-json response - the former causes sync to treat the error as a "real" auth error that requires you to sign back in, while the latter is treated as "transient" (just like 500 and all other non-401 responses are).  While the fake responses make the mock setup a little more painful, it seemed the right thing to do rather so the full response-processing part of both those modules is exercised.
Attachment #8382873 - Flags: feedback?(warner-bugzilla)
Attachment #8382873 - Flags: feedback?(ckarlof)
(In reply to Mark Hammond [:markh] from comment #1)
> This patch arranges for currentAuthState to return either
> LOGIN_FAILED_LOGIN_REJECTED (if a "real" 401 auth error) or
> LOGIN_FAILED_NETWORK_ERROR for other problems.  The former code is treated
> as transient by Sync (ie, it should retry as normal) while the latter is
> considered fatal so the retry semantics differ appropriately.

Doh - got them back-to-front - LOGIN_FAILED_LOGIN_REJECTED is fatal, LOGIN_FAILED_NETWORK_ERROR is transient.
Now with moar manual testing :)

This patch also removes the special handling for LOGIN_FAILED_NOT_READY in policies.js - now we have an observer that tells us when a login has completed we recover via this.

Note that this does *not* cause a pending sync to happen immediately - eg, the flow is now something like:

* User says "sync now"
* Sync notices the identity isn't ready, so asks it to log in.
* Sync still fails due to IDENTITY_NOT_READY, and backs off.
* Finally we login, which policy.js sees via an observer.
* This causes checkSyncStatus() to be called, which arranges for sync to happen on its normal schedule.

Ideally, in the last point, Sync would start immediately.  The thorn in that ointment is that just because policy.js sees the login:complete notification doesn't mean it should sync now - it does in the scenario above, but not in the "normal" case.  Given this patch still allows sync to continue (where it didn't before) I think we can just fix the "but I said sync now" issue in a followup.
Attachment #8382870 - Attachment is obsolete: true
Attachment #8382870 - Flags: feedback?(rnewman)
Attachment #8382870 - Flags: feedback?(ckarlof)
Attachment #8383458 - Flags: feedback?(rnewman)
Attachment #8383458 - Flags: feedback?(ckarlof)
Comment on attachment 8382870 [details] [diff] [review]
0001-Bug-977502-better-management-of-login-failure-states.patch

Review of attachment 8382870 [details] [diff] [review]:
-----------------------------------------------------------------

This pains me a lot. I'm still digesting to see if there's a less painful solution than "fail all the time, add more state to record failure".

::: services/sync/modules/browserid_identity.js
@@ +118,5 @@
> +    if (!this.readyToAuthenticate) {
> +      return; // we are already logging in, so don't start a new one.
> +    }
> +    // Note we do not return a promise - this just kicks things off.
> +    this.initializeWithCurrentIdentity();

That also means that any errors in this promise chain won't be caught. That's not a new problem (it was previously called in an observer) but it's worth considering.

::: services/sync/modules/identity.js
@@ +96,5 @@
>    /**
> +   * Begin an explicit, async login process.
> +   */
> +  login: function() {
> +    // This identity manager has no explcit login process, so nothing to do.

explicit

::: services/sync/modules/service.js
@@ +943,5 @@
>        if (passphrase) {
>          this.identity.syncKey = passphrase;
>        }
>  
> +      this.identity.login();

I don't think this is the right term for this. What you're talking about is "kick off an asynchronous initialization process for the current credentials we have".

That this is async seems fundamentally at odds with the very next line -- _checkSetup.
Attachment #8382870 - Attachment is obsolete: false
Attached patch Now with moar blocking! (obsolete) — Splinter Review
(In reply to Richard Newman [:rnewman] from comment #5)
> Comment on attachment 8382870 [details] [diff] [review]
> 0001-Bug-977502-better-management-of-login-failure-states.patch
> 
> This pains me a lot. I'm still digesting to see if there's a less painful
> solution than "fail all the time, add more state to record failure".

Indeed.  Stepping back a little, I realized a better solution is to just have Service.login() block waiting for the login.  Service.sync() is the only thing that calls .login(), so this actually seems sane.  .verifyLogin is called more often and in ways that blocking would be bad - but blocking during a .sync() call sounds reasonable given what it does, and given it is protected via the locks etc, things should work out fine.

Assuming I'm not missing something there, this makes things *alot* saner.  For one, it means the sync will never fail due to Status.login being LOGIN_FAILED_NOT_READY (so this patch still removes that code).  It also means that sync() will automagically do the right thing WRT retries etc, which simplifies things alot and saves us from the "fail all the time, add more state to record failure" state you correctly identity as a problem.  We also don't need new observers etc.

> ::: services/sync/modules/browserid_identity.js
> @@ +118,5 @@
> > +    if (!this.readyToAuthenticate) {
> > +      return; // we are already logging in, so don't start a new one.
> > +    }
> > +    // Note we do not return a promise - this just kicks things off.
> > +    this.initializeWithCurrentIdentity();
> 
> That also means that any errors in this promise chain won't be caught.
> That's not a new problem (it was previously called in an observer) but it's
> worth considering.

This new patch has the same "problem" - however, note that initializeWithCurrentIdentity() itself has trailing rejection handlers that log the failure, so it's really not necessary to have extra ones.

> I don't think this is the right term for this. What you're talking about is
> "kick off an asynchronous initialization process for the current credentials
> we have".
> 
> That this is async seems fundamentally at odds with the very next line --
> _checkSetup.

Indeed - so now it's .ensureLogin() that returns a promise which is resolved when a login process is complete, starting a new one if there isn't already one in progress and we are in an error state.  service.js does the spinningly-thang on the promise. Seems much simpler - I hope I'm not missing something.
Attachment #8382870 - Attachment is obsolete: true
Attachment #8383458 - Attachment is obsolete: true
Attachment #8383458 - Flags: feedback?(rnewman)
Attachment #8383458 - Flags: feedback?(ckarlof)
Attachment #8384409 - Flags: feedback?(rnewman)
Attachment #8384409 - Flags: feedback?(ckarlof)
Comment on attachment 8384409 [details] [diff] [review]
Now with moar blocking!

Review of attachment 8384409 [details] [diff] [review]:
-----------------------------------------------------------------

This looks sane. I would love a test.

::: services/sync/modules/browserid_identity.js
@@ +23,5 @@
>  Cu.import("resource://gre/modules/FxAccounts.jsm");
>  
>  // Lazy imports to prevent unnecessary load on startup.
> +XPCOMUtils.defineLazyModuleGetter(this, "Weave",
> +                                  "resource://services-sync/main.js");

Recursive imports, ho! :P

@@ +130,5 @@
> +    if (!this._shouldHaveSyncKeyBundle) {
> +      // We are already in the process of logging in.
> +      return this.whenReadyToAuthenticate.promise;
> +    }
> +    // If we are already happy then there is nothing more to do.

Nit: newlines before comments. (Throughout.)

@@ +134,5 @@
> +    // If we are already happy then there is nothing more to do.
> +    if (Weave.Status.login == STATUS_OK) {
> +      return Promise.resolve();
> +    }
> +    // Similarly, if we have a previous failure that implies an explicit login

Careful with terminology here. Sync's "login" ain't the same as "login" as you intend it -- you mean "re-entering credentials", right?

@@ +150,1 @@
>    initializeWithCurrentIdentity: function(isInitialSync=false) {

Worth leaving a comment for this function that it out-of-band 'returns' this.whenReadyToAuthenticate.

::: services/sync/modules/policies.js
@@ -165,5 @@
> -          this._loginNotReadyCounter++;
> -          this._log.debug("Couldn't log in: identity not ready.");
> -          this._log.trace("Scheduling a sync at IDENTITY_NOT_READY_RETRY_INTERVAL * " +
> -                          this._loginNotReadyCounter);
> -          this.scheduleAtInterval(IDENTITY_NOT_READY_RETRY_INTERVAL *

Do we still need IDENTITY_NOT_READY_RETRY_INTERVAL?

::: services/sync/modules/service.js
@@ +951,5 @@
> +      // Ask the identity manager to explicitly login now.
> +      let cb = Async.makeSpinningCallback();
> +      this.identity.ensureLoggedIn().then(
> +        () => cb(),
> +        err => cb(err, null)

.then(cb, cb)?

@@ +953,5 @@
> +      this.identity.ensureLoggedIn().then(
> +        () => cb(),
> +        err => cb(err, null)
> +      );
> +      // just let any errors bubble up - they've more context than we do!

s/just/Just, newline before.
Attachment #8384409 - Flags: feedback?(rnewman) → feedback+
Comment on attachment 8384409 [details] [diff] [review]
Now with moar blocking!

Review of attachment 8384409 [details] [diff] [review]:
-----------------------------------------------------------------

Overall, I like the direction here, but I need to think about this a bit more.

::: services/sync/modules/browserid_identity.js
@@ +142,5 @@
> +      return Promise.reject();
> +    }
> +    // So - we've a previous auth problem and aren't currently attempting to
> +    // login - so fire that off.
> +    this.initializeWithCurrentIdentity();

I'm confused why complete re-initialization should happen here.

@@ +526,5 @@
> +        // Drop the sync key bundle, but still expect to have one.
> +        // This will arrange for us to be in the right 'currentAuthState'
> +        // such that UI will show the right error.
> +        this._shouldHaveSyncKeyBundle = true;
> +        this._syncKeyBundle = null;

Do you think we should change this to this.resetSyncKey()? We would then have to set this._shouldHaveSyncKeyBundle = true afterwards, because that function sets it to false.

@@ +528,5 @@
> +        // such that UI will show the right error.
> +        this._shouldHaveSyncKeyBundle = true;
> +        this._syncKeyBundle = null;
> +        Weave.Status.login = this._authFailureReason;
> +        Services.obs.notifyObservers(null, "weave:service:login:error", null);

Tracing through the code, I feel like we might need a Weave.Service.logout() here or in the handler for weave:service:login:error (need to be careful because logout() will fire "weave:service:logout:finish", which is handled in browserid_identity.js. However, we are likely abusing that notification. Please see my comment in Bug 967015.) I think we'll need a Service.logout() because that is what causes Service._shouldLogin() to return false, which will then cause Service.login() to fail and prevent the sync from happening.

::: services/sync/modules/policies.js
@@ -166,5 @@
> -          this._log.debug("Couldn't log in: identity not ready.");
> -          this._log.trace("Scheduling a sync at IDENTITY_NOT_READY_RETRY_INTERVAL * " +
> -                          this._loginNotReadyCounter);
> -          this.scheduleAtInterval(IDENTITY_NOT_READY_RETRY_INTERVAL *
> -                                  this._loginNotReadyCounter);

Was this stuff added by us?
Attachment #8384409 - Flags: feedback+ → feedback?(rnewman)
Comment on attachment 8384409 [details] [diff] [review]
Now with moar blocking!

Review of attachment 8384409 [details] [diff] [review]:
-----------------------------------------------------------------

sorry for unplussing your feedback, rnewman
Attachment #8384409 - Flags: feedback?(rnewman) → feedback+
Comment on attachment 8382873 [details] [diff] [review]
0002-Bug-977502-add-tests-with-mock-http-responses-simula.patch

Review of attachment 8382873 [details] [diff] [review]:
-----------------------------------------------------------------

nice improvements
Attachment #8382873 - Flags: feedback?(ckarlof) → feedback+
This addresses the feedback and discussion so far.  There is a test for test_ensureLoggedIn() - it would be better if there was one covering a full sync, but that has proven difficult in the short term.
Attachment #8384409 - Attachment is obsolete: true
Attachment #8384409 - Flags: feedback?(ckarlof)
Attachment #8386031 - Flags: review?(rnewman)
Attachment #8386031 - Flags: review?(ckarlof)
The hawk and token-server tests.
Attachment #8382873 - Attachment is obsolete: true
Attachment #8382873 - Flags: feedback?(warner-bugzilla)
Attachment #8386032 - Flags: review?(ckarlof)
Whiteboard: [qa+]
Comment on attachment 8386031 [details] [diff] [review]
0001-Bug-977502-part-1-better-management-of-login-failure.patch

Review of attachment 8386031 [details] [diff] [review]:
-----------------------------------------------------------------

I'm liking it. Since the try build was a bust, I still need to do some manual testing though.

::: services/sync/modules/browserid_identity.js
@@ +534,5 @@
> +        // Drop the sync key bundle, but still expect to have one.
> +        // This will arrange for us to be in the right 'currentAuthState'
> +        // such that UI will show the right error.
> +        this._shouldHaveSyncKeyBundle = true;
> +        this._syncKeyBundle = null;

I'm still not sure if we should necessarily clear the syncKeyBundle for all errors, or just auth errors. I'm not sure if we need to do Service.logout() here for auth errors. Also for LOGIN_FAILED_LOGIN_REJECTED, it would be nice to arrange for sync to be de-scheduled somehow, because there will be no auto-recovery here.

::: services/sync/modules/policies.js
@@ -166,5 @@
> -          this._log.debug("Couldn't log in: identity not ready.");
> -          this._log.trace("Scheduling a sync at IDENTITY_NOT_READY_RETRY_INTERVAL * " +
> -                          this._loginNotReadyCounter);
> -          this.scheduleAtInterval(IDENTITY_NOT_READY_RETRY_INTERVAL *
> -                                  this._loginNotReadyCounter);

I think these are part of legacy sync. Why are we removing them?
Attachment #8386031 - Flags: review?(ckarlof) → review+
Comment on attachment 8386032 [details] [diff] [review]
0002-Bug-977502-part-2-add-tests-with-mock-http-responses.patch

Review of attachment 8386032 [details] [diff] [review]:
-----------------------------------------------------------------

Nice improvements.
Attachment #8386032 - Flags: review?(ckarlof) → review+
Comment on attachment 8386031 [details] [diff] [review]
0001-Bug-977502-part-1-better-management-of-login-failure.patch

Review of attachment 8386031 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/sync/modules/browserid_identity.js
@@ +144,5 @@
> +      return Promise.reject();
> +    }
> +
> +    // So - we've a previous auth problem and aren't currently attempting to
> +    // login - so fire that off.

Nit: "log in"

::: services/sync/tests/unit/test_browserid_identity.js
@@ +235,5 @@
> +  // arrange for no logged in user.
> +  let fxa = browseridManager._fxaService
> +  let signedInUser = fxa.internal.currentAccountState.signedInUser;
> +  fxa.internal.currentAccountState.signedInUser = null;
> +  Status.login = LOGIN_FAILED_NO_USERNAME;

You should ensure that shouldHaveSyncKeyBundle is false here, otherwise you're not testing much…
Attachment #8386031 - Flags: review?(rnewman) → review+
Part 2 is not going to apply on aurora, but this one will.  I'll request approval once they get to m-c
Attachment #8387382 - Flags: review+
Attachment #8387382 - Attachment description: Rebased against aurora → Part 2 - Rebased against aurora
https://hg.mozilla.org/mozilla-central/rev/6e4f7fc7e1f3
https://hg.mozilla.org/mozilla-central/rev/00825a70f56f
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → Firefox 30
Comment on attachment 8386031 [details] [diff] [review]
0001-Bug-977502-part-1-better-management-of-login-failure.patch

Review of attachment 8386031 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/sync/modules/browserid_identity.js
@@ -479,5 @@
> -          // such that UI will show the right error.
> -          this._shouldHaveSyncKeyBundle = true;
> -          this._syncKeyBundle = null;
> -          Weave.Status.login = this.currentAuthState;
> -          Services.obs.notifyObservers(null, "weave:service:login:error", null);

I think the removal of this line might be a regression. There are at least two significant listeners of this:

1) Sync UI, which use it as a signal to update the "logged in" state
2) The SyncScheduler, which when Status.login enters an "unrecoverable state", like LOGIN_FAILED_LOGIN_REJECTED, then it stops scheduling syncs, which makes sense.
> 2) The SyncScheduler, which when Status.login enters an "unrecoverable state", like LOGIN_FAILED_LOGIN_REJECTED, then it stops scheduling syncs, which makes sense.

http://hg.mozilla.org/mozilla-central/diff/35a1943f351c/services/sync/modules/policies.js#l1.128
So, maybe not ready for Resolved/Fixed status?
Comment on attachment 8386031 [details] [diff] [review]
0001-Bug-977502-part-1-better-management-of-login-failure.patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Fxa Sync
User impact if declined: Incorrect error state handling.
Testing completed (on m-c, etc.): Landed on m-c
Risk to taking this patch (and alternatives if risky): Low
String or IDL/UUID changes made by this patch: None

Note to sheriffs or whoever lands these: These patches may conflict unless they are landed in the order of: Bug 977502, Bug 967015 and Bug 969528
Attachment #8386031 - Flags: approval-mozilla-aurora?
Comment on attachment 8387382 [details] [diff] [review]
Part 2 - Rebased against aurora

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Fxa Sync
User impact if declined: Incorrect error state handling.
Testing completed (on m-c, etc.): Landed on m-c
Risk to taking this patch (and alternatives if risky): Low
String or IDL/UUID changes made by this patch: None

Note to sheriffs or whoever lands these: These patches may conflict unless they are landed in the order of: Bug 977502, Bug 967015 and Bug 969528
Attachment #8387382 - Flags: approval-mozilla-aurora?
Attachment #8387382 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8386031 [details] [diff] [review]
0001-Bug-977502-part-1-better-management-of-login-failure.patch

Sorry, I missed this patch.
Attachment #8386031 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Blocks: 968439
Keywords: verifyme
Status: RESOLVED → VERIFIED
QA Whiteboard: [qa-]
Keywords: verifyme
Whiteboard: [qa+]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: