Closed Bug 972070 Opened 6 years ago Closed 6 years ago

FxAccounts.jsm doesn't drop all state when a new user logs in

Categories

(Firefox :: Sync, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla30
Tracking Status
firefox29 + fixed
firefox30 + fixed

People

(Reporter: rnewman, Assigned: markh)

References

Details

Attachments

(4 files, 3 obsolete files)

Just disconnected and signed in as a different account. Clicked through the data-mixing warning.

Client proceeded to:

* Sync to the wrong place, getting a 401 and then retrying to the a different place. Server URL not cleared?

* Sync to *a different place than the Android device on the account*. Desktop uid = 17300, Android uid = 17291.
OS: Mac OS X → All
Hardware: x86 → All
I've compared the logged client state value on the phone to one I computed in the Browser Console on desktop, and they're the same.

Hypothesis: is desktop retrying and failing to attach the client state on the retry?
Client state in both cases is 6780aa2bbaed7eee1adfc0d2268db663.
User records on tokenserver:

         uid: 17300
     service: 1
       email: 98b0ca10a626435c9e7afdca16e751a5@api.accounts.firefox.com
        node: https://sync-2-us-east-1.sync.services.mozilla.com
  generation: 1391634310249
client_state: 6780aa2bbaed7eee1adfc0d2268db663
  created_at: 1392247898992
 replaced_at: NULL

         uid: 17291
     service: 1
       email: aa378fcfc2fd43bda32631179463a06c@api.accounts.firefox.com
        node: https://sync-1-us-east-1.sync.services.mozilla.com
  generation: 1392246222455
client_state: 6780aa2bbaed7eee1adfc0d2268db663
  created_at: 1392246381440
 replaced_at: NULL
Hypothesis: desktop is caching a cert somewhere from the previous login, so even though it's using the new password it's not appearing to be the new user.
Disconnecting Sync, restarting the browser, and setting up again with the same credentials sees us hit the wrong server, then 401, then ends up on the same server as Android.

So something about our server URL is being persisted across restarts, but the invalid credential is not.

Here's the log.
I asked Bob to look in the auth DB to see how many splits we have.

I'm defining a split as the existence of multiple auth assignments with different ids but the same X-Client-State -- that is, a client that appears to be a different client, but has the same randomly-generated kB.

We're currently running at a 0.6% error rate, which is pretty bad considering that this should be zero, because it utterly breaks the user experience.
(In reply to Richard Newman [:rnewman] from comment #0)
> Just disconnected and signed in as a different account. Clicked through the
> data-mixing warning.
> 
> Client proceeded to:
> 
> * Sync to the wrong place, getting a 401 and then retrying to the a
> different place. Server URL not cleared?

Yep, I can verify this:

https://pastebin.mozilla.org/4281957

clusterURL isn't being cleared on logout and it's not being set after token server fetch if we already have a clusterURL. Both of these things should probably be addressed.

I'm not sure if clientState isn't being set properly on retry, but it looks like it: http://mxr.mozilla.org/mozilla-central/source/services/sync/modules/browserid_identity.js#419
(In reply to Chris Karlof [:ckarlof] from comment #7)

> Yep, I can verify this:

Thanks for doing the legwork!

> clusterURL isn't being cleared on logout and it's not being set after token
> server fetch if we already have a clusterURL. Both of these things should
> probably be addressed.

We should be setting cluster URL every time we fetch a token. It can differ each time. The client *should* respond correctly when it sees an empty server, but if it doesn't that's a separate bug.
(In reply to Richard Newman [:rnewman] from comment #6)
> I asked Bob to look in the auth DB to see how many splits we have.
> 
> I'm defining a split as the existence of multiple auth assignments with
> different ids but the same X-Client-State -- that is, a client that appears
> to be a different client, but has the same randomly-generated kB.
> 
> We're currently running at a 0.6% error rate, which is pretty bad
> considering that this should be zero, because it utterly breaks the user
> experience.

This function could be improved: http://mxr.mozilla.org/mozilla-central/source/services/sync/modules/browserid_identity.js#411

The X-Client-State header is calculated before the whenVerified promise resolves, which could be significantly later. So maybe a different user could log in after the X-Client-State header is calculated.

On the flip side, if a different user subsequently logs in, then I think all these promises get rejected by FxAccounts.jsm to address such issues. 

warner and I have discussed using a capability object implementation to represent the logged in state passed to Sync, but it didn't make it into Fx29. I think such an approach would make it easier to reason about these things (as opposed to the ambient authority model now, which could potentially change identities in the middle of a series of operations).
(In reply to Richard Newman [:rnewman] from comment #8)
> We should be setting cluster URL every time we fetch a token. It can differ
> each time. 

I agree, it's a bug. Such things happen under the circumstances. :)

> The client *should* respond correctly when it sees an empty
> server, but if it doesn't that's a separate bug.

I'm pretty sure that's not being addressed either, but I also wonder if that is expected to happen under our current model (syncing happily and TS changes your storage URL). 

I also expect we'll find more. 

Thanks for helping to diagnose Richard.
(In reply to Chris Karlof [:ckarlof] from comment #10)

> I'm pretty sure that's not being addressed either, but I also wonder if that
> is expected to happen under our current model (syncing happily and TS
> changes your storage URL).

Coincidentally, we tested this scenario today as part of walking through <https://etherpad.mozilla.org/fxa-sync-backoff-test-plan> for Android.

I expect node reassignments to be less frequent in the Sync 1.5 world, but as far as I know we haven't moved to 100%-reliable distributed storage, so reassignments'll happen eventually as some AMI locks up.

The Sync 1.1 codebase sees node reassignments routinely via 401s during operation (node reassignment is how we do capacity leveling, upgrades, hardware replacements, etc.) so I would expect things to Just Work®.

Bob can correct me if I'm wrong! :D
(In reply to Chris Karlof [:ckarlof] from comment #9)

> warner and I have discussed using a capability object implementation to
> represent the logged in state passed to Sync, but it didn't make it into
> Fx29. I think such an approach would make it easier to reason about these
> things (as opposed to the ambient authority model now, which could
> potentially change identities in the middle of a series of operations).

Did you see the way we do this on Android, with an explicit login state machine?

https://github.com/mozilla-services/android-sync/blob/develop/src/main/java/org/mozilla/gecko/fxa/login/StateFactory.java
On IRC, rnewman said:

10:27 AM <rnewman> markh: there are probably two issues
10:27 AM <rnewman> one is we're starting off pointing at the wrong server, which presumably means we think we already have a token
10:28 AM <rnewman> the other is that the server has two records for us
10:30 AM <rnewman> it's the token server that has the doubled rows
10:30 AM <rnewman> so at some point we requested a token with the new kB and the old assertion, or the right assertion and the old kB
10:30 AM <rnewman> presumably the former

The first issue is, I believe, fairly well understood - our handling of the clusterURL is somewhat broken, meaning that in some cases we might end up using the URL for a different user.  Also, IIUC, this case is handled in the code - attempting to use the wrong URL will give a 401 and cause us to find the correct node.

IIUC, that's bad, but *should not* cause the token server to have duplicate rows for us, and does not explain how we used the (valid) assertion from one user along with the kB from another.  Looking through the code, I can't see how this could be happening - but some diagnosis with rfkelly on IRC has convinced me that it is indeed true.

Most comments above in this bug are about first issue - using the wrong cluster URL - but the bigger and as-yet unexplained problem is the duplicate token server rows with data conflated from 2 distinct users.

Or am I missing something?
(In reply to Mark Hammond [:markh] from comment #13)

> Most comments above in this bug are about first issue - using the wrong
> cluster URL - but the bigger and as-yet unexplained problem is the duplicate
> token server rows with data conflated from 2 distinct users.

I think I've finally tracked this down - look at |this.cert| usage in FxAccounts.jsm - it's not invalidated as a different user signs in, so I saw the same cert being used for 2 different users.  It *is* cleared when .getKeyPair() resolves, which explains why it's not 100% reproducible.  Same issue seems to exist with this.keyPair, but I didn't actually look for issues around that.
Attachment #8376671 - Flags: feedback?(jparsons)
Wow, nice sleuthing, Mark.  This bug seems like one more justification for re-structuring this module to deal with user objects that contain these properties.  The multiple-user cases are very tricky now.

In the present situation, perhaps it would be enough to delete this.cert and this.keyPair in the abortExistingFlow method, which is called on setSignedInUser and signOut?
(In reply to Jed Parsons (use needinfo, please) [:jedp, :jparsons] from comment #15)
> In the present situation, perhaps it would be enough to delete this.cert and
> this.keyPair in the abortExistingFlow method, which is called on
> setSignedInUser and signOut?

Thinking some more, I think we might also need similar checks to what browserid_identity does - check that when the various promises resolve the same user is still logged in.  Otherwise we could potentially end up with a key fetch (or other promise-based activity) starting, then abortExistingFlow() called, then the earlier promise resolving.  This could still leave us with info for the wrong user stored.

Another alternative is as you say - move to something like a user object.
Attached patch t.patch (obsolete) — Splinter Review
(In reply to Mark Hammond [:markh] from comment #16)

> Thinking some more, I think we might also need similar checks to what
> browserid_identity does - check that when the various promises resolve the
> same user is still logged in.  Otherwise we could potentially end up with a
> key fetch (or other promise-based activity) starting, then
> abortExistingFlow() called, then the earlier promise resolving.

Just to expand on this a little, consider the following existing code:

      if (!this.whenKeysReadyPromise) {
        this.whenKeysReadyPromise = Promise.defer();
        this.fetchAndUnwrapKeys(data.keyFetchToken).then(data => {
          if (this.whenKeysReadyPromise) {
            this.whenKeysReadyPromise.resolve(data);
          }
        });
      }

And note that fetchAndUnwrapKeys() ends up making a network request.

In a worst-case scenario, our attempted protection for .whenKeysReadyPromise() could fail.  Imagine if *during* the fetchAndUnwrapKeys() network call, a we signed the current user out and signed another user in.  Then the initial fetchAndUnwrapKeys() network call returns, and we find that .whenKeysReadyPromise() isn't null - but it's *not* the promise for the user we think it is.

IOW, our protection for .whenKeysReadyPromise does handle the user logging out during the call - it just doesn't handle a different user logging back in and re-establishing a new promise during the call.

This patch is still somewhat messy, but attempts to avoid this problem by using a "generationalPromise" - basically a wrapper around a promise that remembers the generation number it started with, and will not resolve the promise if the generation count has changed.  Note that I'm also yet to adjust the whenVerifiedPromise, but I believe it has the same basic issue - checkEmailStatus() is a network request, so a user could log out then log back in before it completes.

I tried to think up a scheme that uses some kind of "user object", but couldn't think of a scheme that would work - apart from isolation of user data, it's important that old outstanding promises are rejected - if they are resolved with the old user's data, we could still end up with problems.  Also note that with this patch, browserid_identity could be a little more relaxed about constantly checking if the same user is still logged in, as no promises will resolve if that's not true.

Am I mad?  Missing something else obvious?  Any other ideas for a cleaner implementation?  Flagging Jed again, but all feedback solicited!)
Attachment #8376671 - Attachment is obsolete: true
Attachment #8376788 - Flags: feedback?(jparsons)
This patch addresses my concerns in comment 17 - but it's far more intrusive.

It breaks all state in fxAccounts into an AccountState object.  A new state object is created whenever a user signs in our out, and the old state is discarded, making it (somewhat) impossible for fxAccounts to accidentally hold on to state that it shouldn't (like .cert etc).  This also obviates the need for the generation counter.  The idea behind the generationalPromise is also encapsulated in the AccountState object - it has resolve() and reject() methods that throw a special error if the state is no longer current.

I think the patch is fairly complete - maybe pollEmailStatus() and the timer management should also move into the accountState, but that's probably not important - it's just a pending cleanup rather than a potential foot-gun.  However, given it's far more intrusive it's also riskier (even though the tests all pass) - hence my casting a wide net for feedback.
Attachment #8376788 - Attachment is obsolete: true
Attachment #8376788 - Flags: feedback?(jparsons)
Attachment #8376967 - Flags: feedback?(ttaubert)
Attachment #8376967 - Flags: feedback?(jparsons)
Attachment #8376967 - Flags: feedback?(ferjmoreno)
(In reply to Mark Hammond [:markh] from comment #17)
> In a worst-case scenario, our attempted protection for
> .whenKeysReadyPromise() could fail.  Imagine if *during* the
> fetchAndUnwrapKeys() network call, a we signed the current user out and
> signed another user in.  Then the initial fetchAndUnwrapKeys() network call
> returns, and we find that .whenKeysReadyPromise() isn't null - but it's
> *not* the promise for the user we think it is.

Indeed, we discovered that in bug 969892 and filed bug 971173 for it.
Comment on attachment 8376967 [details] [diff] [review]
Use a transient AccountState object

Review of attachment 8376967 [details] [diff] [review]:
-----------------------------------------------------------------

I love splitting this file into smaller modules that are easier to test and take care of fewer things, we should however be cautious about not complicating the module dependency graph even further. By passing the FxAInternal instance to the state object we make testing (and reading) this a lot harder. We should really design those separate modules to have an internal and external API and to have clear dependencies.

Additionally, I don't think that "AccountState" is a good name for that kind of module. The naming suggests that we have an account that can be in multiple states which isn't quite right. Wouldn't a FxAccountsUser module be better? That could have get/setAccountData() methods, expose signIn() and signOut(), own the storage, and keep track of keys and such without knowing *how* to fetch keys or *how* to verify a user.

::: services/fxaccounts/FxAccounts.jsm
@@ +56,5 @@
> +//   );
> +// }
> +// If the state has changed between the function being called and the promise
> +// being resolved, the .resolve() call will actually be rejected.
> +AccountState = function(fxaInternal) {

This should have its own module.

@@ +67,5 @@
> +  signedInUser: null,
> +  whenVerifiedPromise: null,
> +  whenKeysReadyPromise: null,
> +
> +  get isCurrent() this.fxaInternal.currentAccountState === this,

Hmm, we shouldn't do that.

@@ +111,5 @@
> +    );
> +  },
> +
> +  setUserAccountData: function(accountData) {
> +    return this.fxaInternal.signedInUserStorage.get().then(record => {

If this module is queried for user account data why doesn't it own the storage?

@@ +168,5 @@
> +    });
> +    return d.promise.then(result => this.resolve(result));
> +  },
> +
> +  resolve: function(result) {

resolve() and reject() shouldn't be exposed. I don't even think it's a good idea to have them, if the current user signs out we should just reject and have new deferreds.

@@ +240,2 @@
>    this.currentTimer = null;
> +  this.currentAccountState = new AccountState(this);

Having a circular dependency here sounds bad to me.

@@ +427,5 @@
>        clearTimeout(this.currentTimer);
>        this.currentTimer = 0;
>      }
> +    this.currentAccountState.abort();
> +    this.currentAccountState = new AccountState(this);

Especially for testing I think it would be much easier to always keep the same AccountState instance. abort() should take care of everything that is necessary to clean the object.

::: services/sync/tests/unit/test_browserid_identity.js
@@ +45,3 @@
>      fxAccountsClient: new MockFxAccountsClient()
>    });
> +  fxa.internal.currentAccountState.getCertificate = function(data, keyPair, mustBeValidUntil) {

Shouldn't we just pass a mock instance of AccountState to the FxAccounts constructor?
Attachment #8376967 - Flags: feedback?(ttaubert)
(In reply to Tim Taubert [:ttaubert] from comment #20)

> Additionally, I don't think that "AccountState" is a good name for that kind
> of module. The naming suggests that we have an account that can be in
> multiple states which isn't quite right. Wouldn't a FxAccountsUser module be
> better? That could have get/setAccountData() methods, expose signIn() and
> signOut(), own the storage, and keep track of keys and such without knowing
> *how* to fetch keys or *how* to verify a user.

I don't think that makes sense - this object isn't to be exposed and is an implementation detail along the lines of the generationCount.  I agree that AccountState isn't a great name, but I think FxAccountsUser is even more misleading.  It's an object that captures the state of the FxAccounts object such that long running promises don't corrupt the state of the FxAccounts object.  StateSnapshot maybe?

While some kind of "user" object might have made sense initially, I'm explicitly not trying to change the public interface of this module given it's already used by FxOS.

> ::: services/fxaccounts/FxAccounts.jsm
> @@ +56,5 @@
> > +//   );
> > +// }
> > +// If the state has changed between the function being called and the promise
> > +// being resolved, the .resolve() call will actually be rejected.
> > +AccountState = function(fxaInternal) {
> 
> This should have its own module.

I'm not sure I agree, given it will never be imported by anyone else and doesn't expose anything used by anyone else.  I'd be fine with giving it some name that includes "internal", but all in all I think it would be a loss of encapsulation to do this.

> @@ +67,5 @@
> > +  signedInUser: null,
> > +  whenVerifiedPromise: null,
> > +  whenKeysReadyPromise: null,
> > +
> > +  get isCurrent() this.fxaInternal.currentAccountState === this,
> 
> Hmm, we shouldn't do that.

Do what?

> @@ +111,5 @@
> > +    );
> > +  },
> > +
> > +  setUserAccountData: function(accountData) {
> > +    return this.fxaInternal.signedInUserStorage.get().then(record => {
> 
> If this module is queried for user account data why doesn't it own the
> storage?

Because it doesn't own the concept of the "current" user.  It is a snapshot of state, not an object that exposes the concept of a user.  The state and storage are owned by FxAccounts.

> @@ +168,5 @@
> > +    });
> > +    return d.promise.then(result => this.resolve(result));
> > +  },
> > +
> > +  resolve: function(result) {
> 
> resolve() and reject() shouldn't be exposed. I don't even think it's a good
> idea to have them, if the current user signs out we should just reject and
> have new deferreds.

I think that maybe you are missing the point of this object.  Stuff is exposed only to the internal implementation of FxAccounts and nothing else will ever use these objects.  These .resolve and .reject methods are the safety net for FxAccounts - if FxAccounts starts an operation while the state is current but it completes while it is not, the promise is rejected.

Also, we *do* have "new deferreds" for the 2 deferreds actually used, but all other deferreds are implicit - eg, the getCertificate call.

> @@ +240,2 @@
> >    this.currentTimer = null;
> > +  this.currentAccountState = new AccountState(this);
> 
> Having a circular dependency here sounds bad to me.

It's not ideal, but I see no alternative.

> Especially for testing I think it would be much easier to always keep the
> same AccountState instance. abort() should take care of everything that is
> necessary to clean the object.

That can't work - the issue is the pending promise-based calls.  This also makes me think you are maybe missing the point here.  Consider the getCertificate() call - how could .abort() cancel the pending promise-based calls inside that function?

The entire point of this object is that it can stay alive *while it's not current*.  If it completes when it's no longer current, the .reject and .resolve methods do the right thing.  Trying to reuse the same object completely defeats the purpose, and would lead us to the issue which got us here in the first place.

Maybe reading comment 17 again would help?

> ::: services/sync/tests/unit/test_browserid_identity.js
> @@ +45,3 @@
> >      fxAccountsClient: new MockFxAccountsClient()
> >    });
> > +  fxa.internal.currentAccountState.getCertificate = function(data, keyPair, mustBeValidUntil) {
> 
> Shouldn't we just pass a mock instance of AccountState to the FxAccounts
> constructor?

I agree the test setup code is a little messy.  We could pass an AccountState object to the constructor, but obviously that would only live until we discard it and create a new one.  The mock could possibly have a constructor function, but the way the current code is setup, the initial AccountState object is created before the mock has been monkey-patched in.  I decided to leave the test setup slightly messy in the expectation that when we get back to working on tests here, the requirement we have at the time can dictate how to best set this up.  IOW, the test setup code is *already* somewhat poor and messy and isn't going to meet our requirements, so I see no point in putting lots of effort into pretending it does :)
Narrowing the scope of this bug to just the FxAccounts.jsm state issue.  I cloned this bug to bug 973749 for the similar-but-different issues in browserid_identity.
Summary: Setting up Sync for the second time on a device doesn't clear enough prefs → FxAccounts.jsm doesn't drop all state when a new user logs in
Comment on attachment 8376967 [details] [diff] [review]
Use a transient AccountState object

I didn't look into every single line of the patch, but this approach looks good to me overall. Thanks Mark.
Attachment #8376967 - Flags: feedback?(ferjmoreno) → feedback+
Comment on attachment 8376967 [details] [diff] [review]
Use a transient AccountState object

Review of attachment 8376967 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for all your work on this problem, Mark.  This approach looks good to me.

::: browser/components/customizableui/test/browser_946320_tabs_from_other_computers.js
@@ -100,5 @@
> -      accountData: user
> -    },
> -    getCertificate: function(data, keyPair, mustBeValidUntil) {
> -      this.cert = {
> -        validUntil: Date.now() + CERT_LIFETIME,

Like FxAccounts's getCertificate(), I think this should use the internal now() methods so that we consistently correct for clock skew.

@@ +110,5 @@
> +    accountData: user
> +  }
> +  authService._fxaService.internal.currentAccountState.getCertificate = function(data, keyPair, mustBeValidUntil) {
> +    this.cert = {
> +      validUntil: Date.now() + CERT_LIFETIME,

Ditto for compensating for clock skew

::: services/fxaccounts/FxAccounts.jsm
@@ +113,5 @@
> +
> +  setUserAccountData: function(accountData) {
> +    return this.fxaInternal.signedInUserStorage.get().then(record => {
> +      if (!this.isCurrent) {
> +        return this.reject("Another user has signed in");

Elsewhere, e.g., in abort(), the argument to reject() is a new Error object, not a string.  Should we consistently use Error objects?
Attachment #8376967 - Flags: feedback?(jparsons) → feedback+
> browser/components/customizableui/test/
> browser_946320_tabs_from_other_computers.js
> @@ -100,5 @@
> > -      accountData: user
> > -    },
> > -    getCertificate: function(data, keyPair, mustBeValidUntil) {
> > -      this.cert = {
> > -        validUntil: Date.now() + CERT_LIFETIME,
> 
> Like FxAccounts's getCertificate(), I think this should use the internal
> now() methods so that we consistently correct for clock skew.

Whoops - sorry - wrong part of the diff :)  But the comment still applies elsewhere.
Comment on attachment 8376967 [details] [diff] [review]
Use a transient AccountState object

Review of attachment 8376967 [details] [diff] [review]:
-----------------------------------------------------------------

Mark, I think this goes a long ways to make the "logged in state" more atomic and encapsulated with the state of the user's account. FWIW, I believe the Android guys made a state machine to represent the various states the user's account goes through. I've attached a PDF of the state machine warner and nick worked out. It may be a little out of date, but I think it's largely accurate. I see Nick preserved the state names in the Android codebase: http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/fxa/login/StateFactory.java :)
Blocks: 971173
Assignee: nobody → mhammond
Comment on attachment 8376967 [details] [diff] [review]
Use a transient AccountState object

Warner, I should have requested feedback from you earlier - my apologies.  If fxAccounts is going to get a near-term interface overhaul, it might make more sense to just fix the observed problem (that .cert isn't dropped)
Attachment #8376967 - Flags: feedback?(warner-bugzilla)
Priority: -- → P1
(In reply to Mark Hammond [:markh] from comment #28)

> Warner, I should have requested feedback from you earlier - my apologies.

No worries, this is moving in the same direction I was thinking, and that
overhaul isn't looking near-term anymore (too much FxA/OAuth stuff to
design). Your approach is best.

I agree that "State" is a funny name/concept, but think this can eventually
mutate into the one-object-per-account design we were thinking of. I'm in
total agreement with replacing the object completely after a logout, and
immediately rejecting all pending externally-visible promises when that
happens (and guaranteeing that no promises will ever successfully fire after
that point).

For extra credit, one might attempt to kill off the requests that are
in-flight (generally impossible), or prevent further processing from
happening when they resolve (i.e. each handler callback checks an "I'm still
alive" flag before doing anything else, including sending secondary
requests). But as long as the results aren't visible to the callers, it
doesn't really matter.
Attachment #8376967 - Flags: feedback?(warner-bugzilla) → feedback+
(In reply to Brian Warner [:warner :bwarner] from comment #29)
> (In reply to Mark Hammond [:markh] from comment #28)

> I'm in total agreement with ...
> immediately rejecting all pending externally-visible promises when that
> happens (and guaranteeing that no promises will ever successfully fire after
> that point).

This raises a good point Chris and I were discussing yesterday, and I'd like your thoughts.

One thing we've been trying to do is have Sync handle the "logout" notification, and in response begin the process of cleaning up sync (and maybe even let a currently executing sync complete).  In a wider FxA world, Sync may not get advance warning or ability to veto/delay a logout.  However, this cleanup process itself requires the credentials.

One way of doing this would be for Sync to hold a copy of all the credentials and throw them away once it has finished doing its thing in the logout observer, and to pass these objects back into FxAccounts so FxAccounts doesn't try and use the (no longer existing) "current" user - ie, that Sync may need to still use those credentials *after* the point that user is no longer current.

However, "guaranteeing that no promises will ever successfully fire after that point" would defeat that - we would probably require those promises to successfully fire.

Any thoughts on that use-case?
Let's ignore comment 30 for now - we can address that in a followup.

Thanks everyone for the feedback!

(In reply to Jed Parsons (use needinfo, please) [:jedp, :jparsons] from comment #24)
> Comment on attachment 8376967 [details] [diff] [review]

> Like FxAccounts's getCertificate(), I think this should use the internal
> now() methods so that we consistently correct for clock skew.
...
> Ditto for compensating for clock skew

Done.

> Elsewhere, e.g., in abort(), the argument to reject() is a new Error object,
> not a string.  Should we consistently use Error objects?

Yes - done.

(In reply to Brian Warner [:warner :bwarner] from comment #29)
> (In reply to Mark Hammond [:markh] from comment #28)
> 
> I agree that "State" is a funny name/concept, but think this can eventually
> mutate into the one-object-per-account design we were thinking of.

Yeah - I can't come up with a better name.  AccountSnapshot is also misleading in its own way, and AccountStateSnapshot seems too long and isn't objectively better IMO :)  So I've left this.

> For extra credit, one might attempt to kill off the requests that are
> in-flight (generally impossible), or prevent further processing from
> happening when they resolve (i.e. each handler callback checks an "I'm still
> alive" flag before doing anything else, including sending secondary
> requests). But as long as the results aren't visible to the callers, it
> doesn't really matter.

Yeah, I felt that preventing further processing inside the callbacks just added clutter to the code and is going to be invisible to the callers as the promises reject at the end anyway.  Given this should be rare in practice, I decided to leave this alone too.  So no extra credit for me (but that's fine - I'm generally good with just a "pass" ;)

Jed, requesting review from you, but feel free to delegate if you think someone else is more appropriate.
Attachment #8376967 - Attachment is obsolete: true
Attachment #8383489 - Flags: review?(jparsons)
Comment on attachment 8383489 [details] [diff] [review]
0001-Bug-972070-improve-FxAccounts.jsm-state-management.-.patch

Review of attachment 8383489 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for all your work on this, Mark.  This is a huge improvement over what we had, and cleans up a number of other little infelicities beside.

This looks good to me.  Only two tweaks.

::: services/fxaccounts/FxAccounts.jsm
@@ +83,5 @@
> +      this.whenKeysReadyPromise.reject(
> +        new Error("Verification aborted; Another user signing in"));
> +      this.whenKeysReadyPromise = null;
> +    }
> +    this.fxaInternal = null;

Shouldn't this also set cert, keyPair, and signedInUser to null?

@@ +92,5 @@
> +    if (this.signedInUser) {
> +      return this.resolve(this.signedInUser.accountData);
> +    }
> +
> +    return this.fxaInternal.signedInUserStorage.get().then(

I'm trying to figure out if these methods all need to reject if isCurrent() is false.  I think they're ok as they are, given how the AccountState is used in FxAccountsInternal.  If in some pathological case we had a long-lived handle to a stale AccountState, these methods would still fail with 'fxaInternal.signedInUserStorage is not defined', so they still wouldn't return an old state.

@@ +182,5 @@
> +  reject: function(error) {
> +    // It could be argued that we should just let it reject with the original
> +    // error - but this runs the risk of the error being (eg) a 401, which
> +    // might cause the consumer to attempt some remediation and cause other
> +    // problems.

I agree with the reasoning.  But I think it might still be helpful to log the original error.

@@ +237,5 @@
>    // conceivable that while we are waiting to verify one identity, a caller
>    // could start verification on a second, different identity, we need to be
>    // able to abort all work on the first sign-in process.  The currentTimer and
> +  // currentAccountState are used for this purpose.
> +  // (XXX - should the timer be directly on the currentAccountState?)

It feels to me that as long as the polling operations remain in fxAccountsInternal, it's ok for the timer to stay here, too.

@@ -285,5 @@
>        log.debug("Polling aborted; Another user signing in");
>        clearTimeout(this.currentTimer);
>        this.currentTimer = 0;
>      }
> -    this.generationCount++;

Nice that your patch kills this bird, too.

@@ -384,5 @@
> -      // Before writing any data, ensure that a new flow hasn't been
> -      // started behind our backs.
> -      if (this.generationCount !== myGenerationCount) {
> -        return null;
> -      }

Yay.  Again, nice to see this go.

@@ +577,3 @@
>        .then(data => {
>          if (data && !this.isUserEmailVerified(data)) {
> +          this.pollEmailStatus(currentState, data.sessionToken, "start");

This makes so much more sense

@@ +590,5 @@
>      // Login is truly complete once keys have been fetched, so once getKeys()
>      // obtains and stores kA and kB, it will fire the onverified observer
>      // notification.
>      return this.whenVerified(data)
> +      .then(() => this.getKeys());

Good catch

@@ +613,5 @@
>      log.debug("Notifying observers of " + topic);
>      Services.obs.notifyObservers(null, topic, null);
>    },
>  
> +  // XXX - pollEmailStatus should maybe be on the AccountState object?

I think it would be more at home there than here (and if it moved, the timer, as you note above, would go with it).  But given the goals of the present patch, I don't feel that it's necessary right now.

I'm holding out hope that either we can continue moving this module, longer-term, towards the user-object model, or we can replace polling with something less like polling, or both :)

::: services/fxaccounts/tests/xpcshell/test_accounts.js
@@ +523,5 @@
>          // passed to the client to make the hawk call
>          do_check_eq(result, "alice's session token");
>  
>          // Timer was not restarted
> +        do_check_eq(fxa.internal.currentAccountState, aliceState);

According to mdn, do_check_eq checks for equality using '=='.  Is there some way we can test for strict object equality?  (It would be grody, but even injecting a property into aliceState at line 511 and testing for its existence on currentAccountState here would probably work.)

Same would go for the do_check_neq on line 510 above.

r=me with this fix and the nulling of all variables in AccountState.abort().

Thanks, Mark!
Attachment #8383489 - Flags: review?(jparsons) → review+
I think this refactor to use an internal state object for each logged in user is a significant step towards giving this object to dependent services, e.g., Sync. This will enable Sync to do its cleanup with the server after a ONLOGOUT notification.
Thanks Jed!

(In reply to Jed Parsons (use needinfo, please) [:jedp, :jparsons] from comment #32)
> Comment on attachment 8383489 [details] [diff] [review]
> Shouldn't this also set cert, keyPair, and signedInUser to null?

Done.

> I agree with the reasoning.  But I think it might still be helpful to log
> the original error.

Done too.

So I managed to mis-read this comment, and pushed:

https://hg.mozilla.org/integration/fx-team/rev/7378692291b7

And then I saw the other comments I missed :(

> According to mdn, do_check_eq checks for equality using '=='.  Is there some
> way we can test for strict object equality?  (It would be grody, but even
> injecting a property into aliceState at line 511 and testing for its
> existence on currentAccountState here would probably work.)
> 
> Same would go for the do_check_neq on line 510 above.

Pushed a followup:

https://hg.mozilla.org/integration/fx-team/rev/e041e26bc0f8

(In reply to Chris Karlof [:ckarlof] from comment #33)
> I think this refactor to use an internal state object for each logged in
> user is a significant step towards giving this object to dependent services,
> e.g., Sync. This will enable Sync to do its cleanup with the server after a
> ONLOGOUT notification.

Sadly it doesn't get us all the way there, but it is much closer.
Status: NEW → ASSIGNED
(In reply to Jed Parsons (use needinfo, please) [:jedp, :jparsons] from comment #32)

> According to mdn, do_check_eq checks for equality using '=='.  Is there some
> way we can test for strict object equality?  (It would be grody, but even
> injecting a property into aliceState at line 511 and testing for its
> existence on currentAccountState here would probably work.)

The modern approach:

http://mxr.mozilla.org/mozilla-central/source/testing/modules/Assert.jsm

includes Assert.strictEqual.
https://hg.mozilla.org/mozilla-central/rev/7378692291b7
https://hg.mozilla.org/mozilla-central/rev/e041e26bc0f8
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Comment on attachment 8383489 [details] [diff] [review]
0001-Bug-972070-improve-FxAccounts.jsm-state-management.-.patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): FxA Sync
User impact if declined: Their user information might end up, basically, corrupted on the server!
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): Low
String or IDL/UUID changes made by this patch: None
Attachment #8383489 - Flags: approval-mozilla-aurora?
Attachment #8383489 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
This and bug 945449 don't apply on Aurora. Are we missing another patch that should be uplifted first?
Looks like the conflicts are bug 963835 and bug 963835. I don't think we want either on Aurora, so this just needs to be rebased there?
Flags: needinfo?(mhammond)
This is against aurora.  Git managed to cherry-pick it without any conflicts, so I'm not sure why hg got upset.

Also not sure if I need to re-request approval?
https://hg.mozilla.org/releases/mozilla-aurora/rev/12450a1263bd

(note the patch I previously uploaded didn't include the followup that landed on central, but the push to aurora did)
Flags: needinfo?(mhammond)
Depends on: 1053948
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.