Open Bug 1422679 Opened 7 years ago Updated 3 years ago

Disconnected device keeps syncing and never resets sync when no push is received

Categories

(Firefox :: Sync, enhancement, P3)

enhancement

Tracking

()

Tracking Status
firefox59 --- affected

People

(Reporter: markh, Unassigned)

References

Details

When disconnecting a device in my test profiles, I'm not getting push notifications for some reason (which would be a different bug). Without the push notification, a disconnected desktop device: * If it is still running, it continues to Sync (and presumably continues until the token server token expires). * Whether running or not, it just enters a "please reconnect" state once it notices the session token is bad. Contrast this with the experience when push does work - the device is completely kicked out of sync. (It's not immediately clear what we should do about this)
> When disconnecting a device in my test profiles Disconnecting through device manager on accounts.firefox.com? Our options for speeding this up include: * Decreasing tokenserver token lifetimes and/or FxA certificate lifetimes * Adding some backchannel notification to tokenserver, so that it can lock the device out server-side
Flags: needinfo?(markh)
(In reply to Ryan Kelly [:rfkelly] from comment #1) > > When disconnecting a device in my test profiles > > Disconnecting through device manager on accounts.firefox.com? Yes. > Our options for speeding this up include: Thanks - the other part of this is that even when we fail to get a new token we still haven't hard-reset like we do on a push. To do that, we probably also need a specific response which indicates our session token has been explicitly revoked (or we treat all invalid session tokens as being revoked and hard-reset sync, but I bet there are many sharp edge-cases there)
Flags: needinfo?(markh)
See Also: → 1423015
(In reply to Mark Hammond [:markh] from comment #2) > (or we treat all invalid session tokens as being revoked > and hard-reset sync, but I bet there are many sharp edge-cases there) We can't do that, as IIUC, we rely on that state for normal password changes or resets. So ISTM that in an ideal world, the 401 for an invalid session token should ideally indicate if it was explicitly revoked, and if so, desktop would disconnect sync completely.
Yeah, the difficulty here is that we currently just delete the sessionToken, so there's no way for us to tell you *why* it got deleted. We can change this but it'll be a bit of a refactor on the server side, and I wouldn't mind doing a bit of other cleanup while we're in there. How urgent is this to fix from the FxA side of things?
Flags: needinfo?(markh)
(In reply to Ryan Kelly [:rfkelly] from comment #4) > Yeah, the difficulty here is that we currently just delete the sessionToken, > so there's no way for us to tell you *why* it got deleted. That makes sense. > We can change > this but it'll be a bit of a refactor on the server side, and I wouldn't > mind doing a bit of other cleanup while we're in there. How urgent is this > to fix from the FxA side of things? Hard to tell without understanding why the push notification isn't working - ie, the more reliable that is, the less of a priority this is :)
Flags: needinfo?(markh)
Fair point :-) Hopefully Bug 1423015 is as straightforward a it appears and we'll have a server-side fix out shortly.
This isn't ideal but we expect push to work in most cases and is alot of work to fix otherwise.
Priority: -- → P3
> * Adding some backchannel notification to tokenserver, so that it can lock the device out server-side From the discussion in 1457851, I think I suddenly understand how to action this bug. In FxA we already have this notion of a "generation number", a monotonically increasing integer that's used to lock out clients after certain security events. We currently increment the generation number when there's a password change or reset, but that's an implementation detail. I think we should: * Update FxA to increment the generation number on any event where a client should lose access, such as disconnecting a device * Push generation number updates out to tokenserver via the existing SQS backchannel, like we already do for password reset This would mean that, when a device is disconnected, tokenserver will quickly learn about the new generation number and will reject access attempts by clients whose certificates contain that or lower generation number. Devices that still have an active session token should use it to fetch a new certificate with the updated generation number, while the disconnected device will not be able to do so and will hence be quickly locked out of further syncs. The only trick here from the client side would be to ensure that, on a 401 "invalid-generation" error from the tokenserver, they first try to get an updated identity certificate from FxA rather than bailing all the way out to prompting the user to login.
> The only trick here from the client side would be to ensure that, on a 401 "invalid-generation" error from > the tokenserver, they first try to get an updated identity certificate from FxA rather than bailing all the > way out to prompting the user to login. Mark, any chance you know the client behaviour here off the top of your head?
Flags: needinfo?(markh)
(In reply to Ryan Kelly [:rfkelly] from comment #9) > > * Adding some backchannel notification to tokenserver, so that it can lock the device out server-side > > From the discussion in 1457851, I think I suddenly understand how to action > this bug. In FxA we already have this notion of a "generation number", a > monotonically increasing integer that's used to lock out clients after > certain security events. ... > * Update FxA to increment the generation number on any event where a client > should lose access, such as disconnecting a device > * Push generation number updates out to tokenserver via the existing SQS > backchannel, like we already do for password reset > > This would mean that, when a device is disconnected, tokenserver will > quickly learn about the new generation number and will reject access > attempts by clients whose certificates contain that or lower generation > number. Right - as we discussed in IRC, in the current world, the token server will still hand out tokens to a client where the password has been changed but which hasn't yet attempted to update the certificate. So the duration of the validity of the *certificate* is more relevant than the duration of the validity of the storage node token. Ryan's proposal will mean that the token server will start refusing to hand out tokens after the password change, making the duration of the validity of the storage node token relevant. This means that after a password change, devices may continue to sync for *up to* ~50 minutes (ie, 80% of the default token duration of 1 hour) instead of up to ~12 hours (the default duration for a certificate.) This is obviously an improvement, but I'm not sure it actually goes far enough in the worst case. (FWIW, desktop never persists the storage token to disk, but does hold it in memory and happily reuses valid tokens across syncs) > The only trick here from the client side would be to ensure that, on a 401 > "invalid-generation" error from the tokenserver, they first try to get an > updated identity certificate from FxA rather than bailing all the way out to > prompting the user to login. This code is messy, but I'm pretty sure it works as you desire - on a 401 from the token server we throw away the certificate - it is the error fetching the new certificate that will force us back to the "please re-login" state - and IIUC, in your scenario, that cert fetch will succeed, so we should be golden.
Flags: needinfo?(markh)
> devices may continue to sync for *up to* ~50 minutes (ie, 80% of the default token duration of 1 hour) I wonder how many times we re-use a tokenserver token in practice, vs generate a new one. It could be interesting to add telemetry around this ("synced using a newly-fetched token" vs "synced using an existing token") in order to know whether the current token lifetime is working out well in practice.
(In reply to Ryan Kelly [:rfkelly] from comment #12) > > devices may continue to sync for *up to* ~50 minutes (ie, 80% of the default token duration of 1 hour) > > I wonder how many times we re-use a tokenserver token in practice, vs > generate a new one. It could be interesting to add telemetry around this > ("synced using a newly-fetched token" vs "synced using an existing token") > in order to know whether the current token lifetime is working out well in > practice. By default we sync every 10 minutes - or more often if data in the profile changes "enough" (where "enough" depends on the data - eg, each bookmark change will sync immediately, whereas not every new history visit will force a sync, but we need ~300 history changes before that forces a sync). We could add telemetry, but I see no reason to doubt that for a browser session that lasts ~ 1 hour, we probably sync around 5 times reusing the same token. It wouldn't be too difficult to force a new token every sync, and obviously another mitigation would be to drop the default duration on the token server.
Indeed. Mostly I just don't recall how we settled on the value of "1 hour" for token lifetime in production.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.