Closed Bug 1250531 Opened 8 years ago Closed 8 years ago

[Sync] Duplicate devices when logout/login or disconnect/connect to sync account

Categories

(Firefox :: Sync, defect, P2)

47 Branch
defect

Tracking

()

RESOLVED FIXED
Firefox 48
Tracking Status
firefox47 --- wontfix
firefox48 --- fixed

People

(Reporter: aflorinescu, Assigned: lina)

References

Details

(Whiteboard: [sync-data-integrity])

Attachments

(2 files, 1 obsolete file)

STR:


1. Sign in to FF using two profiles and the same account.
2. On FF with profile 1 open few tabs, sync.
3. On FF with profile 2 sync and open Sync tabs sidebar.
4. In FF profile 1 change device name and re-sync FF profile 1 and profile 2.
5. In FF profile 1 disconnect and reconnect the sync account. (the initial tabs from step2 are still open and unchanged on FF profile 1)
6. Do a sync on FF profile 2 and refresh the Sync Tab Sidebar.


AR:
On the Sync Tab sidebar there are now two devices with the same name. 
Also, in this scenario(take in account step.5) the new duplicate device will show no tabs, although there are tabs open while the initial device will show the correct tabs.

Reproducible on:
Windows10, Windows7, Linux, Mac

Note:
On mobile devices sync, the steps are reduced, just disconnecting and reconnecting the sync account will duplicate the device.(see 1209160 and 670474)
Blocks: 1239084
Flags: firefox-backlog+
This is a side-effect of how Sync manages devices. The new FxA device work will improve this - we probably need a meta-bug to track them.
I suspect this may be because when the user logs out, it first kills the FxA creds, which may prevent the client from cleaning itself up on the Sync server. The next step here is to investigate whether there's a straightforward solution to make this better in the near term or we should just fix it more broadly with the device management work.
Priority: -- → P2
Assigning to myself for investigation.
Assignee: nobody → kcambridge
Status: NEW → ASSIGNED
Blocks: 1254391
Blocks: 1257587
The plot thickens.

Here's the error I see on the first profile when I sign out... https://pastebin.mozilla.org/8864146 It looks like the delete succeeds, so the FxA error might be a red herring. (I set "services.sync.log.appender.dump" to "Trace" manually in services/sync/services-sync.js; otherwise, Sync resets it when I sign out).

And here's what I see on the second profile when I sync, after signing out of the first: https://pastebin.mozilla.org/8864148 ("ZhsXhMex3LNo" is the new client).
So I think we worked out that the problem is that the now-deleted client remains in memory on the other devices, and a mitigation of that would be to perform a full sync of the clients engine each sync (whereas now we do a full sync only on startup). I expect that to be reasonable for the clients engine.

I *think* the same issue will probably exist for the tabs engine too - we will still have an in-memory copy of the tabs from the now-deleted device. We probably don't want to fully sync this engine each sync, but we can probably mitigate it by ignoring any im-memory tabs from a device that the clients engine doesn't know about.

A bit of an aside here - there are 3 cases I know of that will cause the deletion of the remote device to fail:
1) The one Chris mentions - if we sign out after the token becomes invalid, so the delete fails. This is probably rare as tokens have a long life.
2) If you disconnect a device before the first Sync you also will not have a valid token.
3) Disconnecting when offline.

(1) and (2) are the same basic problem - although in (1) we've already done a login dance, just need a new token. For (2) we have never done the login dance. For (3) we are screwed, but that's probably ok
This is an awfully hacky fix until we can integrate the device manager. I haven't fixed up the Sync xpcshell tests yet, but is the general approach sensible?
Attachment #8732447 - Flags: review?(markh) → review+
Comment on attachment 8732447 [details]
MozReview Request: Bug 1250531 - Only show existing remote clients in the Synced Tabs UI. r?markh

https://reviewboard.mozilla.org/r/41189/#review37805

This LGTM, but please see if you can add a simple test for this new branch (ie, ensure the list is filtered when the new function returns false.

::: services/sync/modules/SyncedTabs.jsm:124
(Diff revision 1)
>      let seenURLs = new Set();
>      let parentIndex = 0;
>      let ntabs = 0;
>  
>      for (let [guid, client] in Iterator(engine.getAllClients())) {
> +      if (!Weave.Service.clientsEngine.remoteClientExists(client.id)) {

let's add a small comment here indicating why we have this check.
Comment on attachment 8732446 [details]
MozReview Request: Bug 1250531 - Only show existing remote clients in the Synced Tabs UI. r=markh

https://reviewboard.mozilla.org/r/41187/#review37803

It looks like a simpler solution that doesn't require changing the semantics of the clients collection would be to have the clients engine override _processIncoming, and have it set this.lastSync = 0 before calling the base implementation?

::: services/sync/modules/engines/clients.js:155
(Diff revision 1)
>      if (this._store._remoteClients[id])
>        return this._store._remoteClients[id].type == DEVICE_TYPE_MOBILE;
>      return false;
>    },
>  
> +  _wipeCachedClients() {

I don't think we need this - we should startup with no remote clients and nothing flagged as changed (and the only item the clients engine will flag as changed is this client)
Attachment #8732446 - Flags: review?(markh)
So...there's some additional complexity to consider.

* I added `_wipeCachedClients` because, even though we're pulling down the complete list from the server, deleted clients will still remain in `_remoteClients`. It's similar to the problem with bookmarks that we talked about at today's standup, where we need to compare the local database with the full collection from the server to detect corruption. In this case, we're downloading the full collection, but we still need to compare it against ClientStore to detect deletes.

Unfortunately, `_wipeCachedClients` doesn't go far enough. If we have a command that we want to send to a deleted client, we'll re-upload the record with commands for a client that will never see them.

* For the multi-device case: since we're syncing the entire collection, `count.applied` will always be > 0 for the clients collection. That causes the scheduler to think we always have incoming items (https://dxr.mozilla.org/mozilla-central/rev/6202ade0e6d688ffb67932398e56cfc6fa04ceb3/services/sync/modules/policies.js#230-232), and set the sync interval to immediateInterval.

I can work around this by either changing the condition to `numItems && data != this.service.clientsEngine.name`, or have `_reconcile` apply the record and return false. Icky.

Am I overcomplicating this? :-)
Flags: needinfo?(markh)
(In reply to Kit Cambridge [:kitcambridge] from comment #11)
> So...there's some additional complexity to consider.
> 
> * I added `_wipeCachedClients` because, even though we're pulling down the
> complete list from the server, deleted clients will still remain in
> `_remoteClients`. It's similar to the problem with bookmarks that we talked
> about at today's standup, where we need to compare the local database with
> the full collection from the server to detect corruption. In this case,
> we're downloading the full collection, but we still need to compare it
> against ClientStore to detect deletes.

It looks to me like we don't actually *have* a local database in this case :/ ISTM that if we attempt to send a command to a remote client, but we restart before we've successfully synced, we lose the command (we will persist the fact the client ID has changed and needs to be updated, but IIUC we've lost the commands themselves as the locally-modified client record hasn't been persisted).

So after pulling down the entire list of remote clients, I think it's safe to discard any missing clients completely (eg, from both _remoteClients can from the list of changed clients in the tracker). But it does seem that we will want to reconcile locally-changed clients (ie, if a client is dirty, we need to ensure that after we've pulled down the new list, we re-add any commands the old version had queued and ensure it remains in the modified list, as we still need to upload that record). It *looks* like a change to the update() method would help us do the right thing (ie, instead of blindly doing the |this._remoteClients[record.id] = record.cleartext;| we could merge queued commands we have back into the record and ensure it is marked as modified and thus should be synced back out.

> 
> Unfortunately, `_wipeCachedClients` doesn't go far enough. If we have a
> command that we want to send to a deleted client, we'll re-upload the record
> with commands for a client that will never see them.

Yeah, we should never do that, but I think the above should avoid it - ie, we just discard local records that don't exist from the server (including the fact we think they have changed).

> 
> * For the multi-device case: since we're syncing the entire collection,
> `count.applied` will always be > 0 for the clients collection. That causes
> the scheduler to think we always have incoming items
> (https://dxr.mozilla.org/mozilla-central/rev/
> 6202ade0e6d688ffb67932398e56cfc6fa04ceb3/services/sync/modules/policies.
> js#230-232), and set the sync interval to immediateInterval.
> 
> I can work around this by either changing the condition to `numItems && data
> != this.service.clientsEngine.name`, or have `_reconcile` apply the record
> and return false. Icky.

Or maybe have _reconcile be smart and actually compare the local and remote versions of the record - so even though we are always fetching remote records, we only apply ones that changed - which in most cases should be none.

> Am I overcomplicating this? :-)

No - I think I'm over simplifying it ;) I'm sure there's devil in the detail that I missed.
Flags: needinfo?(markh)
It also sounds like there's an existing latent bug here? If an incoming record has commands for a device, and we also have commands for the same device, ISTM that our commands will be lost as we've overridden our client record with the incoming one? Eg, consider 2 desktops trying to send a tab to a 3rd device that hasn't synced for some time...
Blocks: 1250866
Review commit: https://reviewboard.mozilla.org/r/43607/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/43607/
Attachment #8732446 - Attachment description: MozReview Request: Bug 1250531 - Unconditionally sync the clients collection. r?markh → MozReview Request: Bug 1250531 - Only show existing remote clients in the Synced Tabs UI. r?markh
Attachment #8736956 - Flags: review?(markh)
Attachment #8732446 - Flags: review?(markh)
Comment on attachment 8732446 [details]
MozReview Request: Bug 1250531 - Only show existing remote clients in the Synced Tabs UI. r=markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/41187/diff/1-2/
Attachment #8732447 - Attachment is obsolete: true
Getting closer! https://dxr.mozilla.org/mozilla-central/rev/494289c72ba3997183e7b5beaca3e0447ecaf96d/services/sync/tests/unit/test_syncscheduler.js#784-837 still fails when run with the other tests, though not when I comment out the ones before it.

I wonder if it has anything to do with `"bar"` not being a valid record; i.e., if I need to write a "real" cleartext like `JSON.stringify({ id: ..., name: ...}` into the store. Though I am wiping the store at the end of each test, and it passes when run independently...so something else is afoot.
Comment on attachment 8736956 [details]
MozReview Request: Bug 1250531 - Unconditionally sync the clients collection. r=markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/43607/diff/1-2/
Comment on attachment 8732446 [details]
MozReview Request: Bug 1250531 - Only show existing remote clients in the Synced Tabs UI. r=markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/41187/diff/2-3/
Comment on attachment 8736956 [details]
MozReview Request: Bug 1250531 - Unconditionally sync the clients collection. r=markh

https://reviewboard.mozilla.org/r/43607/#review40211

This is looking great - just a few trivial tweaks to go!

::: services/sync/modules/engines/clients.js:167
(Diff revision 1)
> +    try {
> +      SyncEngine.prototype._processIncoming.call(this);
> +      // Since clients are synced unconditionally, any records in the local store
> +      // that don't exist on the server must be for disconnected clients. Remove
> +      // them, so that we don't upload records with commands for clients that will
> +      // never see them.

This comment could also mention that we simply don't want stale devices as their list of tabs is confusing.

::: services/sync/modules/engines/clients.js:176
(Diff revision 1)
> +        this._removeRemoteClient(staleID);
> +      }
> +    } finally {
> +      this._incomingClients = null;
> +    }
> +  }

missing a trailing comma here.

::: services/sync/modules/engines/clients.js:450
(Diff revision 1)
> +  },
> +
> +  _removeRemoteClient(id) {
> +    if (this._store._remoteClients[id]) {
> +      delete this._store._remoteClients[id];
> +      this._tracker.removeChangedID(id);

I'd be inclined to hoist this out of the if - ie, check if the id is in changedIDs even if it's *not* in _remoteClients.

::: services/sync/modules/engines/clients.js:473
(Diff revision 1)
>        this.engine.localCommands = record.commands;
> -    else
> +    else {
> +      let currentRecord = this._remoteClients[record.id];
> +      if (currentRecord && currentRecord.commands) {
> +        // Merge commands.
> +        for (let action of currentRecord.commands) {

ISTM that if we hit this case the ID should *also* already be in changedIDs (ie, it means we have new commands for the client so should upload them later.

I wonder if it is worth adding a check for this, and if it fails log an error message (ie, a kind of non-fatal assertion)?

IOW, I think it would imply a logic problem if the ID is not in changedIDs and we should know about it somehow.

Also, if it's not too much work, I think we should ensure test_clients tests this case (if I'm reading it correctly I think it doesn't)
Attachment #8736956 - Flags: review?(markh)
Comment on attachment 8732446 [details]
MozReview Request: Bug 1250531 - Only show existing remote clients in the Synced Tabs UI. r=markh

https://reviewboard.mozilla.org/r/41187/#review40221

Awesome, this looks great with the comma moved to the first patch!

::: services/sync/modules/engines/clients.js:180
(Diff revision 3)
>          this._removeRemoteClient(staleID);
>        }
>      } finally {
>        this._incomingClients = null;
>      }
> -  }
> +  },

this should be in the other patch :)
Attachment #8732446 - Flags: review?(markh) → review+
Comment on attachment 8736956 [details]
MozReview Request: Bug 1250531 - Unconditionally sync the clients collection. r=markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/43607/diff/2-3/
Attachment #8732446 - Attachment description: MozReview Request: Bug 1250531 - Only show existing remote clients in the Synced Tabs UI. r?markh → MozReview Request: Bug 1250531 - Only show existing remote clients in the Synced Tabs UI. r=markh
Attachment #8736956 - Flags: review?(markh)
Comment on attachment 8732446 [details]
MozReview Request: Bug 1250531 - Only show existing remote clients in the Synced Tabs UI. r=markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/41187/diff/3-4/
https://reviewboard.mozilla.org/r/43607/#review40211

> ISTM that if we hit this case the ID should *also* already be in changedIDs (ie, it means we have new commands for the client so should upload them later.
> 
> I wonder if it is worth adding a check for this, and if it fails log an error message (ie, a kind of non-fatal assertion)?
> 
> IOW, I think it would imply a logic problem if the ID is not in changedIDs and we should know about it somehow.
> 
> Also, if it's not too much work, I think we should ensure test_clients tests this case (if I'm reading it correctly I think it doesn't)

Sure. Looks like it's too late to check the tracker here, since we clear it out in `_syncStartup`, but we can check the engine's `_modified` dict instead. Seems reasonable?
Comment on attachment 8736956 [details]
MozReview Request: Bug 1250531 - Unconditionally sync the clients collection. r=markh

https://reviewboard.mozilla.org/r/43607/#review40261

Excellent, thanks - LGTM with the log tweak

::: services/sync/modules/engines/clients.js:477
(Diff revisions 2 - 3)
>          for (let action of currentRecord.commands) {
> -          if (!hasDupeCommand(record.cleartext.commands, action)) {
> -            record.cleartext.commands.push(action);
> +          if (hasDupeCommand(record.cleartext.commands, action)) {
> +            // We already know about this command.
> +            continue;
> +          }
> +          if (!this.engine._modified[record.id]) {

I don't think this needs to be in the loop - ie, just as part of the "if" above where we know we've got *some* commands - that alone should be enough for it to be modified (and IIUC, the code as written will log one message per command and ignore every one of them, which isn't really necessary). I also think there's no need to take any action (ie, your continue) for this case - just log it and continue as normal.
Attachment #8736956 - Flags: review?(markh) → review+
Comment on attachment 8736956 [details]
MozReview Request: Bug 1250531 - Unconditionally sync the clients collection. r=markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/43607/diff/3-4/
Attachment #8736956 - Attachment description: MozReview Request: Bug 1250531 - Unconditionally sync the clients collection. r?markh → MozReview Request: Bug 1250531 - Unconditionally sync the clients collection. r=markh
Comment on attachment 8732446 [details]
MozReview Request: Bug 1250531 - Only show existing remote clients in the Synced Tabs UI. r=markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/41187/diff/4-5/
https://reviewboard.mozilla.org/r/43607/#review40261

> I don't think this needs to be in the loop - ie, just as part of the "if" above where we know we've got *some* commands - that alone should be enough for it to be modified (and IIUC, the code as written will log one message per command and ignore every one of them, which isn't really necessary). I also think there's no need to take any action (ie, your continue) for this case - just log it and continue as normal.

Removed the logging entirely per our discussion on IRC, since it's possible for other clients in a mesh to pull down commands for clients that they haven't changed. For example: mobile sends a URL to desktop, then laptop (third device) syncs.
https://hg.mozilla.org/mozilla-central/rev/c7b00e87d644
https://hg.mozilla.org/mozilla-central/rev/1d4afa71e193
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 48
Blocks: 1264498
I've tested on Windows 7 64bit and Mac OS X 10.9.5 using latest Nightly 48.0a1 (buildID: 20160413030239) and I have the following results: 
- Desktop to Desktop - the problem isn't reproducible anymore: there is no duplicate devices.  
- Mobile to Desktop (meaning that you disconnect and reconnect on Mobile and then, on Desktop, you see the dupe) - the problem is still reproducible. There are the following bugs for mobile side related to this issue: bug 670474, bug 1224278, bug 1224278.
Similar issue to bug 1250866, comment 14.
See Also: → 1264498
I've tested on latest Aurora 47.0a2 (buildID: 20160418004027) and the issue on "Desktop to Desktop" side is still reproducible here. 

I consider it would be a good idea to uplift this to 47. What do you think?
Flags: needinfo?(rkothari)
Flags: needinfo?(kcambridge)
(In reply to Camelia Badau, QA [:cbadau] from comment #34)
> I've tested on latest Aurora 47.0a2 (buildID: 20160418004027) and the issue
> on "Desktop to Desktop" side is still reproducible here. 
> 
> I consider it would be a good idea to uplift this to 47. What do you think?

I don't really consider that necessary; this isn't a trivial patch and users don't disconnect and reconnect their accounts with regularity. Taking it to beta with only a few weeks to bake on -central (which is effectively what uplifting to 47 would do) seems a greater risk than the reward.
I agree with Mark. This is a change to how we sync clients, and introduced a regression early on (bug 1262021). I'd feel a lot more comfortable if this rode the trains.
Flags: needinfo?(kcambridge)
(In reply to Camelia Badau, QA [:cbadau] from comment #34)
> I've tested on latest Aurora 47.0a2 (buildID: 20160418004027) and the issue
> on "Desktop to Desktop" side is still reproducible here. 
> 
> I consider it would be a good idea to uplift this to 47. What do you think?

Based on the replies from Mark and Kit, we will let this one ride the trains. Thanks for posing the question though.
Flags: needinfo?(rkothari)
Thank you for your answers!
Whiteboard: [sync-data-integrity]
See Also: → 1238871
You need to log in before you can comment on or make changes to this bug.