Open Bug 1230011 Opened 9 years ago Updated 2 years ago

Restoring bookmarks while Sync is configured will cause bad things to happen.

Categories

(Firefox :: Sync, defect, P3)

defect

Tracking

()

People

(Reporter: markh, Unassigned)

References

Details

Attachments

(1 file)

<markh> rnewman: I'm wondering what bad things might happen if wipeServer() fails to work - eg, bookmarks.js observes "bookmarks-restore-success" and calls wipeServer() - if we are offline when that happens, how bad would things get?
2:08 PM <markh> I could imagine duplicates
2:13 PM <•rnewman> terrible, terrible things
2:14 PM <•rnewman> we will mash the server contents into the new local bookmark set
2:14 PM <•rnewman> then upload all of them
2:14 PM <•rnewman> you probably restored because your bookmarks were fucked
2:14 PM <•rnewman> so now they'll be doubly fucked
2:21 PM <markh> heh - yeah

We should fix that. A trivial option might be to persist the fact we tried to do a wipeServer, but that runs the risk of other devices having synced before we were actually able to complete the wipe causing different bad things to happen.
Flags: firefox-backlog+
Careful, though.

wipeServer is flawed. You might need to do more work.

What it does is issue a DELETE to the server, discarding all stored records. Then it resets the local client, which will cause a full upload of records.

It doesn't specify X-If-Unmodified-Since.

It doesn't cause other clients to do a full download -- that would require changing meta/global. Service.freshStart does this, but not per-collection wipes.

Nor does it cause other clients to reupload their own records! They'll presumably redownload anything the wiping client uploads, and reupload after conflicts, but that could be worse -- creating orphans or who knows what else. Races are possible here.


If we queue, I'm particularly concerned by the lack of XIUS. What you *don't* want to happen is that a user restores their bookmarks from a backup while on the plane, puts that laptop to sleep, then opens it a day later and trashes the server.

Thinking outside the box, you might consider a different approach.

Similar to Bug 1162778/Bug 1225224, introduce a synchronous dialog. If you try to restore bookmarks, warn the user that it'll reset their bookmarks on the server and merge bookmarks on other devices, then issue the wipe, sync bookmarks, and then upload a changed meta/global.

You could also introduce a Reset Sync style step here: when you restore the backup, do you want to wipe other devices, merge, or what?
> Similar to Bug 1162778/Bug 1225224, introduce a synchronous dialog. If you
> try to restore bookmarks, warn the user that it'll reset their bookmarks on
> the server and merge bookmarks on other devices, then issue the wipe, sync
> bookmarks, and then upload a changed meta/global.

Sorry, I was unclear. The key point here is that you do the server wipe before you do the restore.
Depends on: 1241699
Filed bug 1241699 to collect telemetry on this event.
Priority: -- → P3
Bookmarks will be auto-restored if places.sqlite doesn't exist or is corrupt at startup. Further, if a restore happens while online but before Sync has initialized, Sync will also do the wrong thing. Note that all these restore operations assign a new GUID to the newly imported items.

Interestingly, and on a "simple" profile that had never synced with a second device, I deleted places.sqlite and synced. Thom's fancy validator (obviously) reports that every record on the server is missing from the client, and every record on the client is missing from the server, and Sync didn't notice or care :) However, if I manually flipped the lastSync time for bookmarks and re-synced, Sync did largely the right thing - it found all duplicates and re-wrote their GUIDs, after which the validator found zero problems.

In this auto-recover scenario, this is exactly what we want to happen - local IDs are switched to the server copy and records that exist on the server but a local dupe can't be found are added. The wipeServer behaviour discussed above would be the *wrong* thing to do (I think :)

But yeah, if the user *chose* to restore" bookmarks, it's reasonable to assume the user decided something like "I want my old bookmarks to be gone, these are my new ones", wiping the server and populating it with the new GUIDs. Hopefully second devices do the "right thing" (whatever that actually is :)

So it sounds like we could have 2 bugs here:

* We should be told about an auto-restore somehow and treat that as though the engine has never synced. That might be as simple as having Places write a services.sync.something pref when it does this, and the engine treating this pref as lastSync=0.

* Keep this bug for the user-invoked restore, and possibly do something smarter than what already exists.

Note also that Kit's current "tracker" work is likely to help solve this too, but the details of that are sketchy and almost certainly isn't going to hit 49. Handling "auto-restore" sounds easy and *could* make 49, which is worthwhile IMO.

Richard, what do you think?
Flags: needinfo?(rnewman)
Summary: Restoring bookmarks while Sync is configured but currently offline will cause bad things to happen. → Restoring bookmarks while Sync is configured will cause bad things to happen.
IIRC, Places should now be storing and restoring GUIDs in backups. If it doesn't, I'd argue that it should. That would resolve some of the work you note.

Marco?

I agree that on auto-recovery we should do a fresh sync to make sure we didn't accidentally roll back by a week and 'miss' a bunch of changes that we already saw. That's the approach we take on iOS, too.

And if you choose to restore bookmarks, we probably want to ask you what you want to do, because it's hard to tell:

* Stop syncing bookmarks and use this set.
  * ... or disconnect from your Firefox Account.
* Merge these restored bookmarks into your synced bookmarks.
* Replace your bookmarks on other devices with the bookmarks you're about to restore.

Picking any one particular avenue might cause unrecoverable dataloss on other devices, which would be sad.
Flags: needinfo?(rnewman) → needinfo?(mak77)
(In reply to Richard Newman [:rnewman] from comment #5)
> IIRC, Places should now be storing and restoring GUIDs in backups. If it
> doesn't, I'd argue that it should. That would resolve some of the work you
> note.
> 
> Marco?

I just found bug 967204 and it looks like it has traction - I'll drop the needinfo for mak - but best I can tell, it doesn't actually change the amount of work here - Sync seemed to recover fairly well even with the different GUIDs - it will do an even better job with the GUIDs restored, but I don't think that needs to block this.

> I agree that on auto-recovery we should do a fresh sync to make sure we
> didn't accidentally roll back by a week and 'miss' a bunch of changes that
> we already saw. That's the approach we take on iOS, too.

Great.

> And if you choose to restore bookmarks, we probably want to ask you what you
> want to do, because it's hard to tell:

Yes, agreed, the "user choosing to restore" is a UX nightmare, but auto-restore seems easy from this POV (ie, no UX required, no guessing about what action to take). I opened bug 1274896 for this.
Flags: needinfo?(mak77)
(In reply to Richard Newman [:rnewman] from comment #1)
> Similar to Bug 1162778/Bug 1225224, introduce a synchronous dialog. If you
> try to restore bookmarks, warn the user that it'll reset their bookmarks on
> the server and merge bookmarks on other devices, then issue the wipe, sync
> bookmarks, and then upload a changed meta/global.

This is a good idea. As you point out in comment 5, users might want to restore their bookmarks for different reasons, and guessing at their intent is fraught. Restore is also tucked away in the organizer UI, and we don't expect it to be used frequently, so an additional dialog doesn't seem onerous here.
Depends on: 1258127
Priority: P3 → --
We probably won't get to this in the next few iterations, but it is important.
Flags: needinfo?(kit)
Priority: -- → P3
Flags: needinfo?(kit)
Priority: P3 → P2
Assignee: nobody → tchiovoloni
Status: NEW → ASSIGNED
Comment on attachment 8844098 [details]
Bug 1230011 - Remember when a wipeServer call fails and retry the wipe on later syncs.

https://reviewboard.mozilla.org/r/117630/#review119484

Sadly I think this might get a bit messy and a fair bit more work:

* We probably need the same basic mechanism for resetClient now that the bookmarks tracker tries to do SQL - I can imagine a slow machine being shutdown during that resetClient, meaning (a) the local bookmarks aren't in the right state and (b) we didn't even get to the wipeServer call, so we won't record we need to do a wipe.
* It seems there's a risk that we will actually be syncing when the restore happens, which would be bad, so we should consider the incoming and outgoing loops both aborting as soon as we flag we need either the reset or the wipe. engines.js already maintains an "aborting" variable we can probably leverage for this.

Best I can tell, bookmarks is the only engine that might try and do this kind of reset mid-sync, so it might be possible to store 2 flags per engine - one for needsReset, one for needsWipe (or one flag with multiple meanings, whatever), and have the bookmarks observer just set these flags and bump the score for the engine. The loop in engines.js where "aborting" is tracked could look at these flags, and we could rely on the next sync to do the actual wipe and reset (and once bug 1335891 lands, we should find the next sync happens immediately)

I also think we need to reproduce bug 1343365 - it seems unlikely to me that when Kanchan did those steps we failed to wipe the server due to a simple network error, so it's not clear to me how validation errors were reported (and unfortunately, the logs she pasted doesn't seem to include that restore etc) - the entire point of this bug is to ensure that scenarios like that are handled somewhat correctly (and don't cause garbage to be written to the server), so we need to understand what is happening in that scenario and that this bug solves it.

Finally, and painfully, I'm fairly sure that if places.sqlite is corrupt, it will be replaced and an observer sent before the sync modules have loaded, which I mentioned briefly in comment 4. We need to somehow handle that scenario too.

::: services/sync/modules/service.js:1201
(Diff revision 1)
>      // them...
>      this.generateNewSymmetricKeys();
>    },
>  
> +  get pendingServerWipe() {
> +    return Svc.Prefs.get("pendingServerWipe", "[]");

doesn't this need a JSON.parse? I think we need tests for this.

::: services/sync/modules/stages/enginesync.js:94
(Diff revision 1)
>      if (!(this.service._remoteSetup(info))) {
>        this.onComplete(new Error("Aborting sync, remote setup failed"));
>        return;
>      }
>  
> +    let pendingWipe = this.service.pendingServerWipe;

what happens if we fail here too? I guess that will mean we do this sync, but the following sync will then do a wipe - but the client isn't going to be reset in that case.

I think we probably want to abort syncing in this case if it fails, to ensure the client state hasn't progressed before we actually succeed.
Attachment #8844098 - Flags: review?(markh)
Comment on attachment 8844098 [details]
Bug 1230011 - Remember when a wipeServer call fails and retry the wipe on later syncs.

This was meant to be published yesterday, but I forgot to hit enter on the "publish review request? [Yn]" button. It includes tests and fixes the missing parse, but doesn't address the other concerns you mentioned, so I've cleared your review flag
Attachment #8844098 - Flags: review?(markh)
(In reply to Mark Hammond [:markh] from comment #11)

> * It seems there's a risk that we will actually be syncing when the restore
> happens, which would be bad, so we should consider the incoming and outgoing
> loops both aborting as soon as we flag we need either the reset or the wipe.
> engines.js already maintains an "aborting" variable we can probably leverage
> for this.

You might consider instead finishing any existing sync, and use flags to control restore/reset within the scope of a sync.

I think you can also directly check whether a sync is ongoing during bookmark import -- perhaps grey out those ops if a sync is ongoing? (And take the sync lock when importing!)
(In reply to Richard Newman [:rnewman] from comment #14)
> (In reply to Mark Hammond [:markh] from comment #11)
> 
> > * It seems there's a risk that we will actually be syncing when the restore
> > happens, which would be bad, so we should consider the incoming and outgoing
> > loops both aborting as soon as we flag we need either the reset or the wipe.
> > engines.js already maintains an "aborting" variable we can probably leverage
> > for this.
> 
> You might consider instead finishing any existing sync,

Yeah - I was initially thinking "there's no real point if we are about to wipe", but I guess in a worst-case, we have an incoming batch which might leave local strangeness behind if we aborted mid way.

> and use flags to
> control restore/reset within the scope of a sync.

Can you elaborate on this?

> I think you can also directly check whether a sync is ongoing during
> bookmark import -- perhaps grey out those ops if a sync is ongoing?

I don't think we can rely on the menu being gray, as the user then needs to choose a file, by which time we might be off again.

> (And
> take the sync lock when importing!)

Although that still implies handling failure to lock (eg, sync just started anyway), which seems tricky to handle well without additional UI, so we'll need to think this through a little (and maybe additional UI is fine here - eg, a simple window saying something like "waiting for sync to complete [cancel]" or similar)
See Also: → 1343365
From triage: assigning to Mark on request.
Assignee: tchiovoloni → markh
(In reply to Richard Newman [:rnewman] from comment #1)
> Careful, though.

Thanks for the detail - some clarifications below:

> What it does is issue a DELETE to the server, discarding all stored records.
> Then it resets the local client, which will cause a full upload of records.

Best I can tell, wipeServer doesn't actually reset the local client, which seems why bookmarks.js handling the restore event does:

>        this.engine.service.resetClient([this.name]);
>        this.engine.service.wipeServer([this.name]);
>        this.engine.service.clientsEngine.sendCommand("wipeEngine", [this.name],
>                                                      null, { reason: "bookmark-restore" });

and the wipeEngine command will wind up doing the remote resetClient. So ISTM that the process for bookmark restore generally does the right thing (which obviously doesn't always happen, hence this bug)

> It doesn't cause other clients to do a full download -- that would require
> changing meta/global. Service.freshStart does this, but not per-collection
> wipes.
>
> Nor does it cause other clients to reupload their own records! They'll
> presumably redownload anything the wiping client uploads, and reupload after
> conflicts, but that could be worse -- creating orphans or who knows what
> else. Races are possible here.

As above, I think the remote command should arrange the above. Would another alternative be to write new keys for that collection? (hrm - although that doesn't cause the local wipe that the wipeEngine command causes, which we want here)

> If we queue, I'm particularly concerned by the lack of XIUS. What you
> *don't* want to happen is that a user restores their bookmarks from a backup
> while on the plane, puts that laptop to sleep, then opens it a day later and
> trashes the server.

hrmph - that's a good point. A counter-concern is that we sync, then some other device syncs, then we see the bookmark restore - we'd still want to wipe in that case IMO. It does however seem somewhat of an edge-case, and the chaos that would cause is better than an obviously wrong wipe. Currently we *do* wipe in that case and adding XIUS would mean we don't.

We could possibly mitigate this by (a) store the lastModified from our last sync as the XIUS to use, (b) immediately try and get a current lastModified from the server and store that ignoring errors, and having the new code use whatever we ended up with.

> Thinking outside the box, you might consider a different approach.
> 
> Similar to Bug 1162778/Bug 1225224, introduce a synchronous dialog. If you
> try to restore bookmarks, warn the user that it'll reset their bookmarks on
> the server and merge bookmarks on other devices, then issue the wipe, sync
> bookmarks, and then upload a changed meta/global.
> 
> You could also introduce a Reset Sync style step here: when you restore the
> backup, do you want to wipe other devices, merge, or what?

I agree that's probably where we should end up, but that's unlikely to happen any time soon (and doesn't actually solve the underlying complexity here - they could still be offline when the dialog is accepted).

We have an implementation that handles the happy case of no network/shutdown errors - it's not ideal, but it's probably safe to assume that the majority of times a user restored bookmarks we take that happy path. We're also fairly sure things go bad if that happy path isn't taken.

Thus, I'm leaning towards this bug trying to ensure that happy-path gets hit with the queued approach and I agree XUIS should be done. This would mean we've hardened the network/shutdown case at the cost of introducing a new failure case when XUIS fails. I think that's the correct trade-off (network/shutdown seems likely, other devices shouldn't be writing bookmarks at the same instant my device is doing a restore).

Thoughts?
Flags: needinfo?(rnewman)
(In reply to Mark Hammond [:markh] from comment #17)

> Best I can tell, wipeServer doesn't actually reset the local client, which
> seems why bookmarks.js handling the restore event does:

Depends which wipeServer you're talking about! :D

Engine.wipeServer does reset the client:

engines.js:
1846  wipeServer() {
1847    let response = this.service.resource(this.engineURL).delete();
1848    if (response.status != 200 && response.status != 404) {
1849      throw response;
1850    }
1851    this._resetClient();


Service.wipeServer does not.

(Service.wipeRemote does, and also wipes other clients.)

I think for bookmark restore the engine pokes the service directly, so you're right.


> >        this.engine.service.resetClient([this.name]);
> >        this.engine.service.wipeServer([this.name]);
> >        this.engine.service.clientsEngine.sendCommand("wipeEngine", [this.name],
> >                                                      null, { reason: "bookmark-restore" });
> 
> and the wipeEngine command will wind up doing the remote resetClient. So
> ISTM that the process for bookmark restore generally does the right thing
> (which obviously doesn't always happen, hence this bug)

Assuming that the remote client has a client record on the server, and we've synced in this browser session, and that the remote client completes the wipeEngine/wipeClient flow, which involves successfully downloading and decrypting a record (!).

If the other device hasn't recently synced, it won't have a client record to send a command to. sendCommand doesn't check for new remote clients, nor does it buffer for clients it doesn't know about.

And if we haven't synced yet, we'll have no _remoteClients (fetched when syncing clients!), and we won't send any commands at all.

There are lots of places that a command-based flow can fall down.


> As above, I think the remote command should arrange the above. Would another
> alternative be to write new keys for that collection? (hrm - although that
> doesn't cause the local wipe that the wipeEngine command causes, which we
> want here)

As noted at the end of Comment 2, wipeServer + flipping syncID might be what you want -- a full merge of all clients against a blank server.

One doesn't necessarily want a remote wipe as a result of a restore. If the user does want to replace others, we should probably fetch the full client list, or buffer the command for a little while. We don't necessarily want to buffer forever, because if you connect a device a month later, you might be really surprised if its bookmarks disappear before your eyes…


> Thus, I'm leaning towards this bug trying to ensure that happy-path gets hit
> with the queued approach and I agree XUIS should be done. This would mean
> we've hardened the network/shutdown case at the cost of introducing a new
> failure case when XUIS fails. I think that's the correct trade-off
> (network/shutdown seems likely, other devices shouldn't be writing bookmarks
> at the same instant my device is doing a restore).
> 
> Thoughts?

Hardening is a good thing, I think. I think it's also worth syncing down the clients collection before trying to send a command to all clients…
Flags: needinfo?(rnewman)
Priority: P2 → P3
Assignee: markh → nobody
Status: ASSIGNED → NEW
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: