Closed Bug 1303679 Opened 3 years ago Closed 3 years ago

Sync warns "Failed to reconcile incoming record places: Error: no item found for the given itemId"

Categories

(Firefox :: Sync, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 53
Tracking Status
firefox53 --- fixed

People

(Reporter: markh, Assigned: lina)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

On current tip, Sync is attempting to reconcile the places root when the server (erroneously) has that record on the server.

While we are more careful about *outgoing* records not under our roots, it seems that maybe we should also make additional checks when processing *incoming* records - eg, if an incoming record matches an existing GUID that's not under one of our roots, we should probably skip it completely.

Full entry I saw:

> 1474268054737	Sync.Engine.Bookmarks	WARN	Failed to reconcile incoming record places: Error: no item found for the given itemId (resource://gre/modules/PlacesUtils.jsm:2469:15) JS Stack trace: GuidHelper.getItemGuid</guid<@PlacesUtils.jsm:2469:15 < TaskImpl_run@Task.jsm:319:40 < Handler.prototype.process@Promise-backend.js:937:23 < this.PromiseWalker.walkerLoop@Promise-backend.js:816:7 < Promise*this.PromiseWalker.scheduleWalkerLoop@Promise-backend.js:747:11 < this.PromiseWalker.schedulePromise@Promise-backend.js:779:7 < this.PromiseWalker.completePromise@Promise-backend.js:714:7 < TaskImpl_run@Task.jsm:324:13 < waitForSyncCallback@async.js:98:7 < makeSpinningCallback/callback.wait@async.js:152:27 < promiseSpinningly@async.js:218:12 < GUIDForId@bookmarks.js:918:12 < createRecord@bookmarks.js:886:23 < SyncEngine.prototype._createRecord@engines.js:884:18 < _createRecord@bookmarks.js:409:18 < SyncEngine.prototype._reconcile@engines.js:1436:23 < SyncEngine.prototype._processIncoming/newitems.recordHandler@engines.js:1119:23 < set recordHandler/this._onProgress@record.js:643:9 < Channel_onDataAvail@resource.js:544:7 < waitForSyncCallback@async.js:98:7 < Res__request@resource.js:386:14 < Res_get@resource.js:415:12 < SyncEngine.prototype._processIncoming@engines.js:1152:18 < BookmarksEngine.prototype._processIncoming@bookmarks.js:389:7 < SyncEngine.prototype._sync@engines.js:1608:7 < WrappedNotify@util.js:166:21 < Engine.prototype.sync@engines.js:689:5 < _syncEngine@enginesync.js:213:7 < sync@enginesync.js:163:15 < onNotify@service.js:1327:7 < WrappedNotify@util.js:166:21 < WrappedLock@util.js:121:16 < _lockedSync@service.js:1317:12 < sync/<@service.js:1309:14 < WrappedCatch@util.js:95:16 < sync@service.js:1297:5
This is definitely fixable. It's trickier for the case where an incoming item is an orphan two or more levels deep.

One possibility is to handle orphans like Richard suggested on IRC: keep them in memory, then apply them at the end of the sync. If we know the ancestry then, we apply it if it's under one of our change roots; if not (in case of a partial write), we reparent under unfiled and set the anno.

Though it's not clear to me what to do if we eventually get the parent, and discover we should ignore the orphan. I guess we can delete the orphan locally, since it shouldn't have been uploaded or applied in the first place?
We know the Places root itself should never make it to the server, so I think it's safe in this special case to:

1. Issue an HTTP DELETE to the server right there and then.
2. Drop the record on the floor.

Note that we shouldn't issue the DELETE if we're in the middle of a download batch, of course!



More generally, if we get a record with a GUID that matches a non-tracked local record, it might:

* Be erroneously uploaded.
* Be a move of that record to a different folder.
* Be a tracked change for that record after its parent was moved to a tracked root.

Let's say you start with Folder A under a root that's not synced at all (let's say readinglist).

Perhaps some of those records are defaults, with a known GUID.

You make some changes to it -- you add some items, rename them, etc.

Sync will (and should) ignore all of those records.

Then on desktop you move that folder to Toolbar. It's no longer under a root that we don't track, so we should recursively track Folder A and all of its children, and upload those records.

Another client should process those changes. If it already had a Folder A under readinglist, it needs to move it!

Another example: on Client A I move a synced bookmark from Toolbar into a root that's not synced (again, let's say readinglist). On Client B I rename that bookmark. What happens?

Possibilities:

* Moving out of the synced set deletes it from the server. That won't affect Client B, but it will affect new clients. Or we do nothing at all. Regardless, Client A and Client B have diverged Toolbars. Client B's change uploads a record to the server, so A either moves the record back to Toolbar (oops) or it doesn't (oops).
* Moving it *deletes it* non-locally. That's data loss, but at least it's consistent!

What happens if I then say "oops" and drag that bookmark back out of readinglist? Will I lose the change I made on Client B?



These situations illustrate why we historically avoided syncing only some roots, or syncing them to different collections -- it's hard to have sane semantics if items can appear and disappear from the synced set on different devices!
To hop back to making useful suggestions: the only sane way to have syncable and non-syncable parts of the bookmark tree is to _never allow moves between those sets_.

So left-pane queries cannot be synced, ever. The places root itself cannot be synced. And bookmarks, most importantly, can never be moved out of the five synced sub-roots -- a user should never be able to put a folder or a bookmark in the left pane parent.

If we can ensure that, _and we can guarantee that all clients should agree on which trees are ignored_, then we can declare that no record from those trees should make it to the server, and thus DELETE them and drop them on the floor.
Depends on: 1303825
Priority: -- → P2
Assignee: nobody → kcambridge
Priority: P2 → P1
Assignee: kcambridge → nobody
Flags: needinfo?(markh)
(In reply to Richard Newman [:rnewman] from comment #2)

> Let's say you start with Folder A under a root that's not synced at all
> (let's say readinglist).

Desktop doesn't have a "readinglist" root. IIUC Android will not Sync any such record either. Is there an example of the above we might actually see in practice, or is this just a theoretical concern to deal with a possible future?

> Another example: on Client A I move a synced bookmark from Toolbar into a
> root that's not synced (again, let's say readinglist). On Client B I rename
> that bookmark. What happens?

Again, I don't see how this can happen on desktop, as the desktop UI doesn't allow it.

(In reply to Richard Newman [:rnewman] from comment #3)

> If we can ensure that, _and we can guarantee that all clients should agree
> on which trees are ignored_, then we can declare that no record from those
> trees should make it to the server, and thus DELETE them and drop them on
> the floor.

Right - but what scenarios exist today where that is not the case? And what harm would happen if desktop simply ignored any incoming bookmarks it can detect are outside of those roots? It seems far better than not ignoring them - which happens today and causes bad things to happen with those left-pane items (which we know categorically *are* on the server for every single user for which Sync was configured before they opened the bookmarks UI, and we also know categorically causes bad things to happen when we see them as incoming)

IOW, it seems the above is trading off a theoretical concern we've never seen in practice, at the cost of fixing an actual problem we regularly see in practice.

> These situations illustrate why we historically avoided syncing only some roots, or
> syncing them to different collections -- it's hard to have sane semantics if items
> can appear and disappear from the synced set on different devices!

Right - and we can all see how well that worked out :)

Why can't we immediately just have desktop ignore any items we know for sure aren't in those roots? That would obviously pick up the root itself, and any left-pane items for which the path to the root remains. This will cause an objectively better outcome for those items while not causing additional problems for items where that path can't be determined. I'm happy to skip deleting the records if you believe other platforms will suffer (but I'm interested in non-theoretical scenarios where this is the case)
Flags: needinfo?(markh) → needinfo?(rnewman)
(In reply to Mark Hammond [:markh] from comment #4)

> Desktop doesn't have a "readinglist" root. IIUC Android will not Sync any
> such record either. Is there an example of the above we might actually see
> in practice, or is this just a theoretical concern to deal with a possible
> future?

Desktop sure does have a Reading List root. My places.sqlite today:

sqlite> select id, title, guid from moz_bookmarks where parent = 1;
id          title       guid
----------  ----------  ------------
4941                    uTqj0oRQ_Lck
5065                    iHjZXah7uyyG
4743        Reading Li  readinglist
2           Bookmarks   menu________
3           Bookmarks   toolbar_____
4           Tags        tags________
5           Other Book  unfiled_____
474         mobile      UjAHxFOGEqU8
4364                    BxfRgGiNeITG

(Once code ships that touches a database, or syncs, it never goes away…)

But I mostly intended it as a hypothetical. The set of roots has changed at least twice (mobile, reading list), and there are other structures that are roots if you squint.


> > Another example: on Client A I move a synced bookmark from Toolbar into a
> > root that's not synced (again, let's say readinglist). On Client B I rename
> > that bookmark. What happens?
> 
> Again, I don't see how this can happen on desktop, as the desktop UI doesn't
> allow it.

There are lots and lots of things that apparently desktop's UI doesn't allow, but that we see in bug reports from users…

Even without add-ons, there are always bugs and unanticipated uses of UI elements.


> > If we can ensure that, _and we can guarantee that all clients should agree
> > on which trees are ignored_, then we can declare that no record from those
> > trees should make it to the server, and thus DELETE them and drop them on
> > the floor.
> 
> Right - but what scenarios exist today where that is not the case?

I was just stating the conditional, not making a firm statement about whether the left side is known to be true or false…

If we never allow in/out moves, and we never track changes in stuff we've declared non-syncable, then we can ignore bad incoming records (and DELETE them).

The thing I'm most doubtful about is whether we can totally prevent in/out moves, or whether we need vestigial code in place to track out-moves as deletions and in-moves as creations to avoid future corruption.

I would not be surprised to see a Firefox feature ship that adds a root, allows moves, and has no involvement from anyone who knows about Sync… 

> And what
> harm would happen if desktop simply ignored any incoming bookmarks it can
> detect are outside of those roots? It seems far better than not ignoring
> them - which happens today and causes bad things to happen with those
> left-pane items (which we know categorically *are* on the server for every
> single user for which Sync was configured before they opened the bookmarks
> UI, and we also know categorically causes bad things to happen when we see
> them as incoming)

I think we should refuse to apply them, certainly. I think we should also delete (some of the) records from the server, because they shouldn't be there.

The risk/harm/cost is that it makes future change of the synced set more difficult. (Bug 762118 et al.)


> Why can't we immediately just have desktop ignore any items we know for sure
> aren't in those roots?

Once you ship a client that doesn't _preserve_ all the records that might be uploaded -- either passively not being able to reupload them after a node reassignment, or actively cleaning them up because it thinks they shouldn't exist -- then you end up having to be aware of that behavior until you ship an incompatible storage format version.

Doing so is essentially a bet that the set of supported roots is fixed for Storage Format 5 (and we should write docs accordingly).

So I see nothing wrong with actively deleting the left pane items, and the Places root, because we're 99.9999% sure we'll never want to sync those. Anything else, drop on the floor (minimize future impact if we're wrong), and hope the set of roots never changes.
Flags: needinfo?(rnewman)
Oh, and if you have a solid way to recognize known-bad records -- the places root, the left pane items, and anything else -- please file an iOS bug to have us discard those and drop them on the floor when we see them!
Assignee: nobody → kcambridge
Comment on attachment 8806209 [details]
Bug 1303679 (WIP) - Periodically clean up items that shouldn't be on the server.

https://reviewboard.mozilla.org/r/89696/#review89462

::: services/sync/modules/engines/bookmarks.js:548
(Diff revision 1)
>          delete this._store._childrenToOrder;
>        }
>      }
>    },
>  
> +  _cleanupServer() {

I haven't thought too much about this, but there seems a (hopefully small) risk that this might cause "damage" WRT the validator - particularly with the deletion of non-syncable roots making other (also non-syncable, but possibly not known on this client) items orphans.

So at a minimum, I think we probably want the fact that we did this recorded in the sync ping.

To reduce the risk, I wonder if in the early days we should also tie this to the validator - ie, in an ideal world, ISTM we might run a validation, perform this repair, then re-run the validator. We might even be able to examine the initial validation results to check we aren't creating orphans.

The upside if this approach is that we could analyse the telemetry for us mkaing things worse. The downside is that validation is expensive, so ultimately we would want the ability to do this without running the validator - but it might still make sense in the short term to allow us to get some confidence in the approach.

Also: just to step back a little, would it make sense to have the "repair" aspects of this patch in a different bug, and in the short term check incoming records against the set returned by fetchUnsyncedItemSyncIds() and simply ignore them? I understand that will not cover all cases (as this client may not be aware an incoming item (other than roots) should be unsyncable), but it might still be worthwhile.

As I said, I haven't really thought about this too much, but it seems worth having a discussion on.

Kit, Thom, WDYT?
(In reply to Mark Hammond [:markh] from comment #8)
> I haven't thought too much about this, but there seems a (hopefully small)
> risk that this might cause "damage" WRT the validator - particularly with
> the deletion of non-syncable roots making other (also non-syncable, but
> possibly not known on this client) items orphans.

It's definitely possible. Clients can only delete items they knows about. If the server has non-syncable root A with children B and C, and client 1 only knows about A and B, then C will be orphaned. It's an orphan that shouldn't have been synced, anyway, but the "fix" corrupted the tree on the server. Oops.

> Also: just to step back a little, would it make sense to have the "repair"
> aspects of this patch in a different bug, and in the short term check
> incoming records against the set returned by fetchUnsyncedItemSyncIds() and
> simply ignore them?

Makes sense. I agree it's tricky to ignore as we're applying, because we might not realize that we should've ignored an entire subtree until we've applied its descendants. For example, the left pane root (IIRC) has 3 direct children, a folder child, and 3-4 grandchildren. We might see the grandchildren first, move them to unfiled, then apply the folder and reparent the grandchildren, and then apply the root.

I wonder if we could do consistency checks afterward. Fetch all non-syncable items, compare them to the items we applied during the last sync, and locally delete the intersection of the two. I haven't thought about safety issues with that, though. It's possible for an interrupted sync to throw a wrench in that.

I also agree that roots are a good starting point, and that this is absolutely worth discussing more.
(In reply to Kit Cambridge [:kitcambridge] from comment #9)
> It's an orphan that shouldn't
> have been synced, anyway, but the "fix" corrupted the tree on the server.
> Oops.

And, since we reparent orphans to unfiled, it'll be very visible corruption if you connect another client. In particular, grandchildren of deleted items (like the "Bookmarks Toolbar", "Bookmarks Menu", "Other Bookmarks" queries, and tags) will make a mess.
(Pushing up my changes for completeness).
FWIW, our friend "readinglist" shows up in the logs for bug 1313985.
Re deleting: two bits of nuance. 

If you delete, delete whole subtrees: by removing just the root, you eliminate the ability of a smart client to ignore a bad subtree!

When you delete, you can either send deleted=true, which might damage older clients, or HTTP DELETE, which leaves junk in buffers. The latter is safest.
(In reply to Richard Newman [:rnewman] from comment #14)
> If you delete, delete whole subtrees: by removing just the root, you
> eliminate the ability of a smart client to ignore a bad subtree!
> 
> When you delete, you can either send deleted=true, which might damage older
> clients, or HTTP DELETE, which leaves junk in buffers. The latter is safest.

Totally agreed on both points.

A complication is, "what if we don't know the whole subtree?" In that case, deleting items that shouldn't be on the server might make things worse. A hypothetical future client that buffered everything before applying could determine (assuming it had a complete subtree) whether to ignore an item. If not, it could resort to heuristics.

OTOH, if the subtree contained items we didn't know about, they would become orphans. If the left pane queries made it to the server, we end up with circular-looking queries in unfiled on other devices (bug 1297234). If any tags made it up, we end up with weird bookmarks that have duplicate URLs and no titles. If the reading list made it up, I guess we'd end up with those entries in unfiled, too. Definitely need heuristics here.
See Also: → 1309255
(In reply to Kit Cambridge [:kitcambridge] from comment #15)

> A complication is, "what if we don't know the whole subtree?" In that case,
> deleting items that shouldn't be on the server might make things worse.

Yes, a client must know the whole state of the server and use XIUS. That is, the client is deleting the subtree that's on the server, not whatever similar subtree it has locally.

(Obviously, deleting the shouldn't-be-there places root is an exception to all this.)
Attachment #8806209 - Attachment is obsolete: true
Comment on attachment 8809182 [details]
Bug 1303679 - Remove Places roots, reading list items, and pinned sites from the Sync server.

https://reviewboard.mozilla.org/r/91802/#review92948

Looks great, sorry for the delay.
Attachment #8809182 - Flags: review?(markh) → review+
Pushed by kcambridge@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1927e94b7a17
Remove Places roots, reading list items, and pinned sites from the Sync server. r=markh
https://hg.mozilla.org/mozilla-central/rev/1927e94b7a17
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 53
Blocks: 1323613
You need to log in before you can comment on or make changes to this bug.