Track bookmark sync changes in Places

RESOLVED FIXED in Firefox 53

Status

()

Firefox
Sync
P1
normal
RESOLVED FIXED
a year ago
a month ago

People

(Reporter: rnewman, Assigned: kitcambridge)

Tracking

(Blocks: 5 bugs)

unspecified
Firefox 53
Points:
---
Dependency tree / graph
Bug Flags:
firefox-backlog +

Firefox Tracking Flags

(firefox53 fixed)

Details

(Whiteboard: [sync-tracker])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(8 attachments, 6 obsolete attachments)

58 bytes, text/x-review-board-request
mak
: review+
markh
: review+
rnewman
: review+
Details | Review
58 bytes, text/x-review-board-request
mak
: review+
rnewman
: review+
Details | Review
58 bytes, text/x-review-board-request
markh
: review+
Details | Review
58 bytes, text/x-review-board-request
mak
: review+
rnewman
: review+
Details | Review
58 bytes, text/x-review-board-request
markh
: review+
Details | Review
58 bytes, text/x-review-board-request
tcsc
: review+
Details | Review
58 bytes, text/x-review-board-request
rnewman
: review+
markh
: review+
Details | Review
58 bytes, text/x-review-board-request
mak
: review+
rnewman
: review+
Details | Review
(Reporter)

Description

a year ago
Shower thoughts. This is similar to Bug 1235269.

We persist changed IDs to a JSON file after one second, assuming the event loop isn't busy.

* Open Firefox.
* Bookmark a page with the star. It'll add a record to Unfiled Bookmarks, modifying two IDs (the new bookmark and the parent).
* A sync will immediately occur because of the score bump. Now the server has both records.
* Relaunch Firefox so that the next sync takes a while.

Either wait for Sync to init, or don't, then:

* Delete that bookmark by hitting the star again, then immediately quit the browser.


Results:

* If you didn't wait for Sync to init, we miss the change altogether, because we're not listening for changes. That's a known issue, but I don't have the bug number to hand.

* If you did wait, then we saw the change, but because of the long FxA, token, i/c, m/g, c/k dance on a first sync, our score-triggered bookmark sync didn't get a chance to run. The deletion and the changed parent _did not make it to the server_.

* Because you quit within one second, the tracked IDs also didn't make it to disk. The only thing that _did_ make it to disk was the deletion from the Places database!

The server now has one model of our bookmarks, Places has another, and Sync didn't really see either.


If you subsequently bookmark another Unfiled Bookmark, the new server record for the folder will no longer contain the locally-deleted child… but the server still has the record for that child, orphaned forever.


This is one particular manifestation of a fundamental problem with Weave/Sync: that change tracking is divorced from storage itself. There are probably other ways beyond the three we know of (ignoreAll, quit-too-quick, Sync not inited) in which we fail to correctly track all changes for subsequent syncing.

There are two approaches to fixing this:

1. Try to make change tracking better. We could avoid ignoring by threading identifiers around. We could persist to disk immediately. This feels like whack-a-mole.
2. Have data sources take responsibility for tracking changes.

The latter is what we do on Android and iOS; it's impossible for those platforms to miss changes. Probably the only reason we don't do it on desktop is because Weave was built as an add-on layer on top of Firefox.

On desktop it'd look something like this:

* Add a SYNC_STATUS column to whatever we sync.
* New items start in NEW. Changed items are marked as CHANGED. After a sync we mark items as SYNCED.
* When Sync makes changes to Places, we need some way to identify ourselves.
* Deletions are marked with a sentinel.
* The entire BookmarkTracker concept goes away, with the exception of score bumping.
(In reply to Richard Newman [:rnewman] from comment #0)
> 2. Have data sources take responsibility for tracking changes.
> 
> The latter is what we do on Android and iOS; it's impossible for those
> platforms to miss changes. Probably the only reason we don't do it on
> desktop is because Weave was built as an add-on layer on top of Firefox.
> 
> On desktop it'd look something like this:
> 
> * Add a SYNC_STATUS column to whatever we sync.
> * New items start in NEW. Changed items are marked as CHANGED. After a sync
> we mark items as SYNCED.
> * When Sync makes changes to Places, we need some way to identify ourselves.
> * Deletions are marked with a sentinel.
> * The entire BookmarkTracker concept goes away, with the exception of score
> bumping.

This certainly sounds like the correct approach and we should probably bite the bullet. There's obviously a fair bit of devil in the detail though.

Random thoughts:

* At first glance it seems annotations could almost solve this - however, we will want these states managed by places itself, and annotations seem like far more work than a new field.

* It also looks like triggers might be able to help us here - but it seems like we'd need triggers against multiple tables, so things like annotation changes flags the item as modified. We'd probably want a new method to set the field to "SYNCED", so the trigger would need to exclude that state (ie, IIUC, with a |WHEN new.syncState != "SYNCED"| or similar.

* We'd need a new method for explicitly adjusting this new field to SYNCED, and presumably we'd need this wrapped in a transaction (so we either update everything or nothing when applying the changes and this new field), but the transaction support in places is quite opaque to me.

* I'm not yet clear on bookmark moves - we *might* be able to get away without touching the new parent node, but ISTM we'd almost certainly need to touch the old parent.

* deletion of records sounds like its own mini-category. Assuming there's reluctance to just flag items as deleted, it might be possible to create a new table to store deleted IDs just for Sync. It seems Sync could clear this table after processing it.

Mak, what thoughts do you have here? Do you think we could create a new field on moz_bookmarks and have triggers update it automagically? How do you feel about tracking deletions? Are there additional complications or alternatives you see?
Flags: needinfo?(mak77)
(Reporter)

Comment 2

a year ago
> * It also looks like triggers might be able to help us here…

I tend to have a visceral distrust of triggers for this kind of stuff, but OTOH I trust mak to have a more useful opinion than mine!

On the plus side, status fields don't change that much:

* Default to NEW on insert.
* Flip the status to CHANGED (unless NEW) on every modification, including the parents as appropriate.
* Have Sync flip it to SYNCED.

Logic for the second is quite easy:

https://github.com/mozilla/firefox-ios/blob/824401559aebbb1a414f12f14315bec8a54117de/Storage/SQL/SQLiteLogins.swift#L502-L502


> * We'd need a new method for explicitly adjusting this new field to SYNCED,
> and presumably we'd need this wrapped in a transaction (so we either update
> everything or nothing when applying the changes and this new field), but the
> transaction support in places is quite opaque to me.

On iOS we do this by collecting all of the success results from each upload batch, and then marking them as uploaded in one query with a pile of IN (?, ?, ?) statements. We do the same thing for clearing deletions when they've been uploaded.

If anything fails, it doesn't matter; we'll just reupload. On desktop you can probably update each batch as you go.

((
Don't look at the iOS bookmarks code for this, because we do full three-way merges. Desktop will be closer to history or logins:

https://github.com/mozilla/firefox-ios/blob/9d5ebb32c7538b0652f345c4b39dc4e0b924ec60/Storage/Logins.swift#L598-L598
))

> * I'm not yet clear on bookmark moves - we *might* be able to get away
> without touching the new parent node, but ISTM we'd almost certainly need to
> touch the old parent.

Currently we track both parents, I think, and I don't see how we could avoid doing so even if we have Places do the tracking. We need to upload records for all three -- both parents plus child -- on every move.


> * deletion of records sounds like its own mini-category. Assuming there's
> reluctance to just flag items as deleted, it might be possible to create a
> new table to store deleted IDs just for Sync.

We actually do this for logins (and other stuff, maybe?) on Android:

http://hg.mozilla.org/mozilla-central/file/default/mobile/android/base/java/org/mozilla/gecko/db/BrowserDatabaseHelper.java#l368

It's a little bit more of a pain in some respects than iOS's deleted flag, but it is at least known to work.


I'm happy to write up a more concrete proposal for this if it helps.
(In reply to Richard Newman [:rnewman] from comment #2)

That's helpful, thanks! One late-night clarification:

> > * I'm not yet clear on bookmark moves - we *might* be able to get away
> > without touching the new parent node, but ISTM we'd almost certainly need to
> > touch the old parent.
> 
> Currently we track both parents, I think, and I don't see how we could avoid
> doing so even if we have Places do the tracking. We need to upload records
> for all three -- both parents plus child -- on every move.

Yeah - I guess I was just suggesting that there may not be a need to set CHANGED on the locally-current parent as Sync could imply that. IOW, a premature optimization ;)
(Reporter)

Comment 4

a year ago
(In reply to Mark Hammond [:markh] from comment #3)

> Yeah - I guess I was just suggesting that there may not be a need to set
> CHANGED on the locally-current parent as Sync could imply that. IOW, a
> premature optimization ;)

Without wanting to rathole too much: I don't think desktop has enough context to be able to determine _what_ changed from a simple change flag.

That is: given a record marked as CHANGED, how can we know whether you renamed it (just upload that record) or moved it (upload it and its current parent)?

If we wanted to use more fine-grained change tracking, we could do so -- VALUE_CHANGED, REPOSITIONED, MOVED, etc. -- and have those imply various behaviors like reuploading the new parent. But it's probably simpler to just be explicit about flipping the right flags.
(Reporter)

Updated

a year ago
Blocks: 1257961
We discussed this in the past, many times.
Surely it's much better to move to a world where Sync just ask to a local provider "please tell me what changed from this timestamp on", and gets data handed. It's better for Sync consistency, it's better for the component, that can handle internals "closer to the metal" and thus be more efficient, and it's better cause shares responsibility through devs. So no doubts on that.

Regarding how to implement that, I suspect what you want it not really a complicate flag that will keep needing updates at every new action, and at long could create backwards compat issues. You want something more like a journal log, where every change is annotated (and eventually changes could merged when it makes sense, for example multiple changes to the same bookmark in a row). That would also solve the problem of tracking removals. Considered most of the changes Sync currently tracks are tracked through some kind of notification from an observer interface, one should just teach the notifier to also annotate the changes to a local journal.
But if you just go through a "give me what changed" API, it's not even responsibility of Sync to handle that. Though as a middle step, sync could already build such a journal today, it should just decouple receiving notifications from handling them.
Flags: needinfo?(mak77)
(In reply to Marco Bonardo [::mak] from comment #5)
> We discussed this in the past, many times.
> Surely it's much better to move to a world where Sync just ask to a local
> provider "please tell me what changed from this timestamp on", and gets data
> handed. It's better for Sync consistency, it's better for the component,
> that can handle internals "closer to the metal" and thus be more efficient,
> and it's better cause shares responsibility through devs. So no doubts on
> that.

I think a simple flag is better than a timestamp - clocks skew and just generally suck - a flag meets our purposes and is unambigious.

> Regarding how to implement that, I suspect what you want it not really a
> complicate flag that will keep needing updates at every new action, and at
> long could create backwards compat issues.

I don't see why this is the case - it's a flag that should be set whenever a row in bookmarks changes - that's not complicated. Duplicating and persisting the change in a different store *is* complicated. I'm not sure what backwards compatibility you are talking about - maybe an addon writing the database directly? We'd already have issues with such addons as we wouldn't get the notification (and even then we'd just be in the same case we are now if the change happened during a sync.) Or maybe you mean something else?

> You want something more like a
> journal log, where every change is annotated (and eventually changes could
> merged when it makes sense, for example multiple changes to the same
> bookmark in a row). That would also solve the problem of tracking removals.

This is *exactly* what we have now - the changes are captured and persisted to disk. As Richard says above, this has a number of problems, including missing notifications before we've installed the observers and being unable to finally persist then as the last change happened just as we were shutting down (or because we crashed).

Having a flag directly in the bookmarks row avoids these problems and is very efficient (we are already writing to that row anyway)

> Considered most of the changes Sync currently tracks are tracked through
> some kind of notification from an observer interface, one should just teach
> the notifier to also annotate the changes to a local journal.

As above, that's what we do, and that's what is letting us down. Also, note the word "most" above - I guess you were talking about the fact that after some threshold, places will avoid sending notifications per item. That's not OK for our purposes.

> But if you just go through a "give me what changed" API, it's not even
> responsibility of Sync to handle that. Though as a middle step, sync could
> already build such a journal today, it should just decouple receiving
> notifications from handling them.

Again, this is exactly what Sync has, and it doesn't work.

Richard and I are quite convinced the tracker approach is never going to be reliable. We could make it better, but never good. It would be helpful if you could enumerate the issues you see with adding the new field like we propose and an ideal way of having places track deletions for Sync.
Flags: needinfo?(mak77)
(Reporter)

Comment 7

a year ago
> I think a simple flag is better than a timestamp - clocks skew and just
> generally suck - a flag meets our purposes and is unambigious.

Seconding this. Timestamp-based lookup is what we do on Android, based on the initial repository design we created for desktop:

https://wiki.mozilla.org/CloudServices/Sync/FxSync/StoreRedesign

With the benefit of hindsight, that was a bad idea.

(Repository doesn't imply timestamp lookup, of course.)

I wrote some of this up here:

https://160.twinql.com/syncing-and-storage-on-three-platforms/


> Richard and I are quite convinced the tracker approach is never going to be
> reliable.

Yup. Characteristics of a good change monitor for a sync system:

* Changes are intrinisically part of the data source. When it's moved or invalidated, so is the change status.
  * If not: backup/restore/Refresh Firefox/DB corruption/etc. will introduce inconsistency or require complicated plumbing.

* Change tracking happens transactionally as the data changes.
  * If not: changes will inevitably be missed.

* Every change is tracked.


The observer-based ('tracker') approach fails all three of those.
(Reporter)

Comment 8

a year ago
(In reply to Marco Bonardo [::mak] from comment #5)
> You want something more like a
> journal log, where every change is annotated (and eventually changes could
> merged when it makes sense, for example multiple changes to the same
> bookmark in a row).

This is one possible way to build a replication/syncing system. See, for example, delta sync/queue sync from PICL:

https://wiki.mozilla.org/Identity/CryptoIdeas/05-Queue-Sync

or Kafka/Samza:

http://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/


I've spent a very, very long time thinking about how deltas and whole-record systems like Sync (and the data source that the frontend needs!) can interact.

A reasonable compromise we used on iOS was to do a coalesced three-way merge (that is: immediately collapse all local changes and all remote changes, and keep the last shared parent). If you ask my reviewers, I'm sure they'd say that any additional complexity would have been too much.

Switching to delta-based tracking would be a natural fit for a delta-based syncing system, but until desktop at the very least keeps shared parents and buffers remote records, it's premature.

Note also that delta-based tracking implies a very different role for Places itself: rather than being the canonical store with direct writes, and consumers seeing changes after the fact, it would likely have to shift to a more Redux/React-like action/reducer approach.


But either way, change tracking or transaction logging, there's no good way to build a reliable syncing system without strong involvement in the storage layer.
I'm taking this bug (but we obviously need to work out exactly what it means first). I'm working on a few experiments as POC.
Assignee: nobody → markh
(In reply to Mark Hammond [:markh] from comment #6)
> I think a simple flag is better than a timestamp - clocks skew and just
> generally suck - a flag meets our purposes and is unambigious.

It depends where you take the timestamps from, if it's like the LASTMOD time of the last modified bookmark, I don't see how clock skews could change anything.
That said, in the log case you'd not need a timestamp at all, cause you'd just consume data. Anything that has not been consumed is data you want to sync.

> This is *exactly* what we have now - the changes are captured and persisted
> to disk.

Afaict, it's not. What we have now is Sync tracking notifications and reacting to them. That is fragile cause Sync is trying to track another component.

What I suggest is the component itself tracking its own notifications, even before they are sent. The component cannot lose notifications and knows how to efficiently store data.
sync would just ask the component to hand it any uhandled changes and could tell the component once a change has been handled (so it can be removed).
So, before sending a notification the component (let's suppose Places) can store all the sync needed data into a custom store (json, db, whatever), that could also be separated from the main db. That'd be very good in case of corruptions requiring to replace the whole db. But that's an edge case, so for now let's ignore "where" to store the log.

> Having a flag directly in the bookmarks row avoids these problems and is
> very efficient (we are already writing to that row anyway)

There can be a concurrency problem though. What happens if Places sets CHANGED on a row that is being synced?
if Sync sets SYNCED as soon as it fetches the row it is lying, cause a crash would throw that op away. If it sets it after the operation, it may lose further changes made in the meanwhile to the same row.
It sounds very complex to ensure consistency of this system, compared to an append-only system.

> > Considered most of the changes Sync currently tracks are tracked through
> > some kind of notification from an observer interface, one should just teach
> > the notifier to also annotate the changes to a local journal.
> 
> As above, that's what we do, and that's what is letting us down.

Yes, the problem is not what we do, but who is doing it.

> I guess you were talking about the fact that after
> some threshold, places will avoid sending notifications per item. That's not
> OK for our purposes.

no, I said "most" only cause I don't know what sync does for other components (not Places).
Any change should be notified.

Btw, what I'm suggesting would be feasible also on top of current Sync, without a total rewrite, we should just move tracking from Sync to each component and provide 2 apis: give_me_all_changes() and applied_change(change_id)
Flags: needinfo?(mak77)
(In reply to Richard Newman [:rnewman] from comment #7)
> * Changes are intrinisically part of the data source. When it's moved or
> invalidated, so is the change status.
>   * If not: backup/restore/Refresh Firefox/DB corruption/etc. will introduce
> inconsistency or require complicated plumbing.

This is not a problem of an observer-based approach. it's a bug of the specific implementation.
I don't see how a column flag is superior to an observer-based approach here. if we replace/restore the db, we lose all the flags, at a minimum even if we can restore some of them, they are unreliable, so in both cases you know we must reconcile everything.

> * Change tracking happens transactionally as the data changes.
>   * If not: changes will inevitably be missed.

again, I'm not sure where the observer-based approach fails, if implemented properly, so with changes tracked just before being notified.

> * Every change is tracked.

again, if it doesn't happen, it's an implementation bug and it can be fixed.

Afaict, the only system that would satisfy your requirements 100% would be a trigger approach (intrinsically transaction safe) that appends to a persistent store outside of the main db (corruption handling) we are tracking.

The only thing the current system is failing in, is the disconnection with the observed component.
About the concurrency with a single flag, we'd need something like SYNCING and SYNCED to be sure to not lose in-the-meanwhile changes. Thus, write SYNCED only if the previous value is SYNCING.
But this is removing the "we are already writing to that row anyway" advantage. you'd need to write to the row 2 times more than what we do today, once when sync is starting, once when done.
So perf is not an advantage of that approach. Plus there's still the DELETE problem, that will still require to retain some sort of log. So in the end it requires implementing both a flag and a log, vs just a log.

Note, I figure Richard is far more expert than me on the argument. It's likely in my quick thinking i am missing a lot of the points that you discussed and analyzed along these years, and I'm maybe making obvious mistakes. I'm mostly trying to post counter-arguments to refine the approach selection.
(In reply to Marco Bonardo [::mak] from comment #10)

> That said, in the log case you'd not need a timestamp at all, cause you'd
> just consume data. Anything that has not been consumed is data you want to
> sync.

Yeah, that's true, but it also seems quite a bit heavier.

> > This is *exactly* what we have now - the changes are captured and persisted
> > to disk.
> 
> Afaict, it's not. What we have now is Sync tracking notifications and
> reacting to them. That is fragile cause Sync is trying to track another
> component.
> 
> What I suggest is the component itself tracking its own notifications, even
> before they are sent. The component cannot lose notifications and knows how
> to efficiently store data.

OK, I misunderstood what you were suggesting, and I agree that if the appending was done in the same transaction it should be reliable. Just potentially expensive, and presumably will need to know to do nothing when Sync's not configured (and presumably to clean up after itself if sync later becomes unconfigured).

> There can be a concurrency problem though. What happens if Places sets
> CHANGED on a row that is being synced?
> if Sync sets SYNCED as soon as it fetches the row it is lying, cause a crash
> would throw that op away. If it sets it after the operation, it may lose
> further changes made in the meanwhile to the same row.
> It sounds very complex to ensure consistency of this system, compared to an
> append-only system.

Yeah, this is true - Sync doesn't work in a way that it can reasonably perform the read/reconcile/write in a single transaction.

In IRC, rnewman suggested an alternative using a counter:

* a syncChangeCount (or whatever) column that's incremented once per change.
* When Sync reads the record, it reads this initial value, then when re-writing it decrements the counter by whatever the value was when it started.
* Any record with a non-zero value needs re-syncing.

This will have a bit of a problem if an incoming record has a change to (say) the title, and the title was changed during a Sync - we'll probably take the server's version. However, our conflict resolution is already not great and I don't think this would leave us any worse off than what we have now (ie, one of the changes must lose) - but it *will* prevent us missing a change to a non-conflicting field. It is also likely to require a new field to handle "new" (ie, a flag to indicate whether this field has ever been synced or not in the past)

(In reply to Marco Bonardo [::mak] from comment #11)

> here. if we replace/restore the db, we lose all the flags, at a minimum even
> if we can restore some of them, they are unreliable, so in both cases you
> know we must reconcile everything.

That's tricky, and probably depends on *why* the DB was restored - in some scenarios the user's expectation will be that this restored copy becomes the canonical version (ie, the server should be updated to match this profile) but others will expect that the server version is canonical (ie, the recently restored data should be changed to match the "good" copy on the server), whereas the majority of people will probably just expect magic :)

But while this is tricky, I believe that the "change log" in this scenario (either with it being restored from the backup, or unchanged even after the restore) is now useless. I suspect we will need a way to explicitly handle a state of "I've no idea what the sync-state of this record is" and the onus will be on Sync to use heuristics to determine an appropriate state and update the record accordingly. That's roughly what will need to happen after a change like this is made anyway (ie, we'll need some way to reconcile existing records with the server copy to catch changes (or entire bookmarks) we failed to Sync in the past due to our tracker limitations.)

(In reply to Marco Bonardo [::mak] from comment #12)
> About the concurrency with a single flag, we'd need something like SYNCING
> and SYNCED to be sure to not lose in-the-meanwhile changes. Thus, write
> SYNCED only if the previous value is SYNCING.
> But this is removing the "we are already writing to that row anyway"
> advantage. you'd need to write to the row 2 times more than what we do
> today, once when sync is starting, once when done.

I believe the "counter" proposal would achieve the same result without the additional overhead - do you agree?

> So perf is not an advantage of that approach. Plus there's still the DELETE
> problem, that will still require to retain some sort of log. So in the end
> it requires implementing both a flag and a log, vs just a log.

That's true - although in the case of a delete it's less of a log and more of a "set", which each deletion requiring one append and (probably) no additional meta-data, and because it's a one-hit change (ie, deletions can't be resurrected) there's no problem "restarting" the syncing of deleted items multiple times (eg, in the case of interrupted uploads or premature shutdown)

> Note, I figure Richard is far more expert than me on the argument. It's
> likely in my quick thinking i am missing a lot of the points that you
> discussed and analyzed along these years, and I'm maybe making obvious
> mistakes. I'm mostly trying to post counter-arguments to refine the approach
> selection.

That's valuable, thanks. We all want a robust solution and doesn't cause performance to degrade noticeably. Writing a new row for every change to a bookmark seems much heavier in terms of IO (both time and space) and more complicated to implement.

Both Richard and I respect your experience with Places, so it's important that we are all comfortable with the solution we come up with, and almost as importantly, are confident we can implement it both reliably and without causing performance degradations inherent to the approach (ie, that we don't find we've regressed performance in a way that is impossible to improve with the chosen approach). My gut feel is one or 2 additional columns on moz_bookmarks that are only updated when the row is already being updated and doesn't care whether Sync is configured or not is more likely to succeed here.
Do we have an idea of the approach taken by other browsers and the eventual downsides they hit?
(Reporter)

Comment 15

a year ago
Here's how Chrome does sync:

https://docs.google.com/document/d/1N5xlLMB_O9oQtOIL9LclM6xLNdK23_WmboP0sdM-N20/edit#heading=h.4b0a0ca1ea0f

Note that they mention having an index of all changed IDs available inside the bookmark service itself.


Chromium moved away from observers to an in-service model:

http://www.chromium.org/developers/design-documents/sync/syncable-service-api

"The change processor is most likely listening to notifications from the service and using the notification's details or querying the service directly to find out what changed.  Make the service "produce" SyncChanges: i.e., when the change processor receives a notification from the service, it should only have to query the service to get one or more SyncChanges to process.  This most likely involves moving the rest of the logic from the change processor to the service itself."


Chrome's on-disk storage mechanism for bookmarks is a flat JSON file. Sync metadata lives in a sqlite database in which they track ID mappings and different versions seen. Chrome has a much more complete syncing concept than we do: data sources opt-in to a thorough Sync-driven lifecycle, so the syncing bookmark service owns both bookmark storage and its sync metadata in that DB. That's a downside if one chooses to look at it that way :)

It's similar in spirit to the in-sync/out-of-sync stuff in Queue Sync.


I have no idea how Safari does its syncing; from what I've heard, not well.
Created attachment 8735735 [details] [diff] [review]
0001-use-triggers-to-populate-the-new-sync-fields.patch

This is the basis for a strategy like comment 13, using triggers to manage the counter and (new) deleted table. Mak, what do you think of this approach?
Attachment #8735735 - Flags: feedback?(mak77)
(In reply to Mark Hammond [:markh] from comment #16)
> Created attachment 8735735 [details] [diff] [review]

I meant to mention: there's a trigger that updates moz_bookmarks.syncChangeCounter whenever there's a change to moz_keywords, so Sync sees the bookmark as changed when a keyword is assigned. This obviously is a performance cost (moz_bookmarks wasn't updated before). I suspect the same basic issue will arise with tags and annotations. An alternative would probably be syncStatus and syncChangeCounter fields on these other tables, and Sync crafting an appropriate query to form the set of changed bookmarks at Sync time.

At this stage though I'm ignoring that and starting to look into what changes to Sync itself are necessary in this world - that will probably influence the places-specific changes in a number of (other?) ways (eg, Sync currently ignores favicon changes and certain annotations, which already seems a challenge and/or a network hit)
(In reply to Mark Hammond [:markh] from comment #17)
> (In reply to Mark Hammond [:markh] from comment #16)
> > Created attachment 8735735 [details] [diff] [review]

Why both a counter and a status? I was assuming the counter idea was to decrease it and have status == synced when the counter is 0.
The other downside here, is that you don't know what changed, so you must compare 2 records to figure what to update. Dunno if that matters for sync or in any case we update the whole record.

> I meant to mention: there's a trigger that updates
> moz_bookmarks.syncChangeCounter whenever there's a change to moz_keywords

This is a problem that we didn't consider before.
Keywords are no more bound to bookmarks, they were though. Today the only relation between bookmarks and keywords is the url.
The cost of the update may not be a problem in this case, keywords are rarely used. But here we are hardcoding in the store a relation that won't exist forever. In future I'd like to completely separate the keywords UI from the bookmarks UI, you may add a keyword to an url without the need to create a bookmark at all. How we manage that, when a keyword won't necessarily have a bookmark?

The problem we didn't analyze yet is, in general, how to manage this kind of changes, where something that currently is bound to a certain item, decouples from it.
It can happen in any provider, not just Places, something could be bound to a certain entity, but then be moved from it, for perf or semantical reasons. And if we don't allow such flexibility, we may make the component pay a technical debt forever.
An observer approach was somehow bypassing this problem, cause you just deal with notifications, they create a decoupling between the store and the view.

Look like we need something in the middle to decouple the store from Sync. If Sync can directly query the component store, an evolving component loses flexibility.

Let's use another example...

> I suspect the same basic issue will arise with tags and annotations.

We plan (will surely happen) to move tags out of the bookmarks table, to their own table. For now only bookmarks support tags, and I suspect this won't change, but we can't know for sure, technically there's nothing disallowing that, it's just our decision.
Relying on a system that assumes tags are in the bookmarks table, in future may create problems. At that point when a tag is added/removed, we'd only update moz_places, not moz_bookmarks (tags are again bound to urls, not bookmarks, 2 bookmarks pointing to the same url have the same tags). So we would need to pay a perf price to update every single bookmark for the tagged url, and if we decide we won't need a bookmark anymore to tag something, we're broken.

To sum up, with a store-bound approach, we need a layer in the middle to provide something that is  view-bound. Currently that something is notifications (they work regardless of the underlying store), in the new approach it may be disallowing Sync from directly querying the component store, so the component can move the field tracking changes where it's more appropriate and still hand back a compatible representation.

To compare with other approaches we looked into, an append-log approach would not have this kind of problem, cause it would log more general messages like "the keyword X was added for the url Y", "the tag W was removed for the url Z" and the physical store would not matter at all.

(In reply to Richard Newman [:rnewman] from comment #15)
> Note that they mention having an index of all changed IDs available inside
> the bookmark service itself.

that is sort of what we discussed many times, components should track themselves what needs to be synced.

> This most likely involves moving the rest of the logic from the
> change processor to the service itself."

Makes a lot of sense, again it was discussed multiple times.

> 
> Chrome's on-disk storage mechanism for bookmarks is a flat JSON file. Sync
> metadata lives in a sqlite database in which they track ID mappings and
> different versions seen.

It's using a separate store, as discussed above. Not sure it makes sense to keep around "different versions" though, if you can just store an object describing the "diff".

> data sources opt-in to a thorough Sync-driven lifecycle, so the
> syncing bookmark service owns both bookmark storage and its sync metadata in
> that DB. That's a downside if one chooses to look at it that way :)

Which downside? To me it looks like almost-how-things-should-have-been-done.
(In reply to Marco Bonardo [::mak] from comment #18)
> (In reply to Mark Hammond [:markh] from comment #17)
> 
> Why both a counter and a status? I was assuming the counter idea was to
> decrease it and have status == synced when the counter is 0.

status also records if the bookmark has ever been synced (and if not, deletions need not write the GUID to the new moz_bookmarks_deleted table). It also records when the state is "unknown" and thus Sync needs to do something painful to attempt reconciliation (eg, first run after landing, or after an auto-bookmark-restore due to a corrupt places DB, etc)

> The other downside here, is that you don't know what changed, so you must
> compare 2 records to figure what to update. Dunno if that matters for sync
> or in any case we update the whole record.

yep - see above for Richards comments on why this doesn't really help us.

I think it's worth stepping back here a little. Given the constraints of the current Sync system and our (lack of) ability to rewrite the world to make a perfect system, Richard and I have settled on a requirement that's fairly easy to describe at a high level:

* Places should track whenever a physical (eg, a field directly on a bookmark) or a logical (eg, something that changes Sync's representation of a bookmark, such as a tag or annotation) change is made to a bookmark.
* Sync will tell Places when it has applied the local changes on a per-bookmark basis, so Places can treat the bookmark as no longer having a sync-related change (while obviously keeping concurrency in mind, hence the "counter" in this proposal)
* Sync does *not* need fine-grained detail about what changed, just that *something* did. Sync will grab the affected bookmark, reconcile the change from the current state of the bookmark, and tell Places it has done so.

I believe the patch I put up is a step in this direction. I believe we could *also* build a system with a "change log" approach, but that seems far more complex, both in terms of implementation, and in reconciling the log with the above requirements.

For example, you mention recording "the keyword X was added for the url Y" - that's fine, but it's useless to Sync in that form. Places would still need to convert that to "bookmark with id=xxx changed". Once Sync has uploaded the changed bookmark, Places would then need to arrange for "bookmark xxx" to no longer be marked as changed, but any other bookmarks affected by that change to still be (if a single change record affected multiple bookmarks, Sync can't promise it will reconcile all affected bookmarks atomically)

IOW, I believe the "change log" approach would fail if a single log entry could affect more than one logical bookmark (unless we increased the complexity even further and *still* ended up needing to attach book-keeping information directly to the bookmarks record, even for "logical" changes that didn't directly touch the bookmarks table)

> This is a problem that we didn't consider before.
> Keywords are no more bound to bookmarks, they were though. Today the only
> relation between bookmarks and keywords is the url.
> The cost of the update may not be a problem in this case, keywords are
> rarely used. But here we are hardcoding in the store a relation that won't
> exist forever.

Sync stores keywords, tags etc with the bookmark - there's no separate store for them like there is in places. If the keyword or tag for a bookmark changes, we need to update Sync's copy of the bookmark. I don't think we are "hardcoding in the store a relation", but instead the Places implementation is meeting the requirement that a logical change to a bookmark flags the bookmark as changed. If Places wants to change how tags or keywords are stored, that's fine, so long as it meets the requirement that a logical change to a bookmark flags the bookmark as changed.

Note also that how that change is flagged and queried is also an implementation detail that should remain internal to Places. Alternatives do include your idea of a "change log", or even a compromise that keeps the flags in a different table keyed by the bookmarks GUID. I'm open to any such ideas, but I'm leaning towards the smallest possible implementation that meets these identified requirements. Meeting additional unspecified requirements by introducing additional complexity is an unnecessary cost.

> It can happen in any provider, not just Places, something could be bound to
> a certain entity, but then be moved from it, for perf or semantical reasons.
> And if we don't allow such flexibility, we may make the component pay a
> technical debt forever.

Let's avoid hyperbole and stick to this specific problem - Sync and Places need to be more tightly bound to make Sync reliable, and it need to happen soon given Sync has alot of strategic importance. The problems we are fixing here are bad enough that iOS refuses to enable bi-directional Sync.

The reality is that future changes to Places *must* take Sync into account. That's the way it should have been long ago. I believe a more formal relationship between Sync and Places will actually *reduce* technical debt over the longer term.

To reiterate:
* Sync just wants to flag to say "a bookmark has logically changed".
* Places is free to implement that flag any way it likes.
* IMO a change-log approach is more expensive and its other theoretical advantages are not relevant to anything we've identified (ie, not to Sync, and not to anything else we are planning)

I'm obviously happy to be proven wrong, but I think that needs to be demonstrated with code that meets the requirements above rather than an abstract thought experiment. In the meantime I'll continue to work on the Sync side of this world and I'd be very happy to see your change-log plan put up as a straw-man patch. As above, Sync's requirements are quite simple, so there's no reason an alternative implementation in places can't be swapped in - if it meets the simple requirements of "tell me what bookmarks have changed" and "no longer treat this bookmark as changed", the Sync side of the world shouldn't change just because the implementation of those requirements has.
(In reply to Mark Hammond [:markh] from comment #19)
> status also records if the bookmark has ever been synced (and if not,
> deletions need not write the GUID to the new moz_bookmarks_deleted table).

does this really matter? If we tell sync a bookmark has been removed, and it can't find it, won't it just ignore it? Is it expensive?

> It also records when the state is "unknown" and thus Sync needs to do
> something painful to attempt reconciliation (eg, first run after landing, or
> after an auto-bookmark-restore due to a corrupt places DB, etc)

I supposed the default counter value for newly created bookmarks was 1 and that was enough to tell that bookmark was "dirty". I just fear the added complexity of having 2 fields.
Off-hand it sounds more like an engine status flag, rather than being for each bookmark. Most cases you point out (excluding deleting a non-synced entry) are cases replacing ALL the bookmarks. It would be pointless to mark that on every bookmark at that point.
Eventually, if we really need an unknown state, is it possible to use negative counter values for unknown and only increase positive values in the trigger? We do something similar for frecency, where a negative value means "unknown".
Reducing complexity should be a win. Plus, in general, it's a bad idea in a schema to have a column that has the same value for all entries but a few, it's inefficient.

> If Places wants to change how tags or keywords are
> stored, that's fine, so long as it meets the requirement that a logical
> change to a bookmark flags the bookmark as changed.

It's not matter of how they are stored, but what they are. We used to tell keywords and tags were a bookmark property cause they were in the bookmarks store. But that's untrue, they are there just cause someone did the schema wrong.

> The reality is that future changes to Places *must* take Sync into account.
> That's the way it should have been long ago.

I agree, but we can't build a system where we have no idea how to react if things change. Some of the changes may have a high user benefit, and we would be blocked on those. We can look for a solution to the problem now, since we already know what is likely to change, we have some advantage.

So let's see what we may need to change for keywords and tags.
The latter may be a non-issue if we keep tags a property of bookmarks. But would be nice to think what we could do if that'd change and tags would become another way of bookkeeping urls. Would we need a new tags engine? Would that solve the problem?

Keywords are a more immediate problem, cause we already started changing them. In the future it's likely keywords will only be used for searches and managed through "add keyword for this search" and "preferences / Search" pane. We may even change them to be stored as custom search engines.
It may still be possible to add a keyword as an alias to an url, through the Places API, but we could decide to just not sync those.
How would we attack that, would we need a new sync engine only for those if we want to keep syncing them? should we move them to the searches sync engine (provided there's one). May we just stop syncing them? Does mobile support something like keywords?
(In reply to Marco Bonardo [::mak] from comment #20)
> (In reply to Mark Hammond [:markh] from comment #19)
> > status also records if the bookmark has ever been synced (and if not,
> > deletions need not write the GUID to the new moz_bookmarks_deleted table).
> 
> does this really matter? If we tell sync a bookmark has been removed, and it
> can't find it, won't it just ignore it? Is it expensive?

This is part of the tracker - Sync needs to know an item was deleted so it can tell the server it deleted. Sync doesn't do a full reconcile of the server tree against the local tree on every sync.

> > It also records when the state is "unknown" and thus Sync needs to do
> > something painful to attempt reconciliation (eg, first run after landing, or
> > after an auto-bookmark-restore due to a corrupt places DB, etc)
> 
> I supposed the default counter value for newly created bookmarks was 1 and
> that was enough to tell that bookmark was "dirty". I just fear the added
> complexity of having 2 fields.

A bookmark that hasn't been synced can still have a non-zero counter. Think a user that doesn't have sync configured - new bookmarks will forever be in a "new" state and have an ever increasing change counter.

We could try and make the semantics of the changeCounter become more complex (eg, treat -1 as special and never increment or something) but that sounds like more complexity than 2 well-defined fields.

> Off-hand it sounds more like an engine status flag, rather than being for
> each bookmark. Most cases you point out (excluding deleting a non-synced
> entry) are cases replacing ALL the bookmarks. It would be pointless to mark
> that on every bookmark at that point.

I don't quite understand this - it doesn't seem expensive for new rows created due to a backup to make sure that state field is "unknown" - by definition replacing all bookmarks will be touching all the rows for those new bookmarks.

> Eventually, if we really need an unknown state, is it possible to use
> negative counter values for unknown and only increase positive values in the
> trigger? We do something similar for frecency, where a negative value means
> "unknown".
> Reducing complexity should be a win.

That's probably possible, but I think we have different concept of complexity :) One technique gives us "syncChangeCounter is a counter that's increments on a change" whereas the other is "syncChangeCounter is a counter sometimes, but othertimes it's a flag" - the latter is more complex IMO. 2 fields with well defined semantics seems less complex than trying to use a single field to track 2 different things.

> It's not matter of how they are stored, but what they are. We used to tell
> keywords and tags were a bookmark property cause they were in the bookmarks
> store. But that's untrue, they are there just cause someone did the schema
> wrong.

That may be true, but we are faced with the fact that Sync stores these in bookmark records. You could well argue that someone did *that* schema wrong too, but that's what we have.

> > The reality is that future changes to Places *must* take Sync into account.
> > That's the way it should have been long ago.
> 
> I agree, but we can't build a system where we have no idea how to react if
> things change. Some of the changes may have a high user benefit, and we
> would be blocked on those. We can look for a solution to the problem now,
> since we already know what is likely to change, we have some advantage.

I'm not sure what you are getting at here.

> So let's see what we may need to change for keywords and tags.
> The latter may be a non-issue if we keep tags a property of bookmarks. But
> would be nice to think what we could do if that'd change and tags would
> become another way of bookkeeping urls. Would we need a new tags engine?
> Would that solve the problem?

That probably would, but it's out of scope for what we are trying to do here - we aren't going to block this on having every sync client implementation implement and deploy this before we try and solve the problem of clients writing an invalid bookmarks tree.

IOW:
step 1: make clients not write invalid trees to the server.
step 2: improve sync so that some very old and possibly invalid assumptions about the relationship between bookmarks, tags, keywords etc better reflects a desired model.

We can do step2 later, but need not block step1 on doing that. The fact we write invalid trees has nothing to do with the tags/keywords model.

> Keywords are a more immediate problem, cause we already started changing
> them. In the future it's likely keywords will only be used for searches and
> managed through "add keyword for this search" and "preferences / Search"
> pane. We may even change them to be stored as custom search engines.
> It may still be possible to add a keyword as an alias to an url, through the
> Places API, but we could decide to just not sync those.
> How would we attack that, would we need a new sync engine only for those if
> we want to keep syncing them? should we move them to the searches sync
> engine (provided there's one).

That all sounds like great fodder for the future, but isn't really relevant to the fact that as of today, Sync will write invalid bookmark trees for reasons unrelated to keyworks, tags or annotations. Let's fix the invalid tree on the server first.

> May we just stop syncing them? Does mobile
> support something like keywords?

Desktop does, and I'd personally be very upset if 2 my desktop machines connected to Sync didn't also carry keywords with them. Also, not Syncing keywords does absolutely nothing to help solve the problem we are facing, so I don't think it is relevant. Turning this bug into "let's re-write sync" isn't going to get us anywhere.
(In reply to Mark Hammond [:markh] from comment #21)
> (In reply to Marco Bonardo [::mak] from comment #20)
> > (In reply to Mark Hammond [:markh] from comment #19)
> > > status also records if the bookmark has ever been synced (and if not,
> > > deletions need not write the GUID to the new moz_bookmarks_deleted table).
> > 
> > does this really matter? If we tell sync a bookmark has been removed, and it
> > can't find it, won't it just ignore it? Is it expensive?
> 
> This is part of the tracker - Sync needs to know an item was deleted so it
> can tell the server it deleted. Sync doesn't do a full reconcile of the
> server tree against the local tree on every sync.

Yep, my question is whether it's expensive to tell the server "this thing went away" when "this thing" is not on the server.

> > Eventually, if we really need an unknown state, is it possible to use
> > negative counter values for unknown and only increase positive values in the
> > trigger? We do something similar for frecency, where a negative value means
> > "unknown".
> > Reducing complexity should be a win.
> 
> That's probably possible, but I think we have different concept of
> complexity :) One technique gives us "syncChangeCounter is a counter that's
> increments on a change" whereas the other is "syncChangeCounter is a counter
> sometimes, but othertimes it's a flag" - the latter is more complex IMO. 2
> fields with well defined semantics seems less complex than trying to use a
> single field to track 2 different things.

It's still a counter that can also take the -1 value, let's not make it more complex than it is.
In an embedded db 1 field is generally better than 2 or 3 or 4.
How much it complicate things depends, we may just need to add an if to triggers.
Moreover, we could always move from 1 field to 2 if we find things are not good enough. But we couldn't move back from 2 fields to 1, cause Sqlite doesn't support removing fields.

So, if the added complexity is just a couple ifs, and we get the same benefits, I'd be all for 1 field.
If we need 2 fields cause the flag is more complex than a bool (or will very shortly be), then let's go for 2 fields.

> I'm not sure what you are getting at here.

I just want to be sure whatever we do here won't make it harder to implement planned changes.
If we know it will be easy to detangle tags and keywords from bookmarks when we'll have to, you have open doors!!
(In reply to Marco Bonardo [::mak] from comment #22)
> Yep, my question is whether it's expensive to tell the server "this thing
> went away" when "this thing" is not on the server.

Sync uses a logical delete for bookmarks, so the process for deleting an item from the server is (basically) creating a record on the server with {deleted: true} as the payload. It would also be possible to query the guid and only upload a deleted version if it did exist (which is slightly more expensive, but not particularly)

I'm not sure how this helps though (ie, I don't see where you are heading with this). The idea behind the "moz_bookmarms_deleted" table is that places would write to it as items were deleted, and Sync would clear that table once it Synced the deletion. I doubt we would want to write to this deleted table when Sync isn't configured, as then that table would never actually be cleared - hence the "new" state. OTOH though, if we always wrote to the new deleted table and didn't care that it might never be purged, it would make fixing bug 1177516 easier!
 
> It's still a counter that can also take the -1 value, let's not make it more
> complex than it is.
> In an embedded db 1 field is generally better than 2 or 3 or 4.
> How much it complicate things depends, we may just need to add an if to
> triggers.
> Moreover, we could always move from 1 field to 2 if we find things are not
> good enough. But we couldn't move back from 2 fields to 1, cause Sqlite
> doesn't support removing fields.

Yeah, fair enough - that's a detail I'm happy to defer to you on - I'm using 2 fields for my POC and we can sort that out later.

It does remind me though that the triggers don't seem like they will work for us :( The fundamental problem is that whenever Sync is told that an item has changed, it unconditionally writes the current representation to the server. Other devices then see the change, pull down the server version and reconcile it with what they have locally. IOW, it's really not possible for Sync on this device to say "oh, I was told guid xxxx changed, but I can detect that the change was immaterial so I'll ignore it". If this new system flags bookmarks as changed which were not flagged before, we run the risk of a "sync loop" - ie, devices continually uploading the same bookmark even though nothing of consequence has changed.

This means that it is important that this new tracking system has identical semantics to the existing tracker system. Sadly though, this doesn't match what *actually* changes. For example, when a bookmark is moved within the same folder only the folder itself must be marked as changed - the children must not (even though the moz_bookmark rows for those children were changed). This seems impossible (or maybe just very difficult) to capture using triggers, so I've a WIP that manages these changes manually (ie, almost every "UPDATE moz_bookmarks ..." statement needs touching).

I'll put up my WIP for this soon, but do you foresee any objections, or see any other alternatives to this?

> I just want to be sure whatever we do here won't make it harder to implement
> planned changes.
> If we know it will be easy to detangle tags and keywords from bookmarks when
> we'll have to, you have open doors!!

Understood, and I'm certainly happy to work out the finer details of this patch so everyone is kept happy :)

Thanks!
(Reporter)

Comment 24

a year ago
> Yep, my question is whether it's expensive to tell the server…

It's never the case with Sync that you're telling the server anything, you're always telling other clients. The server is just a whiteboard.

> It would also be possible to query the guid
> and only upload a deleted version if it did exist (which is slightly more
> expensive, but not particularly)

Not in every case -- just because a record isn't on this server at this moment doesn't mean that it's not on another client.

> I doubt we would want to write to this deleted
> table when Sync isn't configured

Yup, and that's why I distinguish between NEW and CHANGED. Clients with no Sync never have records that aren't NEW.
(In reply to Mark Hammond [:markh] from comment #23)
> For example, when a bookmark is moved within the
> same folder only the folder itself must be marked as changed - the children
> must not (even though the moz_bookmark rows for those children were
> changed). This seems impossible (or maybe just very difficult) to capture
> using triggers

Well, in an update trigger you can know what changed, for example
CREATE TEMP TRIGGER test
AFTER UPDATE OF moz_bookmarks
WHEN NEW.position <> OLD.position

it depends on the number of special cases.
But I see, it's hard to tell if *only* position changed.
Mark, is the feedback request still needed, or are you working on a new version? I think I read some comments above you were working on a second POC...
Flags: needinfo?(markh)
Comment on attachment 8735735 [details] [diff] [review]
0001-use-triggers-to-populate-the-new-sync-fields.patch

Thanks Mak - at this stage the plan is for Kit to take over this bug, taking the same basic strategy but probably avoiding the trigger approach for exactly the reason you mention. So if you have no general objections to that we'll get back to you once the patches are more developed.
Flags: needinfo?(markh)
Attachment #8735735 - Flags: feedback?(mak77)
Assignee: markh → kcambridge
Priority: -- → P1

Updated

a year ago
Depends on: 1269902

Updated

a year ago
Blocks: 1269902
No longer depends on: 1269902

Updated

a year ago
No longer blocks: 1257961
Blocks: 1269904
No longer blocks: 1269902
Status: NEW → ASSIGNED
Flags: firefox-backlog+
Created attachment 8751132 [details]
MozReview Request: Bug 1258127 - Improve bookmarks engine test logging and remove a deprecated call. r=kitcambridge

Review commit: https://reviewboard.mozilla.org/r/51875/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/51875/
Created attachment 8751133 [details]
MozReview Request: Bug 1258127 - Improve existing bookmarks tracker tests. r=markh

The intention of this patch is 2-fold:

* Capture more existing semantics of the tracker that aren't currently tested.
  The intention is that this patch doesn't touch the existing tracker or
  bookmarks engine implementation at all.

* Make structural changes such that later patches that want to ensure the
  same semantics exist using SQL queries are more obvious and limited
  only to things directly related to the new tracker - for example, this patch
  uses tasks/promises even though they aren't necessary here, but will become
  necessary in later patches.

Review commit: https://reviewboard.mozilla.org/r/51877/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/51877/
Created attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

Review commit: https://reviewboard.mozilla.org/r/51879/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/51879/
Attachment #8735735 - Attachment is obsolete: true
Here's a start that imports Mark's work from https://github.com/mhammond/gecko-dev/tree/experiment/places-sync-state. The tracker test passes, but this still needs lots of work:

(1) Restoring bookmarks doesn't work at all. This is where the distinction between the "new" and "unknown" sync statuses will come in.
(2) This needs test coverage for Places schema upgrades. I think the correct approach is to mark all existing bookmarks as "unknown" and perform a reconciliation before changing them to "new", but we'll want to iron out (1) first.
(3) The queries in Bookmarks.jsm still need to be updated. I focused on the "nsINav*" services at first; porting the queries should be a matter of typing.
(4) The sync status column isn't updated after the bookmark is first synced. For the first cut, I focused on getting the existing tracker tests to pass; the bookmarks engine sync tests still fail spectacularly.
(5) We aren't handling the changes that Sync makes as it's syncing. I think the easiest approach here is to add a "SyncUtils.jsm" module to Places that exposes some calls for Sync. These queries would duplicate those in Bookmarks.jsm (and nsNav*Service.cpp), but wouldn't update the change counter. This can be tricky because we might *want* to update the counter for bookmarks that we've reconciled, but not ones that we're applying directly. Alternatively, we could add a flag to the existing methods in Bookmarks.jsm.

I'm going to focus on bumping test coverage for Bookmarks.jsm next, and then looking at getting the engine working.
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

Mak, I'd love your feedback on the general approach. Some of the queries for updating the change counter are horrific, because they need to ignore changes that Sync doesn't care about.

Sorry for the size of this patch; I'll split it up by different types (bookmarks, tags, keywords) before requesting review.
Attachment #8751134 - Flags: feedback?(mak77)
Comment on attachment 8751132 [details]
MozReview Request: Bug 1258127 - Improve bookmarks engine test logging and remove a deprecated call. r=kitcambridge

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51875/diff/1-2/
Attachment #8751134 - Flags: feedback?(mak77)
Comment on attachment 8751133 [details]
MozReview Request: Bug 1258127 - Improve existing bookmarks tracker tests. r=markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51877/diff/1-2/
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51879/diff/1-2/
https://reviewboard.mozilla.org/r/51879/#review49025

Looking good, but still lots of work ahead of us :) I think we need to think through the change counter semantics for "non-bookmark bookmark rows" as I mention below as that looks like it will impact both the complexity and performance implications of these changes (although to be fair, I don't think we are anywhere near as sensitive to perf on writing as we are on reading)

I'm sure mak will have some strong opinions on much of this :)

::: toolkit/components/places/Bookmarks.jsm:784
(Diff revision 1)
>      let tuples = new Map();
>      if (info.hasOwnProperty("lastModified"))
>        tuples.set("lastModified", { value: toPRTime(info.lastModified) });
>      if (info.hasOwnProperty("title"))
>        tuples.set("title", { value: info.title });
> +    if (info.hasOwnProperty("syncStatus"))

This is probably a question for later, but if we end up with sync-specific routines for Sync to do updates, I expect we will probably not allow this field here.

::: toolkit/components/places/Database.h:270
(Diff revision 1)
>    nsresult MigrateV26Up();
>    nsresult MigrateV27Up();
>    nsresult MigrateV28Up();
>    nsresult MigrateV30Up();
>    nsresult MigrateV31Up();
> -  nsresult MigrateV32Up();
> +  nsresult MigrateV33Up();

I'm surprised to see this *replace* V32Up? (We chatted in irc, and I think you said you'll address that in the next patch)

::: toolkit/components/places/Database.cpp:1666
(Diff revision 1)
>  
>    return NS_OK;
>  }
>  
>  nsresult
> -Database::MigrateV32Up() {
> +Database::MigrateV33Up() {

This doesn't look correct - if a nightly user already upgraded to 32, I'd expect the creation of the moz_migrate_v32_temp table would fail when they attempt to upgrade to 33. I'd expect v32 remains and a new v33 is created with only these changes?

::: toolkit/components/places/PlacesUtils.jsm:1790
(Diff revision 1)
>            let val = aRow.getResultByName(prop);
>            if (val !== null)
>              item[prop] = val;
>          }
>        };
> -      copyProps("guid", "title", "index", "dateAdded", "lastModified");
> +      copyProps("guid", "title", "index", "dateAdded", "lastModified", "syncChangeCounter");

I think we should make the changeCounter opt-in via options to prevent callers "accidentally" using it by, eg, persisting every field we return or similar.

::: toolkit/components/places/nsAnnotationService.cpp:2047
(Diff revision 1)
> +            "SELECT item_id from moz_items_annos WHERE "
> +              "expiration = :expire_session AND "
> +              "anno_attribute_id IN ("
> +                "SELECT id FROM moz_anno_attributes WHERE name IN ("
> +                  // These should match the annotations in `IsSyncedAnnotation`.
> +                  "'bookmarkProperties/description', "

It might be nice to work out a sane way to avoid that duplication

::: toolkit/components/places/nsNavBookmarks.cpp:259
(Diff revision 1)
>                                int32_t aDelta)
>  {
>    NS_ASSERTION(aStartIndex >= 0 && aEndIndex <= INT32_MAX &&
>                 aStartIndex <= aEndIndex, "Bad indices");
>  
> +  mozStorageTransaction transaction(mDB->MainConn(), false);

It looks to me like all existing callers already have a transaction in progress, so this probablu isn't necessary. I wonder if we could assert that?

::: toolkit/components/places/nsNavBookmarks.cpp:442
(Diff revision 1)
>      NS_ENSURE_SUCCESS(rv, rv);
>      rv = lastInsertIdStmt->GetUTF8String(1, _guid);
>      NS_ENSURE_SUCCESS(rv, rv);
>    }
>  
> +  rv = TrackSyncChange(*_itemId, 1);

can't we just add the change counter when creating the record, thus avoiding the extra statement?

::: toolkit/components/places/nsNavBookmarks.cpp:1563
(Diff revision 1)
> + * Unconditionally increments the sync change counter for |aItemId|, even if
> + * the item is otherwise untracked by Sync. This is used to update the counter
> + * for the parent of an ignored item.
> + */
> +nsresult
> +nsNavBookmarks::AddSyncChange(int64_t aItemId, int32_t aDelta)

it seems like all callers to this specify a delta of 1 - they are never going to specify 2, so the only other reasonable value would be zero - in which case they could just omit the call entirely.

Also, many callers of this are already updating the same row, so I'm not sure the cost of the extra UPDATE is worth using this helper in those cases (and in some cases, we are adding a new transaction to account for this which we could continue to avoid if done "inline" - but that's probably tied up with the comment below about whether it's OK to increment tag records or not)

::: toolkit/components/places/nsNavBookmarks.cpp:1584
(Diff revision 1)
> +
> +  return statement->Execute();
> +}
> +
> +nsresult
> +nsNavBookmarks::TrackSyncTagChange(nsIURI* aURI, int32_t aDelta)

This seems fairly complex and I wonder if we could just ignore them at sync tim?. We do have a requirement that a bookmark that we would otherwise sync doesn't get a change counter when it shouldn't - however, I don't see a problem in bookmark records that we never sync (such as tags) getting a change counter and sync just ignoring them when it performs the query (and probably resetting them as normal after a sync)

It might also avoid having places hold the "policy" about what annos to ignore.

(Same comment below - and in TraceSyncChange's case, if we can ignore this complexity we can avoid the extra statement completely, as the caller is often already updating the same row)

::: toolkit/components/places/nsPlacesTriggers.h:270
(Diff revision 1)
> +// This trigger will insert a row into moz_bookmarks_deleted when a bookmark
> +// is deleted.
> +#define CREATE_BOOKMARKS_DELETED_AFTERDELETE_TRIGGER NS_LITERAL_CSTRING( \
> +  "CREATE TEMP TRIGGER moz_bookmarks_insert_deleted " \
> +  "AFTER DELETE ON moz_bookmarks FOR EACH ROW " \
> +  /* XXX - not clear what to do with SYNC_STATUS_UNKNOWN?? */ \

I expect we probably do want to add them to deleted as well, under the assumption that STATUS_UNKNOWN will be temporary until Sync has reconciled - in which case recording the deletion might help that process.
https://reviewboard.mozilla.org/r/51879/#review49025

> I expect we probably do want to add them to deleted as well, under the assumption that STATUS_UNKNOWN will be temporary until Sync has reconciled - in which case recording the deletion might help that process.

It just occurred to me there's a case where we could miss deletions if the first sync is interrupted between uploading the item and changing its status to `NORMAL`. In that case, we won't write a tombstone, but the deleted boomark will still make its way to other devices. I'm not inclined to fix that in this patch, since partial uploads violate all sorts of guarantees. But it's something to keep in mind until we have batched uploads.
Created attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

Review commit: https://reviewboard.mozilla.org/r/52587/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/52587/
Attachment #8751132 - Attachment description: MozReview Request: Bug 1258127 - Improve bookmarks engine test logging and remove a deprecated call. → MozReview Request: Bug 1258127 - Improve bookmarks engine test logging and remove a deprecated call. r=kitcambridge
Attachment #8751133 - Attachment description: MozReview Request: Bug 1258127 - Improve existing bookmarks tracker tests. → MozReview Request: Bug 1258127 - Improve existing bookmarks tracker tests. r?markh
Attachment #8751134 - Attachment description: MozReview Request: Bug 1258127 - Track sync changes for bookmarks and annotations. → MozReview Request: Bug 1258127 - Bookmarks and annotations service updates to support tracking Sync changes. f?mak
Attachment #8751132 - Flags: review?(kcambridge)
Attachment #8751133 - Flags: review?(markh)
Created attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

Review commit: https://reviewboard.mozilla.org/r/52589/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/52589/
Comment on attachment 8751132 [details]
MozReview Request: Bug 1258127 - Improve bookmarks engine test logging and remove a deprecated call. r=kitcambridge

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51875/diff/2-3/
Comment on attachment 8751133 [details]
MozReview Request: Bug 1258127 - Improve existing bookmarks tracker tests. r=markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51877/diff/2-3/
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51879/diff/2-3/
Comment on attachment 8751132 [details]
MozReview Request: Bug 1258127 - Improve bookmarks engine test logging and remove a deprecated call. r=kitcambridge

https://reviewboard.mozilla.org/r/51875/#review49543
Attachment #8751132 - Flags: review?(kcambridge) → review+
The reworked test_async tracker tests don't work yet, and I haven't incorporated Mark's suggestions to collapse the transactions for inserts, but I wanted to split up the commits.

Also, I think we can land the tests separately, since they don't depend on the rework.
Attachment #8752393 - Flags: feedback?(mak77)
Attachment #8751134 - Flags: feedback?(mak77)
Comment on attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

Mark, I split the engine changes out into a separate commit.
Attachment #8752394 - Flags: feedback?(markh)
Depends on: 1274108
https://reviewboard.mozilla.org/r/51879/#review50604

What can I say... This is not scary, it is ***SCARY***!

The main problem is that it is adding a ton of I/O to every operation. for few ops is not a big deal, but sometimes we have to remove, move or add thousands of bookmarks, and moving from 2000 queries to 6000 queries won't help. I'd suggest to make a test that does that, insert thousands bookmarks, move all of them, then remove all of them. Count queries before/after and try to take the additional overload inside an acceptable threshold (30% more queries?)
The second problem is that it seems to increase the counter more than once in a lot of cases, that shows how this is complex to get right.
The third problem is that I don't see us doing this until we have remove the old bookmarks API (bug 519514) and the old transactions (bug 979280) cause here we'd have to do the work twice, and we'll have twice the risk and bugs. And unfortunately I don't see that happening unless you can convince a manager that is worth for our future not having multiple APIs doing the same thing and thus giving back any kind of resource to Places development.

::: toolkit/components/places/Bookmarks.jsm:103
(Diff revision 3)
> +   * Sync status constants.
> +   * These should stay consistent with nsINavBookmarksService.idl
> +   */
> +  SYNC_STATUS_UNKNOWN: 0,
> +  SYNC_STATUS_NEW: 1,
> +  SYNC_STATUS_NORMAL: 2,

Looks like it may be worth to expose these from a common code point, where they can be used by js and cpp.

I'd suggest to design an idl that can be implemented by consumers, something like nsISyncParticipant (or specific for Places, nsIPlacesSyncParticipant if it's hard to make it generic) and move these there.

I'd like to reach a point where Sync communicates with Places through an API, rather than executing raw queries. Indeed I'd like Sync to make NO SQL QUERIES at all, and collect all the needed APIs in that interface.

I don't know if that can be made through a common API for each consumer, or it will require a module. But these consts should probably be along with that code.

::: toolkit/components/places/Bookmarks.jsm:818
(Diff revision 3)
>                 AND position BETWEEN :lowIndex AND :highIndex
>              `, { sign: sign, newParentId: newParent._id,
>                   lowIndex: Math.min(item.index, newIndex),
>                   highIndex: Math.max(item.index, newIndex) });
> +          // and mark the parent (but not children) as having a sync change.
> +          yield db.executeCached(

any way to integrate this into the query for setAncestorsLastModified? I'd prefer to avoid an additional query if possible...

::: toolkit/components/places/Bookmarks.jsm:1397
(Diff revision 3)
>        return new URL(v.spec);
>      return v;
> -  }
> +  },
> +  // TODO: We don't want to expose the change counter to consumers of this
> +  // module. This is a temporary hack for the bookmark tracker tests until we
> +  // can add a separate "PlacesSyncUtils" module for Sync to use.

And how would yo prevent consumers from abusing your module? This sounds like a non-goal, since I don't see how you'd do it.
I also think this comment is in the wrong place, nobody is going to look at validators, better move it where we accept it in input.

::: toolkit/components/places/Bookmarks.jsm:1562
(Diff revision 3)
> +
> +  // Only increment the parent's change counter.
> +  yield db.executeCached(
> +    `UPDATE moz_bookmarks
> +     SET syncChangeCounter = syncChangeCounter + 1
> +     WHERE guid = :guid

Aren't we incrementing the counter twice since we do both here and in the callers?
Couldn't this be merged in the query iself, maybe using a CASE WHEN construct?

::: toolkit/components/places/nsAnnotationService.cpp:2014
(Diff revision 3)
> +
> +  nsCOMPtr<mozIStorageStatement> statement =
> +    mDB->GetStatement("UPDATE moz_bookmarks "
> +      "SET syncChangeCounter = syncChangeCounter + 1 "
> +      "WHERE id = :item_id"
> +  );

Is there a reason to not just reuse the annotations observer in nsNavBookmarks.cpp?

::: toolkit/components/places/nsAnnotationService.cpp:2038
(Diff revision 3)
>  
>    if (strcmp(aTopic, TOPIC_PLACES_SHUTDOWN) == 0) {
>      // Remove all session annotations, if any.
>      if (mHasSessionAnnotations) {
> +      // Increment the sync change counter for tracked annotations.
> +      nsCOMPtr<mozIStorageAsyncStatement> syncChangeCounterStmt =

Doesn't look like sync is handling ANY session annotation, so why is this needed?

Moreover these will likely move to a memory hash in future.

::: toolkit/components/places/nsNavBookmarks.cpp:282
(Diff revision 3)
>    NS_ENSURE_SUCCESS(rv, rv);
>  
> +  // And we need to mark the folder itself (but not the children), as having
> +  // a sync change.
> +  rv = AddSyncChange(aFolderId, 1);
> +  NS_ENSURE_SUCCESS(rv, rv);

adjustIndices could be invoked thousands of time, indeed it's invoked basically everywhere we add or remove a bookmark.
This simple call you make here, means executing thousands of new queries. it's A LOT of I/O.

::: toolkit/components/places/nsNavBookmarks.cpp:1579
(Diff revision 3)
> +  if (aItemId == mTagsRoot) {
> +    return NS_OK;
> +  }
> +
> +  nsCOMPtr<mozIStorageStatement> statement = mDB->GetStatement(
> +   "UPDATE moz_bookmarks SET "

looks like in some cases this could be merged with another update that is already happening (lastModified or dateAdded or such)

::: toolkit/components/places/nsNavBookmarks.cpp:1601
(Diff revision 3)
> +nsNavBookmarks::TrackSyncTagChange(nsIURI* aURI, int32_t aDelta)
> +{
> +  NS_ENSURE_ARG(aURI);
> +
> +  nsCOMPtr<mozIStorageStatement> statement = mDB->GetStatement(
> +   "WITH RECURSIVE tags(tag_id) AS ("

all of this makes me think Sync is just tracking tags the wrong way, as if they were bookmarks rather than property of another bookmark or url. This is very dangerous for any future development.

::: toolkit/components/places/nsNavBookmarks.cpp:1618
(Diff revision 3)
> +          "item_id = id AND "
> +          "anno_attribute_id = "
> +           "(SELECT id FROM moz_anno_attributes "
> +           "WHERE name = 'places/excludeFromBackup')"
> +        ") "
> +      // Ignore bookmarks that are excluded from backups.

I don't expect any tag to have that annotation... I'm not sure I understand what you are trying to do.

::: toolkit/components/places/nsNavBookmarks.cpp:1670
(Diff revision 3)
> +      "THEN 0 "
> +      "ELSE :delta "
> +      "END"
> +     ") WHERE id = :item_id AND "
> +     "id NOT IN tags"
> +  );

This is a recursive query, very expensive, and you are executing it in a lot of points.

It may be a cceptable for everyday tagging, but when we are removing all bookmarks to restore a backup, this will completely kill us.

::: toolkit/components/places/nsNavBookmarks.cpp:2505
(Diff revision 3)
>    if (keyword.IsEmpty() && oldKeywords.Length() == 0) {
>      return NS_OK;
>    }
>  
>    if (keyword.IsEmpty()) {
> +    mozStorageTransaction removeTxn(mDB->MainConn(), false);

This API is deprecated, and it's not really tracking keywords created from the UI since those are using the new API in PlacesUtils.
https://reviewboard.mozilla.org/r/52587/#review50616

::: toolkit/components/places/nsPlacesTriggers.h:266
(Diff revision 1)
>  )
>  
> +// Triggers for the sync fields.
> +
> +// This trigger will insert a row into moz_bookmarks_deleted when a bookmark
> +// is deleted.

doesn't look like it's doing that, it's only inserting it if the bookmark was just inserted. And I'm not sure I understand what happens in the other cases.
https://reviewboard.mozilla.org/r/52589/#review50618

::: services/sync/modules/engines/bookmarks.js:449
(Diff revision 1)
> +          // we only reduce the change counter for them.
> +          if (change.deleted && counter > 0) {
> +            yield db.executeCached(`
> +            UPDATE moz_bookmarks
> +            SET syncChangeCounter = syncChangeCounter - :counter
> +            WHERE guid = :guid

Since the counter is directly managed in Places, any queries on it should be in Places. Either a module, a service or whatever.
But please don't raw query places.sqlite if you can make some kind of API.

::: services/sync/modules/engines/bookmarks.js:1408
(Diff revision 1)
> +      cb();
> +    }
> +  },
> +
> +  // TODO: Needs more test coverage to make sure we build the map correctly.
> +  promiseChangedIDs() {

ditto, should be in Places, not here.
Attachment #8751134 - Flags: feedback?(mak77)
Attachment #8752393 - Flags: feedback?(mak77)
please don't take this badly, but I honestly think this is the worst time in Places history for this kind of change.
We have duplicate APIs for history, bookmarks, keywords... Tags are in an horrible shape. Performance is still horrible. And all the work to resolve these situations (caused by previous decisions to invest in sync => async conversion) has been interrupted for management decisions.
As it is, these changes are thrice as big as they should be, and thrice as scary.
If I should plan this, I'd first try to complete removal of the old bookmarks API, the old keywords API, the old transactions (all projects abandoned at about 60-70% advancement). Then this could be more manageable cause we'd not have tens of APIs to change a single bookmark, most changes would be batched in a single API call. As it is, it's just frustrating for me.
https://reviewboard.mozilla.org/r/51879/#review50604

Heh, imagine how I felt writing this. ;-)

Sync already does a fair bit of (synchronous) querying to determine whether it should ignore a change. But it only does that for users who have configured Sync, whereas this patch will cause us to take the hit in all cases. Are there existing tests that measure query performance, or flags that I need to set to count those queries?

I'm not too concerned about increasing the counter more than once. The plan is to read the counter before a sync, and decrement it by that value afterward. It doesn't really matter what the value is.

I agree that this duplicates a lot of work...and I still need to fix up Bookmarks.jsm. But I'll respond to your last comment separately.

> Looks like it may be worth to expose these from a common code point, where they can be used by js and cpp.
> 
> I'd suggest to design an idl that can be implemented by consumers, something like nsISyncParticipant (or specific for Places, nsIPlacesSyncParticipant if it's hard to make it generic) and move these there.
> 
> I'd like to reach a point where Sync communicates with Places through an API, rather than executing raw queries. Indeed I'd like Sync to make NO SQL QUERIES at all, and collect all the needed APIs in that interface.
> 
> I don't know if that can be made through a common API for each consumer, or it will require a module. But these consts should probably be along with that code.

I agree, especially about removing all queries from Sync and having it use an API. We could take care of this in bug 1274108.

> Aren't we incrementing the counter twice since we do both here and in the callers?
> Couldn't this be merged in the query iself, maybe using a CASE WHEN construct?

Probably. I didn't do much with this file; all the Sync tests that use the async API currently fail. :-P

> Is there a reason to not just reuse the annotations observer in nsNavBookmarks.cpp?

That could work, sure! This keeps them part of the same transaction, but it doesn't make sense to duplicate this query between the two.

> Doesn't look like sync is handling ANY session annotation, so why is this needed?
> 
> Moreover these will likely move to a memory hash in future.

Great, that makes this simpler. :-) I wasn't sure if it was possible to set an annotation like "bookmarkProperties/description" to expire at the end of a session. If it's not possible, or we don't need to worry about that, this entire bit of code can be removed.

> adjustIndices could be invoked thousands of time, indeed it's invoked basically everywhere we add or remove a bookmark.
> This simple call you make here, means executing thousands of new queries. it's A LOT of I/O.

It probably makes sense to do this in the callers, then. We only need to do this for the parent, though, not any of the children. I don't know if that helps.

> all of this makes me think Sync is just tracking tags the wrong way, as if they were bookmarks rather than property of another bookmark or url. This is very dangerous for any future development.

Isn't that how Places stores tags? Sync only tracks the tagged bookmark, but Places uses a folder for each tag, and a child tag "instance" with an `fk` that points to the bookmark ID in `moz_places`. I only want Sync to track the bookmark with the `fk`, not the tag instance.

> I don't expect any tag to have that annotation... I'm not sure I understand what you are trying to do.

Hmm, I'm trying to filter out tagged items, not tags, that also have `places/excludeFromBackup` set. Is that possible?

> This is a recursive query, very expensive, and you are executing it in a lot of points.
> 
> It may be a cceptable for everyday tagging, but when we are removing all bookmarks to restore a backup, this will completely kill us.

Yeah. Restoring from backup and importing need to be handled specially because they completely change local state. It's no longer possible for Sync to guarantee whether the counter or sync state is accurate, so it needs to reconcile *everything* with what's on the server. We could avoid this query in that case, because we're going to reconcile, anyway.

> This API is deprecated, and it's not really tracking keywords created from the UI since those are using the new API in PlacesUtils.

Is there a chance add-ons might be using this API? The goal is to replicate the semantics ("bug-for-bug compatibility") of the current tracker as much as we can, so we don't want this patch to change what we sync. Sync currently reacts to the observer notification fired by this method, so I think we'd want this as long as the API exists.
(In reply to Marco Bonardo [::mak] from comment #50)
> please don't take this badly, but I honestly think this is the worst time in
> Places history for this kind of change.
> We have duplicate APIs for history, bookmarks, keywords... Tags are in an
> horrible shape. Performance is still horrible. And all the work to resolve
> these situations (caused by previous decisions to invest in sync => async
> conversion) has been interrupted for management decisions.

I agree with you. :-) We're underinvesting in Places, especially considering all the upcoming projects that plan to use it. Your frustration is totally understandable.

> If I should plan this, I'd first try to complete removal of the old
> bookmarks API, the old keywords API, the old transactions (all projects
> abandoned at about 60-70% advancement). Then this could be more manageable
> cause we'd not have tens of APIs to change a single bookmark, most changes
> would be batched in a single API call. As it is, it's just frustrating for
> me.

That would be ideal. I'm reading through the bugs you linked me to. Bug 519514 is almost 7 years old; bug 979280 is two years old. I could mark those bugs as blocking this one, but I don't think that would do much given the current state of things. It's a big chunk of work that needs a lot of attention and time.

If there's a less invasive way to fix bookmarks tracking, I'd strongly favor that, even if it's not as robust. But I think a realistic plan for improving bookmark syncing can't depend on these bugs. Otherwise, it'll never get done.
Thanks for the feedback Mak, that's very valuable. 

(In reply to Marco Bonardo [::mak] from comment #50)
> please don't take this badly, but I honestly think this is the worst time in
> Places history for this kind of change.

"The best time to plant a tree was 20 years ago. The second best time is now."  IOW, as you mention above, I don't believe there is any realistic hope for a well-resourced places team to be formed any time soon and simply waiting isn't going to make it happen.

Note that bookmark corruption is *always* the most complained about feature of Sync on SUMO, for both desktop and android. The corruption is seen regularly in the wild to the point where iOS is refusing to enable bi-directional Sync due to invalid bookmarks trees on the server - almost all of them due to missing parents etc, and almost certainly due to the limitations in the tracker - it is trivial to reproduce this.

So it's simply not an option that we wait another year or 2 before tackling this problem. I agree that sucks, but I don't think you could realistically advocate to management that we do nothing here until this mythical future arrives.

That said though, your specific concerns here are completely valid. While this work must have some impact on the amount of IO, we need to both optimize this as much as possible and measure the impacts in ways that you suggest. Kit is going to start exploring how to reliably measure this and hopefully we can come to some agreement on what is and isn't acceptable here. Regarding the duplicated code and/or the additional code complexity, it's true that we also need to optimize that, but some of that is just the nature of the beast - places is already complicated and has various different ways of achieving the same thing - we don't want to make that worse, but we also can't realistically improve that for the Sync work, nor can we block these Sync improvements on waiting for that to happen.

So thanks for the feedback and we will address the specific concerns you have and welcome any and all ideas to make the tracker reliable in the best possible way, but we don't feel there's any alternative other than continue down this path - so TIA for your future feedback too :)
Comment on attachment 8751133 [details]
MozReview Request: Bug 1258127 - Improve existing bookmarks tracker tests. r=markh

https://reviewboard.mozilla.org/r/51877/#review51138

This looks great, thanks!

::: services/sync/tests/unit/test_bookmark_tracker.js:116
(Diff revision 3)
> -    Svc.Obs.notify("weave:engine:stop-tracking");
>    }
> +});
> +
> +add_task(function* test_onItemAdded() {
> +  _("Verify we've got an empty tracker to work with.");

I know my original version of the patch did this, but let's just kill this |_("Verify we've got an empty tracker to work with.");| entry from each test (or at least replace it with a brief summary of what each test is actually doing)

::: services/sync/tests/unit/test_bookmark_tracker.js:237
(Diff revision 3)
> +    _("Clean up.");
> +    yield cleanup();
> +  }
> +});
> +
> +add_task(function* test_onItemChanged() {

It looks like we have a few similar tests below that are checking changes to the bookmark itself via the same sync APIs, so rename this test function by appending "_tracked_annos" or similar just to make it clear it's not about the item but about the annotations.

::: services/sync/tests/unit/test_bookmark_tracker.js:691
(Diff revision 3)
> +    _("Clean up.");
> +    yield cleanup();
> +  }
> +});
> +
> +add_task(function* test_onPageAnnoChanged() {

We should add a comment here indicating that because this is not an anno we track, we should not track the items. It would also make sense to me to move this up to the other annotation tests above.
Attachment #8751133 - Flags: review?(markh) → review+
Created attachment 8756109 [details]
Bug 1258127 - Add basic test for intensive queries.

Review commit: https://reviewboard.mozilla.org/r/54952/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/54952/
Attachment #8751133 - Attachment description: MozReview Request: Bug 1258127 - Improve existing bookmarks tracker tests. r?markh → MozReview Request: Bug 1258127 - Improve existing bookmarks tracker tests. r=markh
Attachment #8752394 - Flags: feedback?(markh)
Created attachment 8756110 [details]
Bug 1258127 - Explain some sync change counter update queries.

Review commit: https://reviewboard.mozilla.org/r/54954/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/54954/
Comment on attachment 8751132 [details]
MozReview Request: Bug 1258127 - Improve bookmarks engine test logging and remove a deprecated call. r=kitcambridge

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51875/diff/3-4/
Comment on attachment 8751133 [details]
MozReview Request: Bug 1258127 - Improve existing bookmarks tracker tests. r=markh

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51877/diff/3-4/
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52587/diff/1-2/
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51879/diff/3-4/
Comment on attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52589/diff/1-2/
The additional queries and transactions caused the time to insert 100k bookmarks to jump from 12 seconds to 23 seconds (!), on average, so that's no good.

Based on mak's suggestions, I folded the change counter updates into the existing statements, and the time went back down to 12-13 seconds. I also added some basic "EXPLAIN QUERY PLAN" logging for queries that update the counter.

The "places/excludeFromBackup" check adds 2 subqueries: one to look up the anno ID ("SEARCH TABLE moz_anno_attributes USING COVERING INDEX sqlite_autoindex_moz_anno_attributes_1 (name=?)"), and a second one to check if any bookmarks match it ("SEARCH TABLE moz_items_annos USING COVERING INDEX moz_items_annos_itemattributeindex (item_id=? AND anno_attribute_id=?)").

The keyword update query is more involved:

> 0 0 0 EXECUTE LIST SUBQUERY 0
> 2 0 0 SCAN TABLE moz_bookmarks USING COVERING INDEX moz_bookmarks_itemlastmodifiedindex
> 2 0 0 EXECUTE LIST SUBQUERY 3
> 3 0 0 SEARCH TABLE moz_items_annos USING COVERING INDEX moz_items_annos_itemattributeindex (item_id=? AND anno_attribute_id=?)
> 3 0 0 EXECUTE LIST SUBQUERY 4
> 3 0 0 EXECUTE SCALAR SUBQUERY 4
> 4 0 0 SEARCH TABLE moz_anno_attributes USING COVERING INDEX sqlite_autoindex_moz_anno_attributes_1 (name=?)
> 1 0 0 COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE (EXCEPT)
> 0 0 0 SCAN SUBQUERY 1
> 0 0 0 EXECUTE LIST SUBQUERY 5

...But I'm not sure if this will be hit frequently enough to where building the b-tree will be an issue.
Depends on: 1277983

Updated

a year ago
Whiteboard: [sync-data-integrity]
Whiteboard: [sync-data-integrity] → [data-integrity]
Depends on: 1285408
Comment on attachment 8756109 [details]
Bug 1258127 - Add basic test for intensive queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54952/diff/1-2/
Attachment #8756109 - Attachment description: MozReview Request: Bug 1258127 - Add basic test for intensive queries. → Bug 1258127 - Add basic test for intensive queries.
Attachment #8752393 - Attachment description: MozReview Request: Bug 1258127 - Places schema updates to support tracking Sync changes. f?mak → Bug 1258127 - Places schema updates to support tracking Sync changes.
Attachment #8751134 - Attachment description: MozReview Request: Bug 1258127 - Bookmarks and annotations service updates to support tracking Sync changes. f?mak → Bug 1258127 - Bookmarks and annotations service updates to support tracking Sync changes.
Attachment #8752394 - Attachment description: MozReview Request: Bug 1258127 - Sync engine updates to support tracking changes in Places. f?markh → Bug 1258127 - Sync engine updates to support tracking changes in Places.
Attachment #8756110 - Attachment description: MozReview Request: Bug 1258127 - Explain some sync change counter update queries. → Bug 1258127 - Explain some sync change counter update queries.
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52587/diff/2-3/
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51879/diff/4-5/
Comment on attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52589/diff/2-3/
Comment on attachment 8756110 [details]
Bug 1258127 - Explain some sync change counter update queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54954/diff/1-2/
Attachment #8751132 - Attachment is obsolete: true
Attachment #8751133 - Attachment is obsolete: true
Rebased, and getting the tests closer to passing. Not requesting f? just yet, but drive-by reviews welcome. test_{nested}_batch_tracking fails because of the scoring changes, as does test_onItemExcluded. Next step is to move the "excludeFromBackup" anno handling into getChangedIDs, fix up scoring, and figure out reconciling bookmarks with SYNC_STATUS_UNKNOWN.
Comment on attachment 8756109 [details]
Bug 1258127 - Add basic test for intensive queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54952/diff/2-3/
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52587/diff/3-4/
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51879/diff/5-6/
Comment on attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52589/diff/3-4/
Comment on attachment 8756110 [details]
Bug 1258127 - Explain some sync change counter update queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54954/diff/2-3/
Comment on attachment 8756109 [details]
Bug 1258127 - Add basic test for intensive queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54952/diff/3-4/
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52587/diff/4-5/
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51879/diff/6-7/
Comment on attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52589/diff/4-5/
Comment on attachment 8756110 [details]
Bug 1258127 - Explain some sync change counter update queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54954/diff/3-4/
The procedure for filtering out bogus "excludeFromBackup" annos was a bit hairy to write (there's a test that adds a very confused tree, which this handles), but should do the job. All the tracker tests pass now.

Also, it occurs to me that using a trigger to insert into `moz_bookmarks_deleted` has the same problem as the insert and update calls: if Sync receives a tombstone (a record with a `.deleted` property) from another client, and deletes the record...it'll create a row in the table for its own deletion. It's not clear to me yet if that's going to be a problem. It'll definitely cause extra syncs, but, since deletions for nonexistent items won't record a change, it's not as bad as inserts or updates.

On to reconciliation.
A naive approach for recovering from database corruption:

1. Places detects corruption and restores a previous version of the DB, or the user manually restores a backup. A transaction sets "syncStatus = UNKNOWN" and "syncChangeCounter = 0" for all bookmarks.

2. Before a sync, we query Places for all bookmarks with status UNKNOWN, and fetch them from the server. IIRC, we don't have a batch "get me the records with these IDs" API, so I think Sync would need to fetch everything from t = 0 instead of t = last sync. As an optimization, maybe we could only fetch bookmarks in the interval [oldest UNKNOWN, newest UNKNOWN], but there's the issue of clock skew pointed out earlier.

3. Use the de-duping logic that we have now. I don't think we need to do anything special for this; `recordHandler` should take care of processing the records.

4. After applying each record, reset its SYNC_STATUS back to NORMAL and bump its change counter. Other clients will receive the reconciled record the next time they sync. (And typically do nothing, since their dupe logic will kick in).

At one time, we talked about having UNKNOWN be per-database, instead of per-bookmark. I think it has to be per-bookmark because we apply incoming records directly, instead of staging them as on iOS. I'd expect reconciliation to take a while, and it's very likely the user will close the browser or interrupt the sync. In that case, we want it to pick up reconciling (mostly) where it left off.

Richard, there's likely a lot that I'm missing, and this is by no means robust. I'm trying to keep this simple for a first cut, since the tracker rewrite is shaping up to be a large chunk of work as it is. But, if you have a chance, I'd love your insights.
Flags: needinfo?(rnewman)
Oh, unlike recovery or restore, I think import can set "syncStatus = NEW" for imported bookmarks. The old bookmarks are still valid, so we don't need to reconcile everything.

I'll mull all this over and start sketching out an implementation tomorrow.
(In reply to Kit Cambridge [:kitcambridge] (Unavailable from 2016-07-18 to 2016-08-02) from comment #80)
> 2. Before a sync, we query Places for all bookmarks with status UNKNOWN, and
> fetch them from the server. IIRC, we don't have a batch "get me the records
> with these IDs" API

I think we do: setting .ids to an array and .full to true. I expect we will still need *some* batching as eventually the URL would get extremely long.

> 3. Use the de-duping logic that we have now. I don't think we need to do
> anything special for this; `recordHandler` should take care of processing
> the records.

I believe that's correct. Further, I think we should consider having the normal case be far less aggressive with de-duping (although probably not as part of this bug)

> 4. After applying each record, reset its SYNC_STATUS back to NORMAL and bump
> its change counter. Other clients will receive the reconciled record the
> next time they sync. (And typically do nothing, since their dupe logic will
> kick in).

I'm not sure why their dupe logic matters here if this client has done the de-duping correctly? ie, I'd expect the other clients to see either an updated record (as we de-duped and possibly updated an attribute or 2) or to see a brand new record. If the other clients de-dupe, ISTM that the other client will by definition now have a different state than the client that initially handled the UNKNOWN status records (And thus initially de-duped)?

> At one time, we talked about having UNKNOWN be per-database, instead of
> per-bookmark. I think it has to be per-bookmark because we apply incoming
> records directly, instead of staging them as on iOS. I'd expect
> reconciliation to take a while, and it's very likely the user will close the
> browser or interrupt the sync. In that case, we want it to pick up
> reconciling (mostly) where it left off.

That sounds correct to me.

> Richard, there's likely a lot that I'm missing, and this is by no means
> robust. I'm trying to keep this simple for a first cut, since the tracker
> rewrite is shaping up to be a large chunk of work as it is. But, if you have
> a chance, I'd love your insights.

Yes, that would be great. However, note that some, if not all, of that could possibly be in a different bug. IMO it's still an excellent outcome if all this bug delivers is a tracker that doesn't miss changes but otherwise has all the existing flaws in terms of handling bookmark restore and corruption.
https://reviewboard.mozilla.org/r/52587/#review60492

This LGTM

::: toolkit/components/places/nsPlacesTables.h:116
(Diff revision 5)
> +       this to SYNC_STATUS_NEW = 1 for new bookmarks, and SYNC_STATUS_NORMAL = 2
> +       after the first sync. Restoring bookmarks should set the status to
> +       SYNC_STATUS_UNKNOWN, allowing Sync to reconcile the restored bookmarks
> +       with those already on the server.
> +
> +       The entire SYNC_STATUS_UNKNOWN vs SYNC_STATUS_NEW

IIUC, our thinking is converging on these status flags being the right thing, so this comment should probably be removed.

::: toolkit/components/places/nsPlacesTables.h:124
(Diff revision 5)
> +       if we can make this reliable, Sync might choose to not be quite as
> +       agressive in de-duping, so we can avoid existing bugs relating to
> +       multiple bookmarks with the same name)
> +    */ \
> +    ", syncStatus INTEGER NOT NULL DEFAULT 0" \
> +    /* Callers should set the sync change counter when inserting bookmarks. */ \

I think you should change this to say "set the sync change counter to 1" to make it clearer.

::: toolkit/components/places/nsPlacesTriggers.h:272
(Diff revision 5)
> +// This trigger will insert a row into moz_bookmarks_deleted when a bookmark
> +// is deleted.
> +#define CREATE_BOOKMARKS_DELETED_AFTERDELETE_TRIGGER NS_LITERAL_CSTRING( \
> +  "CREATE TEMP TRIGGER moz_bookmarks_insert_deleted " \
> +  "AFTER DELETE ON moz_bookmarks FOR EACH ROW " \
> +  /* 0 == SYNC_STATUS_UNKNOWN, 2 == SYNC_STATUS_NORMAL */ \

it probably can't hurt to add a comment indicating why _NEW doesn't need the new row (obvious to us, but not necessarily to others)
https://reviewboard.mozilla.org/r/51879/#review60500

This is looking good to me. It's a shame there's so much complexity around tags - it might be worth exploring ways to avoid that, but nothing springs to mind :( Hopefully mak can take a look t this soonish.

::: toolkit/components/places/Bookmarks.jsm:1187
(Diff revision 7)
>        yield db.executeCached(
>          `DELETE FROM moz_bookmarks WHERE guid = :guid`, { guid: item.guid });
>  
> -      // Fix indices in the parent.
> +      // Fix indices in the parent and mark it as having a sync change.
>        yield db.executeCached(
> -        `UPDATE moz_bookmarks SET position = position - 1 WHERE
> +        `UPDATE moz_bookmarks

this change in formatting looks unnecessary (and there's no need for you to be on blame for this line ;)

::: toolkit/components/places/PlacesSyncUtils.jsm:98
(Diff revision 7)
> +
> +    let db = yield PlacesUtils.promiseDBConnection();
> +    yield db.executeCached(`
> +      SELECT id, guid, parent, lastModified, syncChangeCounter
> +      FROM moz_bookmarks
> +      WHERE syncChangeCounter > 0

do we need to exclude UNKNOWN status here?

::: toolkit/components/places/PlacesSyncUtils.jsm:128
(Diff revision 7)
> +        // We still need to check all ancestors in case we have an unexcluded
> +        // child with an excluded ancestor (U > U > E > U > E). To avoid
> +        // uploading a partial tree, we'll need to include the previously
> +        // excluded ancestor. This shouldn't happen with a valid tree: if an
> +        // ancestor is excluded, all its descendants should be excluded, too.
> +        // Unfortunately, this isn't true for the mobile root, which is marked

I wonder if we can repair the mobile root as part of this patch and ignore addons doing damage?

I also thought there were cases in the "left pane items" where only the parent had the annotation but children weren't guaranteed to - in which case the children *should* be excluded - but this case would seem to directly contradict the mobile root current reality? 

Maybe special casing the mobile root ID would be another alternative?

::: toolkit/components/places/PlacesSyncUtils.jsm:131
(Diff revision 7)
> +        // excluded ancestor. This shouldn't happen with a valid tree: if an
> +        // ancestor is excluded, all its descendants should be excluded, too.
> +        // Unfortunately, this isn't true for the mobile root, which is marked
> +        // as excluded even though its children are not (bug 647605,
> +        // bug 1274496). It's also possible for add-ons to set this annotation.
> +        itemExcluded = itemExcluded && (item.id in excludedIds ? excludedIds[item.id] : (

this seems quite complex - it's difficult to read and  excludedIds[item.id] looks like it will be set in 2 cases.

I haven't thought about this too much and haven't got to the tests yet, but hopefully this has excellent test coverage ;)

::: toolkit/components/places/PlacesUtils.jsm:1688
(Diff revision 7)
>            let val = aRow.getResultByName(prop);
>            if (val !== null)
>              item[prop] = val;
>          }
>        };
> -      copyProps("guid", "title", "index", "dateAdded", "lastModified");
> +      copyProps("guid", "title", "index", "dateAdded", "lastModified", "syncChangeCounter");

I wonder if we should require |includeSyncChangeCounter| or similar in options, just to help keeping that value hidden from casual callers?

::: toolkit/components/places/nsAnnotationService.cpp:1984
(Diff revision 7)
> +            "SELECT item_id from moz_items_annos WHERE "
> +              "expiration = :expire_session AND "
> +              "anno_attribute_id IN ("
> +                "SELECT id FROM moz_anno_attributes WHERE name IN ("
> +                  // These should match the annotations in `IsSyncedAnnotation`.
> +                  "'bookmarkProperties/description', "

I don't really understand session annotation expiry, but I'd be surprised if any of thise annos have that expiry, and if they don't it seems unlikely we need this?

::: toolkit/components/places/nsNavBookmarks.cpp:2335
(Diff revision 7)
>    NS_ENSURE_SUCCESS(rv, rv);
>    NS_ENSURE_TRUE(aNewIndex < folderCount, NS_ERROR_INVALID_ARG);
>    // Check the parent's guid is the expected one.
>    MOZ_ASSERT(bookmark.parentGuid == folderGuid);
>  
> +  mozStorageTransaction transaction(mDB->MainConn(), false);

it looks like some callers of this already have a transaction in place, and that sqlite will get upset with nested transactions (although it looks like maybe mozStorageTransaction will ignore the error in that case?)
https://reviewboard.mozilla.org/r/52589/#review60520

This is looking good from a quick glance, but (as I'm quite sure you know) there's a number of TODOs that will need to be addressed before landing.

::: services/sync/modules/engines/bookmarks.js:442
(Diff revision 5)
> +  getChangedIDs() {
> +    return Async.promiseSpinningly(this._tracker.promiseChangedIDs());
> +  },
> +
> +  getAllIDs() {
> +    // TODO: Do we need all modified timestamps to be "0" for the first sync?

I'm surprised this is calling promiseChangedIDs
(In reply to Mark Hammond [:markh] from comment #82)
> I think we do: setting .ids to an array and .full to true. I expect we will
> still need *some* batching as eventually the URL would get extremely long.

Oh, excellent! Sure enough, I see those explained in https://docs.services.mozilla.com/storage/apis-1.5.html.

> I'm not sure why their dupe logic matters here if this client has done the
> de-duping correctly? ie, I'd expect the other clients to see either an
> updated record (as we de-duped and possibly updated an attribute or 2) or to
> see a brand new record. If the other clients de-dupe, ISTM that the other
> client will by definition now have a different state than the client that
> initially handled the UNKNOWN status records (And thus initially de-duped)?

I think the case I was worried about here is the one Richard pointed out in https://github.com/mozilla/fxa-auth-server/issues/1316#issuecomment-231064542. In other words, reconciliation could produce a merged bookmark that no other client would see until the next "real" change to that bookmark on this client. We don't know the delta between what we restored and what's on the server, so I think we'll want to bump the change counter and upload all bookmarks that we reconcile.

However, it's more likely I'm not thinking this through correctly. :-)

> Yes, that would be great. However, note that some, if not all, of that could
> possibly be in a different bug. IMO it's still an excellent outcome if all
> this bug delivers is a tracker that doesn't miss changes but otherwise has
> all the existing flaws in terms of handling bookmark restore and corruption.

Sounds good! I think we currently wipe the server in the event of one client restoring from a backup, so there's room for improvement. I'll do some cleanup first, and we can talk more about this part at our 1:1.
https://reviewboard.mozilla.org/r/51879/#review60500

Indeed, I think this needs some cleanup before it's ready for f?/r? I wanted to get the tracker tests working first.

> this change in formatting looks unnecessary (and there's no need for you to be on blame for this line ;)

I think I added a change counter update here, then removed it because it was wrong. I'll go through the unified diff and remove other spurious changes like this.

> do we need to exclude UNKNOWN status here?

With the approach in comment 82, yes; I'd like a separate accessor for unknown bookmarks.

> I wonder if we can repair the mobile root as part of this patch and ignore addons doing damage?
> 
> I also thought there were cases in the "left pane items" where only the parent had the annotation but children weren't guaranteed to - in which case the children *should* be excluded - but this case would seem to directly contradict the mobile root current reality? 
> 
> Maybe special casing the mobile root ID would be another alternative?

For the organizer, I think it *is* guaranteed that children are excluded if the parents are. Mobile is the only exception that we know about. I definitely think it's worthwhile to make this part of the repair process. The logic is hairy because we need to make sure we don't upload an incomplete tree, if some of the ancestors have bogus annos. Thankfully, we don't need to duplicate it anywhere else, so that's a win. Thanks for pushing to move this into Sync!

Unrelatedly: I know "unexcluded" isn't a word, but I feel like it works better in this context than "included". WDYT?

> this seems quite complex - it's difficult to read and  excludedIds[item.id] looks like it will be set in 2 cases.
> 
> I haven't thought about this too much and haven't got to the tests yet, but hopefully this has excellent test coverage ;)

This could definitely be unrolled, and the comment paragraph split up.

> I wonder if we should require |includeSyncChangeCounter| or similar in options, just to help keeping that value hidden from casual callers?

Good point, though, now that we have `getChanges`, I don't think we need to expose this at all.

> I don't really understand session annotation expiry, but I'd be surprised if any of thise annos have that expiry, and if they don't it seems unlikely we need this?

Oops, yes; these need to go. I took your advice elsewhere and removed Sync-specific tracking of annotations.
https://reviewboard.mozilla.org/r/51879/#review60500

> it looks like some callers of this already have a transaction in place, and that sqlite will get upset with nested transactions (although it looks like maybe mozStorageTransaction will ignore the error in that case?)

Quite so. mozStorageTransaction ignores the failure if a transaction is already in progress, so this is fine, but Sqlite.jsm doesn't (bug 1079180).
Comment on attachment 8756109 [details]
Bug 1258127 - Add basic test for intensive queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54952/diff/4-5/
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52587/diff/5-6/
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51879/diff/7-8/
Comment on attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52589/diff/5-6/
Comment on attachment 8756110 [details]
Bug 1258127 - Explain some sync change counter update queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54954/diff/4-5/
The tests pass! \o/ We need new tests; I don't think our existing coverage is nearly up to snuff. But it's a nice milestone. As with the other bugs, I'll clean this up tonight and tomorrow (addressing Mark's feedback and removing spurious changes), then request feedback.
Comment on attachment 8756109 [details]
Bug 1258127 - Add basic test for intensive queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54952/diff/5-6/
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52587/diff/6-7/
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51879/diff/8-9/
Comment on attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52589/diff/6-7/
Comment on attachment 8756110 [details]
Bug 1258127 - Explain some sync change counter update queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54954/diff/5-6/
Comment on attachment 8756109 [details]
Bug 1258127 - Add basic test for intensive queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54952/diff/6-7/
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52587/diff/7-8/
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51879/diff/9-10/
Comment on attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52589/diff/7-8/
Comment on attachment 8756110 [details]
Bug 1258127 - Explain some sync change counter update queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54954/diff/6-7/
(Reporter)

Comment 105

a year ago
(In reply to Kit Cambridge [:kitcambridge] (Unavailable from 2016-07-18 to 2016-08-02) from comment #80)
> A naive approach for recovering from database corruption:

This sounds right to me. Mark covered the rest :)

(In reply to Mark Hammond [:markh] from comment #82)

> I believe that's correct. Further, I think we should consider having the
> normal case be far less aggressive with de-duping (although probably not as
> part of this bug)

Concur.

FTR, iOS dedupes in precisely one situation: leaves of a completely matching structure will be deduped if there's no GUID match. That is, if you have on the server:

   menu________  "Bookmarks Menu"
     foobarbaznoo "Foo"
       child1     "ABC"
       child2     "DEF"

and on the client:

   menu________  "Bookmarks Menu"
     foobarbaznoo "Foo"
       localguid1  "DEF"
       localguid2  "GHI"
  
You'll end up with a folder named "Foo" in the menu with children "ABC", "DEF", "GHI". It deliberately doesn't do partial structure merging like guidMap does.

I think deduping does still need to exist on desktop, but not be as aggressive. The difficulty is that non-aggressive duping requires top-down tree walking.

 
> IMO it's still an excellent outcome if all
> this bug delivers is a tracker that doesn't miss changes but otherwise has
> all the existing flaws in terms of handling bookmark restore and corruption.

Hell yes.
Flags: needinfo?(rnewman)

Updated

a year ago
Blocks: 1235269
Blocks: 1228827
Blocks: 1122834

Updated

a year ago
Whiteboard: [data-integrity] → [data-integrity][tracker]

Updated

a year ago
Whiteboard: [data-integrity][tracker] → [tracker]

Updated

a year ago
Whiteboard: [tracker] → [sync-tracker]

Updated

a year ago
Duplicate of this bug: 1235269
While this problem exists for all our trackers, this bug has morphed into just fixing bookmarks.
Summary: Sync tracker persistence can cause inconsistency → Sync bookmark tracker persistence can cause inconsistency
Comment on attachment 8756109 [details]
Bug 1258127 - Add basic test for intensive queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54952/diff/7-8/
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52587/diff/8-9/
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51879/diff/10-11/
Comment on attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/52589/diff/8-9/
Comment on attachment 8756110 [details]
Bug 1258127 - Explain some sync change counter update queries.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/54954/diff/7-8/
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Attachment #8756109 - Attachment is obsolete: true
Attachment #8756110 - Attachment is obsolete: true
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Blocks: 1295520
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 128

a year ago
mozreview-review
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

https://reviewboard.mozilla.org/r/51879/#review69710

OK, I think this is ready for a first look. The Sync unit tests pass, but that doesn't mean much; I discovered some issues with `fetch{All, New}Items` that would've shown up quickly with a real sync. I'll keep adding tests for the new PlacesSyncUtils methods, which is much easier to unit test than mocking out the Sync machinery.

::: toolkit/components/places/Bookmarks.jsm:382
(Diff revision 16)
>          // If the item was moved, notify onItemMoved.
> -        if (item.parentGuid != updatedItem.parentGuid ||
> -            item.index != updatedItem.index) {
> +        let newParent = item.parentGuid != updatedItem.parentGuid;
> +        if (newParent) {
> +          // Remove the Sync orphan annotation, so that Sync doesn't try to
> +          // reparent the item once it sees the original parent.
> +          PlacesUtils.annotations.removeItemAnnotation(updatedItem._id,

This only removes the orphan anno if the bookmark is reparented. Alternatively, we could remove the anno for any modification where source != SOURCE_SYNC. Arguably, if the user modifies an orphan, having Sync reparent it later could be confusing.

::: toolkit/components/places/Bookmarks.jsm:909
(Diff revision 16)
> -          yield setAncestorsLastModified(db, item.parentGuid, info.lastModified);
> +          yield setAncestorsLastModified(db, item.parentGuid, info.lastModified,
> +                                         ancestorsSyncChangeDelta);
> +        }
> +        yield setAncestorsLastModified(db, newParent.guid, info.lastModified,
> +                                       ancestorsSyncChangeDelta);
> +      } else if (itemNeedsSyncChange) {

Needs a comment: this is an `else if` and not an `if` because we don't want to give the item a sync change if we're moving it within the same parent.

::: toolkit/components/places/PlacesSyncUtils.jsm:189
(Diff revision 16)
> +                 b.syncChangeCounter, b.syncStatus
> +          FROM moz_bookmarks b
> +          JOIN moz_bookmarks p ON p.id = b.parent
> +          WHERE b.syncChangeCounter > 0
> +          UNION ALL
> +          SELECT d.id, d.guid, IFNULL(p.guid, dp.guid), :now, -1, :syncNormal

`IFNULL` in case the parent was deleted and has a tombstone.

::: toolkit/components/places/PlacesSyncUtils.jsm:191
(Diff revision 16)
> +          JOIN moz_bookmarks p ON p.id = b.parent
> +          WHERE b.syncChangeCounter > 0
> +          UNION ALL
> +          SELECT d.id, d.guid, IFNULL(p.guid, dp.guid), :now, -1, :syncNormal
> +          FROM moz_bookmarks_deleted d
> +          LEFT JOIN moz_bookmarks p ON p.id = d.parent

The trigger might be a better place for this. We could add a `parentGuid` column to `moz_bookmarks_deleted` and store the former parent GUID instead of just the ID.

::: toolkit/components/places/PlacesSyncUtils.jsm:200
(Diff revision 16)
> +          row => {
> +            let lastModified = PlacesUtils.toDate(row.getResultByName(
> +              "lastModified"));
> +            let guid = row.getResultByName("guid");
> +            items[guid] = {
> +              key: guid,

We need the GUID instead of the ID here because autoincrementing IDs can be reused after deletion, so they're unsuitable as a map key.

Also, this is called `key` instead of `guid` because it's mutable: Sync uses this to translate Places GUIDs to "special GUIDs." Bug 1295521 should clean this up.

::: toolkit/components/places/PlacesSyncUtils.jsm:219
(Diff revision 16)
> +    );
> +  },
> +
> +  /**
> +   * Fetches change records for all items, excluding tags and items excluded
> +   * from backups.

This also doesn't return tombstones, unlike `fetchNewChanges`, but probably should.

::: toolkit/components/places/PlacesSyncUtils.jsm:1178
(Diff revision 16)
> +      if (!itemExcluded) {
> +        // Once we see an unexcluded item, all its parents should be
> +        // unexcluded, even if they have the annotation. We need to set
> +        // this for all ancestors in case we have an unexcluded descendant
> +        // with an excluded ancestor (U > U > E > U). We also unconditionally
> +        // include deleted items.

We don't actually include deleted items unconditionally yet. One approach is to check if it's a tombstone (`item.counter < 0`), and always include it if yes. That does mean we'll upload tombstone records for excluded items.

Another approach is to set `syncStatus = "NEW"` for items with the exclude anno, so that we don't write tombstones for them in the first place.

::: toolkit/components/places/PlacesSyncUtils.jsm:1193
(Diff revision 16)
> +    }
> +  }
> +  return excludedGuids;
> +}
> +
> +function isItemIdExcluded(id) {

I broke this out into its own function, thinking to prevent `remove` from removing excluded items. On further reflection, I don't think that's necessary, so this can be folded back in to `determineExcludedGuids`.

::: toolkit/components/places/nsNavBookmarks.cpp:1603
(Diff revision 16)
> +
> +  nsCOMPtr<mozIStorageStatement> statement = mDB->GetStatement(
> +   "UPDATE moz_bookmarks SET "
> +    "syncChangeCounter = syncChangeCounter + :delta "
> +   "WHERE fk = (SELECT id FROM moz_places WHERE url = :page_url) AND "
> +   "id != :tag_id"

This could probably use the same query as `addKeywordSyncChange`, and then we wouldn't need to pass aTagId.
Attachment #8751134 - Flags: feedback?(markh)
Attachment #8752394 - Flags: feedback?(markh)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 134

a year ago
mozreview-review
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

https://reviewboard.mozilla.org/r/51879/#review70068

::: toolkit/components/places/nsNavBookmarks.cpp:1669
(Diff revision 18)
> +
> +  return statement->Execute();
> +}
> +
> +nsresult
> +nsNavBookmarks::InsertTombstone(const BookmarkData& aBookmark)

So, I removed the tombstone trigger as well, because it had the same issue as the update trigger. We don't want Sync to track its own deletions, and the trigger doesn't have access to the state that we have in the remove functions.

::: toolkit/components/places/nsNavBookmarks.cpp:1694
(Diff revision 18)
> +
> +  return stmt->Execute();
> +}
> +
> +nsresult
> +nsNavBookmarks::InsertTombstones(const nsTArray<TombstoneData>& aTombstones)

This is tedious (the same operation in Bookmarks.jsm's `removeFoldersContents` is 10 lines, total), but it lets us write tombstones for all records in a single statement.
Attachment #8751134 - Flags: feedback?(markh)
Attachment #8752394 - Flags: feedback?(markh)

Comment 135

a year ago
mozreview-review
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

https://reviewboard.mozilla.org/r/51879/#review70038

This is looking great from my POV. It's large (but we always knew it would be) and not particularly complex in terms of understanding what is going on - apart from some of the tags work, but I don't see how to improve that. The logic duplicated between c++ and js is also a shame, but that should eventually die as the old deprecated APIs are deleted.

I think :mak should have a look when you think it is ready.

::: toolkit/components/places/Bookmarks.jsm:97
(Diff revision 17)
>    TYPE_FOLDER: 2,
>    TYPE_SEPARATOR: 3,
>  
>    /**
> +   * Sync status constants.
> +   * These should stay consistent with nsINavBookmarksService.idl

This comment seems wrong as they are already defined in terms of that interface.

::: toolkit/components/places/Bookmarks.jsm:1596
(Diff revision 17)
> +    let syncedItems = itemsRemoved.filter(needsTombstone);
> +    if (syncedItems.length) {
> +      yield db.executeCached(`
> +        INSERT INTO moz_bookmarks_deleted (guid, parentGuid)
> +        ${syncedItems.map(item => `VALUES (
> +          ${JSON.stringify(item.guid)},

JSON.stringify here seems a little odd as we kow we have strings, but I've no idea if that's sane or not - I'm sure mak will have an opinion.

::: toolkit/components/places/Bookmarks.jsm:1661
(Diff revision 17)
> +
> +function determineSyncChangeDelta(item) {
> +  return item.source == Bookmarks.SOURCES.SYNC ? 0 : 1;
> +}
> +
> +function determineSyncStatus(item) {

IIUC, this function is used only for the status for "new" items? If so, it might make sense to change the name to reflect that (eg, if it was ever used on an update for an item that already has, say, .NORMAL, it would end up being replaced with .NEW)

::: toolkit/components/places/PlacesSyncUtils.jsm:175
(Diff revision 17)
> +   *
> +   * @param onItem [optional]
> +   *        A transformation function invoked for each change record. See
> +   *        `itemsToChangeRecords` for details.
> +   * @return {Promise} resolved once all items have been fetched.
> +   * @resolves to an object containing records for changed bookmarks.

I think it's worthwhile expanding this slightly (ie, something like "an object with sync record objects keyed by the items GUID" or similar). Alternatively, and although rnewman seems to hate them, a Set might be another option, as it's a little more self-descriptive ("a set of Sync records, keyed by GUID")

(I'm not too bothered by this, but when reading the description I was wondering if you actually meant an array)

::: toolkit/components/places/PlacesSyncUtils.jsm:1172
(Diff revision 17)
> +  let excludedGuids = {};
> +  for (let guid in items) {
> +    let itemExcluded = true;
> +    for (let item = items[guid]; item; item = items[item.parentGuid]) {
> +      if (!itemExcluded) {
> +        // Once we see an unexcluded item, all its parents should be

This seems a little dangerous, and I assume is mainly for the incorrect attribution on the mobile root?

If that's the case, I wonder if we should include a short-term fix for the mobile root - have a "schema upgrade" remove the anno and just remove that anno from Sync's creation of the bookmark?

::: toolkit/components/places/PlacesSyncUtils.jsm:1185
(Diff revision 17)
> +      if (item.key in excludedGuids) {
> +        // We already walked this item and its ancestors, so we can move on
> +        // to the next item.
> +        break;
> +      }
> +      // TODO(kitcambridge): `item.id` no longer exists.

I don't understand this todo

Comment 136

a year ago
mozreview-review
Comment on attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

https://reviewboard.mozilla.org/r/52589/#review70086

This looks great. test_bookmark_tracker.js still needs some cleanup, but I don't see anything that worries me here.

::: services/sync/modules/engines.js:860
(Diff revision 16)
>      Svc.Prefs.set(this.name + ".lastSyncLocal", value.toString());
>    },
>  
>    /*
> -   * Returns a mapping of IDs -> changed timestamp. Engine implementations
> -   * can override this method to bypass the tracker for certain or all
> +   * Returns an initial change set for this sync. The change set is a mapping of
> +   * IDs -> opaque change record. By default, the record is the last change in

I don't understand what "By default, the record is the last change in seconds" is tryng to say

::: services/sync/tests/unit/test_bookmark_tracker.js:179
(Diff revision 16)
>        folder, Utils.makeURI("http://getfirefox.com"),
>        PlacesUtils.bookmarks.DEFAULT_INDEX, "Get Firefox!");
>    }
>  
>    try {
> -    _("Create bookmark. Won't show because we haven't started tracking yet");
> +    // _("Create bookmark. Won't show because we haven't started tracking yet");

We should delete these commented lines that we don't intend uncommenting (although there's probably some value in the score checks, even if they need to change a little)

::: services/sync/tests/unit/test_bookmark_tracker.js
(Diff revision 16)
>              PlacesUtils.bookmarks.bookmarksMenuFolder,
>              "Test Folder", PlacesUtils.bookmarks.DEFAULT_INDEX);
>            // We should be tracking the new folder and its parent (and need to jump
>            // through blocking hoops...)
>            Async.promiseSpinningly(Task.spawn(verifyTrackedCount(2)));
> -          // But not have bumped the score.

this comment still seems relevant
Attachment #8751134 - Flags: feedback?(markh) → feedback+
Attachment #8752394 - Flags: feedback?(markh) → feedback+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 140

a year ago
mozreview-review-reply
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

https://reviewboard.mozilla.org/r/51879/#review70038

Thanks for the initial pass, Mark! I noted some issues above, and I'd like to bump the test coverage up for `fetchAllChanges` and `fetchNewChanges`, so that I can be more confident it's behaving as we expect.

> JSON.stringify here seems a little odd as we kow we have strings, but I've no idea if that's sane or not - I'm sure mak will have an opinion.

I saw this used in some of the other queries. Seems like it's used to quote and escape the string, to avoid hand-rolling a SQL escape function.

> IIUC, this function is used only for the status for "new" items? If so, it might make sense to change the name to reflect that (eg, if it was ever used on an update for an item that already has, say, .NORMAL, it would end up being replaced with .NEW)

Renamed to `determineInitialSyncStatus`, and changed it to accept a source like the C++ version.

> I think it's worthwhile expanding this slightly (ie, something like "an object with sync record objects keyed by the items GUID" or similar). Alternatively, and although rnewman seems to hate them, a Set might be another option, as it's a little more self-descriptive ("a set of Sync records, keyed by GUID")
> 
> (I'm not too bothered by this, but when reading the description I was wondering if you actually meant an array)

I renamed this to a "change map", but maybe "change object" would be better. Earlier, I was using "change set" in the VCS sense, but I agree it's too easily mixed up with JS data structures. I describe the format of the structure a bit more in `filterChangeRecords`; could mention that here.

> This seems a little dangerous, and I assume is mainly for the incorrect attribution on the mobile root?
> 
> If that's the case, I wonder if we should include a short-term fix for the mobile root - have a "schema upgrade" remove the anno and just remove that anno from Sync's creation of the bookmark?

We talked about this today. I think the right course of action is:

* Exclude all descendants of an excluded folder. (We don't want to overinclude, and we'll still end up with a consistent tree).
* Add a temporary workaround specifically for the mobile root, ignoring its exclude anno.
* Make mobile a proper root in bug 647605, and remove the workaround.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
I wasn't happy with the complexity of the tag handling, so I removed all special cases for tags. We'll now bump the change counter like we do for any other bookmark, and filter out tags at query time. We still check and ignore tags when writing tombstones, but that's a lot more manageable.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Mark pointed out yesterday that it's unclear why we consider the "excludeFromBackup" anno in the first place. The only items that should have that anno are the left pane queries, and those are under a "PlacesRoot" folder (http://searchfox.org/mozilla-central/rev/b38dbd1378cea4ae83bbc8a834cdccd02bbc5347/browser/components/places/PlacesUIUtils.jsm#1327-1328) that's ignored entirely in `getAllIDs` (http://searchfox.org/mozilla-central/rev/bb22cc4067c3832b943507497389b0b13d6f3a2b/services/sync/modules/engines/bookmarks.js#854-857,862).

Further, `getAllIDs` *doesn't* currently filter out excluded bookmarks; even if a bookmark under the menu, toolbar, or unfiled had that anno, it would still be uploaded on the first sync.

With that in mind, I think we can remove the tree walking in `determineExcludedGuids` and excluded item handling completely. `pullAllChanges` and `pullNewChanges` can use a recursive query to restrict tracked bookmarks to the roots we care about.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Depends on: 1274496
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
For those following along: the most recent version removes the hideous query to reconstruct the ancestry for tombstones, and just stores the root instead. I think it's still possible to keep the common case (Sync not set up; all bookmarks are "NEW") speedy. This should also mop up the edge cases from bug 1274496.

I need to add more tests to make sure we're not tracking stuff that we shouldn't, but filtering items out at sync time is so much easier, mentally and code-wise, than at change time.
Sunday long run thoughts: UNKNOWN bookmarks are going to be interesting. Broadly, UNKNOWN means, "I don't know anything about the sync state of this item; pull it down from the server and reconcile."

Realistically, you'll only end up with UNKNOWN bookmarks after upgrading, if you restore from backup, or if Firefox restores automatically in case places.sqlite is corrupt. In all three cases, every bookmark will have that state, so reconciling means fetching the entire tree from the server. The difference is that all change counters will be 0 after upgrading, and > 0 after restoring.

* We need to make sure we don't reupload everything after upgrading. This means any currently missed changes won't be synced until the next time the user updates that bookmark...but that's better than setting our servers on fire.

* We currently wipe the server and reupload everything if you restore (or import!) bookmarks. That's not good, but this patch isn't the right place to fix that. We just need to make sure we don't make this worse. Fixing import can be done today, with the current tracker; fixing restore is going to take more thought.

* We mark UNKNOWN items as NORMAL after a successful sync. I think that makes sense: it means we won't reconcile them, but also means that we don't need to decide on a reconciliation and repair strategy right now. Again, the important part is to not make the existing behavior worse, no matter how wrong it is now.
(In reply to Kit Cambridge [:kitcambridge] from comment #162)
> Sunday long run thoughts: UNKNOWN bookmarks are going to be interesting.
> Broadly, UNKNOWN means, "I don't know anything about the sync state of this
> item; pull it down from the server and reconcile."

Yep.

> Realistically, you'll only end up with UNKNOWN bookmarks after upgrading, if
> you restore from backup, or if Firefox restores automatically in case
> places.sqlite is corrupt. In all three cases, every bookmark will have that
> state, so reconciling means fetching the entire tree from the server. The
> difference is that all change counters will be 0 after upgrading, and > 0
> after restoring.

While that does seem to imply that a single flag would be better than tracking it per bookmark, a complication is that we don't sync transactionally. IOW, it seems likely that a full reconciliation may take multiple attempts if Sync is interrupted. It may be that we change the bookmarks engine to be slightly smarter when dealing with UNKNOWN records rather than just relying on the existing timestamp based full-sync a resetClient gives us.

> * We need to make sure we don't reupload everything after upgrading. This
> means any currently missed changes won't be synced until the next time the
> user updates that bookmark...but that's better than setting our servers on
> fire.

Yeah, the upgrade case really is an edge-case. It might even be worth considering that we set all items to NORMAL as part of the upgrade (or maybe have Sync itself do that when it notices the upgrade, or something). That doesn't repair anything we already missed, but that's not really the scope of this bug.

> * We currently wipe the server and reupload everything if you restore (or
> import!) bookmarks. That's not good, but this patch isn't the right place to
> fix that. We just need to make sure we don't make this worse. Fixing import
> can be done today, with the current tracker; fixing restore is going to take
> more thought.

Agreed.

> * We mark UNKNOWN items as NORMAL after a successful sync. I think that
> makes sense: it means we won't reconcile them, but also means that we don't
> need to decide on a reconciliation and repair strategy right now. Again, the
> important part is to not make the existing behavior worse, no matter how
> wrong it is now.

Agreed again.
(Reporter)

Comment 164

a year ago
(In reply to Mark Hammond [:markh] from comment #163)

> > * We need to make sure we don't reupload everything after upgrading. This
> > means any currently missed changes won't be synced until the next time the
> > user updates that bookmark...but that's better than setting our servers on
> > fire.
> 
> Yeah, the upgrade case really is an edge-case. It might even be worth
> considering that we set all items to NORMAL as part of the upgrade (or maybe
> have Sync itself do that when it notices the upgrade, or something). That
> doesn't repair anything we already missed, but that's not really the scope
> of this bug.

I'm more concerned about losing local changes than about reuploading -- that is, if the behavior after upgrade is to mark everything as UNKNOWN, and on the next sync we download and apply server records with the same GUIDs as local UNKNOWN records, we will (sometimes?) lose any local changes since the last sync.

This might only affect clients whose clocks are wrong -- the server record should be older. But testing with some known styles of corruption (missed uploads, mainly) would be interesting.


> > * We currently wipe the server and reupload everything if you restore (or
> > import!) bookmarks. That's not good, but this patch isn't the right place to
> > fix that. We just need to make sure we don't make this worse. Fixing import
> > can be done today, with the current tracker; fixing restore is going to take
> > more thought.
> 
> Agreed.

Bug 1295520/Bug 1199077, right?
(In reply to Richard Newman [:rnewman] from comment #164)
> I'm more concerned about losing local changes than about reuploading -- that
> is, if the behavior after upgrade is to mark everything as UNKNOWN, and on
> the next sync we download and apply server records with the same GUIDs as
> local UNKNOWN records, we will (sometimes?) lose any local changes since the
> last sync.

That's a good point. Since the change counter will be 0 for all records, we won't see them as locally modified, and, as you say, we'll always take the server's version if our clock is lagging. We could try migrating from the JSON file to minimize local losses, but that's fraught...and, considering we already miss changes, incomplete.

We don't use the sync status when reconciling or determining what changed (but maybe we should!), so the effect would be the same as if we set it to NORMAL.

> Bug 1295520/Bug 1199077, right?

Yep.
(Reporter)

Comment 166

a year ago
(In reply to Kit Cambridge [:kitcambridge] from comment #165)

> That's a good point. Since the change counter will be 0 for all records, we
> won't see them as locally modified, and, as you say, we'll always take the
> server's version if our clock is lagging.

Worse than that: if there are no indicated local changes, and that implies there's no conflict to resolve, then we'll always take the server's.

Updated

11 months ago
Blocks: 1301622
Depends on: 1303825
Component: Firefox Sync: Backend → Sync
Depends on: 1302901
No longer depends on: 1303825
Product: Cloud Services → Firefox
Summary: Sync bookmark tracker persistence can cause inconsistency → Track bookmark sync changes in Places
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
I'm going to rebase this atop bug 1295521, so that it makes parts 2 and 3 a bit easier to follow.
Depends on: 1295521
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

All right, let's do this! Mak, I think this has crystallized enough to where it's ready for your feedback. Let me know if there's anything I can do to make this easier to review. I tried to split this up into chunks, but it's still a big change.

This part makes schema changes to track bookmark insertions, updates, and deletions in Places. There are three new columns:

* syncChangeCounter - Bumped whenever the bookmark is changed and Firefox should sync. The value of the counter isn't really important; most changes will bump it by more than 1. At the start of a sync, we pull all changed bookmarks and store their counts in memory. At the end, we subtract our in-memory counter from the change counters in the DB. If the bookmark was updated or deleted during the sync, the counter will still be > 0, and we'll upload the bookmark again on the next sync.

* syncStatus - Tracks the bookmark's sync status, based on the source. "NEW" means the item was created locally, and hasn't been synced yet; "NORMAL" means the item was synced at least once, and "UNKNOWN" means the item was restored from backup and needs to be reconciled with the server. We also use this to decide if we need to write a tombstone when a bookmark is deleted.

* syncable - Indicates whether the bookmark should be synced. I made this a separate field instead of overloading "syncStatus" because "syncable" is inherited from parent to child, and having "syncStatus" mean "inherited if parent is ignored, determined from the source if not" is surprising. This is true for bookmarks in the toolbar, menu, unfiled, and mobile roots; false for the others. We can get the same result using a recursive query, but I made it a separate field so that we can fetch an item's syncability in `nsNavBookmarksService.moveItem`, `Bookmarks.update`, `PlacesSyncUtils.bookmarks.pull{All, New}Changes`, `PlacesSyncUtils.bookmarks.reset`, and `PlacesUtils.promiseBookmarksTree`, instead of repeating the query everywhere.

There's also a new `moz_bookmarks_deleted` table that stores tombstones for "NORMAL" and "UNKNOWN" bookmarks. These are deleted at the end of each sync, so the table should be fairly small.
Attachment #8752393 - Flags: feedback?(mak77)
Comment on attachment 8794426 [details]
Bug 1258127 - Update `Bookmarks.jsm` (JS) to track sync changes.

This and the next patch are the big ones. The idea is that every Places method bumps the "syncChangeCounter". Sync then pulls these changed bookmarks from Places, uploads them to the server, and decrements the change counter at the end.
Attachment #8794426 - Flags: feedback?(mak77)
Comment on attachment 8794427 [details]
Bug 1258127 - Update `nsNavBookmarksService` (C++) to track sync changes.

The C++ changes. Same functionality as JS, just more crufty. I tried to keep the same function names and comments, so it's a bit easier to see how they match up. The duplication is really unfortunate.
Attachment #8794427 - Flags: feedback?(mak77)
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

This part adds methods for fetching and updating those change counters.

* `PlacesSyncUtils.bookmarks.pullAllChanges` and `pullNewChanges` return change records for syncable bookmarks. (I'm using "pull" in the VCS sense; it's not a "fetch" because it also updates the sync status of "NEW" bookmarks to "NORMAL"). These records are passed to the bookmarks engine, which uses them to determine what to upload. Once they're uploaded (or reconciled), the engine sets the object's `synced` property to true.

* `PlacesSyncUtils.bookmarks.pushChanges` is called at the end of the sync, and pushes the sync engine's state back to Places. This decrements the change counter, deletes tombstones, and updates the sync status of "UNKNOWN" bookmarks to "NORMAL". We change "NEW" to "NORMAL" before the sync so that we can accurately write tombstones if there's a partial sync, and "UNKNOWN" to "NORMAL" after so that reconciliation can happen.

* `wipe` and `reset` are used internally by the sync engine. `reset` is called if you disable bookmark syncing: we don't erase anything locally, but we reset the sync status and change counter so that we can reupload everything if you change your mind.
Attachment #8751134 - Flags: feedback?(markh)
Attachment #8751134 - Flags: feedback?(mak77)
Comment on attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

Finally, getting the bookmarks engine to use the whole thing. :-)

*mic drop*
Attachment #8752394 - Flags: feedback?(markh)

Comment 186

11 months ago
mozreview-review
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

https://reviewboard.mozilla.org/r/51879/#review79638

This looks good to me

Comment 187

11 months ago
mozreview-review
Comment on attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

https://reviewboard.mozilla.org/r/52589/#review79642

LGTM

::: services/sync/tests/unit/test_bookmark_tracker.js:78
(Diff revision 28)
> +  return PlacesTestUtils.setBookmarkSyncFields(...guids.map(
> +    guid => ({ guid, syncStatus: "NORMAL" })
> +  ));
> +}
> +
> +// A temp debugging helper;

might as well remove the "temp" here (ie, there's probably no good reason to remove this function from the test - we'll be happy once we see failures)

Updated

11 months ago
Attachment #8751134 - Flags: feedback?(markh) → feedback+

Updated

11 months ago
Attachment #8752394 - Flags: feedback?(markh) → feedback+

Comment 188

11 months ago
(In reply to Kit Cambridge [:kitcambridge] from comment #181)
> * syncable - Indicates whether the bookmark should be synced. I made this a
> separate field instead of overloading "syncStatus" because "syncable" is
> inherited from parent to child, and having "syncStatus" mean "inherited if
> parent is ignored, determined from the source if not" is surprising. This is
> true for bookmarks in the toolbar, menu, unfiled, and mobile roots; false
> for the others. We can get the same result using a recursive query, but I
> made it a separate field so that we can fetch an item's syncability in
> `nsNavBookmarksService.moveItem`, `Bookmarks.update`,
> `PlacesSyncUtils.bookmarks.pull{All, New}Changes`,
> `PlacesSyncUtils.bookmarks.reset`, and `PlacesUtils.promiseBookmarksTree`,
> instead of repeating the query everywhere.

This still strikes me as sub-optimal and increases the cost of many non-sync related operations. ISTM a cache, much like the GuidHelper caches, could do this without too much pain and without significant extra work at sync time. One concern I have is these fields getting out-of-date somehow (eg, addons, etc) and with no clear road to recovery. While that can't prevent a "move", I wonder if there are hypothetical scenarios where, say, some addon temporarily moves items to a root under its control before moving them back. I could even imagine a future improvement to Sync that decided to store new incoming bookmarks under a temporary root for application after sync completes.

That said though, I'm sure mak will have his own opinions, so you should prefer his over mine :)
Sorry I'm context-switching too much today to sanely go through this, I'll look at these patches on Monday.
Do we have a list of what is non-syncable?

For tags it should be "easy" to determine that, since they can only have 2 levels, we can always check the grandparent. Also, for them I'd prefer working towards bug 424160 to remove them from moz_bookmarks. While that bug appears hard, it's not really that much, we had a volunteer working on it (bug 1039200) with good results. So it's something that could be done even without a lot of Places knowledge, maybe even by an intern.

The left pane folder is something that could probably be converted to a "virtual" folder. By this I mean we could have something like RESULTS_AS_TAG_QUERY, but it being RESULTS_AS_LEFTPANE_FOLDER that returns a fake result. Or we could hardcode the Library left pane contents, it's a long time we want to move the library to an in-content UI, and the current tree is not the best visual for that. That may allow us to have no real left pane folder. Even in this case it can only have 2 levels: left pane root -> all bookmarks. So again we can check the parent and the grandparent.

What's left? Add-ons can clearly create any kind of non-syncable stuff and their own roots... even with your approach they can do whatever they want with low level access to the db and you can't avoid that, so you'll still need to check syncable is valid, even if you have it.

Maybe we should just disallow adding anything to the root and have an internal method to add to it?
This would break:
- the left pane folder creation code. This is already "broken" in the sense its creation is incompatible with the new async bookmarking API.
- old sync (the one without a real root) could not create its root. this means we need a downgrade path (a version with real mobile root) before making the strict roots rule.
- Probably some add-ons, but honestly those should create their own db and ATTACH to places.sqlite rather than creating fake roots. I know that some add-ons do this and increase the size of places.sqlite to a level where it also kills our history and perf, so I'd be happy to disallow it. Add-ons have a way out using tags or attaching their own db. Someone could still try to use db access to do its own writes, but those will clearly be bad players at that point (we could remove any invalid root at random times).

If we'd take this strict rule and make the left pane root "virtual", the only non-syncable thing would be tags, that should not be so hard to figure (now we have a skipTags option in nsINavBookmarkObserver, so looks like we may have that info in most cases you care about).

The downside is that either we delay this work until we can do all of that, or we take a small perf hit to do the appropriate checks now and start working on the long term solution (stricter root rules) to remove those checks asap.
TL;DR we should probably take a hit to do an on-the-fly syncable check and make a plan to move to "nobody can write to the Places root apart from Places itself".
Please don't take my questions as a stop-energy or a dislike sign. I'm mostly trying to understand. It would have been great to build a google doc with the new design, an example of a sync procedure, and reasons for which the various decisions have been taken (or reasons against other possibilities). In the lack of that, I need to ask many questions :)

(In reply to Kit Cambridge [:kitcambridge] from comment #181)
> At the start of a sync, we pull all changed
> bookmarks and store their counts in memory.

Is there any point where you could need to pull ALL the bookmarks in memory? I'm asking cause we have evidence of users with 10k bookmarks or even more. What do we store in memory, guid and counter? Any other data?
If we should store a lot of data for each bookmark, any possibilities to reduce that?

If counter would monotonically increase, couldn't we set a point in time by storing the counter value at that point and checking all bookmarks having a bigger counter without making more complex differences? I mean, at the beginning of a sync we check the previously stored counter and the currently higher counter. Sync everything in the middle. Any bookmark change monotonically increases this global counter and stores it with the bookmark. The only thing to update at the end of the sync is the currently synced value, not every bookmark.
Is the reason to update every single bookmark to better support interrupted Syncs?

> * syncable - Indicates whether the bookmark should be synced. I made this a
> separate field instead of overloading "syncStatus"

I agree with the will to avoid adding too many statuses to syncStatus. It sounds like a good idea to move syncable elsewhere, especially based on the fact we could effectively remove any non-syncable contents and I think it would be a good idea, as explained before.
I just don't like much adding a third column, that in future could also be pointless. I agree with Mark we should investigate alternatives.

> There's also a new `moz_bookmarks_deleted` table that stores tombstones for
> "NORMAL" and "UNKNOWN" bookmarks. These are deleted at the end of each sync,
> so the table should be fairly small.

Could you please explain the functionality of this table and how it will be used?
You said something about using syncStatus to decide whether we need a tombstone, is that just that we basically don't create tombstones for NEW entries?
I'm also interested in the management of contents here in the Sync-disabled case.
an alternative idea regarding syncable. We know we can remove it if we wish, but we also know that requires work.
Adding another column that in the future may become orphan is not great, but we have one (from a long time) unused column in moz_bookmarks we may reuse, that is folder_type. I'd have nothing against using this temporarily and in the future just set it to NULL again once everything is syncable. the name is not great, I admit, but it's already there and we can alias it.

I think in the end the decision boils down to how much is more expensive (perf) to use an alternative to a column. Can you do that kind of analysis? If we write syncable only when we are already doing a write and we read it when we already do a read, it's basically free and we should go for it (and at the same time start planning to make everything syncable)
another question, is there anything of this we could easily do through triggers? those have the nice feature that we unlikely break them touching code.
We already have insert/update/remove triggers. I guess you already evaluated those, but I don't see the reasoning about avoiding them here.

Comment 195

10 months ago
mozreview-review
Comment on attachment 8794427 [details]
Bug 1258127 - Update `nsNavBookmarksService` (C++) to track sync changes.

https://reviewboard.mozilla.org/r/80902/#review83824

::: toolkit/components/places/nsNavBookmarks.cpp:30
(Diff revision 1)
>  const int32_t nsNavBookmarks::kGetChildrenIndex_Guid = 18;
>  const int32_t nsNavBookmarks::kGetChildrenIndex_Position = 19;
>  const int32_t nsNavBookmarks::kGetChildrenIndex_Type = 20;
>  const int32_t nsNavBookmarks::kGetChildrenIndex_PlaceID = 21;
> +const int32_t nsNavBookmarks::kGetChildrenIndex_SyncStatus = 22;
> +const int32_t nsNavBookmarks::kGetChildrenIndex_Syncable = 22;

something fishy here (same index)

::: toolkit/components/places/nsNavBookmarks.cpp:119
(Diff revision 1)
>  };
>  
> +
> +// Removes the Sync orphan annotation from an item.
> +nsresult
> +RemoveSyncParentAnno(int64_t aItemId, uint16_t aSource)

ditto for a better name

::: toolkit/components/places/nsNavBookmarks.cpp:1348
(Diff revision 1)
>        // to shift up.
>        --newIndex;
>      }
>    }
>  
> +  if (!MatchesSyncable(aNewParent, bookmark.syncable)) {

same question as in the js case
This also adds a query :(

Comment 196

10 months ago
mozreview-review
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

https://reviewboard.mozilla.org/r/52587/#review83786

::: toolkit/components/places/Database.cpp:1873
(Diff revision 23)
>  
>  nsresult
>  Database::MigrateV35Up() {
>    MOZ_ASSERT(NS_IsMainThread());
>  
> -  nsresult rv = CreateMobileRoot();
> +  // Add sync status and change counter tracking columns for bookmarks.

I assume now you want v36 (maybe v37 since bug 1283320 wants to add context_id) (FYI I think I will ask the Sync team feedback on that before it lands, cause it's still a bit unclear to me if the previously pointed out Sync complains about that addition are still valid or not)

The fact it may be in the same version doesn't matter, v35 is gone, you must assume dbs may be at it.

Comment 197

10 months ago
mozreview-review
Comment on attachment 8794426 [details]
Bug 1258127 - Update `Bookmarks.jsm` (JS) to track sync changes.

https://reviewboard.mozilla.org/r/80900/#review83808

::: browser/components/places/tests/unit/head_bookmarks.js:82
(Diff revision 1)
>   */
>  function rebuildSmartBookmarks() {
>    let consoleListener = {
>      observe(aMsg) {
> +      if (aMsg.message.startsWith("[JavaScript Warning:")) {
> +        // Ignore spurious strict warnings (bug 1300416).

nit: in Places we usually use this form:
// TODO (BUG NNN): description.
Well, at least please add a TODO so it's easy to seach on it.

::: toolkit/components/places/Bookmarks.jsm:98
(Diff revision 1)
>    TYPE_SEPARATOR: 3,
>  
>    /**
> +   * Sync status constants, stored for each item.
> +   */
> +  SYNC_STATUSES: {

I'd prefer this singular as SYNC_STATUS.UNKNOWN, for example. I know that SOURCES is plural, but it probably doesn't matter much.
I'd probably would like if we'd also change SOURCES to SOURCE, but that should also be uplifted to 51.

::: toolkit/components/places/Bookmarks.jsm:344
(Diff revision 1)
> +        let itemMoved = item.parentGuid != updatedItem.parentGuid ||
> +                        item.index != updatedItem.index;
> +        if (itemMoved) {
> +          // Remove the Sync orphan annotation, so that Sync doesn't try to
> +          // reparent the item once it sees the original parent.
> +          removeSyncParentAnno(updatedItem._id, updatedItem.source);

may we name the function in a way that better explains what it's doing?
removeSyncParentAnno doesn't tell me anything, it's named after its implementation details.
Something like avoidSyncReparenting or such.

::: toolkit/components/places/Bookmarks.jsm:850
(Diff revision 1)
>        if (newParent) {
> +        if (item._syncable != newParent._syncable) {
> +          // Moving an item from a synced folder to an unsynced folder causes
> +          // other devices to miss changes; moving from unsynced to synced
> +          // orphans an item's descendants and uploads an inconsistent tree.
> +          throw new Error("Cannot move item between synced and unsynced parents");

this seems to add a failure to bookmarks API that depends on Sync doing thae right thing...

May this start denying updateBookmarks operations if the db somehow corrupts? What kind of problems will this cause that is worth avoiding the operation at all costs? any alternative ways to handle the problem we could evaluate?

::: toolkit/components/places/Bookmarks.jsm:910
(Diff revision 1)
>        }
>  
> +      if (isTagging) {
> +        // If we're updating a tag entry, bump the sync change counter for
> +        // bookmarks with the tagged URL.
> +        yield addSyncChangesForBookmarksWithURL(db, item.url, syncChangeDelta);

This will also bump the counter for tags, but I assume you don't care cause those are not syncable.

::: toolkit/components/places/Bookmarks.jsm:1675
(Diff revision 1)
> +  }
> +  return Bookmarks.SYNC_STATUSES.NEW;
> +}
> +
> +// Indicates whether a removed item needs a tombstone. Items excluded from
> +// syncing and "NEW" items that haven't been synced yet don't need tombstones.

I think the javadoc about tombstones usage and lifetime could be larger and more detailed

::: toolkit/components/places/Bookmarks.jsm:1738
(Diff revision 1)
> +    UPDATE moz_bookmarks
> +      SET syncChangeCounter = syncChangeCounter + :syncChangeDelta
> +    WHERE id IN (
> +      SELECT b.id
> +      FROM moz_bookmarks b
> +      JOIN moz_bookmarks p ON p.id = b.parent

what's the point of the parent join? and at that point, why can't we just UPDATE ... WHERE fk = (SELECT id FROM moz_places...)

::: toolkit/components/places/Bookmarks.jsm:1755
(Diff revision 1)
> +  }
> +  return db.execute(`
> +    UPDATE moz_bookmarks SET
> +      syncChangeCounter = syncChangeCounter + :syncChangeDelta
> +    WHERE fk IN (
> +      SELECT h.id

do you actually need to join with moz_places? you can already have the list of fk (that is the same as h.id)

::: toolkit/components/places/PlacesUtils.jsm:3874
(Diff revision 1)
>        throw new Error("Invalid value");
>      return v;
>    };
>  }
> +
> +var addKeywordSyncChange = Task.async(function* (db, url, source) {

maybe you could make a more generic util to mark syncchange to bookmarks with a given url in PlacesSyncUtils and reuse it both here and in Bookmarks.jsm?

::: toolkit/components/places/tests/unit/test_sync_fields.js:2
(Diff revision 1)
> +
> +// Tracks a set of bookmark ids and their syncChangeCounter field and

please move the test to bookmarks/ tests folder
last 2 questions:
1. Please analyze the queries querying on syncable/syncStatus and so on and try to figure if any of those may need an index. indices are not cheap so we take them for good reasons. Also in sqlite a compound index on (a, b) can be used both for queries querying a, b or also only querying a.
2. How well is all of this tested by existing Sync tests?
Thanks, Mak! I'm writing up a design document for future reference, but I'll answer your questions in the meantime.

(In reply to Marco Bonardo [::mak] from comment #190)
> Do we have a list of what is non-syncable?

I think it's easier to list what is syncable: menu, toolbar, unfiled, mobile, and their descendants. Everything else is non-syncable: tags (until we remove them from `moz_bookmarks`), the left pane folder, roots and items from add-ons, and old roots that we no longer use (Richard mentions "readinglist" in bug 1303679, comment 5).

> If we'd take this strict rule and make the left pane root "virtual", the
> only non-syncable thing would be tags, that should not be so hard to figure
> (now we have a skipTags option in nsINavBookmarkObserver, so looks like we
> may have that info in most cases you care about).

I agree. If we can prevent the creation of new roots (and migrate or delete existing ones), make the left pane virtual, and move tags out of `moz_bookmarks`, we wouldn't need to worry about syncable vs. non-syncable.

I'm a bit wary of having this patch depend on that work...partly because the virtual left pane seems complex, and partly because I'm not sure it would reduce the complexity all that much. It would let us drop the "syncable" field and the check in `moveItem`, but we'd still end up with a large patch to bump the counter in each method.

(In reply to Marco Bonardo [::mak] from comment #193)
> an alternative idea regarding syncable. We know we can remove it if we wish,
> but we also know that requires work.
> Adding another column that in the future may become orphan is not great, but
> we have one (from a long time) unused column in moz_bookmarks we may reuse,
> that is folder_type. I'd have nothing against using this temporarily and in
> the future just set it to NULL again once everything is syncable. the name
> is not great, I admit, but it's already there and we can alias it.

Sounds reasonable, if we already have a column we can reuse.

> I think in the end the decision boils down to how much is more expensive
> (perf) to use an alternative to a column. Can you do that kind of analysis?
> If we write syncable only when we are already doing a write and we read it
> when we already do a read, it's basically free and we should go for it (and
> at the same time start planning to make everything syncable)

"syncable" is basically `WITH RECURSIVE syncableItems(did) AS (SELECT id FROM moz_bookmarks WHERE guid IN ('menu________', 'toolbar_____', 'unfiled_____', 'mobile______') UNION ALL SELECT id FROM moz_bookmarks JOIN syncableItems ON parent = did)`.

Running that query every time we bumped the counter was expensive (comment 62). Another version bumped the counter and wrote tombstones for syncable and non-syncable items, and filtered out the non-syncable ones in `pullAllChanges` and `pullNewChanges`. That made it hard to filter items out of `moz_bookmarks_deleted`; I had to join that table to `moz_bookmarks` to see if a tombstone was a descendant of a syncable root. A later version only ran that query for deletes, but it complicated the queries in `removeFoldersContents`, `RemoveFolderChildren`, and `promiseBookmarksTree`. It also didn't prevent moves between syncable and non-syncable roots, but I think I'm convinced that's not a problem in practice.

I'll investigate further and post some numbers comparing these approaches.
(In reply to Marco Bonardo [::mak] from comment #194)
> another question, is there anything of this we could easily do through
> triggers? those have the nice feature that we unlikely break them touching
> code.
> We already have insert/update/remove triggers. I guess you already evaluated
> those, but I don't see the reasoning about avoiding them here.

Triggers would have definitely made these simpler. We decided against them for a couple of reasons:

* Sync and Places represent bookmarks differently, so a single "change" in the UI might bump the counter multiple times, and for multiple bookmarks. Examples: repositioning an item within a container only tracks its parent, not the child. Changing the tag (until bug 1039200 lands) syncs tagged bookmarks, not the tag folder or entry. Keywords bump the counter for bookmarks with the URL.

* We'd need to make the trigger aware of the change source, because that impacts the value of the counter and status. For local and imported bookmarks, we want syncStatus = "NEW" and syncChangeCounter = 1. For remote bookmarks added by Sync, we want syncStatus = "NORMAL" and syncChangeCounter = 0. For restored bookmarks, we want syncStatus = "UNKNOWN" and syncChangeCounter = 1.

* Related to the second point, some changes aren't relevant to Sync at all. For these, we're worried about sync loops, where the trigger fires for a change that shouldn't be synced, and then clients keep reuploading the same record forever.
(In reply to Marco Bonardo [::mak] from comment #192)
> Could you please explain the functionality of this table and how it will be
> used?

Sure. Whenever we delete a "NEW" or "UNKNOWN" bookmark with syncable = 1, we insert a row for it into this table. When Sync runs, it uploads tombstones for these items to the server. At the end of the sync, we delete successfully uploaded tombstones from `moz_bookmarks_deleted`.

The new `PlacesSyncUtils.bookmarks.pushChanges` method (https://reviewboard.mozilla.org/r/51879/diff/30) handles deleting successfully uploaded tombstones. The table shouldn't have more than a handful of rows between syncs, unless you delete a large folder.

> You said something about using syncStatus to decide whether we need a
> tombstone, is that just that we basically don't create tombstones for NEW
> entries?

Exactly. We don't create tombstones for "NEW" and non-syncable entries.

> I'm also interested in the management of contents here in the Sync-disabled
> case.

If the user never signs in to Sync, all bookmarks will stay in the "NEW" state, so the table will be empty. If Sync is enabled and then disabled later, `PlacesSyncUtils.bookmarks.reset` (also https://reviewboard.mozilla.org/r/51879/diff/30; I wish MozReview could link to lines) drops the contents and changes the state on all bookmarks back to "NEW".
(Reporter)

Comment 202

10 months ago
(In reply to Kit Cambridge [:kitcambridge] from comment #201)

> … If the user never signs in to Sync, all bookmarks will stay in the "NEW"
> state, so the table will be empty…

I'm really pleased by how close this aligns, at least conceptually, with iOS's solution. And also pleased by the clear explication, Kit! :thumbsup:
(In reply to Kit Cambridge [:kitcambridge] from comment #199)
> I'm a bit wary of having this patch depend on that work...partly because the
> virtual left pane seems complex, and partly because I'm not sure it would
> reduce the complexity all that much. It would let us drop the "syncable"
> field and the check in `moveItem`, but we'd still end up with a large patch
> to bump the counter in each method.

Yes, it would only remove the need to mark things as syncable, but that's still 1/3 of all the needed tracking and could prevent future problems.
I think it's correct to not gate this work on that, but I also think we should plan and start working towards that. Reaching a point where there is nothing in moz_bookmarks that is not syncable sounds like a good goal to me, and it "only" requires to fix the left pane folder and make the APIs stricter (parentGuid cannot be placesRoot). we'll clearly still need to check parent and grandParent for tags, until they are moved.

> "syncable" is basically `WITH RECURSIVE syncableItems(did) AS (SELECT id
> FROM moz_bookmarks WHERE guid IN ('menu________', 'toolbar_____',
> 'unfiled_____', 'mobile______') UNION ALL SELECT id FROM moz_bookmarks JOIN
> syncableItems ON parent = did)`.

But for all the cases we care, we could only check parent and grandParent, and in many cases we already have those, isn't it?
Sure, add-ons could use deeper nesting, but do we care about that? they can break us and sync in many other ways already.

I'm mostly trying to understand if we could replace syncable with a check on parent and grandparent. IF we have to run a recursive query we already lost (and it's better to have the column).
(In reply to Kit Cambridge [:kitcambridge] from comment #200)
> Triggers would have definitely made these simpler. We decided against them
> for a couple of reasons:

OK, I see.

(In reply to Kit Cambridge [:kitcambridge] from comment #201)
> If the user never signs in to Sync, all bookmarks will stay in the "NEW"
> state, so the table will be empty. If Sync is enabled and then disabled
> later, `PlacesSyncUtils.bookmarks.reset` (also
> https://reviewboard.mozilla.org/r/51879/diff/30; I wish MozReview could link
> to lines) drops the contents and changes the state on all bookmarks back to
> "NEW".

OK, could you please document this somewhere in the code, maybe above the table definition.
(In reply to Marco Bonardo [::mak] from comment #203)
> I'm mostly trying to understand if we could replace syncable with a check on
> parent and grandparent. IF we have to run a recursive query we already lost
> (and it's better to have the column).

I see. For tags, definitely. For the left pane, I think we might need to go up three levels because it has an "All Bookmarks" subfolder with 3 queries. For add-ons...I'll trust your judgement. On the one hand, it's really easy for Sync to spread corruption caused by a confused add-on to other devices. On the other, you're right that they can break us in many ways already, so maybe this doesn't buy us anything.
oh well, let's keep this syncable thing by reusing folder_type and start up a new bug to figure how to simplify or remove it.
Attachment #8751134 - Flags: feedback?(mak77) → feedback+
Attachment #8752393 - Flags: feedback?(mak77) → feedback+
Attachment #8794426 - Flags: feedback?(mak77) → feedback+
Attachment #8794427 - Flags: feedback?(mak77) → feedback+
Blocks: 1309930
I'll post new versions tomorrow that incorporate your feedback, but I wanted to reply to this...

(In reply to Marco Bonardo [::mak] from comment #197)
> ::: toolkit/components/places/Bookmarks.jsm:850
> (Diff revision 1)
> >        if (newParent) {
> > +        if (item._syncable != newParent._syncable) {
> > +          // Moving an item from a synced folder to an unsynced folder causes
> > +          // other devices to miss changes; moving from unsynced to synced
> > +          // orphans an item's descendants and uploads an inconsistent tree.
> > +          throw new Error("Cannot move item between synced and unsynced parents");
> 
> this seems to add a failure to bookmarks API that depends on Sync doing thae
> right thing...
> 
> May this start denying updateBookmarks operations if the db somehow
> corrupts? What kind of problems will this cause that is worth avoiding the
> operation at all costs? any alternative ways to handle the problem we could
> evaluate?

One idea is to relax this check and allow moves between syncable and non-syncable parents if Sync is disabled. It's still weird, but we don't have to worry about it, because we're not syncing the tree. (If a tree falls and no one is around to hear it, does it make a sound?) In that case, we inherit the syncability of the new parent, so moving from non-syncable to syncable makes the item syncable.

Once Sync is set up, though, we prevent moves like this, to avoid confusing Sync. OTOH, it might be surprising if an add-on depends on being able to do this: "How come it works with Sync enabled, but breaks when it's disabled?" So maybe we enforce it consistently and break either way. But you bring up a good point about corruption, so maybe we do nothing, and add some telemetry to see if this actually happens in practice.
(Reporter)

Comment 208

10 months ago
(In reply to Kit Cambridge [:kitcambridge] from comment #207)

> One idea is to relax this check and allow moves between syncable and
> non-syncable parents if Sync is disabled. It's still weird, but we don't
> have to worry about it, because we're not syncing the tree.

I recommend against that:

1. Users sometimes turn Sync off and on, because we don't offer much control over its impact on battery or network usage, and because it's sometimes necessary to fix things.
2. We don't have automated tests outside of Sync that run with Sync turned on. That means we'd have a whole significant state-management path that is never tested.
3. Problems arise if a cross-root move _has ever_ occurred, if data intersects between clients.
(Assignee)

Comment 209

10 months ago
mozreview-review-reply
Comment on attachment 8794426 [details]
Bug 1258127 - Update `Bookmarks.jsm` (JS) to track sync changes.

https://reviewboard.mozilla.org/r/80900/#review83808

> may we name the function in a way that better explains what it's doing?
> removeSyncParentAnno doesn't tell me anything, it's named after its implementation details.
> Something like avoidSyncReparenting or such.

Changed to `preventSyncReparenting` here and in C++.

> this seems to add a failure to bookmarks API that depends on Sync doing thae right thing...
> 
> May this start denying updateBookmarks operations if the db somehow corrupts? What kind of problems will this cause that is worth avoiding the operation at all costs? any alternative ways to handle the problem we could evaluate?

I'm considering enforcing this only if Sync is enabled, but Richard points out in comment 208 that has other issues. In theory, this shouldn't be possible with the existing set of roots, because we don't move items in or out of the left pane root or tags root. But I think Mark mentioned add-ons might create their own roots, and expect this to work. Maybe it's fine to ignore that case, if we're talking about preventing creation of new roots.

I'm still thinking about this one. Moves like this will confuse Sync quite a bit (see the comment above this line), but that confusion can already happen today. Some telemetry might be useful.

> This will also bump the counter for tags, but I assume you don't care cause those are not syncable.

Right. I think this was before I added `syncable`; now that we have that, we can ignore tags. Though, as you say, it doesn't really matter; we'll filter them out at query time.

> I think the javadoc about tombstones usage and lifetime could be larger and more detailed

Expanded here and in the schema patch.

> what's the point of the parent join? and at that point, why can't we just UPDATE ... WHERE fk = (SELECT id FROM moz_places...)

Replaced this query with `addSyncChangesForBookmarksWithURL`, because we've removed the bookmarks at this point...so the query wouldn't return anything.

> do you actually need to join with moz_places? you can already have the list of fk (that is the same as h.id)

You're right; I only need the `fk`.

> maybe you could make a more generic util to mark syncchange to bookmarks with a given url in PlacesSyncUtils and reuse it both here and in Bookmarks.jsm?

Good idea, added `PlacesSyncUtils.bookmarks.addSyncChangesForBookmarksWithURL`.
(Assignee)

Comment 210

10 months ago
mozreview-review-reply
Comment on attachment 8794427 [details]
Bug 1258127 - Update `nsNavBookmarksService` (C++) to track sync changes.

https://reviewboard.mozilla.org/r/80902/#review83824

> something fishy here (same index)

Fixed and added a test in part 4, because this wasn't exercised in `removeFolderChildren`.

> same question as in the js case
> This also adds a query :(

I can add this to `FetchFolderInfo`, but that'll still add a third subquery.

Another idea is to rewrite the fetch query as something like `SELECT count(*), NULL, NULL, NULL FROM moz_bookmarks WHERE parent = :parent UNION ALL SELECT NULL, guid, id, syncable FROM moz_bookmarks WHERE id = :parent`, so that it only searches `moz_bookmarks` once. I'll do some analysis, assuming we want to keep it.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(In reply to Richard Newman [:rnewman] from comment #208)
> 1. Users sometimes turn Sync off and on, because we don't offer much control
> over its impact on battery or network usage, and because it's sometimes
> necessary to fix things.

I think part of the problem, as you've mentioned before, is that none of our clients agree on the set of syncable items. If they did, they could drop incoming records belonging to non-syncable trees.

Allowing moves between syncable and non-syncable trees would make this validation harder. If we get a remote record that belongs to a non-syncable tree locally, is that because another client changed its syncability (and we should take it), or was it a buggy client (and we should drop it)?

> 2. We don't have automated tests outside of Sync that run with Sync turned
> on. That means we'd have a whole significant state-management path that is
> never tested.

This patch checks if `services.sync.username` is set and the bookmarks engine is enabled before bumping the counter and writing tombstones. The checks are centralized in 4 functions (`determineSyncChangeDelta`, `determineInitialSyncStatus`, `needsTombstone`, and `preventSyncReparenting`), but there's already a difference in how we behave with Sync enabled and disabled. (This avoids extra queries, but another reason for this difference is bookmark restore: if Sync is enabled, we want to mark restored bookmarks as "UNKNOWN", and write tombstones for them. If not, we should mark them as "NEW", to avoid accumulating tombstones we'll never delete).

> 3. Problems arise if a cross-root move _has ever_ occurred, if data
> intersects between clients.

That's a good point, and I think it can be part of our repair strategy, too.
(In reply to Marco Bonardo [::mak] from comment #198)
> last 2 questions:
> 1. Please analyze the queries querying on syncable/syncStatus and so on and
> try to figure if any of those may need an index. indices are not cheap so we
> take them for good reasons. Also in sqlite a compound index on (a, b) can be
> used both for queries querying a, b or also only querying a.

I don't think we need indices on any of the new columns. I tried with 5k, 10k, 20k, and 40k bookmarks, with a mix of zero and non-zero change counter values: only first half > 0, only evens > 0, regions of 1k bookmarks > 0, 5-10 bookmarks > 0 in a total set of 10k or 40k bookmarks, and combinations of these.

The worst-case performance I saw was 40k bookmarks for `pullNewChanges`, with a mix of large and small ranges (dozens to hundreds of bookmarks), for 23375k total bookmarks where syncChangeCounter > 0. Even then, it took 5sec to fetch the changes without an index, and 4sec with.

For a more realistic set (14 where syncChangeCounter > 0, out of 40k total), the index didn't make a difference: 0.3s to fetch with an index; 0.4s without. The query times were a fraction of the time it took to stuff the DB with 40k bookmarks (55sec).

`pullNewChanges` is the only time we really use the change counter. `pullAllChanges` does a full table scan anyway, because it fetches change records for all items, and is only called when the user signs in to Sync for the first time.

> 2. How well is all of this tested by existing Sync tests?

test_bookmark_tracker.js tests that we increment the change counter in the ways we expect, and test_sync_utils.js exercises {pullAll, pullNew, Push}Changes. That said, I don't have too much confidence in the Sync test suite. This will definitely need a manual QA pass.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 224

10 months ago
mozreview-review
Comment on attachment 8794426 [details]
Bug 1258127 - Update `Bookmarks.jsm` (JS) to track sync changes.

https://reviewboard.mozilla.org/r/80900/#review86772

::: toolkit/components/places/PlacesSyncUtils.jsm:351
(Diff revision 3)
> +  /**
> +   * Returns true if the user is signed in to Sync and the bookmarks engine is
> +   * enabled. This check avoids additional queries to bump the change counter
> +   * and write tombstones if Sync is disabled.
> +   */
> +  isSyncEnabled() {

This could be a problem if Refresh Firefox doesn't actually restore this pref (bug 1311091). Then again, it doesn't restore the last sync time for each engine, either, so we'll always call `pullAllChanges` on the next sync.

Refresh Firefox should probably call `reset()`, though, to clear the change counter and drop old tombstones.
(Assignee)

Comment 225

10 months ago
mozreview-review
Comment on attachment 8802750 [details]
Bug 1258127 - Move bookmark de-duping logic into `PlacesSyncUtils.bookmarks.dedupe`.

https://reviewboard.mozilla.org/r/87042/#review86796

::: toolkit/components/places/PlacesSyncUtils.jsm:1508
(Diff revision 1)
> +      INSERT INTO moz_bookmarks_deleted (guid, dateRemoved, syncStatus)
> +      VALUES (:localGuid, :modified, :localSyncStatus)`,
> +      { localGuid, modified, localSyncStatus });
> +  });
> +
> +  // Pull change records for the parents and the old local GUID.

I'm not really a fan of this approach.

Before, the engine didn't add to `_modified` during the sync. It was an opaque structure that we populated at the start of the sync, and added back to the tracker at the end.

With bug 1276823 and bug 1299978, that's no longer true: we add to the changeset during the download. We can probably wait to pull the changeset before uploading; the only reason it's needed during the download is dupe and delete detection. But that means reworking `_reconcile`, which seems far out of scope for this patch.

This also doesn't need to be solved now. We can add placeholder entries to `_modified` that are marked as `synced: true`, so that `pushChanges` ignores them. I think that behavior will be the same as now for interrupted syncs.

Mark, do you have any suggestions?

Comment 226

10 months ago
mozreview-review
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

https://reviewboard.mozilla.org/r/51879/#review87706

from a Places code point of view, this looks ok. I'm delegating to Mark regarding the Sync logic.

::: toolkit/components/places/PlacesSyncUtils.jsm:1480
(Diff revision 32)
> +    for (let chunk of chunkArray(unsyncedGuids, chunkLength)) {
> +      yield db.execute(`
> +        UPDATE moz_bookmarks
> +        SET syncStatus = ?
> +        WHERE guid IN (${new Array(chunk.length).fill("?").join()})`,
> +        [PlacesUtils.bookmarks.SYNC_STATUS.NORMAL, ...chunk]);

why do you need to bind the guids here? these are just guids and you control them, you can just build a static list of them (not sure whether that would still require chunking, regardless you'd save the binding cost).

::: toolkit/components/places/PlacesSyncUtils.jsm:1494
(Diff revision 32)
> +    return Promise.resolve();
> +  }
> +  for (let chunk of chunkArray(guids, SQLITE_MAX_VARIABLE_NUMBER)) {
> +    yield db.execute(`
> +      DELETE FROM moz_bookmarks_deleted
> +      WHERE guid IN (${new Array(chunk.length).fill("?").join()})`,

ditto for avoiding the binding
Attachment #8751134 - Flags: review?(mak77) → review+

Comment 227

10 months ago
mozreview-review
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

https://reviewboard.mozilla.org/r/52587/#review87728

r=me with the query performance verified on a large (but realistic) dataset.

::: toolkit/components/places/Database.cpp:2007
(Diff revision 25)
> +      "SELECT b.id FROM moz_bookmarks b "
> +      "JOIN syncedItems s ON b.parent = s.id "
> +    ") "
> +    "UPDATE moz_bookmarks SET folder_type /* syncable */ = '1' "
> +    "WHERE id IN syncedItems"
> +  ));

do you have a benchmark of this on a large dataset (something like 10000 bookmarks).
Whatever we do here, is a direct price the user pays on startup. We usually try to executeAsync expensive tasks, but in this case this should probably stay synchronous for data safety reasons.
Attachment #8752393 - Flags: review?(mak77) → review+

Comment 228

10 months ago
mozreview-review
Comment on attachment 8794426 [details]
Bug 1258127 - Update `Bookmarks.jsm` (JS) to track sync changes.

https://reviewboard.mozilla.org/r/80900/#review87734

::: toolkit/components/places/PlacesSyncUtils.jsm:33
(Diff revision 3)
>   * tags, keywords, synced annotations, and missing parents.
>   */
>  var PlacesSyncUtils = {};
>  
> -const { SOURCE_SYNC } = Ci.nsINavBookmarksService;
> +const { SOURCE_SYNC, SOURCE_IMPORT_REPLACE, SYNC_STATUS_UNKNOWN,
> +        SYNC_STATUS_NEW, SYNC_STATUS_NORMAL } = Ci.nsINavBookmarksService;

Is there a reason to use these instead of the ones in Bookmarks.jsm?
In general we should limit touching the old idl in new code.
Attachment #8794426 - Flags: review?(mak77) → review+

Comment 229

10 months ago
mozreview-review
Comment on attachment 8794426 [details]
Bug 1258127 - Update `Bookmarks.jsm` (JS) to track sync changes.

https://reviewboard.mozilla.org/r/80900/#review87744

::: toolkit/components/places/Bookmarks.jsm:1749
(Diff revision 3)
> +    return;
> +  }
> +  let annos = PlacesUtils.annotations;
> +  let orphanAnno = PlacesSyncUtils.bookmarks.SYNC_PARENT_ANNO;
> +  if (annos.itemHasAnnotation(itemId, orphanAnno)) {
> +    annos.removeItemAnnotation(itemId, orphanAnno, source);

I just noticed this in the cpp file, and went back to check this.
the problem here is that this API runs a synchronous query, while most of Bookmarks.jsm is async.

This comment is valid for both patches, how often may this happen in reality?
I'm trying to figure out if we need an async fast-path for this case.

Comment 230

10 months ago
mozreview-review
Comment on attachment 8794427 [details]
Bug 1258127 - Update `nsNavBookmarksService` (C++) to track sync changes.

https://reviewboard.mozilla.org/r/80902/#review87736

::: toolkit/components/places/nsNavBookmarks.h:305
(Diff revision 3)
> +
> +  nsresult SetItemTitleInternal(BookmarkData& aBookmark,
> +                                const nsACString& aTitle,
> +                                int64_t aSyncChangeDelta);
> +
> +  bool MatchesSyncable(int64_t aItemId, int32_t aSyncable);

any of these methods that don't need to access this, should be moved into the anonymous namespace at the top of nsNavBookmarks.cpp, rather than being methods of this class.
if it just needs the connection, it can also be passed as an argument. If the method requires more hooks than just the connection, then it's fine to add a method.

::: toolkit/components/places/nsNavBookmarks.h:332
(Diff revision 3)
>    }
>  
> +  static bool IsSyncEnabled();
> +
> +  // Returns the sync change counter increment for a change source constant.
> +  static inline int64_t DetermineSyncChangeDelta(uint16_t aSource) {

ditto for static methods, unless they should be accessed from an outside consumer of nsNavBookmarks...
I think most of the functions you added could just be plain functions in the anonymous namespace.

::: toolkit/components/places/nsNavBookmarks.cpp:203
(Diff revision 3)
> +  bool isOrphan;
> +  // Check if the anno exists, so that we don't fire spurious
> +  // `OnItemAnnotationSet` or `OnItemChanged` notifications.
> +  nsresult rv = annosvc->ItemHasAnnotation(aItemId,
> +                                           NS_LITERAL_CSTRING(SYNC_PARENT_ANNO),
> +                                           &isOrphan);

Same question as for the bookmarks.jsm case, with the difference here we can't add any async fast-path...
Attachment #8794427 - Flags: review?(mak77)
I think in general it'd be a good thing if Mark would review even the patches I already r+d, 4 eyes are better than 2.

What's the deadline for this? I'm a little bit scared of it landing in 52, rather than waiting the beginning of the 53 cycle.
Thanks so much for the reviews, Mak! I'll address your feedback and ask Mark to take a look. Richard, if you have cycles, would you mind doing a pass as well? I'll flag you once the patches are updated.

We're thinking 53, too. It's risky to land in 52 this late in the cycle.
(Reporter)

Comment 233

10 months ago
(In reply to Kit Cambridge [:kitcambridge] from comment #232)
> Richard, if you have cycles, would you mind doing a pass as well?

Sure; just let me know when.

Comment 234

10 months ago
mozreview-review
Comment on attachment 8802750 [details]
Bug 1258127 - Move bookmark de-duping logic into `PlacesSyncUtils.bookmarks.dedupe`.

https://reviewboard.mozilla.org/r/87042/#review87280

::: toolkit/components/places/PlacesSyncUtils.jsm:330
(Diff revision 1)
> -   */
> -  changeGuid: Task.async(function* (oldGuid, newGuid) {
> -    PlacesUtils.BOOKMARK_VALIDATORS.guid(oldGuid);
> -    PlacesUtils.BOOKMARK_VALIDATORS.guid(newGuid);
> -
> -    let itemId = yield PlacesUtils.promiseItemId(oldGuid);
> +   *        The remote ID that should replace the local ID.
> +   * @param remoteParentSyncId
> +   *        The remote record's parent ID.
> +   * @return {Promise} resolved once the ID has been changed.
> +   * @resolves to an object containing new change records for the old item,
> +   *           the local parent, and the remote parent. The bookmarks engine

maybe add "possibly" to this comment?

::: toolkit/components/places/PlacesSyncUtils.jsm:1487
(Diff revision 1)
> +      WHERE guid = :localParentGuid`,
> +      { localParentGuid });
> +
> +    // And we also add the parent as reflected in the incoming record as the
> +    // de-dupe process might have used an existing item in a different folder.
> +    // But only if the parent exists, otherwise we will upload a deleted item

This comment seems stale re the upload of a deleted item?

::: toolkit/components/places/tests/unit/test_sync_utils.js
(Diff revision 1)
>    }
>  
>    yield PlacesUtils.bookmarks.eraseEverything();
>  });
>  
> -add_task(function* test_changeGuid_invalid() {

Would it be particularly difficult to get a few tests for the new PSU method?
Attachment #8802750 - Flags: review?(markh) → review+

Comment 235

10 months ago
mozreview-review
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

https://reviewboard.mozilla.org/r/51879/#review88220

Awesome work Kit!

::: toolkit/components/places/PlacesSyncUtils.jsm:172
(Diff revision 32)
>    }),
>  
>    /**
> +   * Returns a changeset containing local bookmark changes since the last sync.
> +   * Updates the sync status of all "NEW" bookmarks to "NORMAL" as a side
> +   * effect.

I think it might be worth appending a brief description of why the side-effect exists - as written it sounds like an "accidental" side-effect rather than an explicit action we take. Somehthing like "... so sync recovers correctly if it is interrupted before the sync completes" or similar.

::: toolkit/components/places/PlacesSyncUtils.jsm:256
(Diff revision 32)
> +                syncStatus: PlacesUtils.bookmarks.SYNC_STATUS.NORMAL });
> +          }
> +
> +          // Remove tombstones.
> +          yield removeTombstones(db, syncedTombstoneGuids);
> +        }

it might be nice to log a summary of what was done here (eg, how many skipped, updated and removed)

::: toolkit/components/places/PlacesSyncUtils.jsm:305
(Diff revision 32)
> +      "BookmarkSyncUtils: reset", function (db) {
> +        return db.executeTransaction(function* () {
> +          // Reset change counters and statuses for all bookmarks.
> +          yield db.executeCached(`
> +            UPDATE moz_bookmarks
> +            SET syncChangeCounter = 1,

I'm surprised the change counter isn't zero?

::: toolkit/components/places/PlacesSyncUtils.jsm:1153
(Diff revision 32)
> -    WHERE parent = :tagsFolder AND title = :tag LIMIT 1`,
> -    { tagsFolder: PlacesUtils.bookmarks.tagsFolder, tag });
> +    SELECT id
> +    FROM moz_bookmarks
> +    WHERE type = :type AND
> +          parent = :tagsFolderId AND
> +          title = :tag
> +    LIMIT 1`,

Even though this existed before, I wonder if LIMIT 2 and a log.warn/error if we get 2 back might make sense?

::: toolkit/components/places/PlacesSyncUtils.jsm:1499
(Diff revision 32)
> +      WHERE guid IN (${new Array(chunk.length).fill("?").join()})`,
> +      chunk);
> +  }
> +});
> +
> +function chunkArray(array, chunkLength) {

this could almost be a generator?
Attachment #8751134 - Flags: review?(markh) → review+

Comment 236

10 months ago
mozreview-review
Comment on attachment 8752394 [details]
Bug 1258127 - Update the bookmarks engine to pull changes from Places.

https://reviewboard.mozilla.org/r/52589/#review88284

\o/
Attachment #8752394 - Flags: review?(markh) → review+
(Assignee)

Comment 237

10 months ago
mozreview-review-reply
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

https://reviewboard.mozilla.org/r/52587/#review87728

> do you have a benchmark of this on a large dataset (something like 10000 bookmarks).
> Whatever we do here, is a direct price the user pays on startup. We usually try to executeAsync expensive tasks, but in this case this should probably stay synchronous for data safety reasons.

I tested the recursive query with a tree 3 levels deep and 7 folders per level, for all four syncable roots and a custom non-syncable root (14016 bookmarks total). It took ~7ms on my dev machine (MBP with an SSD). With my own places.sqlite file (2806 bookmarks), it took ~3ms. I know Windows has much slower I/O; might push that to Try and see how well it does. It takes much longer to fill the DB than to run the query.
(Assignee)

Comment 238

10 months ago
mozreview-review-reply
Comment on attachment 8794426 [details]
Bug 1258127 - Update `Bookmarks.jsm` (JS) to track sync changes.

https://reviewboard.mozilla.org/r/80900/#review87744

> I just noticed this in the cpp file, and went back to check this.
> the problem here is that this API runs a synchronous query, while most of Bookmarks.jsm is async.
> 
> This comment is valid for both patches, how often may this happen in reality?
> I'm trying to figure out if we need an async fast-path for this case.

I tried this out with `reorder`, which also needs to remove that anno. Reordering an entire folder containing 5k bookmarks blocked the main thread for 6 seconds, so that's a no-go. We'll definitely want a fast path that can remove the anno for multiple items. That query takes ~2ms on my laptop. One snag is there's no `nsIAnnotationService::getObservers` method, so we can't notify `onItemAnnotationRemoved`. Should we implement that separately, since this patch isn't landing until 53?

We can also avoid removing the anno for "NEW" and "UNKNOWN" bookmarks, because Sync only sets it on "NORMAL" ones. If the user never sets up Sync, we don't need to run it at all.
(In reply to Kit Cambridge [:kitcambridge] from comment #237)
> I tested the recursive query with a tree 3 levels deep and 7 folders per
> level, for all four syncable roots and a custom non-syncable root (14016
> bookmarks total). It took ~7ms on my dev machine (MBP with an SSD). With my
> own places.sqlite file (2806 bookmarks), it took ~3ms.

it sounds good enough, thanks for checking.

(In reply to Kit Cambridge [:kitcambridge] from comment #238)
> I tried this out with `reorder`, which also needs to remove that anno.
> Reordering an entire folder containing 5k bookmarks blocked the main thread
> for 6 seconds, so that's a no-go. We'll definitely want a fast path that can
> remove the anno for multiple items.

yep, and be async for Bookmarks.jsm

> That query takes ~2ms on my laptop. One
> snag is there's no `nsIAnnotationService::getObservers` method, so we can't
> notify `onItemAnnotationRemoved`. Should we implement that separately, since
> this patch isn't landing until 53?

Or we can add an API to the annotations service?

> We can also avoid removing the anno for "NEW" and "UNKNOWN" bookmarks,
> because Sync only sets it on "NORMAL" ones. If the user never sets up Sync,
> we don't need to run it at all.

any main-tread I/O we can avoid is a win.
Blocks: 1313890
(Assignee)

Comment 240

10 months ago
mozreview-review-reply
Comment on attachment 8794427 [details]
Bug 1258127 - Update `nsNavBookmarksService` (C++) to track sync changes.

https://reviewboard.mozilla.org/r/80902/#review87736

> Same question as for the bookmarks.jsm case, with the difference here we can't add any async fast-path...

One idea I had was only removing the anno for bookmarks with syncStatus = NORMAL. Since only Sync should set this anno, we can theoretically skip `PreventSyncReparenting` for other bookmarks. There are two issues with this, but it seems like the way to go:

1) Existing bookmarks with that anno, before the `syncStatus` column was added. There shouldn't be many of these, and it's easy to fix; our migration code can set syncStatus = NORMAL for them.

2) Restore, where every bookmark is set to SYNC_STATUS_UNKNOWN. Restored bookmarks might have this anno, but I don't think we care too much. We'll need to reconcile with the server, anyway; the worst side effect is the user moves a restored bookmark to another folder, and Sync moves it back. User grumbles and moves the bookmark again, Sync tracks the change and reuploads, everything's OK. We don't even do this kind of reconciliation yet, and letting vaporware drive this decision is fraught.

We should also probably remove the SYNC_PARENT_ANNO from all items in `PlacesSyncUtils.bookmarks.reset`. It doesn't make sense to have them hang around once Sync is disconnected.

Another idea is, we make this an async statement. The possible race here is: a user sees that Sync moved a bookmark into unfiled (because it doesn't have the parent yet), and moves it out of unfiled. Sync then downloads the parent before our async statement runs, and moves the bookmark back to the parent it just downloaded. If the user picks a different parent than what's on the server, we'll undo their change, and pointlessly increment the change counter for that orphan. Probably bad.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 248

10 months ago
mozreview-review
Comment on attachment 8794427 [details]
Bug 1258127 - Update `nsNavBookmarksService` (C++) to track sync changes.

https://reviewboard.mozilla.org/r/80902/#review88898

::: toolkit/components/places/nsNavBookmarks.cpp:177
(Diff revision 4)
> +  NS_ENSURE_SUCCESS(rv, rv);
> +  if (isOrphan) {
> +    rv = annosvc->RemoveItemAnnotation(aBookmark.id,
> +                                       NS_LITERAL_CSTRING(SYNC_PARENT_ANNO),
> +                                       aSource);
> +    NS_ENSURE_SUCCESS(rv, rv);

I didn't get if we're going to land this and then work on improving the annotations path perf, or if a new part is coming for this.
Attachment #8794427 - Flags: review?(mak77) → review+

Comment 249

10 months ago
mozreview-review
Comment on attachment 8805911 [details]
Bug 1258127 - Move bookmark deletion logic into `PlacesSyncUtils.bookmarks.remove`.

https://reviewboard.mozilla.org/r/89518/#review88918

Looks good to me as long as you make sure the comments aren't stale.

Also make sure this runs test_bookmark_conflicts in TPS without erroring.

::: services/sync/modules/engines/bookmarks.js:475
(Diff revision 1)
> -        this._modified.set(modifiedSyncID, { timestamp: now, deleted: false });
> -      }
>      }
>    },
>  
>    // We avoid reviving folders since reviving them properly would require

Is this still true? The check for the kind seems to be removed so this comment should be updated (or moved, if It's just elsewhere now)

Edit: Hmm... Looks like the folder logic happens in PSU. Probably just mention that the logic occurs in that file (Or something, as it is it looks like it's just a stale comment).
Attachment #8805911 - Flags: review?(tchiovoloni) → review+
Depends on: 1314084
(Assignee)

Comment 250

10 months ago
mozreview-review-reply
Comment on attachment 8794427 [details]
Bug 1258127 - Update `nsNavBookmarksService` (C++) to track sync changes.

https://reviewboard.mozilla.org/r/80902/#review88898

> I didn't get if we're going to land this and then work on improving the annotations path perf, or if a new part is coming for this.

I went with the first option in comment 240: add an async fast path for JS in part 2, and only remove the anno for bookmarks with syncStatus = NORMAL. There's no penalty if Sync isn't set up, since unsynced bookmarks will always stay in NEW (or UNKNOWN, if they're restored). If Sync is set up, the current tracker already calls `removeItemAnnotation` on every move (http://searchfox.org/mozilla-central/rev/445097654c1cc1098ee0171a29c439afe363a588/services/sync/modules/engines/bookmarks.js#1340-1342), so this should be an improvement.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Reporter)

Comment 265

10 months ago
mozreview-review
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

https://reviewboard.mozilla.org/r/52587/#review89298

::: toolkit/components/places/Database.cpp:98
(Diff revision 28)
>  #define MOBILE_ROOT_GUID "mobile______"
>  #define MOBILE_ROOT_ANNO "mobile/bookmarksRoot"
>  
>  // We use a fixed title for the mobile root to avoid marking the database as
>  // corrupt if we can't look up the localized title in the string bundle. Sync
>  // sets the title to the localized version when it creates the left pane query.

Sidenote: it seems to me that both of these things are now untrue. Firefox includes the string for the mobile root (it's no longer part of the Firefox Sync add-on), and Sync no longer creates the left pane query after Bug 1302901, right?

Can you file a follow-up to clean this up?

::: toolkit/components/places/Database.cpp:1952
(Diff revision 28)
> +    "SELECT syncStatus FROM moz_bookmarks"
> +  ), getter_AddRefs(syncStatusStmt));
> +  if (NS_FAILED(rv)) {
> +    // We may already have bookmarks on the server that we need to reconcile,
> +    // so we set the sync status to SYNC_STATUS_UNKNOWN = 0 for existing
> +    // bookmarks. After reconciliation, we'll update the status to

Leaving a note partly for myself as I move through this review.

It would seem that once we do this, we have two choices:

- Sync from scratch, starting everything at UNKNOWN. That might introduce problems.
- Use the tracker to initialize the database: the first time we sync after this migration (or using UNKNOWN as a sentinel), apply the existing tracker contents to turn some into CHANGED and others into NORMAL.

::: toolkit/components/places/Database.cpp:1984
(Diff revision 28)
> +
> +  // Sync annotates orphaned items with the special `SYNC_PARENT_ANNO`. This
> +  // indicates an item's parent folder doesn't exist locally, which is possible
> +  // because Sync doesn't buffer incoming records before applying them. Only
> +  // items that have been synced will have this anno, so we set their status
> +  // to `NORMAL`.

Does this anno persist across restarts, even when we've finished up parenting at the end of a sync?

Is this how Weave expected this to work -- that incoming records move into Unsorted but are annotated until their parent arrives?

::: toolkit/components/places/nsINavBookmarksService.idl:326
(Diff revision 28)
>  
>    /**
> +   * Sync status flags.
> +   */
> +  // "UNKNOWN" means sync tracking information was lost because the item
> +  // was restored from a backup, and needs to be reconciled with the server.

Worth commenting here about the initial value and why…

::: toolkit/components/places/nsPlacesTables.h:108
(Diff revision 28)
>      ", position INTEGER" \
>      ", title LONGVARCHAR" \
>      ", keyword_id INTEGER" \
> +    /* Schema version 37 repurposes this field to mean "syncable", indicating
> +       whether a bookmark should be synced ("1") or not (NULL). Unlike the sync
> +       status, syncability is inherited from parent to child: if a folder is

Does that mean the child's 'syncable' is ignored? Or must be kept up to date with the parent? Or is automatically kept up to date? Document!

::: toolkit/components/places/nsPlacesTables.h:147
(Diff revision 28)
> +// all bookmarks will stay in SYNC_STATUS_NEW.
> +#define CREATE_MOZ_BOOKMARKS_DELETED NS_LITERAL_CSTRING( \
> +  "CREATE TABLE moz_bookmarks_deleted (" \
> +    "  guid TEXT PRIMARY KEY" \
> +    ", dateRemoved INTEGER NOT NULL DEFAULT 0" \
> +    ", syncStatus INTEGER NOT NULL DEFAULT 0" \

Why do we need `syncStatus` here? How do we use this differently if this is `UNKNOWN`?

I would imagine that `UNKNOWN` => a full sync, in which case you know whether an item should have been `NEW` or `NORMAL` by the time you come to upload deletions…

::: toolkit/components/places/tests/migration/test_current_from_v35.js:1
(Diff revision 28)
> +const syncableRoots = ["menu________", "toolbar_____", "unfiled_____",

PD block:

```
/* Any copyright is dedicated to the Public Domain.
   http://creativecommons.org/publicdomain/zero/1.0/ */
```
Attachment #8752393 - Flags: review?(rnewman) → review+
(Reporter)

Comment 266

10 months ago
mozreview-review
Comment on attachment 8794427 [details]
Bug 1258127 - Update `nsNavBookmarksService` (C++) to track sync changes.

https://reviewboard.mozilla.org/r/80902/#review89342

I gave only a cursory look at C++ coding style, and I wasn't diving as deep as I expect mak did.

::: toolkit/components/places/nsNavBookmarks.h:39
(Diff revision 6)
>      int32_t position;
>      int64_t placeId;
>      int64_t parentId;
>      int64_t grandParentId;
>      int32_t type;
> +    int32_t syncStatus;

`uint8_t`? Or is this some byte-alignment optimization?

::: toolkit/components/places/nsNavBookmarks.h:66
(Diff revision 6)
>    };
>  
> +  struct TombstoneData {
> +    nsCString guid;
> +    PRTime dateRemoved;
> +    int32_t syncStatus;

Same.

::: toolkit/components/places/nsNavBookmarks.h:280
(Diff revision 6)
>                             int64_t* _parentId);
>  
>    nsresult GetLastChildId(int64_t aFolder, int64_t* aItemId);
>  
> +  nsresult AddSyncChangesForBookmarksWithURL(const nsACString& aURL,
> +                                             int64_t aSyncChangeDelta);

The change counter is an `int32_t`, right? Why do these take a 64-bit int?

::: toolkit/components/places/nsNavBookmarks.cpp:149
(Diff revision 6)
> +    return nsINavBookmarksService::SYNC_STATUS_UNKNOWN;
> +  }
> +  return nsINavBookmarksService::SYNC_STATUS_NEW;
> +}
> +
> +// Indicates whether an item has been uploaded to the Sync server.

has ever

::: toolkit/components/places/nsNavBookmarks.cpp:491
(Diff revision 6)
>    }
>    NS_ENSURE_SUCCESS(rv, rv);
>  
>    // Could use IsEmpty because our callers check for GUID validity,
>    // but it doesn't hurt.
> -  if (_guid.Length() == 12) {
> +  bool hasExistingGuid = _guid.Length() == 12;

This makes it really hard to work around Bug 1313026…

::: toolkit/components/places/nsNavBookmarks.cpp:1453
(Diff revision 6)
> -  rv = SetItemDateInternal(LAST_MODIFIED, bookmark.parentId, now);
> +  rv = SetItemDateInternal(LAST_MODIFIED, syncChangeDelta,
> +                           bookmark.parentId, now);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +  if (sameParent) {
> +    // If we're moving within the same container, only the parent needs a sync
> +    // change. Update the item's last modified date without bumping its counter.

Follow-up: mark as changed all Separator siblings with an index greater than either the source or destination indices. They'll need new 'pos' values. (Fixes Bug 1228827.)

::: toolkit/components/places/nsNavBookmarks.cpp:1467
(Diff revision 6)
> +    NS_ENSURE_SUCCESS(rv, rv);
> +  }
> +
> +  bool isChangingTagFolder = bookmark.parentId == mTagsRoot;
> +  if (isChangingTagFolder) {
> +    // Moving a tag folder out of the tags root untags all its bookmarks. This

Wow, you've been thorough!

::: toolkit/components/places/nsNavBookmarks.cpp:1817
(Diff revision 6)
> +
> +  nsCOMPtr<mozIStorageStatement> statement = mDB->GetStatement(
> +   "UPDATE moz_bookmarks SET "
> +    "syncChangeCounter = syncChangeCounter + :delta "
> +   "WHERE folder_type /* syncable */ = '1' AND "
> +         "type = :type AND "

I'd be inclined to put `fk` and `type` first.

::: toolkit/components/places/nsNavBookmarks.cpp:2280
(Diff revision 6)
>  
> +bool
> +nsNavBookmarks::MatchesSyncable(int64_t aItemId, int32_t aSyncable)
> +{
> +  nsCOMPtr<mozIStorageStatement> stmt = mDB->GetStatement(
> +    "SELECT 1 FROM moz_bookmarks WHERE id = :item_id AND folder_type = '1'");

`/* syncable */`

::: toolkit/components/places/nsNavBookmarks.cpp:2882
(Diff revision 6)
> +      // keyword.
> +      nsCOMPtr<mozIStorageStatement> stmt = mDB->GetStatement(
> +        NS_LITERAL_CSTRING(
> +        "UPDATE moz_bookmarks SET "
> +         "syncChangeCounter = syncChangeCounter + :delta "
> +        "WHERE id IN (") + changedIds + NS_LITERAL_CSTRING(")")

This will fail if `bookmarks.Length()` > `SQLITE_MAX_VARIABLE_NUMBER`, which is usually 999.

You will need to do this iteratively within a transaction.

::: toolkit/components/places/nsNavBookmarks.cpp:2999
(Diff revision 6)
> +
> +  if (syncChangeDelta && !bookmarks.IsEmpty()) {
> +    // Build a query to update all bookmarks in a single statement.
> +    nsAutoCString changedIds;
> +    changedIds.AppendInt(bookmarks[0].id);
> +    for (uint32_t i = 1; i < bookmarks.Length(); ++i) {

Same.
Attachment #8794427 - Flags: review?(rnewman) → review+
(Assignee)

Comment 267

10 months ago
mozreview-review-reply
Comment on attachment 8752393 [details]
Bug 1258127 - Update the Places schema to track bookmark sync changes.

https://reviewboard.mozilla.org/r/52587/#review89298

> Sidenote: it seems to me that both of these things are now untrue. Firefox includes the string for the mobile root (it's no longer part of the Firefox Sync add-on), and Sync no longer creates the left pane query after Bug 1302901, right?
> 
> Can you file a follow-up to clean this up?

This is still true, unfortunately. Firefox includes the string, and creates the root, but Sync manages the query. We'll eventually fix it in bug 647605.

> Leaving a note partly for myself as I move through this review.
> 
> It would seem that once we do this, we have two choices:
> 
> - Sync from scratch, starting everything at UNKNOWN. That might introduce problems.
> - Use the tracker to initialize the database: the first time we sync after this migration (or using UNKNOWN as a sentinel), apply the existing tracker contents to turn some into CHANGED and others into NORMAL.

Yes, we're going to need a migration strategy. We currently start everything at `UNKNOWN`, but that means we won't upload deletions for items that are already syncing. At the same time, we don't want to set our server on fire with full reconciles for everyone after upgrading. And I suspect we can't use the `services.sync.username` pref, because we might actually have records locally that aren't on the server.

> Does this anno persist across restarts, even when we've finished up parenting at the end of a sync?
> 
> Is this how Weave expected this to work -- that incoming records move into Unsorted but are annotated until their parent arrives?

The anno does persist across restarts (and gets backed up and restored!) I think Weave intentionally expected this to work, if a partial download was interrupted by shutdown.

> Does that mean the child's 'syncable' is ignored? Or must be kept up to date with the parent? Or is automatically kept up to date? Document!

We don't update a child's syncability after it's created. It's inherited from a root, and moving between syncable and non-syncable roots isn't allowed. There are comments in `Bookmarks.update` and `nsNavBookmarksService::MoveItem`; I should point to those (or reproduce the explanation) here.

> Why do we need `syncStatus` here? How do we use this differently if this is `UNKNOWN`?
> 
> I would imagine that `UNKNOWN` => a full sync, in which case you know whether an item should have been `NEW` or `NORMAL` by the time you come to upload deletions…

Good catch! We no longer store tombstones for `UNKNOWN` bookmarks, and, as you note, we already know what to do in case of a full sync. This entire column can be removed.
(Reporter)

Comment 268

10 months ago
mozreview-review
Comment on attachment 8794426 [details]
Bug 1258127 - Update `Bookmarks.jsm` (JS) to track sync changes.

https://reviewboard.mozilla.org/r/80900/#review89394

::: toolkit/components/places/Bookmarks.jsm:1681
(Diff revision 6)
>  }
> +
> +// Indicates whether an item has been uploaded to the Sync server. This
> +// determines whether we write a tombstone if the item is deleted, or remove
> +// its orphan anno if the item is moved.
> +function isItemSyncing(item) {

Wrong tense. `shouldItemSync`, `doesItemSync`, `hasItemSynced`?

Also, you're checking for `NORMAL`, which means you won't store a tombstone for an `UNKNOWN` — so perhaps invert the condition and call this `isLocalOnlyItem`?

::: toolkit/components/places/Bookmarks.jsm:1716
(Diff revision 6)
> +  if (!syncedItems.length) {
> +    return Promise.resolve();
> +  }
> +  let dateRemoved = PlacesUtils.toPRTime(Date.now());
> +  let valuesTable = syncedItems.map(item => `(
> +    ${JSON.stringify(item.guid)},

This still makes me sad :/

::: toolkit/components/places/Bookmarks.jsm:1719
(Diff revision 6)
> +  let dateRemoved = PlacesUtils.toPRTime(Date.now());
> +  let valuesTable = syncedItems.map(item => `(
> +    ${JSON.stringify(item.guid)},
> +    ${dateRemoved},
> +    ${item._syncStatus}
> +  )`).join();

These should be joined with a comma.

::: toolkit/components/places/Bookmarks.jsm:1752
(Diff revision 6)
> +  return db.execute(`
> +    UPDATE moz_bookmarks SET
> +      syncChangeCounter = syncChangeCounter + :syncChangeDelta
> +    WHERE folder_type /* syncable */ = '1' AND
> +          type = :type AND
> +          fk = (SELECT fk FROM moz_bookmarks WHERE parent = :parent)

Same comment as in the C++ version.

::: toolkit/components/places/tests/PlacesTestUtils.jsm:177
(Diff revision 6)
> +   */
> +  resetSyncedBookmarks() {
> +    return PlacesUtils.withConnectionWrapper(
> +      "PlacesTestUtils: resetSyncedBookmarks", db => {
> +        return db.executeTransaction(function* () {
> +          yield db.executeCached(

Use `execute` instead of `executeCached` — you'll only call this once.

::: toolkit/components/places/tests/PlacesTestUtils.jsm:179
(Diff revision 6)
> +    return PlacesUtils.withConnectionWrapper(
> +      "PlacesTestUtils: resetSyncedBookmarks", db => {
> +        return db.executeTransaction(function* () {
> +          yield db.executeCached(
> +            `UPDATE moz_bookmarks
> +             SET syncChangeCounter = 0`);

I could have sworn your docs/notes/an earlier patch said this would be reset to 1, not 0.

::: toolkit/components/places/tests/PlacesTestUtils.jsm:235
(Diff revision 6)
> -});
> +    });
> +  },
> +
> +  fetchBookmarkSyncFields: Task.async(function* (...aGuids) {
> +    let db = yield PlacesUtils.promiseDBConnection();
> +    let rows = yield db.executeCached(`

I'm pretty sure you don't want to use `executeCached` for any statement that includes a computed list of GUIDs.
Attachment #8794426 - Flags: review?(rnewman) → review+
(Reporter)

Comment 269

10 months ago
mozreview-review
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

https://reviewboard.mozilla.org/r/51879/#review89666

::: toolkit/components/places/PlacesSyncUtils.jsm:149
(Diff revision 35)
> +   *      for an explanation of why we update the sync status.
> +   */
> +  pullNewChanges() {
> +    return PlacesUtils.withConnectionWrapper(
> +      "BookmarkSyncUtils: pullNewChanges", function (db) {
> +        return pullChanges(db, { pullAll: false });

'bare' function name (not foo.pullChanges) confused me for a minute!

::: toolkit/components/places/PlacesSyncUtils.jsm:191
(Diff revision 35)
> +        let counts = { skipped: 0, updated: 0, tombstones: 0 };
> +
> +        yield db.executeTransaction(function* () {
> +          let syncedTombstoneGuids = [];
> +
> +          for (let syncId in changeRecords) {

Some of this logic doesn't need to be within the transaction.

Consider pulling the validation/transform/etc. code to outside the transaction, keeping the critical section small and avoiding the need to roll back if validation fails.

This will also make it explicit that you intend nothing to be committed if validation fails for any `changeRecord` -- currently `validateChangeRecord` will throw, so the transaction will be rolled back, but you might as well validate and throw before you do a bunch of writing in a transaction!

::: toolkit/components/places/PlacesSyncUtils.jsm:218
(Diff revision 35)
> +            // Reduce the change counter and update the sync status for
> +            // reconciled and uploaded items. If the bookmark was updated
> +            // during the sync, its change counter will still be > 0 for the
> +            // next sync.
> +            counts.updated++;
> +            yield db.executeCached(`

Indeed, you'll note that you can build your counts entirely outside the transaction, collecting `[guid, delta]` pairs, then applying those in one go. If the transaction commits, your counts will be correct.

::: toolkit/components/places/PlacesSyncUtils.jsm:288
(Diff revision 35)
> +      "BookmarkSyncUtils: reset", function (db) {
> +        return db.executeTransaction(function* () {
> +          // Reset change counters and statuses for all bookmarks.
> +          yield db.executeCached(`
> +            UPDATE moz_bookmarks
> +            SET syncChangeCounter = 0,

Same question as on another commit.

::: toolkit/components/places/PlacesSyncUtils.jsm:1356
(Diff revision 35)
> +function addRowToChangeRecords(row, changeRecords) {
> +  // Timestamps are stored in microseconds. If we're pulling all items,
> +  // we treat them as modified long ago instead of using their
> +  // real modification or deletion time.
> +  let syncId = BookmarkSyncUtils.guidToSyncId(row.getResultByName("guid"));
> +  let modified = row.getResultByName("modified") / 1000 / 1000;

`1000000`, just on the off chance this isn't optimized away by the compiler.

::: toolkit/components/places/PlacesSyncUtils.jsm:1397
(Diff revision 35)
> +    FROM moz_bookmarks_deleted`,
> +    { syncChangeCounter: options.pullAll ? 0 : 1,
> +      modified: options.pullAll ? 0 : null },
> +    row => addRowToChangeRecords(row, changeRecords));
> +
> +  yield markNewBookmarksAsSyncing(db, changeRecords);

You're conflating _requesting the list_ with updating the bookkeeping as if the records have been uploaded. Are you sure you've landed on the right side of the fence?

Imagine that this runs on a flaky network connection. We apply downloaded records, call `pullChanges`, then instantly fail the upload.

We want to make sure that:

- Records we don't successfully upload will be uploaded on the next sync.
- Incoming changes to records that we didn't upload are reconciled.
- Records we uploaded are eventually marked as `NORMAL` so that we upload deletion tombstones.

Presumably we handle reconciling incoming records against `NEW` local changes, particularly if they're identical.

Will we upload these newly-`NORMAL` records if the sync is interrupted right after `pullAllRecords` is called?
(Assignee)

Comment 270

10 months ago
mozreview-review-reply
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

https://reviewboard.mozilla.org/r/51879/#review89666

> Same question as on another commit.

I'm trying to remember why I set it to 1 before. I think that was to match the behavior of a "fresh" (not migrated) database: if you inserted all your bookmarks without setting up Sync, everything would have a change counter of > 0. It doesn't make much of a difference for Sync: if you reconnect, it'll call `pullAllChanges` on the first sync, anyway. But, if everything has a change counter > 0, then `pullNewChanges` would return the complete set, and we could get rid of `pullAllChanges`.

I'm not sure yet what the correct behavior should be for a migrated database. Setting the counter to 1 would cause us to reupload everything on upgrade, which seems bad. We can read the existing tracker JSON file, but that only holds changes since the last sync.

> You're conflating _requesting the list_ with updating the bookkeeping as if the records have been uploaded. Are you sure you've landed on the right side of the fence?
> 
> Imagine that this runs on a flaky network connection. We apply downloaded records, call `pullChanges`, then instantly fail the upload.
> 
> We want to make sure that:
> 
> - Records we don't successfully upload will be uploaded on the next sync.
> - Incoming changes to records that we didn't upload are reconciled.
> - Records we uploaded are eventually marked as `NORMAL` so that we upload deletion tombstones.
> 
> Presumably we handle reconciling incoming records against `NEW` local changes, particularly if they're identical.
> 
> Will we upload these newly-`NORMAL` records if the sync is interrupted right after `pullAllRecords` is called?

We talked about this on IRC, but I think this is right. The comment above `markNewBookmarksAsSyncing` mentions we do this in case of an interrupted sync (specifically, a partial upload where we've already uploaded a `NEW` record to the server). We only write tombstones for `NORMAL` bookmarks, so, if the bookmark is deleted before the next sync, we won't upload the deletion to the server. With this approach, we might upload a tombstone for an item that other clients haven't seen yet, but that's fine; they'll ignore it.

We only use the sync status for two things: deciding whether to write tombstones, and removing the "sync/parent" orphan anno after moving. It might be worth splitting up `NORMAL` into `PENDING` and `SYNCED`, though the gains aren't as clear to me (maybe future-proofing? And we could avoid the orphan anno check for `PENDING` bookmarks).
(Reporter)

Comment 271

10 months ago
(In reply to Kit Cambridge [:kitcambridge] from comment #270)

> It doesn't make much of a difference for Sync: if you reconnect, it'll
> call `pullAllChanges` on the first sync, anyway. But, if everything has a
> change counter > 0, then `pullNewChanges` would return the complete set, and
> we could get rid of `pullAllChanges`.

If they're all zero, and we do a full from-0 sync without pullAllChanges, we will only ever download records. None of the local ones will appear to be new, so they won't be uploaded.

If they're all 1, and we do a full from-0 sync without pullAllChanges, we will download server records, reconcile and un-track any records that don't need to be uploaded, and upload the rest.

So I guess this comes down to: how confident are you that your first-sync handling is correct, and will call pullAllChanges each time?

The current behavior is to grab all IDs and dump them into the tracker. They don't ever get removed from the tracker until they're synced. It seems the change counter is the closest analogue to that we have…
We currently have three sync statuses: UNKNOWN, NEW, and NORMAL. I think they need some more thought. Some considerations, in no particular order:

* Migrating existing Sync users. Unfortunately, our JSON tracker gets cleared before each sync, so we don't know which records have been uploaded to the server before this point. We're going to need a full reconciliation eventually. In the meantime, we *do* want to write tombstones for these bookmarks. When a migrated item is changed, we'll bump its change counter, sync, and change its status to indicate it's synced.

* Migrating bookmarks for users that haven't set up Sync. This is the majority of our user base. We *don't* want to write tombstones for any of these bookmarks; otherwise, we'd endlessly accumulate rows in `moz_bookmarks_deleted`. (In any case, it probably makes sense to have a maintenance function that evicts old tombstones).

* Restoring. This needs a full reconcile, but we don't want to write a tombstone if a restored bookmark is deleted. There's a risk that we'll reintroduce deleted bookmarks when we reconcile with the server, but this is something the user can fix. Tombstones aren't backed up, anyway, and trying to reconcile local data with the server without knowing where they diverged is already fraught. Just like the migrated state, we'll change the status back to synced after uploading.

* Interrupted uploads. Currently, we mark new bookmarks as synced *before* uploading them. This is so that we can write tombstones if the bookmark made it to the server, the sync was interrupted, and the new bookmark was deleted before the next sync. For clarity, it could make sense to have a separate status. The bookmark's change counter is still > 0, so it'll be synced regardless.

This leads, roughly, to the following statuses:

* NEW: New bookmark that hasn't been synced yet. Doesn't need tombstones, not reconciled.
* MIGRATED_AND_SYNC_DISABLED: Migrated from an older profile, and user hasn't set up Sync. Same as "NEW".
* MIGRATED_AND_SYNC_ENABLED: Migrated from an older profile, and Sync is enabled. Needs tombstones, needs to be reconciled.
* RESTORED_AND_SYNC_DISABLED: Restored from backup (manually or automatically). Same as "NEW".
* RESTORED_AND_SYNC_ENABLED: Restored, and Sync is enabled. Doesn't need tombstones, needs to be reconciled.
* UPLOADING: Pulled from the database and queued for uploading, but not on the server yet. Needs tombstones, doesn't need to be reconciled.
* SYNCED: The bookmark is on the server. Same as "UPLOADING": needs tombstones, doesn't need to be reconciled.

Our constraints:

* IIUC, there's no reliable way to tell if Sync is enabled. http://searchfox.org/mozilla-central/rev/f5c9e9a249637c9abd88754c8963ecb3838475cb/services/sync/Weave.js#125-128 checks `services.sync.username`, but Mark mentioned we really shouldn't use this. This pref also isn't migrated in Firefox Refresh (bug 1311091)...even though Sync somehow continues to work.

* We can't do a full reconcile for everyone immediately after upgrading. I asked Bob M. about this: new servers might be OK, but older servers will struggle with the load, because we aren't pruning records on them by TTL. We need to stagger these, and Sync needs to be usable in the meantime.

* What happens if the same profile is used with Nightly, Dev Edition, Beta, and Release? What happens if they set up Sync on Nightly, then downgrade to Release? How about if they set up Sync on Release, upgrade to Nightly, then downgrade again?
(Reporter)

Comment 273

10 months ago
(In reply to Kit Cambridge [:kitcambridge] from comment #272)

> * Migrating existing Sync users. Unfortunately, our JSON tracker gets
> cleared before each sync, so we don't know which records have been uploaded
> to the server before this point. We're going to need a full reconciliation
> eventually.

Why not assume they all have, apart from those mentioned in the tracker JSON file? That's what we do now…


> * What happens if the same profile is used with Nightly, Dev Edition, Beta,
> and Release? What happens if they set up Sync on Nightly, then downgrade to
> Release? How about if they set up Sync on Release, upgrade to Nightly, then
> downgrade again?

I don't think we can support database downgrades in this case -- the downgrade would have to ship in an earlier release, perhaps even two versions earlier. And even if we were to allow the DB to be downgraded, we'd need the DB migration to read and write a tracker file, or we'd have to maintain two trackers.

If this change gets backed out, the DB change itself can't be undone.

Updated

9 months ago
Duplicate of this bug: 1236209
(Reporter)

Comment 275

9 months ago
mozreview-review
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

https://reviewboard.mozilla.org/r/51879/#review91050

Setting this to an r- to let RB know that there's stuff to fix!
Attachment #8751134 - Flags: review?(rnewman) → review-
(Assignee)

Comment 276

9 months ago
mozreview-review-reply
Comment on attachment 8794426 [details]
Bug 1258127 - Update `Bookmarks.jsm` (JS) to track sync changes.

https://reviewboard.mozilla.org/r/80900/#review89394

> Wrong tense. `shouldItemSync`, `doesItemSync`, `hasItemSynced`?
> 
> Also, you're checking for `NORMAL`, which means you won't store a tombstone for an `UNKNOWN` — so perhaps invert the condition and call this `isLocalOnlyItem`?

Good call. Renamed to `needsTombstone`, since that's what it really means.

> These should be joined with a comma.

`join()` is equivalent to `join(",")` in JS, but I've added commas here and in `PlacesSyncUtils` for clarity.

> I could have sworn your docs/notes/an earlier patch said this would be reset to 1, not 0.

This is test code. We reset it to 0 as a quick way to mark bookmarks as synced in tests, without going through `pullChanges` or `pushChanges`. I cleaned this up a bit in the latest version.
(Assignee)

Comment 277

9 months ago
mozreview-review-reply
Comment on attachment 8794427 [details]
Bug 1258127 - Update `nsNavBookmarksService` (C++) to track sync changes.

https://reviewboard.mozilla.org/r/80902/#review89342

> `uint8_t`? Or is this some byte-alignment optimization?

AFAICT, mozstorage only defines `Statement::GetInt32` and `Statement::GetInt64`, and I wanted to avoid `static_cast`s.

> The change counter is an `int32_t`, right? Why do these take a 64-bit int?

It's an `int64_t`, to match how we store other integers. Should I make it 32 bits? I can imagine it overflowing eventually, if you never set up Sync, but you'd have to make a lot of changes to your bookmarks over a long period of time.

> This will fail if `bookmarks.Length()` > `SQLITE_MAX_VARIABLE_NUMBER`, which is usually 999.
> 
> You will need to do this iteratively within a transaction.

This doesn't use binding params; I don't think `SQLITE_MAX_VARIABLE_NUMBER` applies here? But it's definitely needed in `InsertTombstones`. Thanks for reminding me! :-)
(Assignee)

Comment 278

9 months ago
mozreview-review-reply
Comment on attachment 8751134 [details]
Bug 1258127 - Add `PlacesSyncUtils` methods for pulling bookmark changes from Places.

https://reviewboard.mozilla.org/r/51879/#review89666

> Some of this logic doesn't need to be within the transaction.
> 
> Consider pulling the validation/transform/etc. code to outside the transaction, keeping the critical section small and avoiding the need to roll back if validation fails.
> 
> This will also make it explicit that you intend nothing to be committed if validation fails for any `changeRecord` -- currently `validateChangeRecord` will throw, so the transaction will be rolled back, but you might as well validate and throw before you do a bunch of writing in a transaction!

Done!

> I'm trying to remember why I set it to 1 before. I think that was to match the behavior of a "fresh" (not migrated) database: if you inserted all your bookmarks without setting up Sync, everything would have a change counter of > 0. It doesn't make much of a difference for Sync: if you reconnect, it'll call `pullAllChanges` on the first sync, anyway. But, if everything has a change counter > 0, then `pullNewChanges` would return the complete set, and we could get rid of `pullAllChanges`.
> 
> I'm not sure yet what the correct behavior should be for a migrated database. Setting the counter to 1 would cause us to reupload everything on upgrade, which seems bad. We can read the existing tracker JSON file, but that only holds changes since the last sync.

Following up: I changed this to 1, and merged `pullAllChanges` and `pullNewChanges` into a single `pullChanges`.

> We talked about this on IRC, but I think this is right. The comment above `markNewBookmarksAsSyncing` mentions we do this in case of an interrupted sync (specifically, a partial upload where we've already uploaded a `NEW` record to the server). We only write tombstones for `NORMAL` bookmarks, so, if the bookmark is deleted before the next sync, we won't upload the deletion to the server. With this approach, we might upload a tombstone for an item that other clients haven't seen yet, but that's fine; they'll ignore it.
> 
> We only use the sync status for two things: deciding whether to write tombstones, and removing the "sync/parent" orphan anno after moving. It might be worth splitting up `NORMAL` into `PENDING` and `SYNCED`, though the gains aren't as clear to me (maybe future-proofing? And we could avoid the orphan anno check for `PENDING` bookmarks).

Thinking more about this, I'd rather not introduce a new `PENDING` state. But I could be persuaded if you feel it would make more sense. We only use the status to determine whether to write a tombstone now, and expect to use it for reconciliation later. `PENDING` doesn't seem to give us an advantage over `NEW`.
Created attachment 8808852 [details]
Bug 1258127 - Add migration logic for old synced bookmarks.

Writing an integration test for this is going to be tricky, but I'll
explain the basic idea here, and add some tests later.

For a non-Sync user upgrading to Firefox 53, we take the following
steps:

1. During Places database migration, we set syncStatus = UNKNOWN and
   syncChangeCounter = 1 for all bookmarks. We also set a Boolean pref
   indicating that Sync should migrate the existing tracker contents.
2. Once Firefox starts up, we wait for 10 seconds before initializing
   Sync. After that, we explicitly force initialization if the tracker
   migration pref is set. Since we load the entire service, this incurs
   a one-time memory increase for the first restart after upgrading,
   even if Sync is disabled.
3. During initialization, we check to see if Sync is disabled or not
   configured. If so, we notify engines to drop migration state via
   the `migration:reset` notification. This also takes care of case
   (c): if the user isn't signed in to Sync now, but decides to try it
   after upgrading, we *don't* want our migration logic to kick in.
4. The bookmarks tracker receives the notification, and clears the
   migration pref. All bookmarks remain in syncStatus = UNKNOWN and
   syncChangeCounter = 1.

For an existing Sync user upgrading to 53, steps 1-2 are the same.
Then:

3. For existing users, we see that Sync is configured, so we fire the
   `migration:start` notification.
4. The bookmarks tracker receives the notification, and checks to see
   if the engine is enabled. If not, go to step 8.
5. Otherwise, bookmark syncing is enabled, so we read the contents of
   the old tracker.
6. We set syncStatus = NORMAL and syncChangeCounter = 0 for all
   syncable bookmarks. As the comment in `migrateTrackerContents`
   points out, this is lossy: we can miss changes between startup and
   the first post-migration sync. We'll also lose changes if the user
   upgrades to 53, makes a change, quits (before the change is synced),
   then downgrades to 52. We could make it so that we only reset the
   status and counter for UNKNOWN bookmarks, but that's not right,
   either, because NEW and NORMAL bookmarks might be modified and
   synced in 52. The existing tracker is already lossy, so this
   behavior isn't much worse. But it is unfortunate.
7. For all existing items in the old tracker, either write tombstones
   or bump the change counter, depending on whether the bookmark
   exists. There's a slight chance an existing tombstone might be for
   a non-syncable item, but the old tracker doesn't have enough
   information to determine that. We assume it's syncable and write
   one.
8. Clear the migration pref. At this point, all migrated bookmarks are
   in syncStatus = NORMAL and syncChangeCounter = 0, if the bookmarks
   engine is enabled.

This approach should work for downgrades, too. Downgrading decrements
the schema version, so we'll run the Places migration code and set the
pref again on the next upgrade.

Finally, we also fire `migration:reset` if the user signs out of Sync,
or signs in for the first time, just in case. We don't want our
migration code running here.

I tried to make this minimally invasive, and work with the existing
Sync modules. Another approach is to add a separate migrator module,
that doesn't depend on Sync initializing. But it does mean we need to
gate Sync initialization on that module, which seems more complicated.

I imagine we'll be able to re-use this approach for other migrations,
if we decide to track sync changes for history, passwords, and add-ons
in Places, Password Manager, and Add-on Manager.

Review commit: https://reviewboard.mozilla.org/r/91586/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/91586/
Attachment #8808852 - Flags: review?(rnewman)
Attachment #8808852 - Flags: review?(markh)
Attachment #8808853 - Flags: review?(rnewman)
Attachment #8808853 - Flags: review?(mak77)
Attachment #8751134 - Flags: review- → review?(rnewman)
Created attachment 8808853 [details]
Bug 1258127 - Update `nsNavBookmarksService` (C++) to track sync changes.

Currently, we prevent moves between syncable and non-syncable subtrees.
This will fail if we already have a non-syncable subtree, like the
left pane root, on the Sync server. If Sync downloads the descendants
before the root, it'll insert them into "unfiled", which is syncable,
then fail to move them back into the non-syncable parent.

We use a trigger to fail syncability updates for non-orphans. This is a
temporary measure until we have two-phase application for downloaded
bookmarks.

Review commit: https://reviewboard.mozilla.org/r/91588/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/91588/
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 288

9 months ago
mozreview-review
Comment on attachment 8808853 [details]
Bug 1258127 - Update `nsNavBookmarksService` (C++) to track sync changes.

https://reviewboard.mozilla.org/r/91588/#review91460

::: toolkit/components/places/Bookmarks.jsm:931
(Diff revision 1)
>           WHERE guid = :guid
>          `, Object.assign({ guid: info.guid },
>                           [...tuples.entries()].reduce((p, c) => { p[c[0]] = c[1].value; return p; }, {})));
> +
> +      if (newParent) {
> +        // Remove the Sync orphan annotation from reparented items. We don't

I was overly optimistic in comment 240 about restricting this anno to "NORMAL" bookmarks: it's possible for a restored bookmark to have it, too. Because we don't have atomic uploads on the server yet, don't buffer before applying, and don't enforce order for folders, we're more likely to end up with orphaned items.

Two-phase application should make this mostly go away, so I'm advocating we land this with the additional query, and work to optimize or remove it later.
(Assignee)

Comment 289

9 months ago
mozreview-review
Comment on attachment 8808853 [details]
Bug 1258127 - Update `nsNavBookmarksService` (C++) to track sync changes.

https://reviewboard.mozilla.org/r/91588/#review91638

::: toolkit/components/places/nsPlacesTriggers.h:255
(Diff revision 1)
> +         before realizing the entire folder shouldn't exist on the server.
> +         Keeping orphaned non-syncable bookmarks in the "unfiled" root is
> +         poor UX, so we allow the move and update the item's syncability
> +         to match its new parent.
> +      */ \
> +      "NOT EXISTS(SELECT 1 FROM moz_items_annos a " \

This isn't complete: if we move an orphaned (syncable) folder to a non-syncable parent, that folder's descendants also need to become non-syncable. Otherwise, we'll track and upload all changes made to them, when we shouldn't have applied them in the first place. This is getting even messier than before.

At this point, I'm considering dropping this commit and removing the "syncable" idea entirely. Mark was skeptical we'd need it at all; I should've listened instead of plowing ahead. :-/

Moving an item from a syncable to a non-syncable root (as Richard suggests in bug 1303679, comment 2) will confuse Sync, it's true. It might also come back to haunt us if the set of synced roots changes in the future...though ISTM we'd like to move toward making everything syncable, and getting rid of these virtual bookmarks.

This is also an edge case that's already possible today. Landing the tracker patch without this commit won't make things any worse.

Landing this commit, however, has the potential to make things worse. Separating syncable and non-syncable bookmarks enforces server consistency for new users, at the cost of *guaranteeing* local corruption for existing users. Any orphaned left pane queries on the server will mess up unfiled bookmarks, as in bug 1297234.

I'd like to get some telemetry around orphans: how many synced bookmarks are orphaned, and how many syncs it takes to resolve them (On the same sync? Next sync? Never?). That seems useful to have before making a change like this.

Mak, Mark, Richard: how do you feel about dropping this commit and the `folder_type /* syncable */` changes?
Attachment #8808853 - Flags: review-
Attachment #8808853 - Flags: review?(rnewman)
Attachment #8808853 - Flags: review?(mak77)
Changing r? to ni? for https://bugzilla.mozilla.org/show_bug.cgi?id=1258127#c289.
Flags: needinfo?(rnewman)
Flags: needinfo?(mak77)
I'm in a rough position to make a call here, cause you are the Sync experts. As I previously stated, from my point of view it would be better to drop the syncable idea and fix bug 1309930. But it's a long call. We don't know whether we'll have resources to pursuit that completely, without someone assigned to work on that.
The low hanging fruit there is virtualization of the left pane folder, and at that point we could easily remove any non-syncable root (but tags) and prevent their creation from the API. That would leave us only with the tags problem, but it should be trivial to detect tags and their children since it's at a maximum 2 levels deep and most call points have the grandParent info.
Flags: needinfo?(mak77)
(Reporter)

Comment 292

9 months ago
I'm fine with simplifying as much as you can.
Flags: needinfo?(rnewman)
(Reporter)

Comment 293

9 months ago
mozreview-review
Comment on attachment 8808852 [details]
Bug 1258127 - Add migration logic for old synced bookmarks.

https://reviewboard.mozilla.org/r/91586/#review92476

::: services/sync/Weave.js:155
(Diff revision 1)
>        this.timer.initWithCallback({
>          notify: function() {
> +          if (this.needsMigration()) {
> +            // If we need to migrate engine state, unconditionally load the
> +            // service. This incurs a one-time penalty on the first restart
> +            // after migrating.

s/migrating/upgrading.

::: services/sync/modules/engines/bookmarks.js:927
(Diff revision 1)
> +          modified: timestamp * 1000,
> +        });
> +      }
> +      yield PlacesSyncUtils.bookmarks.migrateOldTrackerEntries(entries);
> +    }
> +    Services.prefs.clearUserPref(PREF_SYNC_MIGRATE_TRACKER_CONTENTS);

And delete the tracker file.

::: toolkit/components/places/PlacesSyncUtils.jsm:114
(Diff revision 1)
>        BookmarkSyncUtils.guidToSyncId(guid)
>      );
>    }),
>  
>    /**
> +   * Migrates an array of `{ syncId, modified }` tuples from the old JSON-based

These are GUIDs, not IDs, right?

::: toolkit/components/places/PlacesSyncUtils.jsm:142
(Diff revision 1)
> +          // server once we decide on a repair strategy.
> +          yield db.executeCached(`
> +            UPDATE moz_bookmarks SET
> +              syncStatus = :syncStatus,
> +              syncChangeCounter = 0
> +            WHERE folder_type /* syncable */ = '1'`,

Do you actually need this `WHERE`?

::: toolkit/components/places/PlacesSyncUtils.jsm:146
(Diff revision 1)
> +              syncChangeCounter = 0
> +            WHERE folder_type /* syncable */ = '1'`,
> +            { syncStatus: PlacesUtils.bookmarks.SYNC_STATUS.NORMAL });
> +
> +          for (let { syncId, modified } of entries) {
> +            yield migrateOldTrackerEntry(db, syncId, modified);

For the unlucky user who has hundreds of bookmarks tracked -- e.g., they set up Sync then upgraded Firefox in the same session -- this N reads + 2N+1 writes approach is gonna be horrible.

Consider an approach that doesn't kill perf: e.g., bulk-insert the tracker contents into a temporary table, do one `INSERT OR REPLACE`, then drop the temporary table. If nothing else that would eliminate N `SELECT`s down on line 1767.
Attachment #8808852 - Flags: review?(rnewman)