Closed Bug 1305563 Opened 8 years ago Closed 6 years ago

Implement structured bookmark application

Categories

(Firefox :: Sync, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 60
Tracking Status
firefox60 --- fixed

People

(Reporter: lina, Assigned: lina)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [data-integrity][Sync Q3 OKR])

Attachments

(2 files, 10 obsolete files)

59 bytes, text/x-review-board-request
tcsc
: review+
markh
: review+
Details
59 bytes, text/x-review-board-request
markh
: review+
rnewman
: review+
tcsc
: review+
mak
: review+
Details
TL;DR: This is the second part of atomic I/O: "atomic downloads". 3-way merging of synced bookmarks is impractical on Desktop, and we shouldn't even try to make it work with Places. Two-phase application, where we download then apply, makes sense and gives us some consistency wins.

Richard and I had a really good chat about this; capturing the conversation in a bug so that we can discuss and prioritize.

iOS does proper three-way merges, tracking bookmarks in three tables: "local", "mirror", and "buffer" (the other three tables encode the folder structure). A complete bookmarks tree is produced by overlaying these tables. Updates need to be atomic, and durable in case of an interrupted sync. On iOS, without Sync set up, all bookmarks are inserted into "local" (mirror and buffer are empty). On the first sync, remote bookmarks are downloaded and inserted into the buffer, merged (2-way) with local, and inserted into mirror.

After the first sync, local changes are stored in local, and the UI overlays these onto mirror to show the full tree. On the next sync, iOS walks the mirror, diffs between local and buffer (taking one, or resolving conflicts if neededd), and builds a set of "completion ops" for changes to the buffer (delete a record because it's been merged), mirror (local or remote deletion, insert merged record, copy from buffer to mirror, and copy from local to mirror), and local (insertion or deletion).

Next, iOS uploads changes. If it's interrupted at this point, buffer, local, and mirror are unchanged; on the next sync, it'll redownload its changes. Then, it clears the buffer, and updates mirror and local. Interruption at this point is OK: local changes either remain in local for the next sync, or were already moved into mirror. Redownloading our own changes is harmless. The completion ops are applied transactionally, so iOS can transition from a valid pre-sync state to a valid post-sync state.

So, that's it for iOS. (Richard, please correct me if my transcription is wrong! I think I missed some of the details). On Desktop, things are messier.

A bookmark sync can touch four Places tables (`moz_bookmarks`, `moz_places`, `moz_items_annos`, and `moz_keywords`, with foreign key references pointing to one another), and the folder structure is encoded in `moz_bookmarks`. Because we apply records as we download them, there's no way to recover from an inconsistent server state: we don't know if a tree is valid until the end of the sync, when we've already mutated our tree. We can only do a two-way merge, because we don't know the shared parent, and, depending on when a sync completes, both local and remote records can change multiple times between syncs.

We can bolt complicated hacks on top of Places: duplicate inserts, updates, and deletes to a Sync-specific table, or copy between Places and the table at the start of a sync, and having our shared parent be the last synced tree. But duplicating bookmark data across two tables and trying to reconcile between them is a world of pain. If these get out of sync (pun intended, and likely if a sync is interrupted), we're back to Sync snarfing your bookmarks. (Even "last synced tree" is vague in this world: is it the local tree, pre-sync? Pre-reconciliation? Post-reconciliation?).

We can get partway there with two-phase application: buffer downloaded records in a separate table, then apply them. Grisha is working on adding this to Android in bug 814801. We still can't do three-way merges, but we can avoid partial reads, and add some validation to check for inconsistent trees before applying. We can also show those bookmarks in a read-only UI until we've finished the merge.
Priority: -- → P3
Yup, good summary of the conversation.

My 2¢:

I don't think it's feasible to bolt correct bookmark syncing on top of Places. It's too much work, and the platform simply doesn't give us the guarantees we would need to do it.

(Probably the only way that would work it is to build a complete shadow bookmark system, and make Places a kind of legacy input to it -- that is, treat Places as a thing that we sync our real synchronizer to, and then sync up to the server. We considered doing this for PiCL, but obviously didn't have time to execute.)

The best we can do in the short term is remove the obvious opportunities for error, just as we've been doing elsewhere.

You can think of this bug as "atomic downloads", a sibling to the atomic uploads work. It provides an opportunity to look over the cliff before the synchronizer starts to jump.

In the long term -- when we no longer directly expose Places to add-ons, and the API surface is much smaller -- we can consider a wholesale replacement of Places by something that meets today's needs, and get rid of a huge amount of scary, hacky code in Sync. Until then, this'll have to do.


See also my notes from a year ago, when I was knee-deep in solving bookmark sync:

https://gist.github.com/rnewman/f95720050df3d5515351
https://gist.github.com/rnewman/82c17460161047db38d5
Depends on: 1294599
Some idle thoughts from the train:

* We'll need to add a couple of tables to Places. The schema won't quite match `moz_bookmarks`, because Places stores tags as bookmarks, keywords and URLs in a separate table, and descriptions as annos. We shouldn't try to replicate that.

* Once we download everything, we can reconcile and attempt some repair: do a two-way merge, mark records that shouldn't be on the server for deletion, reparent and hide orphans.

* If it's easy, we can extend the awesomebar queries to include entries for incoming bookmarks. (It might be as simple as adding rows for incoming URLs to `moz_places`). It would be hard to do the same for the tree view, because it assumes a complete subtree as you expand containers, and we won't have one at that point.

Awesomebar queries are really the only reason this needs to be in places.sqlite at all. If we don't need that, we can use a different SQLite DB or experiment with another store.
(In reply to Kit Cambridge [:kitcambridge] from comment #2)
> Awesomebar queries are really the only reason this needs to be in
> places.sqlite at all. If we don't need that, we can use a different SQLite
> DB or experiment with another store.

I have the same thing written down in my own notes for Android two-phase work.

It seems to me that while it might be relatively simple to get the incoming data mixed together with existing data, and get it to show up in the right places ("fresh" awesomebar results as you sync!), once it's on the screen (on Android at least) it's not as simple to determine that said data is immutable and shouldn't be subject to deletion, etc.

Incoming records might be modifications/deletions of what we currently have. Mixing remote+local data might be tricky without performing reconciliation first.

While this could likely be worked around, an additional fact that we'll be fetching sync data oldest-first will probably not make for a good user experience when all of the frecency queries will suddenly start recomputing and producing new data as sync continues its thing.

It is a nice property of sync (from user's POV) that data starts to show up very quickly. I don't think it's an important enough property to necessitate preserving it going forward. It seems to mainly matter during very infrequent events (such first sync), and I'd probably take overall simplicity of code and robustness of the process over reducing UX latency by a few seconds (admittedly this could be measured in minutes, as well).
(In reply to Kit Cambridge [:kitcambridge] from comment #2)

> * We'll need to add a couple of tables to Places. The schema won't quite
> match `moz_bookmarks`, because Places stores tags as bookmarks, keywords and
> URLs in a separate table, and descriptions as annos. We shouldn't try to
> replicate that.

You can see iOS's schema here:

https://github.com/mozilla-mobile/firefox-ios/blob/master/Storage/SQL/BrowserTable.swift#L246

Note that it's called with extra server columns (hasDupe, server_modified).

> * If it's easy, we can extend the awesomebar queries to include entries for
> incoming bookmarks. (It might be as simple as adding rows for incoming URLs
> to `moz_places`). It would be hard to do the same for the tree view, because
> it assumes a complete subtree as you expand containers, and we won't have
> one at that point.

See ViewBookmarksBufferOnMirror in that file. (We also have separate structure, which is quite complicated.)


> Awesomebar queries are really the only reason this needs to be in
> places.sqlite at all. If we don't need that, we can use a different SQLite
> DB or experiment with another store.

On nice thing about putting it in places.sqlite is that the data is blown away when the bookmarks store is. One bad thing is that users back up, move, and recover places.sqlite, so you need to transactionally store some kind of indication that helps you choose to throw away the buffer.



(In reply to :Grisha Kruglov from comment #3)

> It seems to me that while it might be relatively simple to get the incoming
> data mixed together with existing data, and get it to show up in the right
> places ("fresh" awesomebar results as you sync!), once it's on the screen
> (on Android at least) it's not as simple to determine that said data is
> immutable and shouldn't be subject to deletion, etc.

On iOS we do this by exposing a direction and some metadata through the model class.

https://github.com/mozilla-mobile/firefox-ios/blob/master/Storage/SQL/SQLiteBookmarksModel.swift#L243

I will agree that it took a little careful thought to make this work correctly, but a lot of that extra work is because we transpose parts of the tree in our mobile UI, and we don't do that on desktop.


> Incoming records might be modifications/deletions of what we currently have.
> Mixing remote+local data might be tricky without performing reconciliation
> first.

You likely need local change tracking to make this happy.
I spent some time yesterday evening thinking about a schema for two-phase application on Desktop, and made a sketch with some scattered thoughts here: https://gist.github.com/kitcambridge/43e3d5058231951c68fbe76ef1bf783d

It's more complicated than the iOS schema for URLs, keywords, and tags, to make it match how Places stores these. Feedback welcome!

(In reply to :Grisha Kruglov from comment #3)
> It is a nice property of sync (from user's POV) that data starts to show up
> very quickly. I don't think it's an important enough property to necessitate
> preserving it going forward. It seems to mainly matter during very
> infrequent events (such first sync), and I'd probably take overall
> simplicity of code and robustness of the process over reducing UX latency by
> a few seconds (admittedly this could be measured in minutes, as well).

I completely agree.
(In reply to Kit Cambridge [:kitcambridge] from comment #5)
> I spent some time yesterday evening thinking about a schema for two-phase
> application on Desktop, and made a sketch with some scattered thoughts here:
> https://gist.github.com/kitcambridge/43e3d5058231951c68fbe76ef1bf783d

Some more thoughts:

* Validation failures shouldn't prevent us from syncing. We should fix up, report, and do the best we can with what we have. That's why the schema doesn't enforce any constraints: the server data simply doesn't provide enough guarantees.

* We'll want to think about carrying through data we don't understand, so that we don't lose fields sent to us by a new client. We're facing this with the origin device (bug 1302797) and creation timestamp (bug 676563). Maybe just storing these as JSON blobs in a separate column is sufficient? I'm assuming iOS does this for tags and keywords.
(In reply to Kit Cambridge [:kitcambridge] from comment #6)

> * Validation failures shouldn't prevent us from syncing. We should fix up,
> report, and do the best we can with what we have. That's why the schema
> doesn't enforce any constraints: the server data simply doesn't provide
> enough guarantees.

I took this approach on Android, and the "fix up" part had unexpected (but entirely predictable in hindsight) consequences.

The trouble is that we didn't (and still don't!) have a good way to distinguish between a complete-but-inconsistent server state and an incomplete server state.

That is, between a server that's fully synced with other clients, and fully synced down, but where one of the other clients made a mistake, versus a server state that another client is part-way done updating, or we're part-way done downloading (which should never occur, if our batching code is bug-free…).

We might assert that if all clients have atomic upload, and all clients have batching download, that finishing a download means we can do what we like.

I'm not sure that's true: not only could we have bugs, but in order to be confident in doing so we really ought to upload the new state before we apply it locally (which makes download-merge-upload an atomic operation).

I think that, if you're planning to correct inconsistent trees, then you absolutely need some way of knowing if you expect to have everything and don't. A partial read or a partial write, combined with fix-up, will introduce large-scale data loss.

I agree that _once the client believes things are stable_ the only way to proceed is for some client to fix things up.


> * We'll want to think about carrying through data we don't understand, so
> that we don't lose fields sent to us by a new client. We're facing this with
> the origin device (bug 1302797) and creation timestamp (bug 676563). Maybe
> just storing these as JSON blobs in a separate column is sufficient? I'm
> assuming iOS does this for tags and keywords.

Because the bookmark record format[1] is stable, iOS stores all of its fields and nothing else. Extending the record format is possible, if you don't mind other clients losing that data. We can best-effort add columns to every device's local format. I'm not sure it's worth having a grab-bag column.

[1] <http://docs.services.mozilla.com/sync/objectformats.html>
Depends on: 1313890
Depends on: 1323333
No longer depends on: 1313890
For folks following along: the latest patch has de-duping, using the GUID map semantics expressed in SQL. It's quite tedious, and unclear if the behavior is correct, or even desired...but I think I understand it now. As with most of Sync, the devil is in the details; this is likely far from complete.

This might be a good way to transition to a proper tree merge, or we could decide to throw out or rework the semantics entirely. It certainly needs more tests, which can land ahead of this patch as part of bug 1323333.
Whiteboard: [data-integrity]
Priority: P3 → P2
I'm reading through the iOS merger now, and exploring how much we can reuse on Desktop to do two-way merges between the buffer and Places. Conceptually, the "mirror" is everything with changeCounter = 0, and "local" is changeCounter > 1. The Places tree is complete, like the mirror, but there's no shared parent.
Assignee: nobody → kit
Status: NEW → ASSIGNED
Priority: P2 → P1
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #11)
> I'm reading through the iOS merger now, and exploring how much we can reuse
> on Desktop to do two-way merges between the buffer and Places. Conceptually,
> the "mirror" is everything with changeCounter = 0, and "local" is
> changeCounter > 1. The Places tree is complete, like the mirror, but there's
> no shared parent.

That seems like a good way to look at it.

Obviously the mirror on desktop is "in shadow" -- you can't see the shared parent for overridden records, nor can you see the original value of deleted records -- but that only affects reconciling.
See Also: → 1388463
Whiteboard: [data-integrity] → [data-integrity][Sync Q3 OKR]
Comment on attachment 8822815 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

Here's a first cut at porting the iOS merger to Desktop, along with tests for complex orphaning and moving. I also added comments to explain how the different parts work, but I think those could be improved, too.

Conceptually, I think I'm happy with the different layers, and how they're separated, but I'd love feedback on that. Once this crystallizes more, we'll want Mak to review, too, since it's tightly integrated with Places. (Especially `BookmarkStorage` and `BookmarkObserverNotifications`).

Things we still need:

* Applying keywords and tags. I had a strategy earlier (apply URLs, keywords, and tags separately, then merge bookmarks), but I no longer think that's a good idea. It's simpler to have `{insert, update}PlacesItem` handle them. Representing tags as bookmarks adds a lot of complexity now; bug 1039200 should make this easier.

* Content-based deduping. `findLocalDupeGuid` is a stub.

* Reparenting remote orphans to unfiled, and moving orphans out of unfiled once we see the parent. There's a vague plan sketched out in the "TODO" comments, but nothing complete.

* Tests for all of the above. :-P

* Wiring the whole thing up to the bookmarks engine and making sure TPS passes.

Richard, I suspect you'll have many thoughts, so flagging you in advance. :-) Mark and Thom, I'd love for you to take a look and note architectural issues, questions, and anything else you see fit. Have at it!
Attachment #8822815 - Flags: feedback?(tchiovoloni)
Attachment #8822815 - Flags: feedback?(rnewman)
Attachment #8822815 - Flags: feedback?(markh)
Comment on attachment 8822815 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/101606/#review172872

This is pretty tremendous work. I have a few concerns, and a larger number of nits, but it looks great so far.

::: services/sync/modules/bookmark_buffer.js:141
(Diff revision 3)
> +        await db.executeTransaction(() =>
> +          migrateSchema(db, currentSchemaVersion, schemaVersion)
> +        );
> +      }
> +    });
> +    return new this(db, options);

nit: Can we be explicit about the type we're newing? This seems needlessly terse, especially since this isn't a type we intend to subclass (and it's unclear that subclassing it would make any sense).

::: services/sync/modules/bookmark_buffer.js:245
(Diff revision 3)
> +    feedURL TEXT,
> +    dateAdded INTEGER NOT NULL DEFAULT 0
> +  )`);
> +
> +  // A partial write might cause an item to appear in multiple parents. To work
> +  // around this, we treat an item's `parentid` as the canonical location.

It seems like we might want to use it to determine parent if the parent referenced by parentid doesn't exist, but that also might be too much of an edge case to worry about in this bug.

Edit: Doing this might be fraught with peril.

::: services/sync/modules/bookmark_buffer.js:301
(Diff revision 3)
> +  }
> +
> +  let guid = ensureValidGuid(record.id);
> +  let title = formatRecordTitle(record.title);
> +
> +  await db.executeCached(`

We probably should be inserting dateAdded here (and below) on the record if one exists. (It's possible you're handling this elsewhere but I can't find it if so.)

Then, when actually updating the records (in the transaction), we would use the logic from PlacesSyncUtils.bookmarks.ratchetTimestampBackwards (Using `dateAdded` and the server last modified -- which I think can be recomputed from `remoteAge`). If it ends up being different than whatever we stored as dateAdded, it needs to be reuploaded (weakly).

::: services/sync/modules/bookmark_buffer.js:418
(Diff revision 3)
> +  await db.executeCached(`
> +    INSERT OR IGNORE INTO urls(url) VALUES(:url)`,
> +    { url: url.href });
> +}
> +
> +async function storeRecordLocation(db, record) {

A comment explaining the difference between storeChildLocation/storeRecordLocation wouldn't go amiss...

::: services/sync/modules/bookmark_buffer.js:435
(Diff revision 3)
> +                   WHERE guid = :guid AND
> +                         parentGuid = :parentGuid), -1))`,
> +    { guid, parentGuid });
> +}
> +
> +async function storeChildLocation(db, parentGuid, childGuid, position) {



::: services/sync/modules/bookmark_buffer.js:441
(Diff revision 3)
> +  await db.executeCached(`
> +    INSERT OR IGNORE INTO location(guid, parentGuid)
> +    VALUES(:childGuid, :parentGuid)`,
> +    { childGuid, parentGuid });
> +
> +  await db.executeCached(`

The logic here and in storeRecordLocation is kind of subtle, and I'm not sure it's right. I think the cases are as follows:

Consider the following case:

1. ParentA with guid guidA has children `[guid0]`. It's inserted, we hit storeChildLocation, and add location(guid0, guidA, 0)

2. ParentB with guid guidB has children `[guid1, guid0, guid2]`. It's inserted, we hit storeChildLocation for guid0, hit the INSERT OR IGNORE, and ignore it because guid0 already exists. The UPDATE does nothing, since the where clause matches nothing.

3. Child0 with guid guid0 and parentid guidB is inserted. We replace the guid0 row from location, but aren't able to provide the correct position information, since we already ignored it.

We'll end up with no position information for child0, when the correct position information absolutely existed.

It's possible you fix this up later? (I haven't seen how you handle position == -1) IMOt if you do worth probably calling that out in a comment here.

::: services/sync/modules/bookmark_buffer.js:501
(Diff revision 3)
> +    let merger = new BookmarkMerger(storage, localTree, remoteTree,
> +                                    mergedTree);
> +    await merger.merge();
> +    if (BookmarkBufferLog.level <= Log.Level.Trace) {
> +      let newTree = mergedTree.toBookmarkTree();
> +      BookmarkBufferLog.trace("Built new merged tree: ${newTree}",

If I'm reading this right, this is logging the whole bookmark's tree? That's... an awful lot to log, even for trace logging...

::: services/sync/modules/bookmark_buffer.js:689
(Diff revision 3)
> +      value: syncChangeDelta,
> +      fragment: "syncChangeCounter = syncChangeCounter + :syncChangeDelta",
> +    });
> +  }
> +
> +  // Record observer notifications before updating Places, so that we capture

nit: Could you word this comment so it's clearer that we're not actually dispatching any notifications here? E.g. something like "Record (but don't dispatch) observer notifications..."

::: services/sync/modules/bookmark_buffer.js:837
(Diff revision 3)
> + */
> +class BookmarkNode {
> +  constructor(guid, age, kind, syncChangeCounter, syncStatus) {
> +    this.guid = guid;
> +    this.kind = kind;
> +    this.age = Math.max(age, 0);

Note that if we do try to reconstruct server modified from age, we explicitly decided to allow dateAdded in the future, so this max would probably need to happen elsewhere

::: services/sync/modules/bookmark_buffer.js:1309
(Diff revision 3)
> +
> +      case "new":
> +        // If we're taking a new merge state, the synthesized node doesn't
> +        // match what's currently on the server, so bump its change counter to
> +        // ensure it's reuploaded.
> +        return 1;

Uh, hm. It's completely possible I'm reading it wrong, but... I'm not completely certain that uploading a fixed view of e.g. bookmark children is likely to give us good results without also causing a number of problems. (possibly unless we're bug-free, which is, err, a tall ask).

I'm completely willing to be convinced though, but we need to be confident that we won't get into sync fights with older versions of desktop, android, etc. if we do this.

::: services/sync/modules/bookmark_buffer.js:1897
(Diff revision 3)
> +    if (!this.remoteTree.isDeleted(localNode.guid)) {
> +      return false;
> +    }
> +
> +    BookmarkBufferLog.trace("${localNode} deleted remotely", { localNode });
> +    this.mergedTree.deleteLocally.add(localNode.guid);

I think we only want to do this conditional on the local node not having been changed since the last sync.  That's at least consistent with the current handling, where we "revive" it (unless that's changed?).

::: services/sync/modules/bookmark_buffer.js:2191
(Diff revision 3)
> +          THEN :livemarkKind
> +          ELSE :folderKind
> +          END
> +        )
> +        WHEN :separatorType THEN :separatorKind
> +        ELSE :virtualItemKind

Hrm, ideally we'd have some way of reporting what kind of thing we *did* see in this case.

I guess we can worry about this if we tend to see "local tree not fully rooted" in the errors from telemetry.

::: services/sync/modules/bookmark_buffer.js:2886
(Diff revision 3)
> +  }
> +
> +  notifyObserver(observer, notification, args = []) {
> +    try {
> +      observer[notification](...args);
> +    } catch (ex) {}

Should we log here?
As i said in https://bugzilla.mozilla.org/show_bug.cgi?id=1388463#c2, I would like to note our future plans for tags and keywords:
1. keywords will likely disappear, the plan when a user adds a keyword for an input field, is to create a custom search engine and let the search service manage that. Considered the tiny amount of users with keywords and the easiness of creating them, I wonder if it's worth handling any kind of syncing for them.
2. tags will stay bound to bookmarks, only in the schema tags are associated with uris, because it's much simpler. The API will consider them part of a bookmark's properties. Indeed the plan is to extend Bookmarks.jsm API with tags and kill the separate tagging API.
> 1. keywords will likely disappear, the plan when a user adds a keyword for
> an input field, is to create a custom search engine and let the search
> service manage that. Considered the tiny amount of users with keywords and
> the easiness of creating them, I wonder if it's worth handling any kind of
> syncing for them.

This isn't quite as simple as just dropping keywords: the Sync object formats are already defined, other clients store (and even use) keywords, and we don't sync search engines.

We would need a plan for how these changes would roll out across platforms. If you're an Android user who uses search shortcuts, and we drop those from desktop's records, then things will quietly stop working for you. I expect there are a fair number of Mozillians with a 'bug' keyword to make pasting "Bug 1305563" work.

We'd need to decide whether to continue to support and store keywords on the other platforms, and whether to spend the significant amount of work to tackle Bug 444284.
Comment on attachment 8822815 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/101606/#review173686

::: services/sync/modules/bookmark_buffer.js:19
(Diff revision 3)
> + *
> + * - `BookmarkBuffer` ties everything together. `open` sets up the database and
> + *   returns a new buffer ready for use. `store` validates and stages downloaded
> + *   records. `apply` fetches the local and remote `BookmarkTree`s, merges the
> + *   two trees into a single `MergedBookmarkTree`, and updates Places to reflect
> + *   the merged tree. Merging and application happen in a single transaction.

Although you touch on it below, it's worth noting here in brief that it's the update of Places that marks records for upload, and the upload happens afterwards. That's in contrast to iOS, where we compute the merged bookmark tree, upload it to the server, and if that succeeds we update the local tables (buffer, mirror, local) to match.

::: services/sync/modules/bookmark_buffer.js:35
(Diff revision 3)
> + *   either keep the complete `local` state, take the complete `remote` state
> + *   without conflicts, or apply a `new` state with conflicts and flag the item
> + *   for reupload. Unlike iOS, we don't make this contingent on successful
> + *   upload because we don't keep bookmarks in an outgoing buffer yet (bug
> + *   1359200), and because there are fewer benefits to doing this without a
> + *   mirror. These functions also translate between how Sync and Places

It might be worth expanding on this, because the two approaches have different assumptions.

iOS does it that way to try to allow the sync to be interruptible and replayable: if we upload our changes we will download and try to merge them on the next sync. We assume that `merge(local, merge(local, remote)) ~= merge(local, remote)`, and more generally `merge(local', merge(local, remote)') ~= merge(local', remote')`.

By writing to Places to provoke upload we leave a window in which a synthetic merged tree — that is, one that isn't the same as the one the user made, nor one that was ever present on the server — exists on disk, indistinguishable from a user-created one.

This approach relies on a different assumption: that if we're interrupted, come back some hours or days later, and download more changes, that the result of the _next_ merge will be acceptable. That is, that `merge(merge(local, remote), remote) ~= merge(local, remote)`, and more generally `merge(merge(local, remote)', remote') ~= merge(local', remote')`.

In essence this inverts iOS's choice: iOS puts data on the server and expects it to win over any local confusion, and desktop will keep data local and hope that any remote changes don't introduce confusion.

::: services/sync/modules/bookmark_buffer.js:41
(Diff revision 3)
> + *   represent bookmarks, rewrite tag queries, and set special annotations
> + *   for descriptions, smart bookmarks, and livemarks.
> + *
> + * - `BookmarkObserverNotifications` records all insertion and updates, and
> + *   fires observer notifications after the merge completes and the Places
> + *   schema is consistent again. Places uses these notifications to update the

Is it possible for the application of the merge output to collide with local changes? Presumably it is if we don't lock the DB while doing the merge. If that happens, do we detect the changes, or overwrite them, or?

::: services/sync/modules/bookmark_buffer.js:84
(Diff revision 3)
> +
> +XPCOMUtils.defineLazyGetter(this, "BookmarkBufferLog", () =>
> +  Log.repository.getLogger("BookmarkBuffer")
> +);
> +
> +const KIND_VIRTUAL_FOLDER = -1;

Consider `-3`, which is the negation of `KIND_FOLDER`.

::: services/sync/modules/bookmark_buffer.js:93
(Diff revision 3)
> +const KIND_QUERY = 2;
> +const KIND_FOLDER = 3;
> +const KIND_LIVEMARK = 4;
> +const KIND_SEPARATOR = 5;
> +
> +// Keep in sync with PlacesUtils.

Add a comment to `PlacesUtils.jsm` to match.

::: services/sync/modules/bookmark_buffer.js:105
(Diff revision 3)
> + * A buffer temporarily stores incoming Sync bookmark records in a separate
> + * SQLite database.
> + *
> + * The buffer schema is a hybrid of how Sync and Places represent bookmarks.
> + * The `content` table contains item attributes (title, kind, description,
> + * URL, etc.), while the `location` table stores parent-child relationships and

Why 'location' instead of 'structure'?

::: services/sync/modules/bookmark_buffer.js:114
(Diff revision 3)
> + * for easier joining.
> + *
> + * There's no guarantee that the remote state is consistent. We might be missing
> + * parents or children, or a bookmark and its parent might disagree about where
> + * it belongs. We do what we can to make the remote state into a mergeable tree.
> + * If the parent and child disagree, we assume the child is correct. We ignore

That assumption is the reverse of iOS. Can you justify your choice?

We _almost_ removed `parentid` from the record in the past, because it's redundant — a change to `parentid` _requires_ rewriting both parent records in order to specify a position.

iOS only keeps it around for informative validation: if there's a mismatch it means a client didn't upload a record when it should have, or there were racing writes.

::: services/sync/modules/bookmark_buffer.js:115
(Diff revision 3)
> + *
> + * There's no guarantee that the remote state is consistent. We might be missing
> + * parents or children, or a bookmark and its parent might disagree about where
> + * it belongs. We do what we can to make the remote state into a mergeable tree.
> + * If the parent and child disagree, we assume the child is correct. We ignore
> + * missing children, and reparent bookmarks without parents to "unfiled". When

If you mean reparenting bookmarks 'physically' in the buffer: is there value to doing so? This seems like something we'd avoid for two reasons:

1. Much less work if we don't. Most likely all records _will_ have parents if we wait just a little while.
2. We can probably do a better/clearer/simpler job of reparenting if we do it during the merge.

::: services/sync/modules/bookmark_buffer.js:121
(Diff revision 3)
> + * we eventually see the missing items, either during a subsequent sync or as
> + * part of repair, we can fix up the local tree.
> + *
> + * We keep the buffer around for as long as it takes to download all records.
> + * If a sync is interrupted, we resume downloading based on the timestamp of the
> + * most recent record. If the server changes in the meantime, we discard the

There should only be three reasons to discard the buffer:

1. Node reassignment.
2. Engine disablement (local or remote).
3. Sign out.

Throwing away the buffer in the case of a remote change is needless expense: each time there's a single write we'd download aaaalmost everything, then walk right back to the start and do it again, and again…

::: services/sync/modules/bookmark_buffer.js:184
(Diff revision 3)
> +            BookmarkBufferLog.trace(`store: Storing tombstone for ` +
> +                                    `${record.id} in buffer`);
> +            return storeTombstone(db, remoteAge, record);
> +          }
> +      }
> +      throw new TypeError("Can't store unknown Sync record type: " +

This is tricky. If we reject, we (I imagine) totally block syncing until another client fixes the record or we upgrade Firefox to handle it. But we also can't just put things into the DB if they're malformed.

::: services/sync/modules/bookmark_buffer.js:223
(Diff revision 3)
> +async function migrateSchema(db, currentSchemaVersion, maxSchemaVersion) {
> +  let databaseInitialized = currentSchemaVersion > 0;
> +  if (!databaseInitialized) {
> +    await createBufferTables(db);
> +  }
> +  // TODO: Add future migration logic here.

N.B., you check the version prior to calling this function, but you should also do it here, inside the transaction boundary.

::: services/sync/modules/bookmark_buffer.js:259
(Diff revision 3)
> +  await db.execute(`CREATE TABLE urls(
> +    id INTEGER PRIMARY KEY,
> +    url TEXT UNIQUE NOT NULL
> +  )`);
> +
> +  await db.execute(`CREATE TABLE keywords(

I'm not convinced that `urls` and `keywords` and `tags` add value. Can we simplify?

::: services/sync/modules/bookmark_buffer.js:283
(Diff revision 3)
> +}
> +
> +async function storeBookmark(db, remoteAge, record) {
> +  let url = formatRecordURL(record.bmkUri);
> +  await db.executeCached(`
> +    INSERT OR IGNORE INTO urls(url) VALUES(:url)`,

For this to be fast you rely on the unique index on `url`, but that also makes each insert more expensive.

::: services/sync/modules/bookmark_buffer.js:291
(Diff revision 3)
> +  let rawKeyword = record.keyword;
> +  if (rawKeyword) {
> +    // A keyword should only be associated with one URL, but it's possible for the
> +    // server to contain incomplete data due to a partial write. During application,
> +    // we'll need to pick one, and re-upload the others.
> +    let keyword = rawKeyword.trim().toLowerCase();

Validate your inputs and either swallow malformed fields or raise a descriptive error.

```
> let rawKeyword = 99;
> rawKeyword.trim();
TypeError: rawKeyword.trim is not a function
```

Note that empty strings are probably invalid keywords, but will bypass this block (`""` is false).

::: services/sync/modules/bookmark_buffer.js:365
(Diff revision 3)
> +
> +  await storeRecordLocation(db, record);
> +
> +  if (record.children) {
> +    for (let position = 0; position < record.children.length; position++) {
> +      await storeChildLocation(db, guid, record.children[position], position);

The access patterns in this function are a bit worrying.

Firstly, you're interleaving writes to `content` and `location`, so you're introducing fragmentation.

Secondly, you're writing every child in a separate `INSERT`, and you're doing it _twice_ — once when you process the child, and once for each item in `children` when you process the parent.

For a bookmark tree like mine — 133 folders, 2673 records — that's 2673 inserts to `content` and 5346 inserts to `location`. And those location inserts have a subquery…

Consider:
1. Allowing `children` to be canonical. It is on other platforms, and it's no less trustworthy than `parentid`. That gets rid of the subquery and halves the inserts.
2. Buffering folders to insert them all together. You get to insert 333 folders in one query if you're motivated.

::: services/sync/modules/bookmark_buffer.js:396
(Diff revision 3)
> +async function storeSeparator(db, remoteAge, record) {
> +  let guid = ensureValidGuid(record.id);
> +
> +  await db.executeCached(`
> +    INSERT INTO content(guid, age, kind)
> +    VALUES (:guid, :remoteAge, :kind)`,

Separators use `pos` to avoid collision — two separators in the same folder will have different `pos` attributes. In theory.

::: services/sync/modules/bookmark_buffer.js:422
(Diff revision 3)
> +
> +async function storeRecordLocation(db, record) {
> +  let guid = ensureValidGuid(record.id);
> +  let parentGuid = ensureValidGuid(record.parentid);
> +
> +  // Treat the item's `parentid` as canonical, preserving the position if we

See earlier comment.

::: services/sync/modules/bookmark_buffer.js:425
(Diff revision 3)
> +  let parentGuid = ensureValidGuid(record.parentid);
> +
> +  // Treat the item's `parentid` as canonical, preserving the position if we
> +  // already processed the item's parent. It's possible for an item's `id` to
> +  // appear in the `children` of a different folder, or in multiple folders. We
> +  // only use the `children` array to determine positions, not parents.

I think this is a pretty big chunk of unnecessary complexity.

::: services/sync/modules/bookmark_buffer.js:2273
(Diff revision 3)
> +   *   positions > -1.
> +   *
> +   * We represent roots and missing parents as virtual folders, and missing
> +   * children of known folders as virtual items. During merging, we use the
> +   * local tree to supply the missing content. If the node doesn't exist
> +   * locally, either, the item is likely an orphan created by a partial upload.

Or we've lost data locally. If a user replaces places.sqlite, or it's corrupted and removed, then this will go badly wrong.
Comment on attachment 8822815 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/101606/#review173710

Well, that's a lot of code! :D

My main worries:

- How we detect and avoid doing unnecessary work. We need to make sure that we don't try to apply unmodified records to the store. We probably don't, but I worry.
- Expense of inserting into the buffer.
- That I haven't fully digested it all :D

::: services/sync/modules/bookmark_buffer.js:476
(Diff revision 3)
> +async function apply(db) {
> +  let storage = new BookmarkStorage(db);
> +  let observersToNotify = new BookmarkObserverNotifications(storage);
> +  let mergedTree = new MergedBookmarkTree();
> +
> +  // 1. Build a likely disjointed tree from the buffer contents.

Comment that we don't need to do this inside the transaction because only Sync writes to the buffer, and we're inside the sync lock by the time we get here.

::: services/sync/modules/bookmark_buffer.js:481
(Diff revision 3)
> +  // 1. Build a likely disjointed tree from the buffer contents.
> +  let remoteTree = await storage.fetchRemoteTree();
> +  BookmarkBufferLog.trace("Populated remote tree from buffer: ${remoteTree}",
> +                          { remoteTree });
> +
> +  await db.executeTransaction(async () => {

I don't see where you call `.apply`, so perhaps I'm missing it, but it would be worthwhile to check whether the buffer is empty and bail out before getting here.

::: services/sync/modules/bookmark_buffer.js:487
(Diff revision 3)
> +    // TODO(kitcambridge): Periodically yield to check if we're shutting down,
> +    // and throw to roll back the transaction. We can try merging again after
> +    // the browser restarts.
> +
> +    // 2. Build a fully rooted local tree from what's currently in Places.
> +    let localTree = await storage.fetchLocalTree();

I'm curious how expensive this is and what the memory footprint is versus `guidMap`. I expect a small improvement, but measurement beats wild guesses!

::: services/sync/modules/bookmark_buffer.js:591
(Diff revision 3)
> +  let localNode = mergedNode.localNode;
> +  if (!localNode) {
> +    // Should never happen. Use `insertPlacesItem` to insert new items.
> +    throw new TypeError(`Can't update item ${mergedNode} without local node`);
> +  }
> +

Why are we not checking the merge status here?

I would expect that as we walk the tree we will encounter nodes that are marked as `.local` and known to not need changing -- after all, identifying which nodes need changing is a handy output of the merge process. So why do we need to look at every field?

::: services/sync/modules/bookmark_buffer.js:720
(Diff revision 3)
> +function determineRemoteAge(currentRemoteTime, record) {
> +  let remoteAge = currentRemoteTime - record.modified || 0;
> +  return Math.max(remoteAge, 0);
> +}
> +
> +// Validates and truncates a record's URL.

s/URL/title

::: services/sync/modules/bookmark_buffer.js:734
(Diff revision 3)
> +function formatRecordURL(rawURL) {
> +  if (typeof rawURL != "string") {
> +    throw new TypeError(`Expected URL string in record`);
> +  }
> +  if (rawURL.length > DB_URL_LENGTH_MAX) {
> +    throw new TypeError(`Record URL too long`);

Is it better to throw or to truncate? Clearly you picked throwing, but if we throw, can we ever make progress?

::: services/sync/modules/bookmark_buffer.js:1567
(Diff revision 3)
> +      // to local.
> +      return BookmarkMergeState.local;
> +    }
> +
> +    if (!localNode.isChanged()) {
> +      // The node exists locally and remotely, and has remote valute state,

s/valute/value
Comment on attachment 8822815 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/101606/#review173686

Thank you for the always insightful comments! Responding inline.

> It might be worth expanding on this, because the two approaches have different assumptions.
> 
> iOS does it that way to try to allow the sync to be interruptible and replayable: if we upload our changes we will download and try to merge them on the next sync. We assume that `merge(local, merge(local, remote)) ~= merge(local, remote)`, and more generally `merge(local', merge(local, remote)') ~= merge(local', remote')`.
> 
> By writing to Places to provoke upload we leave a window in which a synthetic merged tree — that is, one that isn't the same as the one the user made, nor one that was ever present on the server — exists on disk, indistinguishable from a user-created one.
> 
> This approach relies on a different assumption: that if we're interrupted, come back some hours or days later, and download more changes, that the result of the _next_ merge will be acceptable. That is, that `merge(merge(local, remote), remote) ~= merge(local, remote)`, and more generally `merge(merge(local, remote)', remote') ~= merge(local', remote')`.
> 
> In essence this inverts iOS's choice: iOS puts data on the server and expects it to win over any local confusion, and desktop will keep data local and hope that any remote changes don't introduce confusion.

This is a great summary, and I agree it's worth documenting explicitly. I think Desktop implicitly assumes this inversion today; in fact, it must, because it writes directly to Places using public methods that enforce schema consistency. In that sense, the synthetic tree exists on disk today, except it's even more likely to diverge if the sync is interrupted, since there's no staging before application.

I elaborated on this below, but I don't think we can currently trust the server data enough to win over local confusion. My goal is to do no worse than Desktop does now, and work incrementally toward the iOS choice.

> Is it possible for the application of the merge output to collide with local changes? Presumably it is if we don't lock the DB while doing the merge. If that happens, do we detect the changes, or overwrite them, or?

We do lock Places when merging and applying, so local changes shouldn't collide. (We'll want to consider error handling in the UI, though...if you try to add a bookmark during that transaction, `Sqlite.jsm` will reject with an error that a transaction is already in progress). We don't lock when replaying notifications; indeed, as the comment suggests, we can't...some of those notification observers begin their own transactions and execute statements directly on the main thread. However, that's already the case today, so I don't think we need to worry about it much.

Alternatively, we can have the buffer connection attach to Places instead. This way, UI code that uses the main Places connection via `PlacesUtils.withConnectionWrapper` will wait for the transaction to finish, instead of trying to start a nested transaction on the same connection. We'd need to expose the URL hashing function to JS (maybe via `nsINavHistoryService::HashURL`, to match `MakeGuid`), since it's only registered for the main Places connection. But, apart from that, it should be possible to flip the attachment.

> Consider `-3`, which is the negation of `KIND_FOLDER`.

I'd rather keep this as "-1", if that's OK with you. There's no corresponding "KIND_ITEM" to negate, since Places knows the types of all rows in `moz_bookmarks`, and having -1 and -3 looks like an off-by-one typo.

> Add a comment to `PlacesUtils.jsm` to match.

These might be exposed on `nsINavHistoryService` in bug 1375896 eventually.

> Why 'location' instead of 'structure'?

I chose "content" and "location" instead of "value" and "structure" because I thought using the same terms for "vaguely similar but not really" concepts would be confusing...especially for folks familiar with the iOS implementation. OTOH, maybe inventing new terms makes it worse. WDYT?

> That assumption is the reverse of iOS. Can you justify your choice?
> 
> We _almost_ removed `parentid` from the record in the past, because it's redundant — a change to `parentid` _requires_ rewriting both parent records in order to specify a position.
> 
> iOS only keeps it around for informative validation: if there's a mismatch it means a client didn't upload a record when it should have, or there were racing writes.

We talked about this, and orphans, on IRC yesterday. I'm capturing my understanding of our discussion for folks following along.

I chose to trust `parentid` over the parent's `children` because that's what Desktop does now. Currently, we always use `parentid` to set the parent, and `children` to reorder. Once we have atomic uploads, I think we can assume the parent is correct. (However, that won't fix up inconsistent records that are *already* on the server, or local trees that have already diverged...for that, we'll need to wait until repair runs, or a node reassignment).

Until then, anything goes: a partial upload might leave a bookmark in multiple folders, no folders at all, or with a `parentid` that doesn't match any folder. In practice, this is mostly OK: `PlacesUtils.bookmarks.reorder` ignores ordered GUIDs that don't exist in the folder, and moves GUIDs that aren't mentioned in the order to the end of the folder. Assuming the folder isn't changed elsewhere, the partial uploader will finish uploading the folders on the next sync, and we'll fix up the order to match.

Of course, if we also change the folder locally, or another device syncs and changes the folder in the meantime, we'll reupload and clobber the partial uploader. Also, if the partial uploader uploaded the folders, but not the record, then we won't move it until the next sync. Those are issues that exist in Desktop today.

In theory, now that we have durable tracking, we can leave those orphans in the buffer until the next sync. This simplifies things, but also means we can't make any progress until the next sync. Consider also the case where we then add a bookmark to one of those incomplete folders. If the partial uploader hasn't synced yet, it's not safe for us to upload the folder, because we'll clobber the partial uploader! We end up in the situation that iOS is in now.

So...where does that leave us?

I think our strategy should be to make as much progress as we can, even if it means diverging locally. We won't explicitly flag incomplete folders for reupload, unless they're already flagged...in which case we'll reupload and clobber. We'll move orphans to "unfiled", and items that don't appear in their parent folder's `children` to the end of that folder. That replicates what Desktop does now, and gives us a starting point for improvements.

> If you mean reparenting bookmarks 'physically' in the buffer: is there value to doing so? This seems like something we'd avoid for two reasons:
> 
> 1. Much less work if we don't. Most likely all records _will_ have parents if we wait just a little while.
> 2. We can probably do a better/clearer/simpler job of reparenting if we do it during the merge.

I mean reparenting those orphaned bookmarks after we merge. Any virtual folders that remain after our merger runs must be orphans, or we would've walked them...so we move their children to `unfiled`, annotate them with the parent GUID, and hope we see the missing parent on the next sync. We also need to teach our merger to look for local orphans, and walk them if the remote tree contains the parent. This is all hand-wavey ATM, though.

> There should only be three reasons to discard the buffer:
> 
> 1. Node reassignment.
> 2. Engine disablement (local or remote).
> 3. Sign out.
> 
> Throwing away the buffer in the case of a remote change is needless expense: each time there's a single write we'd download aaaalmost everything, then walk right back to the start and do it again, and again…

The case I'm worried about is if one or more items that we already buffered change the next time we sync. I *think* that should be OK, we can `INSERT OR REPLACE` them. My intuition is that partial writes will make this replacement tricky, but maybe it won't be so bad once I sketch out some code.

> This is tricky. If we reject, we (I imagine) totally block syncing until another client fixes the record or we upgrade Firefox to handle it. But we also can't just put things into the DB if they're malformed.

It's certainly tricky, and valuable to note for the future, but I think it's OK for now. Realistically, we aren't going to be adding new bookmark types soon. The existing `PlacesSyncUtils` methods will throw and halt syncing when given unknown kinds, so addressing that is not a goal of this work.

> I'm not convinced that `urls` and `keywords` and `tags` add value. Can we simplify?

Definitely. They're artifacts of my initial plan for applying them, which I've since abandoned. See comment 16. :-)

> For this to be fast you rely on the unique index on `url`, but that also makes each insert more expensive.

It does. I expect this will only come up on the first sync, or after a large import. I'm willing to wager that most people (citation needed) won't make sweeping changes to their bookmarks between syncs. :-P

> Validate your inputs and either swallow malformed fields or raise a descriptive error.
> 
> ```
> > let rawKeyword = 99;
> > rawKeyword.trim();
> TypeError: rawKeyword.trim is not a function
> ```
> 
> Note that empty strings are probably invalid keywords, but will bypass this block (`""` is false).

This is on my list. :-) See the comment in `ensureValidRecord`.

> The access patterns in this function are a bit worrying.
> 
> Firstly, you're interleaving writes to `content` and `location`, so you're introducing fragmentation.
> 
> Secondly, you're writing every child in a separate `INSERT`, and you're doing it _twice_ — once when you process the child, and once for each item in `children` when you process the parent.
> 
> For a bookmark tree like mine — 133 folders, 2673 records — that's 2673 inserts to `content` and 5346 inserts to `location`. And those location inserts have a subquery…
> 
> Consider:
> 1. Allowing `children` to be canonical. It is on other platforms, and it's no less trustworthy than `parentid`. That gets rid of the subquery and halves the inserts.
> 2. Buffering folders to insert them all together. You get to insert 333 folders in one query if you're motivated.

IIUC, as with `urls`, this is most likely to crop up during the first sync. We don't fetch the complete tree from the server every time we sync; unless you change every item in every one of those folders, we won't sync it again. Our existing `PlacesSyncUtils.bookmarks.insert` and `update` methods currently do a lot more work, so making this efficient is a lower priority. However, as you suggest here and above, making `children` canonical would work around this.

> Separators use `pos` to avoid collision — two separators in the same folder will have different `pos` attributes. In theory.

I don't think we need `pos`, since the parent's `children` should have the same information? If the parent and child disagree, the `pos` is wrong, anyway.

> Or we've lost data locally. If a user replaces places.sqlite, or it's corrupted and removed, then this will go badly wrong.

Indeed. :-( Once we have more confidence in the server data, I think we can switch it to being canonical. Removing or replacing places.sqlite will confuse Sync now; as long as we don't make it worse, we're OK.
Summary: Two-phase bookmark application on Desktop → Implement structured bookmark application
Comment on attachment 8822815 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/101606/#review174748

This is very well written and commented - awesome job. However, it's still a lot to get my head around. It also looks like there's a chunk of work to do, but hopefully the final integration will not be too painful - it would also be perfect if we could arrange to keep the old code around in some form and pref'd off so disabling the code is possible should we strike problems - but that's not a huge deal if it isn't really viable. However, we do need to think alot about telemetry - possibly reporting events when we throw a BookmarksConsistencyError (or whatever it was called ;) would get us a long way there.

Nice work Kit.

::: services/sync/modules/bookmark_buffer.js:21
(Diff revision 3)
> + *   returns a new buffer ready for use. `store` validates and stages downloaded
> + *   records. `apply` fetches the local and remote `BookmarkTree`s, merges the
> + *   two trees into a single `MergedBookmarkTree`, and updates Places to reflect
> + *   the merged tree. Merging and application happen in a single transaction.
> + *
> + * - A `BookmarkTree` is a complete or partial tree with many `BookmarkNode`s.

"partial" tree could be read as meaning it's not a well-formed tree (eg, intermediate nodes might be missing, which I doubt is the case) - I wonder if adding "well formed" makes sense, or just rewording to make it clearer you mean "complete/partial" in terms of the real bookmarks tree?

::: services/sync/modules/bookmark_buffer.js:81
(Diff revision 3)
> +                                  "resource://gre/modules/PlacesUtils.jsm");
> +XPCOMUtils.defineLazyModuleGetter(this, "Sqlite",
> +                                  "resource://gre/modules/Sqlite.jsm");
> +
> +XPCOMUtils.defineLazyGetter(this, "BookmarkBufferLog", () =>
> +  Log.repository.getLogger("BookmarkBuffer")

I wonder if a name like "Sync.Engine.Bookmarks.Buffer" or similar makes sense, which IIUC, should then use the same logging level the engine itself uses, which seems desirable.

::: services/sync/modules/bookmark_buffer.js:114
(Diff revision 3)
> + * for easier joining.
> + *
> + * There's no guarantee that the remote state is consistent. We might be missing
> + * parents or children, or a bookmark and its parent might disagree about where
> + * it belongs. We do what we can to make the remote state into a mergeable tree.
> + * If the parent and child disagree, we assume the child is correct. We ignore

> I think our strategy should be to make as much progress as we can, even if it means diverging locally.

Agreed. But ISTM that we could probably still prefer the parent which appeals only as it is closer to iOS, so might mean the 2 implementations can converge in some edge-cases, and may even allow us to nuke parentid as mentioned by Richard.

::: services/sync/modules/bookmark_buffer.js:121
(Diff revision 3)
> + * we eventually see the missing items, either during a subsequent sync or as
> + * part of repair, we can fix up the local tree.
> + *
> + * We keep the buffer around for as long as it takes to download all records.
> + * If a sync is interrupted, we resume downloading based on the timestamp of the
> + * most recent record. If the server changes in the meantime, we discard the

I think it would be fine to do certain improvements in a followup, and related, we should start to think soon about what telemetry to record in this patch so we can get an objective idea of how well it is performing.

::: services/sync/modules/bookmark_buffer.js:135
(Diff revision 3)
> +  // Sets up the database connection and upgrades to the newest schema version.
> +  static async open(schemaVersion, options) {
> +    let db = await Sqlite.openConnection(options);
> +    await db.executeBeforeShutdown("BookmarkBuffer: open", async function(db) {
> +      let currentSchemaVersion = await db.getSchemaVersion();
> +      if (currentSchemaVersion != schemaVersion) {

I don't have a strong opinion here, but ISTM we already seems to consider discarding the buffer as somewhat normal, so it might make sense to just discard it when we find a schema mismatch in the interests of future simplicity?

::: services/sync/modules/bookmark_buffer.js:184
(Diff revision 3)
> +            BookmarkBufferLog.trace(`store: Storing tombstone for ` +
> +                                    `${record.id} in buffer`);
> +            return storeTombstone(db, remoteAge, record);
> +          }
> +      }
> +      throw new TypeError("Can't store unknown Sync record type: " +

This might be worth capturing in telemetry?

::: services/sync/modules/bookmark_buffer.js:223
(Diff revision 3)
> +async function migrateSchema(db, currentSchemaVersion, maxSchemaVersion) {
> +  let databaseInitialized = currentSchemaVersion > 0;
> +  if (!databaseInitialized) {
> +    await createBufferTables(db);
> +  }
> +  // TODO: Add future migration logic here.

or, as above, throw it away completely :)

::: services/sync/modules/bookmark_buffer.js:232
(Diff revision 3)
> +async function createBufferTables(db) {
> +  await db.execute(`CREATE TABLE content(
> +    id INTEGER PRIMARY KEY,
> +    kind INTEGER NOT NULL DEFAULT 0,
> +    guid TEXT UNIQUE NOT NULL,
> +    age INTEGER NOT NULL DEFAULT 0,

this might become clearer later, but "age" seems strange given it's only accurate for an instant and will end up wrong if we end up keeping some of these records across syncs.

::: services/sync/modules/bookmark_buffer.js:298
(Diff revision 3)
> +      INSERT INTO keywords(keyword, urlId)
> +      VALUES(:keyword, (SELECT id FROM urls WHERE url = :url))`,
> +      { keyword, url: url.href });
> +  }
> +
> +  let guid = ensureValidGuid(record.id);

there's evidence of invalid guids already out there and places itself already has reasonable validation in place, so I wonder if we should do very "light" validation here and just rely on places throwing? ie, we don't want to end up in a future where we end up stricter than places for some of these things.

::: services/sync/modules/bookmark_buffer.js:492
(Diff revision 3)
> +    let localTree = await storage.fetchLocalTree();
> +    if (!localTree.isFullyRooted()) {
> +      // Should never happen. The local tree must be complete.
> +      throw new BookmarkConsistencyError("Local tree not fully rooted");
> +    }
> +    BookmarkBufferLog.trace("Populated local tree from Places: ${localTree}",

This is equiv to, but marginally less efficient than `BookmarkBufferLog.trace("Populated local tree from Places", localTree);`

::: services/sync/modules/bookmark_buffer.js:1309
(Diff revision 3)
> +
> +      case "new":
> +        // If we're taking a new merge state, the synthesized node doesn't
> +        // match what's currently on the server, so bump its change counter to
> +        // ensure it's reuploaded.
> +        return 1;

Thom makes a good point, although it will be hard to know whether we are uploading due to what was previously "corruption" on the server (which should be fine) vs desktop and mobile having subtly different ideas about what the record should look like.

Further, IIUC, this should be rare, so staying on my telemetry focus, maybe this might be a good place to record the fact it happened?

::: services/sync/modules/bookmark_buffer.js:1548
(Diff revision 3)
> +   * @param   {BookmarkNode} remoteNode
> +   *          The remote bookmark node.
> +   * @returns {BookmarkMergeState}
> +   *          The two-way merge state.
> +   */
> +  async resolveTwoWayValueConflict(mergedGuid, localNode, remoteNode) {

More logging in this function might help us in the future when trying to diagnose things.

::: services/sync/modules/bookmark_buffer.js:2071
(Diff revision 3)
> +    }
> +    if (this.url.protocol != "place:") {
> +      return null;
> +    }
> +    let params = new URLSearchParams(this.url.pathname);
> +    let type = +params.get("type");

this "+" seems odd?
(In reply to Richard Newman [:rnewman] from comment #19)
> If you're an Android user who uses search shortcuts, and we drop those from
> desktop's records, then things will quietly stop working for you. I expect
> there are a fair number of Mozillians with a 'bug' keyword to make pasting
> "Bug 1305563" work.

True, but how much effort requires to the user to recreate once such keyword on his 2 or 3 devices, and how many times it's going to happen? In the worst case, the user has to create the keyword once per device, and it will just work "forever" from that point on.
While I agree a totally transparent handling is nicer, we should also evaluate the relative implementation cost compared to the user cost.
Attachment #8822815 - Flags: feedback?(rnewman) → feedback+
Comment on attachment 8822815 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

Forgot to mark as f+ after the mozreview, err, review.
Attachment #8822815 - Flags: feedback?(tchiovoloni) → feedback+
Comment on attachment 8822815 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/101606/#review173710

> Why are we not checking the merge status here?
> 
> I would expect that as we walk the tree we will encounter nodes that are marked as `.local` and known to not need changing -- after all, identifying which nodes need changing is a handy output of the merge process. So why do we need to look at every field?

As this is currently written, I don't think we can. Consider `test_new_orphan_with_local_parent` as an example: we don't have a remote record for the parent, but we do have the `parentid` locally, so we move the orphans to the end of the parent folder. On the next sync, we see just the parent, so we insert virtual nodes for the children into the remote tree. We always take the local state when merging a virtual remote node, because virtual nodes don't have values. So we end up taking the remote parent, and the local child.

That's OK, except Places stores the position on the child, not the parent, so we'll inadvertently corrupt sibling positions if we skip updating "local" nodes. "New" isn't right here, either, because we didn't synthesize a new state; we're just taking the remote structure state and local value state.

We could bring back the split between `valueState` and `structureState`, to match iOS.
Depends on: 1393904
Attachment #8900083 - Attachment is obsolete: true
Attachment #8900084 - Attachment is obsolete: true
Attachment #8822815 - Attachment is obsolete: true
Oh, MozReview. :-( I didn't address all the issues on the merger yet; pushing with a single updated changeset obsoleted the earlier attachments, and discarded them on MozReview for good measure. Fortunately, all the comments are mirrored on Bugzilla, so I'll just respond to them here as I work through the list.

Getting closer to a working `BufferedBookmarksEngine`. These tests unsurprisingly fail, because I've gutted `BufferedBookmarksStore`. :-) But they do test some interesting things (changing a folder to a livemark, which can happen if we sync before the livemark anno is set on the folder: https://searchfox.org/mozilla-central/rev/89e125b817c5d493babbc58ea526be970bd3748e/toolkit/components/places/nsLivemarkService.js#196-204,222; handling invalid bookmarks, reducing the number of observer notifications we fire for items we didn't actually change) that I think we should also test in the buffer.

> 0:03.49 LOG: Thread-6 INFO services/sync/tests/unit/test_bookmark_livemarks.js failed or timed out, will retry.
> 0:03.49 LOG: Thread-5 INFO services/sync/tests/unit/test_bookmark_invalid.js failed or timed out, will retry.
> 0:04.79 LOG: Thread-3 INFO services/sync/tests/unit/test_bookmark_duping.js failed or timed out, will retry.
> 0:09.88 LOG: Thread-13 INFO services/sync/tests/unit/test_bookmark_smart_bookmarks.js failed or timed out, will retry.
> 0:11.18 LOG: Thread-7 INFO services/sync/tests/unit/test_bookmark_order.js failed or timed out, will retry.
> 0:11.56 LOG: Thread-14 INFO services/sync/tests/unit/test_bookmark_store.js failed or timed out, will retry.

We also need to make sure that we're correctly bumping the change counter in all cases when we resolve conflicts. The `services/sync/tests/unit/test_engine_changes_during_sync.js` failure suggests that's not the case yet.
Depends on: 1394525
Attachment #8901393 - Attachment is obsolete: true
(In reply to Richard Newman [:rnewman] from comment #21)
> Consider:
> 1. Allowing `children` to be canonical. It is on other platforms, and it's
> no less trustworthy than `parentid`. That gets rid of the subquery and
> halves the inserts.
> 2. Buffering folders to insert them all together. You get to insert 333
> folders in one query if you're motivated.

These are good suggestions! For (1), I went further and ignored the `parentid` entirely, even as a hint. It keeps things simpler to temporarily move orphans to "unfiled", since the `parentid` doesn't tell us the item's position, or even if it's still accurate. This is a change from what we do now, but I feel more comfortable making it considering we're likely to receive the missing parent on the next sync, or once we have atomic uploads. For users with shallow bookmark trees, we'll download folders first, so we might avoid orphans entirely. Emitting telemetry would be good here.

I decided not to do (2). Today, `PlacesSyncUtils.bookmarks.update` runs almost a dozen queries per call (https://searchfox.org/mozilla-central/rev/2aa0806c598ec433e431728f5ddd3a6847c1a417/toolkit/components/places/Bookmarks.jsm#568-579,584-586,593,605,1189,1207-1213,1219-1228,1230-1231,1233-1234,1258-1259,1264-1265,1273,1276-1281). Except for the first sync, I don't expect most syncs to insert many folders, and the number of queries we'll run to update Places easily dwarfs the queries needed to buffer records. The public API surpasses both.

> Separators use `pos` to avoid collision — two separators in the same folder
> will have different `pos` attributes. In theory.

If we're treating `children` as canonical, I think we can ax `pos`, too.

> I think this is a pretty big chunk of unnecessary complexity.

I think you're completely correct. ;-) Axed with prejudice!
(In reply to Richard Newman [:rnewman] from comment #22)
> My main worries:
> 
> - How we detect and avoid doing unnecessary work. We need to make sure that
> we don't try to apply unmodified records to the store. We probably don't,
> but I worry.

I'm concerned about this, too, and I suspect we are currently doing unnecessary work. I'll work on improving our test coverage for this next week, and also make sure we aren't flagging unchanged records for reupload.

Thom also points out in comment 17 that we need to handle weak uploads, too. (And a number of other suggestions that I haven't fixed yet).

> - Expense of inserting into the buffer.

I think I'm concerned about the expense of inserting into Places. I need to profile this, of course, but I suspect it's going to be better than what we currently do. Using the `PlacesSyncUtils` APIs, a first sync can run thousands of separate transactions, 1 per bookmark, with a half-dozen to a dozen queries for `insert` (https://searchfox.org/mozilla-central/rev/2aa0806c598ec433e431728f5ddd3a6847c1a417/toolkit/components/places/Bookmarks.jsm#205,215,1340,1344-1348,1356-1367,1370,1374-1376,1382-1383,1386-1387) and `update`. The bar isn't particularly high now. :-)

> I'm curious how expensive this is and what the memory footprint is versus
> `guidMap`. I expect a small improvement, but measurement beats wild guesses!

Indeed. Adding to the tasks for next week.

> Is it better to throw or to truncate? Clearly you picked throwing, but if we
> throw, can we ever make progress?

Mark brings up a good point in comment 24, Since we're no longer bound by the public API, I think it might be better to truncate if there are existing invalid URLs. We also need to think about what to do for invalid GUIDs (delete and reupload?), since we have records like that in the wild.
(In reply to Mark Hammond [:markh] from comment #24)
> I don't have a strong opinion here, but ISTM we already seems to consider
> discarding the buffer as somewhat normal, so it might make sense to just
> discard it when we find a schema mismatch in the interests of future
> simplicity?

The buffer is permanent now, and mirrors the server completely...so migration makes a bit more sense if we change the schema and don't want to do a full resync. Switching channels with the same profile probably shouldn't do a full sync when there's a schema change, and I can imagine wanting migrations for incremental changes like device origin. (Side note: we can build on the buffer to roundtrip additional metadata, like origin, without having to teach Places about them).

As Richard suggests in comment 21, I think we can trash the buffer and do a full sync if there's database corruption, if you sign out or disable the bookmarks engine, if you're node reassigned, or if Sync forgets you're signed in (bug 1295122). For other cases, that might be extreme.

> this might become clearer later, but "age" seems strange given it's only
> accurate for an instant and will end up wrong if we end up keeping some of
> these records across syncs.

Good point! I decided to store the age (`AsyncResource.serverTime - record.modified`) instead of `record.modified` directly in case the server time changed between syncs. I thought the age would be a more accurate snapshot for reconciliation, but that seems like an edge case, and also inaccurate in a different way. (Plus, the local time can change, too; we can't do anything about that).

We're going to need the complete last modified time, anyway, so that we can determine the last sync time: `SELECT MAX(modified) FROM value`. So I'll go ahead and change `age` to `modified`. Thanks!

> there's evidence of invalid guids already out there and places itself
> already has reasonable validation in place, so I wonder if we should do very
> "light" validation here and just rely on places throwing? ie, we don't want
> to end up in a future where we end up stricter than places for some of these
> things.

Agreed! I'm also thinking we can ignore some invalid properties, like bad keywords or tags, and truncate URLs instead of throwing. Since we update Places through SQL now, there's no throwing...in theory, we could store whatever we want for the GUID, too. Though we'll corrupt the GUID cache (https://searchfox.org/mozilla-central/rev/2aa0806c598ec433e431728f5ddd3a6847c1a417/toolkit/components/places/PlacesUtils.jsm#2516-2517), and won't be able to get the item back out again with `fetch` (https://searchfox.org/mozilla-central/rev/2aa0806c598ec433e431728f5ddd3a6847c1a417/toolkit/components/places/Bookmarks.jsm#889-890,918-919).
I spent a lot of time testing and thinking about how to handle partial trees, and concluded that trying to make "virtual folders" and "virtual items" work is fraught.

To recap: a value-only change, like renaming or changing the URL of a bookmark, only uploads a record for the bookmark. This means we need to somehow link that bookmark to our local tree, so that our merger will see it. The approach I used initially was to add a virtual folder for the parent using the `parentid`; another approach is to query the local tree for the parent. However, using the local parent isn't always right. If the item moved locally, we'll supply the wrong remote state, and our "remote tree" won't match the server's.

We also need to handle the case where a `parentid` is not remotely modified, but is mentioned in a remotely modified folder's children. Currently, those are represented as virtual items; we need to detect them and change the child to a virtual folder, with the non-virtual child. `test_value_changes` exercises this: we changed the title of a folder and one of its grandchildren, but not the child folder that contains the grandchild. All need to be linked to the local tree.

Another tricky case: what if we delete a folder and its children locally, either directly or through a "new" folder for which we don't write a tombstone? Storing the `parentGuid` for tombstones lets us at least reconstruct the tree up to the topmost deleted synced folder, but then tombstones will need to become part of the structure.

So, relying on the local tree to supply state for virtual folders is complicated, and I'm not confident that my approach will be correct for all cases. It also occurs to me that we're doing a lot of busy work to reconstruct server state that we knew at one time, but threw away after merging because the buffer is temporary.

The new patch I uploaded solves this by turning the buffer into a persistent mirror of the server's records. In iOS terms, the Desktop buffer becomes "buffer overlaid onto mirror", like Places is "local overlaid onto mirror". Since we don't have a separate mirror, we won't be able to use it for three-way merges. However, we will be able to do two-way merges between two complete trees, without complicating our logic with "virtual roots". In the future, we can build on this to add a proper mirror.

This also simplifies traversing and reasoning about "value" and "structure". Everything should exist in both. If it's in `value` but not `structure`, it's an orphan that needs to move to `unfiled` until we see its parent folder. If it's in `structure` but not `value`, it's a missing child that we'll eventually apply and move to the right place when the partial uploader returns. (Though we can still have cycles if you manage to move folders around and interrupt the sync at an unfortunate time. I don't think we'll trigger infinite recursion given the way we replace records, but we will ignore that subtree. Since atomic uploads are the future, I'm not sure if it's worth investing more time in this now).

After uploading changed records, we should update the buffer to reflect our local tree. I haven't made the patch do this yet.
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #47)
> After uploading changed records, we should update the buffer to reflect our
> local tree. I haven't made the patch do this yet.

A super hacky way to do this without atomic uploads or an outgoing buffer is to call `buf.store(outgoingRecord, { wasChanged: false })` for every successfully uploaded outgoing record. (With atomic uploads, the way to do this is to store all uploaded records in a transaction once the batch commits).

We also need to test that it's OK to replay records that we already uploaded, in case the upload was interrupted. And, in general, test successive merges.
Comment on attachment 8900968 [details]
Bug 1305563 - Add a `BufferedBookmarksEngine` that can be toggled with a pref.

Mark, here's how I'm thinking we could pref this off. Does the general idea seem sensible?
Attachment #8900968 - Flags: feedback?(markh)
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review180766

::: services/sync/modules/bookmark_buffer.js:587
(Diff revision 2)
> +      await observersToNotify.noteItemRemoved(guid);
> +      await storage.delete(guid);
> +    }
> +
> +    // Flag all merged items as unchanged, excluding orphans.
> +    // TODO(kitcambridge): Make this contingent on successful upload. This might

I think this is currently correct, actually. Here, we set `wasChanged = 0` for all remote records that we merged. We then pull and upload records for everything in Places with `syncChangeCounter > 1`. After we've uploaded everything, we write those records back to the buffer.

The "write uploaded records (also with `wasChanged = 0`) back to the buffer" step can be in a transaction, but we don't need to do it *during* upload.

If upload is interrupted, and batch uploads are enabled, the server won't change. (Unless another device syncs before next time, of course). If batch uploads are disabled, we'll download our own partially uploaded changes, along with changes other devices might have made, into the buffer on the next sync. We'll then merge again, and reupload everything.

To summarize:

* We can have `apply` return an in-memory array of all outgoing records: weakly tracked, plus everything with `syncChangeCounter > 1`. This would incidentally fix bug 1359200. We don't need a separate outgoing buffer database.
* In `_uploadOutgoing`, we'll pull records from this array instead of going through the store.
* After upload completes, in `_syncFinish`, we'll write the records we uploaded back to the buffer, so that it matches the server.
* If `_uploadOutgoing` is interrupted, that's fine. We won't update the buffer at all, so we'll see our partially uploaded changes on the next sync. Make sure our merger does the right thing here. Atomic uploads will make this problem go away eventually.

Bonus future optimization for a follow-up bug: since we keep all *downloaded* records in memory, too, we can insert them into the buffer in a single transaction, without calling `buffer.store` repeatedly in `_recordHandler`.
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review180766

> I think this is currently correct, actually. Here, we set `wasChanged = 0` for all remote records that we merged. We then pull and upload records for everything in Places with `syncChangeCounter > 1`. After we've uploaded everything, we write those records back to the buffer.
> 
> The "write uploaded records (also with `wasChanged = 0`) back to the buffer" step can be in a transaction, but we don't need to do it *during* upload.
> 
> If upload is interrupted, and batch uploads are enabled, the server won't change. (Unless another device syncs before next time, of course). If batch uploads are disabled, we'll download our own partially uploaded changes, along with changes other devices might have made, into the buffer on the next sync. We'll then merge again, and reupload everything.
> 
> To summarize:
> 
> * We can have `apply` return an in-memory array of all outgoing records: weakly tracked, plus everything with `syncChangeCounter > 1`. This would incidentally fix bug 1359200. We don't need a separate outgoing buffer database.
> * In `_uploadOutgoing`, we'll pull records from this array instead of going through the store.
> * After upload completes, in `_syncFinish`, we'll write the records we uploaded back to the buffer, so that it matches the server.
> * If `_uploadOutgoing` is interrupted, that's fine. We won't update the buffer at all, so we'll see our partially uploaded changes on the next sync. Make sure our merger does the right thing here. Atomic uploads will make this problem go away eventually.
> 
> Bonus future optimization for a follow-up bug: since we keep all *downloaded* records in memory, too, we can insert them into the buffer in a single transaction, without calling `buffer.store` repeatedly in `_recordHandler`.

Elaborating on `apply` returning outgoing records:

1) Thanks to the merged tree, we know what we need to upload: records for "new" merged nodes, and "local" nodes with `isChanged()`.
2) We already walk down the merged tree to update Places, so this would be a good time to fetch locally changed records, too. For each locally changed record, we query Places for the value state, and create a Sync record with the contents. The merged tree also gives us the folder structure, so we only need to query the database for the title and description of a `BookmarkFolder`.
Comment on attachment 8900968 [details]
Bug 1305563 - Add a `BufferedBookmarksEngine` that can be toggled with a pref.

https://reviewboard.mozilla.org/r/172422/#review180932

This looks fine to me!

::: services/sync/services-sync.js:28
(Diff revision 3)
>  // The sync "create account" process typically *will* offer these engines, so
>  // they may be flipped to enabled at that time.
>  pref("services.sync.engine.addons", true);
>  pref("services.sync.engine.addresses", false);
>  pref("services.sync.engine.bookmarks", true);
> +pref("services.sync.engine.bookmarks.buffer", true);

I think we should default this to false :)

::: services/sync/tests/unit/test_bookmark_engine.js:40
(Diff revision 3)
>  }
>  
>  add_task(async function test_delete_invalid_roots_from_server() {
>    _("Ensure that we delete the Places and Reading List roots from the server.");
>  
> -  let engine  = new BookmarksEngine(Service);
> +  let engine  = new BufferedBookmarksEngine(Service);

do you think there is scope to add a helper that automagically runs some of these tests twice, once with each engine? I don't think we want to lose test coverage while the old engine remains the default, but also want to ensure the new engine has as much coverage as possible in the meantime.
I tested merging a flat tree with 10k unfiled bookmarks, all reordered remotely, and came away with some observations:

* The local tree takes ~39ms to build. Nothing to optimize here.

* The remote tree takes ~43ms to build, if we build a pseudo-tree first and recurse in JS. Using a recursive SQL query with joins in the base and recursive clauses is *very* slow, on the order of 10 seconds (!) for a buffer with ~5k items. `EXPLAIN QUERY PLAN` shows three full table scans: one for `value`, one for the recursive `descendants` table, and one for the recursive clause following `UNION ALL`.

* Merging takes ~3.8s. There's not much work for the merger to do in this test, since all remote items exist locally, there are no structure conflicts to resolve, and no new items to dedupe. We'll have to see how this performs with duping (worst case: *every* remote GUID is different, and we need to check every local item), or more complex trees.

* Updating Places takes 1m25s. The most expensive parts here are the 10k reads in `fetchRemoteContent`, followed by 10k writes, followed by another 10k reads in `noteItemChanged`. `fetchRemoteContent` is avoidable, if we can `INSERT OR REPLACE` from `value` directly into `moz_bookmarks`. This is slightly complicated, because we also need to insert new URLs into `moz_places`, update descriptions in `moz_items_annos`, and rewrite tag queries. However, queries are the exception to the rule, and we can update them separately. We can also avoid the `noteItemChanged` reads if we bloat our nodes a bit, and include the observer info when we fetch the local tree. Bug 1392533 would also help, as would storing URL hashes in the buffer to avoid the `fetchLocalPlaceId` call.

* Reading and preparing records for upload takes 21s. It might be worth breaking this out into a separate stage. One advantage of doing this in the transaction is that we can rely on the merged tree to not change underneath us. We can also read these in batches via `IN`, instead of one-by-one as we currently do.

* Tags and keywords are expensive, but also uncommon (*citation needed*). Bug 1039200 should make tag storage more efficient, and we can probably use SQL to replace keywords instead of reading then writing.

* `apply` doesn't need to be wrapped in `executeBeforeShutdown`. I *think* that should be sufficient to prevent merging from blocking shutdown.
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #65)
> I tested merging a flat tree with 10k unfiled bookmarks, all reordered
> remotely…

I'd be very interested in you porting this test to iOS.
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #65)
> We'll have to see how this performs
> with duping (worst case: *every* remote GUID is different, and we need to
> check every local item), or more complex trees.

A quick fix for this is to create `BookmarkContent` objects in advance, for all `NEW` (and `UNKNOWN`) Places rows not in the buffer, and for all buffer rows not in Places. The queries to fetch these are simple: `SELECT ... FROM moz_bookmarks b LEFT JOIN value v ON v.guid = b.guid WHERE b.syncStatus <> NORMAL AND v.guid IS NULL` for local candidates; `SELECT ... FROM value v LEFT JOIN moz_bookmarks b ON b.guid = v.guid WHERE b.guid IS NULL` for remote contents. Group the local candidates by their parent GUIDs, and stash the remote contents in a map of GUID to content.

In the worst case (Places has 10k bookmarks, the buffer has 10k bookmarks, and there's no overlap between the two), we'll fetch and create 20k `BookmarkContent` objects. However, I expect fully non-overlapping trees to be rare (citation needed; this would be a good place to record event telemetry), and even 20k objects isn't much compared to all the other work the browser does.

Doing I/O up front means the merger would work completely in memory. We could even move it to a `ChromeWorker` later, if we find that walking large trees is janky. (We'd still pay the cost of structured clone serialization and deserialization, though).
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #65)
> * Updating Places takes 1m25s. The most expensive parts here are the 10k
> reads in `fetchRemoteContent`, followed by 10k writes, followed by another
> 10k reads in `noteItemChanged`. `fetchRemoteContent` is avoidable, if we can
> `INSERT OR REPLACE` from `value` directly into `moz_bookmarks`.

I almost have this working, and it's not too annoying to use `REPLACE INTO moz_bookmarks(...) SELECT ...` compared to `{insert, update}PlacesItem`. I still need to fix the observer notifications to work with this new approach, though.
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #67)
> Updating Places takes 1m25s.

With the new approach, it takes ~4 seconds to apply a tree with 15k bookmarks. I'd call that a win!

* 0.7s to populate the temp `mergeStates` table.
* (No tag query rewriting or populating `moz_places`, since we're only reordering bookmarks).
* 1s to read and record old info to pass to `onItemChanged`.
* 1s to read changed bookmark IDs to pass on `onItemAnnotation{Set, Removed}`.
* 0.3s to update value and structure state, and set new annos (15k new descriptions, 7.5k "load in sidebar").
* (No tag or keyword changes).

Notifying all observers takes ~1 minute, maybe because some do a lot of work? (https://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/toolkit/components/places/nsNavBookmarks.cpp#3320-3347 in particular looks suspect...a statement executed on the main thread just to bump the modified time). But we're outside the transaction boundary at that point, anyway, and updating 15k descriptions at a time is not going to be a common scenario.

Other interesting times:

* It takes 2m44s to fill Places with 15k bookmarks using `insertTree`.
* Filling the buffer DB with 15k new records takes 37s.
* Building the two trees and merging takes 4.2s, consistent with the 3.8s for 10k bookmarks in comment 67.
OK, I think this is ready for review!

First, some context. Sync currently applies downloaded bookmarks directly to Places, without staging and using the `PlacesSyncUtils` APIs. This has several disadvantages. Notably, Sync can't detect incomplete or inconsistent server state, and might realize halfway through that another device didn't finish uploading its changed bookmarks. At that point, we can't revert, because we already updated Places with the changes we saw so far. It's also easy for the local tree to diverge from the server if the sync is interrupted, like when a user closes their laptop lid, restarts Firefox, switches Wi-Fi networks mid-sync, and so on. Also, during large syncs, users might temporarily see bookmarks in the wrong order, and try to help by moving them back. (Using the `PlacesSyncUtils` APIs, every incoming bookmark starts a transaction with a half-dozen reads and writes, so the delay is often visible). Unfortunately, this is likely to confuse Sync further, and make the problem worse.

There's a design doc in https://docs.google.com/document/d/1mMkQR4aYXftPZ9B3kojoSIlLwyeRNbb5OQ6duWoKOJE/edit that explains the problem in more detail, though the patch in this bug is effectively a "full mirror of the server." Everything past "Buffer schema design" in that doc is out of date.

All these issues manifest as a class of related bugs: random duplicates and moves, incorrectly merged folders, scrambled child order, and bookmarks from one device that never show up on the other. Debugging these issues is difficult or impossible because the way we currently sync bookmarks is non-deterministic. We don't know the complete local or server tree, and can only guess at what happened through scant logs.

This patch helps solve these problems in two ways.

First, we buffer all incoming bookmarks into a separate SQLite database. The schema (see `BufferSchemaMigrations` in `bookmark_buffer.js`) is similar to Places: `value` corresponds to `moz_bookmarks`, `urls` => `moz_places`, `annoAttributes` => `moz_anno_attributes`, `annos` => `moz_items_annos`. Unlike Places, we use a separate `structure` table to record parent-child relationships and position info, because we might see parents before children. The terms "value" and "structure" come from iOS.

Second, once we've received all bookmarks, we do a structured merge instead of updating Places directly. We begin by building a local tree from Places, and a remote tree from the buffer contents. We then recursively walk the two trees to produce a merged tree, noting deletions, structure conflicts (for example, adding a bookmark to a folder on one device that's deleted on another), and value conflicts (changing the title or URL of the same bookmark on two different devices). We then take the merged tree, apply it back to Places, and upload records for new and resolved items back to the server. Finally, once we've uploaded the new records, we write them back to the buffer, so that it matches the server again. This means the buffer is a complete mirror of the server the last time we synced.

Since we have the complete tree, we `ATTACH` to Places and `INSERT OR REPLACE` the changed records in a single transaction. This is fast; much faster than calling `PlacesSyncUtils.bookmarks.insert` or `update` for every bookmark. This also means we don't have to worry about the user changing bookmarks during the sync (bug 1359200), or trying to fix the order of downloaded bookmarks before the sync completes.

So, the patches:

* Part 1 exposes `hashURL` on `nsINavHistoryService`, so that we can use it from JS. Since the buffer attaches to Places, we can't use the SQL `hash()` function, which only exists for the main Places connection. I did this for two reasons. 1) Avoid `executeBeforeShutdown` in `apply`, so that merging doesn't block shutdown. If we're interrupted, that's OK; we'll try merging again on the next sync. In contrast, `withConnectionWrapper` always adds a shutdown blocker, and `promiseDBConnection` is read-only. 2) Ensure other Places code doesn't stomp on our transaction. I think Places works around SQLite's lack of support for nested transactions by trying to start one, then plowing ahead anyway if one is already in progress (https://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/storage/mozStorageHelper.h#64, https://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/toolkit/modules/Sqlite.jsm#574-578). AIUI, using a separate connection means any code that uses the main Places connection will wait on the attached connection to finish instead.

* Part 2 adds `mozIAsyncLivemarks.invalidateCachedLivemarks`. Since we merge and update in a single transaction, we can't use `mozIAsyncLivemarks.addLivemark` or `mozIAsyncLivemarks.removeLivemark`: those start their own transactions. To make merging easier, the buffer sets the livemark feed and site URL annos in the transaction, then calls `invalidateCachedLivemarks` at the end to clear the livemark cache. The next `mozIAsyncLivemarks` call should repopulate the cache, this time with the new livemarks.

* Part 3 makes some Sync test changes to support the new buffer, so that we can be more confident we're not introducing different behavior.

* Part 4 is the actual buffer and merger. A lot of what I've written above is documented in `bookmark_buffer.js`, too. The important points in this file are `BookmarkBuffer`, `BufferSchemaMigrations`, the `store*` methods that stage incoming bookmarks, `fetchRemoteTree`, `fetchLocalTree`, `updateItems` (mass-inserts changes from the buffer into Places), `dropRemotelyDeletedItems` (applies synced tombstones), `BookmarkMerger` (the structured merger), and `BookmarkObserverRecorder` (records `nsINavBookmarkObserver` notifications for everything we changed during the merge, then fires them once the transaction commits).

* Part 5 is a new bookmark engine, pref'd off, that uses the buffer. This part will need some more work: there are some failing tests that use the store directly, and we'll want to use the last sync time from the buffer (via `buf.getLastSync()`) instead of relying on the `lastSync` pref.

For review:

* Mark and Thom, flagging you on all the Sync bits for general review. Broad feedback, specific nits, and everything in between welcome!

* Mak, please take a look at parts 1, 2, and 4. In particular, I'd love your thoughts on storage. Does the schema seem reasonable? Do the queries make sense? (Especially in `BookmarkBuffer#apply` and `updateItems`). I did some testing with large trees and a mix of different types of remote changes (comment 65, comment 74). Are there SQLite gotchas that we should consider? I'll probably need to enable WAL for the buffer DB; are there other pragmas I should set?

* Richard, please review `BookmarkMerger` and tests in part 4...and make any other comments as you see fit, of course.

This is a big chunk of work, so please ask me to comment on anything that's unclear.

Thank you!
Oh, if it helps, I can also split up part 4 into "buffer", "merger", and "Places" parts, so that they can be reviewed individually.
Some random thoughts that leap to mind:

- State is spread between prefs (download timestamps), an in-memory DB, and places.sqlite. We download and buffer into the in-memory DB, we track download timestamps into prefs to make sure we don't redownload, we commit our final changes into Places, and we track local changes there, too.

What are the properties of the system if it's interrupted during each of its phases?

- It's possible, though less likely than before, for the user or an add-on to modify Places before we flush the merged tree, right? If so, what happens? Presumably we detect changes during our write because we have an exclusive transaction (though I'd like to see a test for this), but what about changes during the merge, between loading the tree and writing the result?
Also: db.beginTransaction creates a deferred transaction by default. If you're immediately going to write, consider using beginTransactionAs and specifying IMMEDIATE or EXCLUSIVE. As we're expecting to write, and we don't want any writers to slip in and out before we do, EXCLUSIVE is probably appropriate, though it could cause other uses of Places (e.g., history writes) to fail.
(In reply to Richard Newman [:rnewman] from comment #82)
> Some random thoughts that leap to mind:
> 
> - State is spread between prefs (download timestamps), an in-memory DB, and
> places.sqlite.

Yes. :-( (Though the buffer DB isn't in-memory anymore; we keep it around in the profile directory). I think we can avoid tracking the last sync time in prefs, and use the buffer DB and Places as the sources of truth. So:

1. If we're interrupted during download, we'll query the buffer for the highest server `modified` time on the next sync, and pick up where we left off. All records that we downloaded before interruption will have `wasChanged = 1` set, so our merger will recognize that they were changed remotely. We can even remove `executeBeforeShutdown` from `store`; at worst, we'll have to redownload one extra record on the next sync. Unlike starring bookmarks in the UI, the changes here are already on the server. It's fine if we have to redownload some that we already saw but didn't commit on the next sync, and we can avoid blocking shutdown.

2. If we're interrupted in `apply`, the Places update transaction won't commit. On the next sync, we'll again fetch new records from the server (likely nothing), then merge. We build the local tree, fetch new local and remote contents, merge, update Places, reduce the change counter for reconciled bookmarks , and set `wasChanged = 0` on the applied buffer records all within the same transaction...so, if upload is interrupted, we won't re-merge and reapply.

3. If we're interrupted during upload, we already have the merged tree locally. On the next sync, we'll fetch new changes, merge, and upload again. We might also have written some records back to the buffer, depending on whether atomic uploads are enabled. If we have, we won't see our own changes on the next sync. If we uploaded, but didn't update the buffer, we will download and reapply our own changes later. Comment 50 elaborates.

TL;DR: I should remove the use of the `lastSync` pref and rely on the timestamps in the buffer to decide where to start downloading.

> - It's possible, though less likely than before, for the user or an add-on
> to modify Places before we flush the merged tree, right?

This should not be possible, no. We start the transaction from the attached connection, and before reading the tree from Places. If I'm understanding SQLite's connection model correctly, that means the main connection will wait on our transaction to finish before allowing writes. (Starring, organizing in the Library, etc).

That's why I'm attaching the buffer connection to Places, instead of attaching the main Places connection to the buffer. The main connection *does* allow concurrent modifications; it'll see a transaction is in progress, and plow ahead, anyway. Does that sound accurate to you?
(In reply to Richard Newman [:rnewman] from comment #83)
> Also: db.beginTransaction creates a deferred transaction by default. If
> you're immediately going to write, consider using beginTransactionAs and
> specifying IMMEDIATE or EXCLUSIVE.

We'll do two reads first: once to build the local tree, and once to fetch the content of "NEW" local bookmarks for deduping. I think you're right, we want an EXCLUSIVE transaction here. Callers like `Bookmarks.insert` read before writing (https://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/toolkit/components/places/Bookmarks.jsm#206,216), so it's possible the merger will begin its DEFERRED transaction, then `insert` will call `fetchBookmark`, then the merger writes to Places and commits, then `insert` runs its DEFERRED transaction, and, oops, now it's inserting based on stale info.

There's another possible race here, that the exclusive transaction won't solve...but I think we can let it slide for now. Since `insert` reads outside of its transaction (https://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/toolkit/components/places/Bookmarks.jsm#206,1339), it might call `fetchBookmark` first, before the merger's exclusive transaction begins. We'll still end up with a consistent DB, since `insert` moves siblings out of the way...but, if you happen to click the star at the exact time we're merging, and add the bookmark to a remotely deleted folder, we might create a local orphan. I think that's 1) unlikely, and 2) fixable by clicking the star again, or eventually cleaning up via Places maintenance. We can also move the `fetchBookmark` call into the transaction.

> though it could cause other uses of Places (e.g., history
> writes) to fail.

Will writes from the main connection actually fail, or will SQLite just wait to execute them until the transaction on the buffer connection is done?
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #84)
> If I'm understanding
> SQLite's connection model correctly, that means the main connection will
> wait on our transaction to finish before allowing writes. (Starring,
> organizing in the Library, etc).

SQLite connections don't block. (They can have error handlers that cause them to wait and retry, but it's not blocking per se.)

There are a few different kinds of locks in SQLite.

https://www.sqlite.org/lockingv3.html#reserved_lock

If you just evaluate BEGIN TRANSACTION DEFERRED, the main connection will not be stopped from reading or writing at all -- in fact, _you_ will be blocked if they open a transaction first! Writes can begin and end without conflict between your BEGIN TRANSACTION and your first write. As soon as you begin reading, you get a read lock. As soon as you try to write, it's upgraded to a write lock. That step can fail.

If you use BEGIN TRANSACTION IMMEDIATE, you immediately get a lock. The main connection will be able to read, but will get an error immediately if it tries to write.

If you use BEGIN TRANSACTION EXCLUSIVE, the main connection will immediately get an error if it tries to read or write.

None of these are all that great for what you're trying to do: you can't just pause other connections while you work. You can exclude other writers, or you can let them succeed.
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #84)
> TL;DR: I should remove the use of the `lastSync` pref and rely on the
> timestamps in the buffer to decide where to start downloading.

Using the most recent server modified time for `lastSync` is interesting. When we set `lastSync` during download (https://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/services/sync/modules/engines.js#1289-1291,1334-1336), we use the *collection* last modified time, which isn't necessarily the same as the modified time of the *newest record* in that collection.

When we POST, our test server compares the X-I-U-S timestamp to the collection timestamp, and rejects with a 412 because the most recent record timestamp is slightly older than the collection timestamp. Per https://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/services/sync/tests/unit/head_http_server.js#1046-1048, it looks like checking against the collection timestamp for a POST is intentional.

I'm not sure if it's possible for a collection timestamp to be newer than its most recent record, or if this is an issue with our mock server. Richard, how does iOS determine the X-I-U-S timestamp?
Flags: needinfo?(rnewman)
(In reply to Richard Newman [:rnewman] from comment #86)
> If you just evaluate BEGIN TRANSACTION DEFERRED, the main connection will
> not be stopped from reading or writing at all -- in fact, _you_ will be
> blocked if they open a transaction first!

That's OK, and even desirable to block merging until the main Places connection finishes writing. Reading https://www.sqlite.org/wal.html#sometimes_queries_return_sqlite_busy_in_wal_mode, it looks like SQLite will return `SQLITE_BUSY` if the database is locked, which mozStorage appears to handle in https://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/storage/mozStorageAsyncStatementExecution.cpp#250-258. I'm guessing the main connection will retry if the buffer connection locks `places.sqlite` while it's merging.

> None of these are all that great for what you're trying to do: you can't
> just pause other connections while you work. You can exclude other writers,
> or you can let them succeed.

Yes. :-( Letting other writers succeed means the merged tree might change underneath us, so I think we'll have to exclude them.

(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #85)
> There's another possible race here, that the exclusive transaction won't
> solve...but I think we can let it slide for now. Since `insert` reads
> outside of its transaction
> (https://searchfox.org/mozilla-central/rev/
> f6dc0e40b51a37c34e1683865395e72e7fca592c/toolkit/components/places/Bookmarks.
> jsm#206,1339), it might call `fetchBookmark` first, before the merger's
> exclusive transaction begins. We'll still end up with a consistent DB, since
> `insert` moves siblings out of the way...but, if you happen to click the
> star at the exact time we're merging, and add the bookmark to a remotely
> deleted folder, we might create a local orphan. I think that's 1) unlikely,
> and 2) fixable by clicking the star again, or eventually cleaning up via
> Places maintenance. We can also move the `fetchBookmark` call into the
> transaction.

FWIW, I think this can already happen today. If you call `PlacesUtils.bookmarks.insert` twice in succession, without `await`-ing the first insert, the second insert might read stale data before starting its transaction. The transactions are serialized, but not the `fetchBookmark` calls.
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #94)
> When we POST, our test server compares the X-I-U-S timestamp to the
> collection timestamp, and rejects with a 412 because the most recent record
> timestamp is slightly older than the collection timestamp. Per
> https://searchfox.org/mozilla-central/rev/
> f54c1723befe6bcc7229f005217d5c681128fcad/services/sync/tests/unit/
> head_http_server.js#1046-1048, it looks like checking against the collection
> timestamp for a POST is intentional.

The docs seem to also say that is true - http://mozilla-services-docs.readthedocs.io/en/latest/storage/apis-2.0.html

PUT: 412 Precondition Failed: the last-modified version number of the item is greater than the value in the X-If-Unmodified-Since-Version header.
POST: 412 Precondition Failed: the last-modified version number of the collection is greater than the value in the X-If-Unmodified-Since-Version header.

But in general (and as you imply below), it's generally not a good idea to take the test server's word for what the semantics should be ;)

> I'm not sure if it's possible for a collection timestamp to be newer than
> its most recent record, or if this is an issue with our mock server.

I recall when working on XIUS for the test server that a (real) DELETE would cause this - which shouldn't be a problem in practice for bookmarks, but it still seems reasonable that there might be cases where the collection timestamp might be greater than the most recent item in the collection.
> it still seems reasonable that there might be cases where the collection
> timestamp might be greater than the most recent item in the collection.

Yes, exactly. Both the server and the client pay attention to a collection timestamp that isn't necessarily the same as max(record_timestamps).

The client won't necessarily handle DELETE as you might expect, of course (Sync record existence is actually ternary!), but the situation is possible.


(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #94)

> I'm not sure if it's possible for a collection timestamp to be newer than
> its most recent record, or if this is an issue with our mock server.
> Richard, how does iOS determine the X-I-U-S timestamp?

We track the collection timestamp, for three reasons:

- To know if we need to fetch:

https://github.com/mozilla-mobile/firefox-ios/blob/master/Sync/Synchronizers/Downloader.swift#L128

- To fetch new items:

https://github.com/mozilla-mobile/firefox-ios/blob/master/Sync/Synchronizers/Downloader.swift#L146

- Chained into the uploader to make sure that our write immediately follows our last read.

https://github.com/mozilla-mobile/firefox-ios/blob/master/Sync/Synchronizers/Bookmarks/BookmarksSynchronizer.swift#L179

It would be ideal for that timestamp to be stored transactionally in the DB, rather than in NSUserDefaults, but if we used the DB it wouldn't just be max(modified).
Flags: needinfo?(rnewman)
Cool, thank you both for clarifying. We'll DELETE records that shouldn't be on the server at all, instead of uploading tombstones (https://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/services/sync/modules/engines/bookmarks.js#605-610), so the distinction matters for bookmarks.

Given that iOS uses `NSUserDefaults` now, I don't think there's much point in Desktop storing the collection modified time in the buffer, either. Gecko prefs aren't a durable store, but the worst that will happen is we'll download records we've already seen...which is also the case today, if we're interrupted before we can fast-forward the `lastSync` time.

There's still value in using `max(modified)`: if a large first sync is interrupted, we don't want to redownload records we already have. But given that this patch is big enough as it is, and we're landing pref'd off, I'd be inclined to do this in a follow-up.

However, we *are* going to want to reset `lastSync` on upgrade, or if the buffer database was freshly created, so that we can fill it with the complete server tree.
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #98)

> There's still value in using `max(modified)`: if a large first sync is
> interrupted, we don't want to redownload records we already have.

Yes; on iOS we refer to this as a "high watermark", and we use it in the downloader if our batch token is invalidated.

(We store it in prefs, alongside the batch token, as we encounter incoming records -- we don't pull it out of the DB.)
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #98)
> Given that iOS uses `NSUserDefaults` now, I don't think there's much point
> in Desktop storing the collection modified time in the buffer, either. Gecko
> prefs aren't a durable store, but the worst that will happen is we'll
> download records we've already seen...which is also the case today, if we're
> interrupted before we can fast-forward the `lastSync` time.

I changed my mind, and decided to store the collection last modified time in the buffer. Nice call, Richard. :-)

This makes it much easier to toggle between the non-buffered and buffered engines. If we use prefs as the canonical store, we need to make sure we remove the buffer when `services.sync.engine.bookmarks.buffer` flips from true to false, or we'll miss all records between the time the engine was disabled and re-enabled. Plus, this gives us the high watermark for free.
Comment on attachment 8900968 [details]
Bug 1305563 - Add a `BufferedBookmarksEngine` that can be toggled with a pref.

https://reviewboard.mozilla.org/r/172422/#review189594

This is well written - nice job. My only concern is the lastSync tracking which I think I need to understand better before giving r+.

::: services/sync/modules/engines.js:1651
(Diff revision 11)
>          }
>  
> -        await this._onRecordsWritten(successful, failed);
> +        await this._onRecordsWritten(successful, failed, modified);
> +
> +        // Advance lastSync since we've finished the batch.
> +        if (modified > this.lastSync) {

I don't quite understand lastSync in this patch - there don't seem to be consumers of the new async getLasySync methods, and continuing to use the timestamp here seems wrong?

::: services/sync/modules/engines.js:1712
(Diff revision 11)
>      if (counts.sent || counts.failed) {
>        Observers.notify("weave:engine:sync:uploaded", counts, this.name);
>      }
>    },
>  
> -  async _onRecordsWritten(succeeded, failed) {
> +  async _onRecordsWritten(succeeded, failed, modified) {

nit: the name "modified" in these signatures seems a little misleading - a casual reader would probably expect it to be a set of modified items or something similar to the other params when it's actuall the server timestamp. ie, "timestamp" might be a better name?

::: services/sync/modules/engines/bookmarks.js:636
(Diff revision 11)
> -    if (this._modified.isTombstone(id)) {
> -      // If we already know a changed item is a tombstone, just create the
> -      // record without dipping into Places.
> +    let record = await super._createRecord(id);
> +    if (record.deleted) {
> +      return record;
> -      return this._createTombstone(id);
>      }
>      // Create the record as usual, but mark it as having dupes if necessary.

nit: the first part of this comment is now out of date (the record is already created)

::: services/sync/modules/engines/bookmarks.js:699
(Diff revision 11)
>    },
> +};
>  
> -  getValidator() {
> -    return new BookmarkValidator();
> +/**
> + * The buffered bookmarks engine uses a different store that stages downloaded
> + * bookmarks in a buffer, instead of writing to Places. The buffer handles

I wonder if this comment should note that the "buffer" is actually a database? Maybe "persistent buffer" or similar?

::: services/sync/modules/engines/bookmarks.js:1049
(Diff revision 11)
> + */
> +function BufferedBookmarksStore() {
> +  BaseBookmarksStore.apply(this, arguments);
> +  let bufferPath = OS.Path.join(OS.Constants.Path.profileDir,
> +                                "sync-bookmarks.sqlite");
> +  this.promiseBuffer = BookmarkBuffer.open(BUFFER_SCHEMA_VERSION, {

I haven't got to that function yet (so this might not be an issue), but I think we need to handle failure to open this database, particularly due to corruption and ensure we reset everything necessary (which might not be anything seeing we read lastSync from there). A followup is fine.

::: services/sync/modules/engines/bookmarks.js:1088
(Diff revision 11)
> -    return index;
> +BufferedBookmarksStore.prototype = {
> +  __proto__: BaseBookmarksStore.prototype,
> +
> +  async getLastSync() {
> +    let buffer = await this.promiseBuffer;
> +    return (await buffer.getLastSync()) / 1000;

almost every other place where we convert a timestamp in this way uses floor() - I admit to not quite understanding why, so therefore don't understand if we can avoid it here.
Attachment #8900968 - Flags: review?(markh)
Comment on attachment 8908848 [details]
Bug 1305563 - Prepare existing bookmark engine tests for new buffered engine.

https://reviewboard.mozilla.org/r/180460/#review190892

an easy one :)

::: services/sync/tests/unit/test_bookmark_duping.js:47
(Diff revision 5)
>    await Service.startOver();
>    await promiseStartOver;
>    await promiseStopServer(server);
>    await bms.eraseEverything();
> +  await engine.resetClient();
>  }

why not await for engine.finalize() here and avoid touching all those other lines?
Attachment #8908848 - Flags: review?(markh) → review+
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review190906

Very impressive work Kit!

::: services/sync/modules/bookmark_buffer.js:109
(Diff revision 12)
> + * bookmarks with missing parents to "unfiled". For parent-child disagreements,
> + * we treat the last parent as canonical, and ignore the child's `parentid`
> + * entirely. When we eventually see the missing items, either during a
> + * subsequent sync or as part of repair, we'll fill in the buffer gaps and fix
> + * up the local tree.
> + *

maybe worth a comment here indicating that we never fix the remote tree? (or maybe better above in the "here's no guarantee that the remote state is consistent." para?

::: services/sync/modules/bookmark_buffer.js:111
(Diff revision 12)
> + * entirely. When we eventually see the missing items, either during a
> + * subsequent sync or as part of repair, we'll fill in the buffer gaps and fix
> + * up the local tree.
> + *
> + * If a sync is interrupted, we resume downloading based on the timestamp of the
> + * most recent record. Newer records always replace existing buffered records.

Worth defining "timestamp" and "newer" here, to clarify it's the server's lastModified timestamp rather than the local clock?

::: services/sync/modules/bookmark_buffer.js:161
(Diff revision 12)
> +  static async open(options) {
> +    let db = null;
> +    try {
> +      db = await Sqlite.openConnection(options);
> +    } catch (ex) {
> +      BookmarkBufferLog.warn("Error opening buffer database; removing and " +

so yeah, ignore my comment in the previous patch :) Is places itself this aggressive (ie, reset on *any* failure)?

I think we should also report a telemetry event here though (although probably only when the exception isn't "file not found")

::: services/sync/modules/bookmark_buffer.js:175
(Diff revision 12)
> +      // without foreign keys, the `REFERENCES` clauses won't parse.
> +      await db.execute(`PRAGMA foreign_keys = ON`);
> +      await db.execute(`PRAGMA temp_store = MEMORY`);
> +      await migrateBufferSchema(db);
> +    } catch (ex) {
> +      await db.close();

maybe another telemetry event, and I wonder if not recreating here is actually the right thing?
Attachment #8901394 - Flags: review?(markh) → review+
Comment on attachment 8900968 [details]
Bug 1305563 - Add a `BufferedBookmarksEngine` that can be toggled with a pref.

https://reviewboard.mozilla.org/r/172422/#review190914

lastSync changes in the latest patch look great!
Attachment #8900968 - Flags: review+
Sorry this is taking more time than I'd like, but we're still busy with work for 57 that has priority over any other thing :( _AND_ this patch is scary. Extremely scary (race conditions and poorly specced Places behaviors).
Is the system enabled when we land this, or can it still land disabled and be toggled with a pref?

Today I did a first quick check of all the comments and some documents, to first understand the thing. Tomorrow I'll hopefully starting dropping questions and comments about single parts.

One thing I'd like to ask, if possible, would be to keep all the writes to places.sqlite into toolkit/components/places. Guaranteeing integrity is a complicate task, and one of the rules we tried to retain in years, is that any code that writes to the database should be in the same place.
It doesn't prevent much by itself, but at least we always know where to look when making breaking changes an clarifies the peer requirements on reviews.
(In reply to Marco Bonardo [::mak] from comment #111)
> Sorry this is taking more time than I'd like, but we're still busy with work
> for 57 that has priority over any other thing :( _AND_ this patch is scary.
> Extremely scary (race conditions and poorly specced Places behaviors).

There's no rush; I expected that reviewing this will take some time. Please focus on higher-priority work now, and have a look at this when you have cycles. Thanks in advance for your time and guidance, and let me know what I can do to make your life easier.

> Is the system enabled when we land this, or can it still land disabled and
> be toggled with a pref?

The latter. Disabled by default, toggled with a pref.

> One thing I'd like to ask, if possible, would be to keep all the writes to
> places.sqlite into toolkit/components/places. Guaranteeing integrity is a
> complicate task, and one of the rules we tried to retain in years, is that
> any code that writes to the database should be in the same place.

Sure. `bookmark_buffer.js` doesn't have a hard dependency on anything in `services/sync`, and we already pass things like the server time and Sync record factory functions as arguments. I can move the JSM into `toolkit/component/places`.
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review190906

> so yeah, ignore my comment in the previous patch :) Is places itself this aggressive (ie, reset on *any* failure)?
> 
> I think we should also report a telemetry event here though (although probably only when the exception isn't "file not found")

Added a telemetry event. Places isn't nearly as aggressive. :-) It'll back up and replace the DB only if the error is `NS_ERROR_FILE_CORRUPTED`: https://searchfox.org/mozilla-central/rev/e62604a97286f49993eb29c493672c17c7398da9/toolkit/components/places/Database.cpp#574-579

I think we can do the same, since it doesn't make sense to delete the buffer for transient errors, or permission errors over which we have no control (https://searchfox.org/mozilla-central/rev/e62604a97286f49993eb29c493672c17c7398da9/storage/mozStoragePrivateHelpers.cpp#45,48,50,52,54,56,59,61,63). We'll need to change Sqlite.jsm to expose the `nsresult` on the exception; it currently only includes it in the message string (https://searchfox.org/mozilla-central/rev/e62604a97286f49993eb29c493672c17c7398da9/toolkit/modules/Sqlite.jsm#948). That might be a good follow-up bug.

I'm not sure if it's worth backing up the file before we replace it. It could be useful for later analysis, but, we'll repopulate it from the server, anyway...

> maybe another telemetry event, and I wonder if not recreating here is actually the right thing?

In this case, I think it's worth failing noisily, and letting our Sync ping capture the failure as an engine error. This might indicate bugs in our schema migration logic, which we should fix (or unsupported SQLite pragmas, which we can't do much about).
Comment on attachment 8900968 [details]
Bug 1305563 - Add a `BufferedBookmarksEngine` that can be toggled with a pref.

https://reviewboard.mozilla.org/r/172422/#review189594

> almost every other place where we convert a timestamp in this way uses floor() - I admit to not quite understanding why, so therefore don't understand if we can avoid it here.

I don't think we should floor here, since the timestamp from `getCollectionLastSync` is the server timestamp that we got from `X-Weave-Timestamp`...just in milliseconds instead of seconds. Flooring might cause a 412.

But I think there's another issue here: we store and pass dates around in several different units. `dateAdded` is milliseconds in the record, and microseconds in the buffer (to make merging easier) and Places. `record.modified` and `X-Weave-Timestamp` are seconds, but we store them in the buffer as milliseconds because that's what JS dates use.

Let's at least change the buffer API to take and return seconds everywhere. Internally, it can use and store milliseconds, or whatever format makes the most sense. I'll add some comments to the schema about the units, too.
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review192000

::: services/sync/modules/bookmark_buffer.js:21
(Diff revision 13)
> + * - `BookmarkBuffer` sets up the database, upserts records changed since the
> + *   last sync, attaches to Places, and applies the changed records. During
> + *   application, we fetch the local and remote `BookmarkTree`s, merge the two
> + *   trees, and update Places to match. Merging and application happen in a
> + *   single transaction, so applying the merged tree won't collide with local
> + *   changes.

It's worth documenting throughout this block comment what the visible behavior is for other consumers of Places.

If the user hits the bookmark star during a merge or apply, what happens? Does it silently fail? Is the write buffered somehow? Does it succeed but never get flushed to the DB?

If the user has a page open in their browser, and that page arrives as a new bookmark, does the star change state? What if it's currently bookmarked, and a remote deletion arrives?

::: services/sync/modules/bookmark_buffer.js:36
(Diff revision 13)
> + * - `BookmarkObserverRecorder` records all changes made to Places during the
> + *   merge, then dispatches `nsINavBookmarkObserver` notifications once the
> + *   transaction commits and the database is consistent again. Places uses these
> + *   notifications to update the UI and internal caches. Some observers, like
> + *   `nsNavBookmarks::OnItemAnnotationSet`, start new transactions and query
> + *   Places, so we can't notify as we're applying changes.

Nor does it make sense to do so until the transaction commits…

::: services/sync/modules/bookmark_buffer.js:43
(Diff revision 13)
> + * - After application, we flag all applied remote items as unchanged, create
> + *   Sync records for the locally new and updated items in Places, and upload
> + *   the records to the server.
> + *
> + * - Once upload succeeds, we update the buffer with the records we just
> + *   uploaded.

If upload fails, we'll retry the upload. If the upload succeeds, but updating the buffer doesn't, we'll update the buffer from the contents of the server, find no changes, and stop.

::: services/sync/modules/bookmark_buffer.js:77
(Diff revision 13)
> +// Keep in sync with `PlacesUtils`. These can be removed once they're exposed on
> +// `PlacesUtils.history` (bug 1375896).
> +const DB_URL_LENGTH_MAX = 65536;
> +const DB_TITLE_LENGTH_MAX = 4096;
> +
> +const SQLITE_MAX_VARIABLE_NUMBER = 999;

Can you call `sqlite3_limit(…)` instead? The number might be higher or lower at runtime!

::: services/sync/modules/bookmark_buffer.js:83
(Diff revision 13)
> +
> +// The current buffer database schema version. Bump for migrations, then add
> +// migration code to `migrateBufferSchema`.
> +const BUFFER_SCHEMA_VERSION = 1;
> +
> +var BookmarkConsistencyError = this.BookmarkConsistencyError =

You win at syntax bingo!

::: services/sync/modules/bookmark_buffer.js:108
(Diff revision 13)
> + *
> + * This means we need a strategy to handle missing parents and children. We
> + * treat the `children` of the last parent we see as canonical, and ignore the
> + * child's `parentid` entirely. We also ignore missing children, and temporarily
> + * reparent bookmarks with missing parents to "unfiled". When we eventually see
> + * the missing items, either during a later sync or as part of repair, we'll

One of the flaws in both the current desktop Sync engine and Android's bookmark engine is this:

- The engine sees a partial state on the server, either due to a client bug or a non-atomic upload.
- Desktop applies the state and Places fixes it up. Android applies the state, fixes it up, and marks it for upload.
- Immediately (in Android's case) or later (in desktop's case), the fixed-up state is uploaded to the server. That uploaded state wins over the state on the device that didn't finish the upload.

The end result: if you have an Android device, connecting it to your old Sync account causes your desktop bookmarks to move around and get dumped in Unfiled Bookmarks.

I don't see how you can wait around for the complete state without doing what iOS does: stopping and not uploading anything until the remote state is known to be complete.

Can you discuss this here?

::: services/sync/modules/bookmark_buffer.js:117
(Diff revision 13)
> + * last modified time, or the server last modified time of the most recent
> + * record if newer. New incoming records always replace existing records in the
> + * buffer.
> + *
> + * We discard the buffer when the user is node reassigned, disables the
> + * bookmarks engine, or signs out.

All of these situations correspond to a reset -- timestamps go to zero.

::: services/sync/modules/bookmark_buffer.js:188
(Diff revision 13)
> +      throw ex;
> +    }
> +    return new BookmarkBuffer(db, options);
> +  }
> +
> +  async getCollectionLastSync() {

I would change this to 'NextSince' or something like that. It's not the last sync time!

::: services/sync/modules/bookmark_buffer.js:200
(Diff revision 13)
> +      )`,
> +      { type: BookmarkBuffer.COLLECTION.MODIFIED });
> +    return rows[0].getResultByName("lastSync");
> +  }
> +
> +  async setCollectionLastModified(lastModified) {

I would defensively null check here, 'cos SQLite won't do it for you.

::: services/sync/modules/bookmark_buffer.js:216
(Diff revision 13)
> +   * @param {PlacesItem[]} records
> +   *        An array of Sync records to store in the buffer.
> +   * @param {Boolean} [options.wasChanged]
> +   *        Indicates if the records were changed remotely since the last sync,
> +   *        and should be merged into the local tree. Defaults to `true`, since
> +   *        other devices won't upload unchanged records. We set this option to

Sure they will! The repair process can request records that haven't changed.

::: services/sync/modules/bookmark_buffer.js:286
(Diff revision 13)
> +   *          A changeset containing locally changed and reconciled records to
> +   *          upload to the server, and to store in the buffer once upload
> +   *          succeeds.
> +   */
> +  async apply({ currentLocalTime = Date.now(),
> +                currentRemoteTime = AsyncResource.serverTime * 1000 } = {}) {

`serverTimeSeconds`

::: services/sync/modules/bookmark_buffer.js:297
(Diff revision 13)
> +    // We intentionally don't use `executeBeforeShutdown` in this function,
> +    // since merging can take a while for large trees, and we don't want to
> +    // block shutdown.
> +    await this.db.execute(`CREATE TEMP TABLE mergeStates(
> +      localGuid TEXT,
> +      mergedGuid TEXT,

Should either of these be `NOT NULL`?

Should either be `UNIQUE`?

::: services/sync/modules/bookmark_buffer.js:301
(Diff revision 13)
> +      localGuid TEXT,
> +      mergedGuid TEXT,
> +      parentGuid TEXT NOT NULL,
> +      position INTEGER NOT NULL,
> +      valueState TEXT NOT NULL,
> +      structureState TEXT NOT NULL,

Please don't put the string `"local"` into the database thousands of times! Use a small integer.

::: services/sync/modules/bookmark_buffer.js:381
(Diff revision 13)
> +            SQLITE_MAX_VARIABLE_NUMBER)) {
> +
> +            await this.db.execute(`
> +              UPDATE value SET
> +                wasChanged = 0
> +              WHERE guid IN (${new Array(chunk.length).fill("?").join(",")})`,

You have no index on GUID, and you don't track the primary key, so these queries are not cheap. Consider your indexing strategy.

::: services/sync/modules/bookmark_buffer.js:474
(Diff revision 13)
> +
> +    let keyword = typeof record.keyword == "string" ?
> +                  record.keyword.trim().toLowerCase() : null;
> +
> +    await this.db.executeCached(`
> +      REPLACE INTO value(guid, modified, kind, wasChanged, dateAdded,

The name `value` for this table is a bit confusing in this context. Maybe `values` or `bookmarks` or `records` or `contents` — certainly something plural — instead?

::: services/sync/modules/bookmark_buffer.js:532
(Diff revision 13)
> +
> +    let title = formatRecordTitle(record.title);
> +
> +    let url = formatRecordURL(record.bmkUri);
> +    if (!url) {
> +      BookmarkBufferLog.trace("Ignoring query ${guid} with invalid URL ${url}",

This is tricky, though.

We know that we handle URIs that aren't valid according to some platforms' parsing:

https://github.com/mozilla-mobile/firefox-ios/blob/7dfb4ccb7ceec3a4da674d241381cdb25c5c7f20/Shared/Extensions/StringExtensions.swift#L70

And if you skip a record here, you will screw up structure.

::: services/sync/modules/bookmark_buffer.js:679
(Diff revision 13)
> +      await this.storeRemoteAnno(guid, siteURLAnno, siteURL.href);
> +    }
> +  }
> +
> +  async storeRemoteSeparator(record, { wasChanged }) {
> +    let guid = formatGuid(record.id);

I don't see `formatGuid` in this patch, but again, be careful about doing too much validation.
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review192000

Thanks for the first set of comments!

> It's worth documenting throughout this block comment what the visible behavior is for other consumers of Places.
> 
> If the user hits the bookmark star during a merge or apply, what happens? Does it silently fail? Is the write buffered somehow? Does it succeed but never get flushed to the DB?
> 
> If the user has a page open in their browser, and that page arrives as a new bookmark, does the star change state? What if it's currently bookmarked, and a remote deletion arrives?

It's good that you called this out, because I'm not fully sure what will happen! :-) I'm hoping Mak can help clarify the intended behavior. If I'm reading https://searchfox.org/mozilla-central/rev/8efd128b48cdae3a6c6862795ce618aa1ca1c0b4/storage/mozStorageAsyncStatementExecution.cpp#251-258 right, I think other Places consumers will keep retrying once we start our (maybe exclusive? this patch still uses deferred) transaction. It's also possible that we'll fail the write, in which case we need to figure out how the UI will surface that.

If the star UI listens for observer notifications (`onItemAdded` or `onItemChanged`), then it will update itself when processing the bookmark. IOW, the behavior should be identical to what it is now...

> If upload fails, we'll retry the upload. If the upload succeeds, but updating the buffer doesn't, we'll update the buffer from the contents of the server, find no changes, and stop.

In the former case, if upload fails, I think we'll just try everything again on the next sync. In the latter case, we'll download our own changes from the server on the next sync, and merge again. See the comment above `_onRecordsWritten` in the next commit of this series; I've tried to describe what happens when each stage is interrupted.

> Can you call `sqlite3_limit(…)` instead? The number might be higher or lower at runtime!

Yes. Not in this patch, though, as that would require adding an API to mozStorage and wrapping it in Sqlite.jsm. :-) Ideally, mozStorage would call `sqlite3_limit` and chunk internally, as it already does for arrays of separate binding params, and callers wouldn't need to do this at all. Bug 483318.

> You win at syntax bingo!

It's a lovely quirk of our artisanal module system that I have to declare it as a global for `EXPORTED_SYMBOLS`, and also set it on `this` so it's accessible via the backstage pass...

> One of the flaws in both the current desktop Sync engine and Android's bookmark engine is this:
> 
> - The engine sees a partial state on the server, either due to a client bug or a non-atomic upload.
> - Desktop applies the state and Places fixes it up. Android applies the state, fixes it up, and marks it for upload.
> - Immediately (in Android's case) or later (in desktop's case), the fixed-up state is uploaded to the server. That uploaded state wins over the state on the device that didn't finish the upload.
> 
> The end result: if you have an Android device, connecting it to your old Sync account causes your desktop bookmarks to move around and get dumped in Unfiled Bookmarks.
> 
> I don't see how you can wait around for the complete state without doing what iOS does: stopping and not uploading anything until the remote state is known to be complete.
> 
> Can you discuss this here?

I've opted to keep the existing Desktop behavior. If there's partial state on the server, and we don't have the parent, we'll move to `unfiled`, without flagging `unfiled` for reupload. (Now, if `unfiled` is already changed locally, then we *will* reupload, and clobber the partial uploader...which is also what Desktop does today).

Unlike Android, we'll never reupload incomplete remotely changed folders. The only time we'll do this is when resolving structural conflicts, caused by adding to or changing items on one side that are deleted on the other.

If there's partial state to where a bookmark exists in two folders, we'll take the "last" (in whatever order they're downloaded) parent we see. When the partial uploader returns, we'll store and merge the missing items. See `test_move_into_orphaned`, `test_new_orphan_with_local_parent`, `test_new_orphan_without_local_parent`, `test_missing_children`, `test_partial_cycle`, `test_complete_cycle` for some scenarios.

As I mentioned in comment 50, I think it's important to understand how our implementation handles this, but not to spend more time beyond that. Our immediate goal is "no worse than the current Desktop sync," and atomic uploads will address a lot of these sticking points.

> All of these situations correspond to a reset -- timestamps go to zero.

Yes. In our case, we throw away the buffer contents, too.

> I would change this to 'NextSince' or something like that. It's not the last sync time!

You mentioned iOS calls this the "high watermark." That might be a good to use here, along with a more detailed explanation in the comment of what that means.

> Sure they will! The repair process can request records that haven't changed.

If the repair process requests records that aren't on the server, or to fix structural issues, then the records are still materially changed in some way...even if we have the complete state locally. But that's confusing, so I'll remove the "since". We also set `wasChanged: false` for records that *we* upload and write back to the buffer. Maybe `needsMerge` is a better name than `wasChanged`.

> `serverTimeSeconds`

These are not currently seconds, but I think they should be (comment 119).

> Should either of these be `NOT NULL`?
> 
> Should either be `UNIQUE`?

Yes, they should be `NOT NULL`, and we can use a compound `PRIMARY KEY(mergedGuid, localGuid)`. In practice, this is already enforced at the application level when we build the merged tree.

> You have no index on GUID, and you don't track the primary key, so these queries are not cheap. Consider your indexing strategy.

I don't explicitly index on GUID, but SQLite always creates a covering index, I'm assuming because the GUID is `UNIQUE`. This is also one of our faster queries; even before the rewrite to make Places updates more efficient, this wasn't a bottleneck. See comment 74 for the details.

> The name `value` for this table is a bit confusing in this context. Maybe `values` or `bookmarks` or `records` or `contents` — certainly something plural — instead?

I can't use `values` because it's a SQL keyword. :-( I had "items" before; while generic, that matches the `itemId` foreign key columns in the other tables.

> This is tricky, though.
> 
> We know that we handle URIs that aren't valid according to some platforms' parsing:
> 
> https://github.com/mozilla-mobile/firefox-ios/blob/7dfb4ccb7ceec3a4da674d241381cdb25c5c7f20/Shared/Extensions/StringExtensions.swift#L70
> 
> And if you skip a record here, you will screw up structure.

We use the `URL` class to validate URLs, which only ensures the URL is syntactically valid according to the URL Standard (https://url.spec.whatwg.org/). It doesn't mean that Desktop can actually load the URL (e.g., an unrecognized `about:` URL is still valid; it'll just fail to load when you open it).

I'm OK with treating records with invalid URLs as missing children, because we can't store invalid URLs in Places. Doing so will cause other issues, and we won't be able to get them back out. We can fix up invalid URLs in a follow-up, maybe as a maintenance task.

> I don't see `formatGuid` in this patch, but again, be careful about doing too much validation.

See line 2027 of this file. I don't think MozReview lets me link to lines. :-( Same story as URLs, though: trying to store and then retrieve items with invalid GUIDs is going to cause a number of issues. It's better to have other clients fix them up and reupload records with valid GUIDs. Bug 1380606.
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #121)
> It's good that you called this out, because I'm not fully sure what will
> happen! :-) I'm hoping Mak can help clarify the intended behavior. If I'm
> reading
> https://searchfox.org/mozilla-central/rev/
> 8efd128b48cdae3a6c6862795ce618aa1ca1c0b4/storage/
> mozStorageAsyncStatementExecution.cpp#251-258 right, I think other Places
> consumers will keep retrying once we start our (maybe exclusive? this patch
> still uses deferred) transaction.

The async consumers will wait. The sync consumers will fail after blocking the main-thread for 100ms (http://searchfox.org/mozilla-central/rev/8efd128b48cdae3a6c6862795ce618aa1ca1c0b4/toolkit/components/places/Database.cpp#94). That means this has a strong dependency on Bug 519514. The good thing is that async transactions are on their way to Firefox 58, and we should be able to remove most of the synchronous API in 60. But there are still some consumers to fix in the tree (the left pane folder, tags, drag&drop). Hopefully we can find the time to do that.

> It's also possible that we'll fail the
> write, in which case we need to figure out how the UI will surface that.

The UI will likely do nothing until the next operation, since it's likely the API will throw.

> If the star UI listens for observer notifications (`onItemAdded` or
> `onItemChanged`), then it will update itself when processing the bookmark.
> IOW, the behavior should be identical to what it is now...

The star UI is async, should not have problems.

> I don't explicitly index on GUID, but SQLite always creates a covering
> index, I'm assuming because the GUID is `UNIQUE`.

Unique is indeed an index, no need to add an index to a unique field.

> I can't use `values` because it's a SQL keyword. :-( I had "items" before;
> while generic, that matches the `itemId` foreign key columns in the other
> tables.

you can probably if you quote it.

Another thing that I was thinking the other day, it's true this insertion is much faster than insertTree, because it bypasses everything. One problem is that it also bypasses all the input checks, that makes a bit scary what it may end up writing. We do very carefully input checks, because Places doesn't have a very strong error recovery on read (basically no error detection at all) and relies mostly on proper writes. A good part of the review will have to check all the coherency is respected.
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #121)

> Unlike Android, we'll never reupload incomplete remotely changed folders.
> The only time we'll do this is when resolving structural conflicts, caused
> by adding to or changing items on one side that are deleted on the other.

Are you sure?

What happens if we get into this state, with an incomplete folder, and the sync is done: we're waiting for another client to finish its work.

What if I then locally add a bookmark to this incomplete remotely changed folder?

Presumably that will -- just like ol' desktop -- cause the local state of the folder to be uploaded to the server?


> As I mentioned in comment 50, I think it's important to understand how our
> implementation handles this, but not to spend more time beyond that.

Mostly I want to make sure these are documented; it's easy to assume that this work fixes all the bugs, and means that bookmark 'corruption' can't ever occur, but strictly speaking that's not the case.


> I don't explicitly index on GUID, but SQLite always creates a covering
> index, I'm assuming because the GUID is `UNIQUE`.

Ah, I missed that this is UNIQUE. Yes, in that case it'll be indexed. v.g.!
(In reply to Richard Newman [:rnewman] from comment #123)
> Presumably that will -- just like ol' desktop -- cause the local state of
> the folder to be uploaded to the server?

Yes. If `unfiled`, or any other folder that's been partially uploaded, is also changed locally, then we'll reupload and clobber the partial uploader. This is what Desktop does today. (A partial remote *deletion* will also synthesize a new structure state and reupload). An incomplete remote change by itself isn't sufficient for us to reupload; we must also have made local changes.

> Mostly I want to make sure these are documented; it's easy to assume that
> this work fixes all the bugs, and means that bookmark 'corruption' can't
> ever occur, but strictly speaking that's not the case.

This makes perfect sense, and I'm probably less optimistic that this work will fix all the bugs...we'll likely introduce new ones! :-) Desktop deliberately chooses to diverge from the remote tree, and opts to (at least temporarily) introduce corruption if it means making progress. I'm hoping the code as written will make this easy to undo once we can guarantee some consistency of the server data, but we have some ways to go before then.
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review188840

Most of the issues I have are nits. I very much share rnewman's concern about reuploading incomplete data (as android does), but OTOH we already do that and it's a strict improvement over the status quo. Moving this to be more conservative in the future (should that suddenly become a realistic option) is doable as well.

As I mentioned in IRC, this fails TPS, but it seems like you have that under control.

I also wonder a bit if some of the tests would be better as TPS tests, given how so many of them simulate syncs with partial writes and the like (Right now TPS would get angry if you did that, so probably not, but theres a good argument that it should allow you to do things like this).

::: services/sync/modules/bookmark_buffer.js:8
(Diff revision 7)
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +"use strict";
> +
> +/**
> + * This file implements a bookmark buffer and two-way merger. The buffer mirrors

I almost feel like the comments here are getting to the size where they should be external documentation, but keeping here probably makes it more likely to stay up to date.

::: services/sync/modules/bookmark_buffer.js:83
(Diff revision 7)
> +  Log.repository.getLogger("Sync.Engine.Bookmarks.Buffer")
> +);
> +
> +// Keep in sync with `PlacesUtils`. These can be removed once they're exposed on
> +// `PlacesUtils.history` (bug 1375896).
> +const DB_URL_LENGTH_MAX = 65536;

We have this in constants.js (here: http://searchfox.org/mozilla-central/source/services/sync/modules/constants.js#87) too. Unless there's a reason not to, I think you should use that one, but add a to it's comment that it should be in sync with PlacesUtils also.

::: services/sync/modules/bookmark_buffer.js:93
(Diff revision 7)
> +function isLivemarkAnno(name) {
> +  return name == PlacesUtils.LMANNO_FEEDURI ||
> +         name == PlacesUtils.LMANNO_SITEURI;
> +}
> +
> +var BookmarkConsistencyError = this.BookmarkConsistencyError =

Heh, `class` syntax really doesn't play nicely with our modules and linting

::: services/sync/modules/bookmark_buffer.js:174
(Diff revision 7)
> +    let db = await Sqlite.openConnection(options);
> +    try {
> +      // Required to enforce `ON DELETE` actions. The SQLite bundled with
> +      // Firefox is compiled with foreign key support. If Firefox was built
> +      // with `MOZ_SYSTEM_SQLITE`, and the system SQLite was compiled
> +      // without foreign keys, the `REFERENCES` clauses won't parse.

So, I'm guessing this means this merger won't with the system sqlite in that case? Is this something common enough for us to care about (I'm guessing no, but it *is* a compilation option...)?

::: services/sync/modules/bookmark_buffer.js:323
(Diff revision 12)
> +        BookmarkBufferLog.info("Building remote tree from buffer");
> +        let remoteTree = await this.fetchRemoteTree(currentRemoteTime);
> +        BookmarkBufferLog.trace("Built remote tree from buffer", remoteTree);
> +
> +        let guidsToWeakUpload = await this.db.executeTransaction(async () => {
> +          BookmarkBufferLog.info("Building local tree from Places");

nit: these should probably be trace, or at least debug logs probably. (Just due to verbiosity).

::: services/sync/modules/bookmark_buffer.js:385
(Diff revision 13)
> +                wasChanged = 0
> +              WHERE guid IN (${new Array(chunk.length).fill("?").join(",")})`,
> +              chunk);
> +          }
> +
> +          // Flag remotely changed records with an older local creation date for weak

I almost think we don't need the weakupload code around anymore, since everything implements dateAdded now. But we probably do for a while longer.

::: services/sync/modules/bookmark_buffer.js:395
(Diff revision 13)
> +          let newDateAddedRows = await this.db.execute(`
> +            SELECT b.guid
> +            FROM moz_bookmarks b
> +            JOIN mergeStates r ON r.mergedGuid = b.guid
> +            JOIN value v ON v.guid = r.mergedGuid
> +            WHERE r.valueState = :valueState AND

Is there a reason not to use 'remote' in the sql? It's a hardcoded string either way...

::: services/sync/modules/bookmark_buffer.js:474
(Diff revision 13)
> +
> +    let keyword = typeof record.keyword == "string" ?
> +                  record.keyword.trim().toLowerCase() : null;
> +
> +    await this.db.executeCached(`
> +      REPLACE INTO value(guid, modified, kind, wasChanged, dateAdded,

Agree that it's confusing. Maybe less here than below when you're selecting from it.

::: services/sync/modules/bookmark_buffer.js:1133
(Diff revision 13)
> +   * @param {BookmarkObserverRecorder} observersToNotify
> +   *        Records Places observer notifications for added, changed, and moved
> +   *        items.
> +   */
> +  async updateLocalItems(observersToNotify) {
> +    BookmarkBufferLog.debug("Recording notifications for new items");

As with the other comment, I think these are probably better off as trace logs.

::: services/sync/modules/bookmark_buffer.js:1648
(Diff revision 13)
> +      { tagFolderGuid, tagsGuid: PlacesUtils.bookmarks.tagsGuid, tag,
> +        type: PlacesUtils.bookmarks.TYPE_FOLDER,
> +        dateAdded: PlacesUtils.toPRTime(new Date()) });
> +
> +    let idRows = await this.db.executeCached(`
> +      SELECT id FROM moz_bookmarks WHERE guid = :tagFolderGuid`,

I've used DB libs before that allowed you do do something like `let idRows = await this.db.executeCached("INSERT .... VALUES(....); SELECT id FROM blah WHERE blah")` (e.g. both insert and a select in one execute call, and have it return the value of the select) but have no idea if ours does. Doesn't really matter either way though!

::: services/sync/modules/bookmark_buffer.js:1651
(Diff revision 13)
> +
> +    let idRows = await this.db.executeCached(`
> +      SELECT id FROM moz_bookmarks WHERE guid = :tagFolderGuid`,
> +      { tagFolderGuid });
> +    if (!idRows.length) {
> +      throw new TypeError("Can't fetch ID for new tag folder");

Is this possible? Maybe add a comment saying it shouldn't happen. If it can happen, is aborting everything the right call?

::: services/sync/modules/bookmark_buffer.js:1817
(Diff revision 13)
> +          break;
> +        }
> +
> +        case PlacesUtils.bookmarks.TYPE_SEPARATOR: {
> +          record = recordFactory(BookmarkBuffer.KIND.SEPARATOR, syncId);
> +          record.pos = row.getResultByName("position");

Should this be .position? (Sorry, it's not 100% clear to me the distinction between them)

::: services/sync/modules/bookmark_buffer.js:2045
(Diff revision 13)
> +function determineModified(record) {
> +  return Math.max(record.modified * 1000, 0) || 0;
> +}
> +
> +function formatRecordURL(rawURL) {
> +  if (typeof rawURL != "string" || rawURL.length > DB_URL_LENGTH_MAX) {

I think places truncates overlong urls, doesn't it?

::: services/sync/modules/bookmark_buffer.js:2051
(Diff revision 13)
> +    return null;
> +  }
> +  let url = null;
> +  try {
> +    url = new URL(rawURL);
> +  } catch (ex) {}

We probably want to log here. (Actually I guess we do in the caller, but there doesn't seem to be anything I can do to make mozreview delete this comment -- feel free to ignore)

::: services/sync/modules/bookmark_buffer.js:2111
(Diff revision 13)
> + *   the structure state. We use this state to resolve conflicts caused by
> + *   moving local items out of a remotely deleted folder, or remote items out of
> + *   a locally deleted folder. Applying a "new" merged node bumps its change
> + *   counter, so that the merged structure is reuploaded to the server.
> + */
> +class BookmarkMergeState {

The joys of emulating enums/tagged unions in JavaScript.

::: services/sync/modules/bookmark_buffer.js:2251
(Diff revision 13)
> + * A complete, rooted tree with tombstones.
> + */
> +class BookmarkTree {
> +  constructor(root) {
> +    this.byGuid = new Map();
> +    this.infosByNode = new WeakMap();

Why a weakmap? Shouldn't everything already be referenced strongly (by this class, even).

::: services/sync/modules/bookmark_buffer.js:2252
(Diff revision 13)
> + */
> +class BookmarkTree {
> +  constructor(root) {
> +    this.byGuid = new Map();
> +    this.infosByNode = new WeakMap();
> +    // this.parentNodes = new WeakMap();

nit: Delete the commented out code?

::: services/sync/tests/unit/test_bookmark_buffer.js:90
(Diff revision 13)
> +      return new Promise(resolve => server.stop(resolve));
> +    },
> +  };
> +}
> +
> +function shuffle(array) {

Good ol' Fisher-Yates

::: toolkit/components/places/PlacesSyncUtils.jsm:40
(Diff revision 13)
> +   *         If the array has less than chunkLength elements, yields all of them
> +   */
> +  * chunkArray(array, chunkLength) {
> +    let startIndex = 0;
> +    while (startIndex < array.length) {
> +      yield array.slice(startIndex, startIndex += chunkLength);

nit: The code was already like this, but could you move startIndex += chunkLength into its own line?
Attachment #8901394 - Flags: review?(tchiovoloni) → review+
Comment on attachment 8900968 [details]
Bug 1305563 - Add a `BufferedBookmarksEngine` that can be toggled with a pref.

https://reviewboard.mozilla.org/r/172422/#review192854

I don't know if I love the fact that they're entirely separate engines, but I think it doesn't matter much in the long run.

::: services/sync/modules/engines/bookmarks.js:22
(Diff revision 14)
>  Cu.import("resource://services-sync/constants.js");
>  Cu.import("resource://services-sync/engines.js");
>  Cu.import("resource://services-sync/record.js");
>  Cu.import("resource://services-sync/util.js");
>  
> -XPCOMUtils.defineLazyModuleGetter(this, "BookmarkValidator",
> +XPCOMUtils.defineLazyModuleGetters(this, {

TIL `XPCOMUtils.defineLazyModuleGetters` exists.

::: services/sync/modules/service.js:62
(Diff revision 14)
>      result.CreditCards = {
>        module: "resource://formautofill/FormAutofillSync.jsm",
>        symbol: "CreditCardsEngine",
>      };
>    }
> +  if (Svc.Prefs.get("engine.bookmarks.buffer", false)) {

It's unfortunate we decide this so early, but I guess in practice it doesn't matter.

::: services/sync/tests/unit/test_bookmark_engine.js:45
(Diff revision 14)
>    initTestLogging("Trace");
> -  generateNewKeys(Service.collectionKeys);
> +  await generateNewKeys(Service.collectionKeys);
>  });
>  
> -add_task(async function test_delete_invalid_roots_from_server() {
> +function add_bookmark_test(task) {
> +  add_task(async function() {

I think we should log when we start buffered versus regular engine to make it a bit clearer if tests fail if its only in one case or another.
Attachment #8900968 - Flags: review?(tchiovoloni) → review+
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review192878

I'm not even going to pretend that I've read all of your test code.

With this much code, the best we can do is review, fix, test, land it, and watch for problems.

::: services/sync/modules/bookmark_buffer.js:399
(Diff revision 13)
> +            JOIN value v ON v.guid = r.mergedGuid
> +            WHERE r.valueState = :valueState AND
> +                  b.dateAdded < v.dateAdded`,
> +            { valueState: "remote" });
> +          for (let row of newDateAddedRows) {
> +            let guid = row.getResultByName("guid");

There's probably a decent speedup from changing this to a numeric index rather than a name lookup.

::: services/sync/modules/bookmark_buffer.js:427
(Diff revision 13)
> +   * Discards the buffer contents. This is called when the user is node
> +   * reassigned, disables the bookmarks engine, or signs out.
> +   */
> +  async reset() {
> +    await this.db.executeBeforeShutdown(
> +      "BookmarkBuffer: reset",

I'm gently concerned that we could manage to crash before this happens. I wonder if it's possible to link the contents of the buffer to the state of the server -- e.g., by storing the `syncID` of the server + the bookmarks collection here, and wiping at the start of the sync if they don't match?

::: services/sync/modules/bookmark_buffer.js:901
(Diff revision 13)
> +      )
> +      SELECT b.id, b.guid, p.guid AS parentGuid, (CASE b.type
> +        /* Map Places item types to Sync record kinds. */
> +        WHEN :bookmarkType THEN (
> +          CASE WHEN EXISTS(
> +            /* Queries are bookmarks with a smart bookmark annotation. */

Elsewhere we check in advance that the URI starts with "place:", before we bother to check the annos table. Consider doing so here?

::: services/sync/modules/bookmark_buffer.js:1103
(Diff revision 13)
> +      INSERT OR IGNORE INTO moz_places(url, url_hash, rev_host, hidden,
> +                                       frecency,
> +                                       guid)
> +      SELECT u.url, u.hash, u.revHost, 0,
> +             (CASE SUBSTR(u.url, 1, 6) WHEN 'place:' THEN 0 ELSE -1 END),
> +             IFNULL((SELECT guid FROM moz_places

I'm a bit confused here.

You're doing an `INSERT OR IGNORE` with a subquery that checks for the URL itself!

If this subquery returns a GUID, the `INSERT` will fail, right?

Can we simplify?

::: services/sync/modules/bookmark_buffer.js:2464
(Diff revision 13)
> + *
> + * - "Mirror" is approximately all Places rows with `syncChangeCounter = 0` and
> + *   buffer rows with `wasChanged = false`. This is approximate because Desktop
> + *   doesn't store the shared parent for items changed on either side.
> + * - "Local" is all Places rows with `syncChangeCounter > 0` and buffer rows
> + *   with `wasChanged = false`.

I'm not sure I follow. Why is the buffer relevant here?

::: services/sync/modules/bookmark_buffer.js:3244
(Diff revision 13)
> +    }
> +    return newLocalNode;
> +  }
> +}
> +
> +function contentsMatch(localNode, localContent, remoteNode, remoteContent) {

It's worth commenting what this is for, and why it's not equality.

::: services/sync/modules/bookmark_buffer.js:3283
(Diff revision 13)
> + * is tracked in bug 1392533 and bug 1340498.
> + *
> + * Annotation observers don't require the extra context, so they're cheap to
> + * record and fire.
> + */
> +class BookmarkObserverRecorder {

Do we ever have a situation where there are no observers for some of these? If so, can we avoid doing the work of recording each change?

::: services/sync/modules/bookmark_buffer.js:3468
(Diff revision 13)
> +  }
> +
> +  notifyAnnoObservers() {
> +    let observers = PlacesUtils.annotations.getObservers();
> +    for (let observer of observers) {
> +      for (let info of this.changedAnnoInfos) {

Gosh, there has to be a better way… do we have a bug on file for bulk notifications?

::: services/sync/modules/bookmark_buffer.js:3514
(Diff revision 13)
> +    this.record = record;
> +    this.synced = false;
> +  }
> +}
> +
> +// In conclusion, this is why bookmark syncing is hard.

Let's get tattoos!
Attachment #8901394 - Flags: review?(rnewman) → review+
Blocks: 1407731
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review188840

Thanks for the review, Thom, you have a good eye for catching nits and oversights! I fixed the TPS `test_sync` failure in the latest revision, though I haven't done a full run yet. I'll do that this evening, when I need a break from using my computer. :-)

I also agree that having TPS simulate partial writes would be more effective than my unit tests, which assume "if we were to do a partial write, here's one way it might look." OTOH, now that we've rolled out atomic uploads to everyone, maybe we should push toward being more strict, and treating partial writes as a case where we might not do the right thing, just not the completely wrong thing.

> We have this in constants.js (here: http://searchfox.org/mozilla-central/source/services/sync/modules/constants.js#87) too. Unless there's a reason not to, I think you should use that one, but add a to it's comment that it should be in sync with PlacesUtils also.

Mak requested I move the buffer module into `toolkit/components/places` in comment 111, and it's unusual for code in `toolkit/` to depend on `services/`. I think we'll have to duplicate for now. :-/

> So, I'm guessing this means this merger won't with the system sqlite in that case? Is this something common enough for us to care about (I'm guessing no, but it *is* a compilation option...)?

I don't think it is. I'm debating keeping this comment at all, since it's also true for other SQLite features (e.g., building with `SQLITE_OMIT_ALTERTABLE` will break Places completely, as will `SQLITE_OMIT_COMPOUND_SELECT`, `SQLITE_OMIT_CTE`, `SQLITE_OMIT_TRIGGER`, and `SQLITE_OMIT_WAL`). My point was more "if SQLite doesn't support foreign key constraints, we'll fail instead of silently doing the wrong thing," but that's generally implied.

> nit: these should probably be trace, or at least debug logs probably. (Just due to verbiosity).

Good call. I made them `info` for benchmarking, but this is just going to add noise for every sync. Changed to `debug`.

> I almost think we don't need the weakupload code around anymore, since everything implements dateAdded now. But we probably do for a while longer.

Filed bug 1407731 to remove this eventually.

> Is there a reason not to use 'remote' in the sql? It's a hardcoded string either way...

At Richard's suggestion, I changed this to an integer.

> I've used DB libs before that allowed you do do something like `let idRows = await this.db.executeCached("INSERT .... VALUES(....); SELECT id FROM blah WHERE blah")` (e.g. both insert and a select in one execute call, and have it return the value of the select) but have no idea if ours does. Doesn't really matter either way though!

The docs for https://sqlite.org/capi3ref.html#sqlite3_prepare say "These routines only compile the first statement in zSql, so *pzTail is left pointing to what remains uncompiled," suggesting that won't work here. :-( Though https://sqlite.org/capi3ref.html#sqlite3_exec *does* support multiple statements, if we're not binding any parameters.

> Is this possible? Maybe add a comment saying it shouldn't happen. If it can happen, is aborting everything the right call?

Added. I think aborting is the right call: if we can't find a row we just inserted, something is seriously wrong and we should investigate.

> Should this be .position? (Sorry, it's not 100% clear to me the distinction between them)

In this case, we're setting the property on the `BookmarkSeparator` record itself, which calls it `pos` (https://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/services/sync/modules/engines/bookmarks.js#276).

Technically, we no longer need it, since we always use the info from `structure` instead of relying on `pos` for deduping. But we still include it for older Desktops, Android, and iOS.

> I think places truncates overlong urls, doesn't it?

IIUC, Places throws for long URLs: https://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/toolkit/components/places/PlacesUtils.jsm#235-237 Maybe we should truncate, though, instead of silently ignoring the record? Or maybe truncating existing long URLs, and flagging affected items for reupload, is a good idea for a Places maintenance task? We need to double-check URL length limits on other devices, too.

> The joys of emulating enums/tagged unions in JavaScript.

Yup. What I wouldn't give for a thoughtful type system...

> Why a weakmap? Shouldn't everything already be referenced strongly (by this class, even).

I used a `WeakMap` because I didn't want `infosByNode` holding on to parents and levels for nodes that don't exist, in case a node is removed. But that's a theoretical concern, because we never remove nodes from a tree. I can change to a vanilla `Map` if it's clearer.
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review192878

> There's probably a decent speedup from changing this to a numeric index rather than a name lookup.

As in `getResultByName` vs. `getResultByIndex`? It's true, there's an extra hashtable lookup in `getResultByName` (https://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/storage/mozStorageRow.cpp#92,101-102), but `getResultByIndex` seems uncommon in Places JS code, so I'd prefer to keep `getResultByName`.

> I'm gently concerned that we could manage to crash before this happens. I wonder if it's possible to link the contents of the buffer to the state of the server -- e.g., by storing the `syncID` of the server + the bookmarks collection here, and wiping at the start of the sync if they don't match?

We could store the sync ID in the `collectionMeta` table, as we already do for last modified. We check the sync ID at the start of the sync (https://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/services/sync/modules/service.js#1013,1015-1017), before we sync the engine (https://searchfox.org/mozilla-central/rev/31606bbabc50b08895d843b9f5f3da938ccdfbbf/services/sync/modules/engines.js#1017,1019-1021), and reset the local state if there's a mismatch, which also clears out the buffer in `_resetClient`.

Are you mainly concerned that we're using prefs as a separate, non-durable store, or is there another place we should check the sync ID that we don't do now?

> Elsewhere we check in advance that the URI starts with "place:", before we bother to check the annos table. Consider doing so here?

I'd prefer to avoid that if we can. It's another subquery to fetch the URL from `moz_places`, and the only other time we check for `place:` explicitly in SQL is when we calculate the frecency.

> I'm a bit confused here.
> 
> You're doing an `INSERT OR IGNORE` with a subquery that checks for the URL itself!
> 
> If this subquery returns a GUID, the `INSERT` will fail, right?
> 
> Can we simplify?

I wrote the query this way because, while we expect URLs to be unique (we debug assert this in https://searchfox.org/mozilla-central/rev/31606bbabc50b08895d843b9f5f3da938ccdfbbf/toolkit/components/places/Database.cpp#2619-2625), we don't have a unique index on them (bug 889561). Inserting a Place with the same `url_hash` and `url`, but a different `guid`, won't actually trigger a constraint failure.

The behavior I want is: if the URL already exists in `moz_places`, then use its GUID (so that the `INSERT` fails, and we'll ignore that row); otherwise, use the GUID (the Place GUID, that is, not the bookmark GUID), URL, hash, etc. from our `urls` table. But I can rewrite this as a `LEFT JOIN` on `moz_places` instead of a subquery.

> I'm not sure I follow. Why is the buffer relevant here?

On a second read, this entire section doesn't make sense. Reworded!

> Do we ever have a situation where there are no observers for some of these? If so, can we avoid doing the work of recording each change?

It's unlikely. In a fresh profile, there are 3 bookmarks observers (I'm guessing the history service and views), and 1 annotation observer (the bookmarks service). In my real profile, with a dozen or so folders, the toolbar visible, and extensions installed, I see 26 (!) bookmark observers.

> Gosh, there has to be a better way… do we have a bug on file for bulk notifications?

We do! Bug 1340498.

> Let's get tattoos!

Yass. :skull_and_crossbones:
Blocks: 1407857
Comment on attachment 8901392 [details]
Bug 1305563 - Expose `PlacesUtils.history.hashURL`.

https://reviewboard.mozilla.org/r/172850/#review193824


C/C++ static analysis found 1 defect in this patch.

You can run this analysis locally with: `./mach static-analysis check path/to/file.cpp`

If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: toolkit/components/places/SQLFunctions.cpp:1016
(Diff revision 7)
> -                                           MAX_CHARS_TO_HASH);
>      RefPtr<nsVariant> result = new nsVariant();
> -    if (mode.IsEmpty()) {
> -      // URI-like strings (having a prefix before a colon), are handled specially,
> -      // as a 48 bit hash, where first 16 bits are the prefix hash, while the
> -      // other 32 are the string hash.
> +    uint64_t hash;
> +    rv = HashURL(str, mode, &hash);
> +    NS_ENSURE_SUCCESS(rv, rv);
> +    rv = result->SetAsInt64(hash);

Warning: Value stored to 'rv' is never read [clang-tidy: clang-analyzer-deadcode.DeadStores]
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #129)
> `getResultByIndex` seems uncommon in Places JS code, so I'd prefer
> to keep `getResultByName`.

Sure thing. Might be interesting to measure at some point, given that these are relatively tight loops, which isn't normally the case in Places.

Android tries to get the best of both worlds: `int foo = cursor.getColumnIndex("foo"); String val = cursor.getStringAtIndex(foo);`.

> Are you mainly concerned that we're using prefs as a separate, non-durable
> store, or is there another place we should check the sync ID that we don't
> do now?

My concern here is that we might end up in a situation where we expect the buffer to be clean, but it isn't (or otherwise get a mismatch).

Notably when we download a meta/global, we set the syncID pref _before_ we call `resetClient`, so it's possible we save the new syncID, trigger the reset, crash, and end up with a dirty buffer.

We can either try to always get this right -- save the pref after resetting, make sure we always await every promise, etc. -- or we can switch to a model where the state lives in storage. Up to you; just give it some thought.

> I'd prefer to avoid that if we can. It's another subquery to fetch the URL
> from `moz_places`, and the only other time we check for `place:` explicitly
> in SQL is when we calculate the frecency.

Fair enough!

> … we don't have a unique index on them (bug 889561)

Ah, that explains it all.
Blocks: 1408092
Blocks: 1408551
The thing that worries me more is the raw insertion into Places tables, we already had broken APIs writing broken data and we know it's painful. How critical is it to have that in the first incarnation of this approach? Could we have a safer but slower insertTree code path and a switch to use the raw insertion?
One side, insertTree will shortly (next days) have an option to fixup invalid input (broken guid, broken lastModified/dateAdded) that we could use. It could also be worth profiling it to speed it up, and make the perf difference a bit smaller.

Second thing, can some parts of this be moved to separate bugs and start landing? Multiple patches are nice, but multiple bugs that can land before the scarier part are even better. The keywords, livemarks and hash changes look isolated enough to be able to land first. Not sure if some of the Sync parts could start landing as well.
Let's try to eat this elephant one piece at a time.
(In reply to Marco Bonardo [::mak] from comment #152)
> The thing that worries me more is the raw insertion into Places tables, we
> already had broken APIs writing broken data and we know it's painful.

That makes sense. It would be unfortunate if we had to add more maintenance tasks and workarounds for incoherent data. I think we can mitigate this in a few ways:

1. We'll land the buffer code pref'd off, and ask staff who use Sync to manually flip the pref and try the new engine. Last year, we got some useful About Sync reports after the tracker work landed. Many of those were from folks who had been using Sync for a while, and had interesting patterns of corruption on the server. This would give us a chance to ensure that the buffer code handles these cases correctly, without corrupting Places.

2. One of our objectives for this quarter is to move validation and repair into a system add-on. The validator already has some checks to detect Places inconsistencies. We could extend those into a more comprehensive check (similar to `PlacesDBUtils`) that records additional Sync ping telemetry. We could use this add-on to flip the pref gradually for users, then monitor telemetry for coherence issues. If we see upticks in local orphans, invalid GUIDs and timestamps, two bookmarks in the same position, and holes in the local tree, we'd stop the rollout and investigate.

3. We should have the new engine create a JSON backup before it does any merging. Most syncs are no-ops, so we won't create a backup every time we sync, but this will make it easier to recover if something goes wrong. The system add-on could even try to do an automatic restore if the coherence check fails.

> How
> critical is it to have that in the first incarnation of this approach? Could
> we have a safer but slower insertTree code path and a switch to use the raw
> insertion?

That's a good point, using `insertTree` occurred to me at first. Unfortunately, I think we'd need something more like `insertOrUpdateTree` to make this work. For context, the merger builds complete local and remote trees, including new items on both sides, GUID changes, moves, value changes, and existing items that haven't changed. However, `insertTree` expects new items; it won't update existing ones.

The easiest way to use `insertTree` with this patch is probably to treat every bookmark sync as a restore. Fetch all the value info we need when we build the local tree, merge the trees as usual, clear everything in Places using `eraseEverything`, then use `insertTree` to repopulate Places. Most changes between syncs should be fairly small, so clearing and restoring your entire tree just for a few changes seems excessive. Also, if things go wrong during the insert, it's more likely that we'll suddenly erase all your bookmarks...though making a backup before merging changes would at least make it possible to disable the engine and start again.

Alternatively, we can call the individual `insert` and `update` methods as we do now. I think we'd still want to expose special versions of those APIs that can be called from within existing transactions. Otherwise, we risk the tree changing out from under us if the user modifies their bookmarks during the sync. (That's also why I'm attaching to Places from the buffer connection, instead of using the Places connection and attaching to the buffer: it's not safe for https://searchfox.org/mozilla-central/rev/dd47bee6468de7e1221b4d006342ad6b9813d0e5/toolkit/modules/Sqlite.jsm#569-578 or https://searchfox.org/mozilla-central/rev/dd47bee6468de7e1221b4d006342ad6b9813d0e5/storage/mozStorageHelper.h#89-91 to ignore the "transaction already in progress" error and commit changes as part of our merge transaction).

> Second thing, can some parts of this be moved to separate bugs and start
> landing? Multiple patches are nice, but multiple bugs that can land before
> the scarier part are even better. The keywords, livemarks and hash changes
> look isolated enough to be able to land first.

Definitely. It's not as clear why we're exposing `hashURL` or `invalidateCachedLivemarks` without the Sync part, but I can split the first four patches out into separate bugs, and land them earlier. The "prepare existing tests for the new engine" patch can land immediately.

> Not sure if some of the Sync
> parts could start landing as well.

That's harder. I could split `SyncBookmarkBuffer.jsm` up into separate stages (store in buffer, merge, apply to Places), but, since it's all new code, I'm not sure how much that would help. It makes sense to keep the last two patches attached to this bug. Sadly, those are also the scariest parts. :-(

> Let's try to eat this elephant one piece at a time.

I want to put this quote in a comment somewhere.
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #153)
> The easiest way to use `insertTree` with this patch is probably to treat
> every bookmark sync as a restore. Fetch all the value info we need when we
> build the local tree, merge the trees as usual, clear everything in Places
> using `eraseEverything`, then use `insertTree` to repopulate Places.

It doesn't sound cheap. Indeed it's probably a bad idea. Thank you for explaining.
I like your idea of dogfooding the code for a while.

> > Not sure if some of the Sync
> > parts could start landing as well.
> 
> That's harder.

I see, let's start landing what is sufficiently insulated from the rest, so we reduce the code load in this review.
Depends on: 1412139
Depends on: 1412141
Depends on: 1412142
Depends on: 1412143
Attachment #8901392 - Flags: review?(mak77)
Attachment #8906193 - Flags: review?(mak77)
Attachment #8917620 - Flags: review?(mak77)
Blocks: 1414435
Depends on: 1417101
No longer depends on: 1417101
See Also: → 1417101
Attachment #8901392 - Attachment is obsolete: true
Attachment #8917620 - Attachment is obsolete: true
Attachment #8906193 - Attachment is obsolete: true
Attachment #8908848 - Attachment is obsolete: true
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review206424

Just rebasing, nothing new to see here.

::: toolkit/components/places/SyncBookmarkBuffer.jsm:177
(Diff revision 17)
> +      options.recordTelemetryEvent("buffer", "open", "error",
> +                                   { why: "corrupt" });
> +      await OS.File.remove(options.path);
> +      db = await Sqlite.openConnection(options);
> +    }
> +    try {

I'm not sure if we want to set `PRAGMA journal_mode = wal`, too. Mak, do you have advice here?

Except for first syncs, I expect we'll read more frequently than write, and we're not concerned about readers and writers blocking one another: only Sync accesses the buffer, and Sync is sequential. If we do use the WAL, should we also change the page size, or keep the default?

::: toolkit/components/places/SyncBookmarkBuffer.jsm:1658
(Diff revision 17)
> +    }
> +    let tagFolderId = idRows[0].getResultByName("id");
> +    return { id: tagFolderId, guid: tagFolderGuid, created: true };
> +  }
> +
> +  async fetchLocalChangeRecords(recordFactory, deletedRecordFactory) {

Technically, `fetchLocalChangeRecords` doesn't need to live in this file anymore, now that I moved it out of the transaction. I might just extend `PlacesSyncUtils.bookmarks.pullChanges` to return full records instead of IDs, and then we can simplify this module a bit.
(In reply to Kit Cambridge (he/him) [:kitcambridge] from comment #159)
> I'm not sure if we want to set `PRAGMA journal_mode = wal`, too. Mak, do you
> have advice here?

WAL helps if you have a lot of writes that you can't do in chunks, and/or you need concurrent readers (from a different connection).
I think you do chunked additions, and you have a single connection, so I don't think it would help you anyway. Is it the time taken for your readers to execute critical?
WAL is also a bad pick if you have transactions that last a long time. In those cases it may generate huge journals that are quite expensive to merge later (I had a couple cases in Places where not chunking the changes caused a journal as large as 3 times the original db).
The second better pick is TRUNCATE, that is a little bit faster on Linux than the other journals.

> If we do use the WAL,
> should we also change the page size, or keep the default?

There's no strict relation with the page size, at a maximum you should change the journal size limit, autocheckpoint and other wal related things. But I don't see strict reasons to go wal here.
Blocks: 1410877
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review208924

I initially thought the buffer db was going to be a memory DB.
What happens if the user replaces places.sqlite with a different one, but leaves the buffer db behind? For example, he may be restoring a places.sqlite from a backup, or moving it across profiles. While it's all edge cases, we should be sure it won't be extremely painful to handle.
Off-hand looks like this db has no direct relation with places.sqlite and thus it should be fine, but I want to ask regardless what you think.

There are other 2 blocking problems, that could also partially explain why this is is so much faster:
1. There's no frecency calculation for the added/removed bookmarks. The normal APIs do recalculate frecency every time a bookmark is added or removed. At least frecency should be set to -1 so it will be recalculated at the next idle, but that also means a synced bookmark will be pushed down in the location bar results.
2. Since most of the operations seem to happen on you connection (to which you attach places.sqlite), all the Places triggers on moz_places and moz_bookmarks won't run. This means: the moz_hosts table will be out of sync, the foreign_count column in moz_places will be out of sync. These are coherency problems.

::: services/sync/tests/unit/test_bookmark_buffer.js:3
(Diff revision 18)
> +Cu.import("resource://gre/modules/osfile.jsm");
> +Cu.import("resource://gre/modules/PlacesUtils.jsm");
> +Cu.import("resource://gre/modules/PlacesSyncUtils.jsm");

doesn't head_helpers include some of these already?

Fwiw, this test is very very long, maybe it could be splitted into 2 or 3 parts and utils be shared through head?

::: toolkit/components/places/PlacesSyncUtils.jsm:1038
(Diff revision 18)
>  
>    /**
>     * Returns `undefined` if no sensible timestamp could be found.
>     * Otherwise, returns the earliest sensible timestamp between `existingMillis`
>     * and `serverMillis`.
>     */

Looks like this javadoc needs an update since we don't return undefined anymore

::: toolkit/components/places/SyncBookmarkBuffer.jsm:48
(Diff revision 18)
> + */
> +
> +this.EXPORTED_SYMBOLS = [
> +  "BookmarkConsistencyError",
> +  "BookmarkBuffer",
> +];

due to the size of the module, I hope we'll always defineLazyGetter it.. though, by exposing 2 objects that becomes a little bit less handy.

Maybe we could expose BookmarkConsistencyError somewhere else, maybe as  PlacesSyncUtils.BookmarkConsistencyError?

I'd also prefer if the exposed object is named the same as the jsm, per usual convention.

::: toolkit/components/places/SyncBookmarkBuffer.jsm:72
(Diff revision 18)
> +  Log.repository.getLogger("Sync.Engine.Bookmarks.Buffer")
> +);
> +
> +// Keep in sync with `PlacesUtils`. These can be removed once they're exposed on
> +// `PlacesUtils.history` (bug 1375896).
> +const DB_URL_LENGTH_MAX = 65536;

I want to clarify that we actually have 2 limits. One is this broad limit, that is used when you bookmark a page for example, bookmarks pretty much have no limits on uris. On the other side pages added through history are limited to MaxUrlLength (2k IIRC).

::: toolkit/components/places/SyncBookmarkBuffer.jsm:73
(Diff revision 18)
> +);
> +
> +// Keep in sync with `PlacesUtils`. These can be removed once they're exposed on
> +// `PlacesUtils.history` (bug 1375896).
> +const DB_URL_LENGTH_MAX = 65536;
> +const DB_TITLE_LENGTH_MAX = 4096;

This is another limit that doesn't make much sense and we should reduce it by a lot, likely it shouldn't be larger than description.
Please change bug 1375896 to be a bit more generic and expose all of these limits more broadly, so we can handle them in just one place.

::: toolkit/components/places/SyncBookmarkBuffer.jsm:74
(Diff revision 18)
> +
> +// Keep in sync with `PlacesUtils`. These can be removed once they're exposed on
> +// `PlacesUtils.history` (bug 1375896).
> +const DB_URL_LENGTH_MAX = 65536;
> +const DB_TITLE_LENGTH_MAX = 4096;
> +const DB_DESCRIPTION_LENGTH_MAX = 1024;

We are reducing this to 256 in bug 1420571
It's likely old entries won't be touched, new ones will be cut.

::: toolkit/components/places/SyncBookmarkBuffer.jsm:161
(Diff revision 18)
> +   *          value: String?, extra: Object?)`, used to emit telemetry events.
> +   * @param   {AsyncShutdown.Barrier} [options.finalizeAt]
> +   *          A shutdown phase, barrier, or barrier client that should
> +   *          automatically finalize the buffer when triggered. Exposed for
> +   *          testing.
> +   * @returns {BookmarkBuffer}

nit: @return (please check other instances)

::: toolkit/components/places/SyncBookmarkBuffer.jsm:202
(Diff revision 18)
> +   *          The high water mark time, in seconds.
> +   */
> +  async getCollectionHighWaterMark() {
> +    let rows = await this.db.executeCached(`
> +      SELECT IFNULL(MAX(serverModified), 0) AS highWaterMark FROM (
> +        SELECT serverModified FROM items

could this subquery pick the max already?

::: toolkit/components/places/SyncBookmarkBuffer.jsm:318
(Diff revision 18)
> +   *          upload to the server, and to store in the buffer once upload
> +   *          succeeds.
> +   */
> +  async apply({ localTimeSeconds = Date.now() / 1000,
> +                remoteTimeSeconds = 0 } = {}) {
> +    let placesDB = PlacesUtils.history.QueryInterface(Ci.nsPIPlacesDatabase);

don't need to QI, PlacesUtils does it already

::: toolkit/components/places/SyncBookmarkBuffer.jsm:332
(Diff revision 18)
> +      parentGuid TEXT NOT NULL,
> +      position INTEGER NOT NULL,
> +      valueState INTEGER NOT NULL,
> +      structureState INTEGER NOT NULL,
> +      PRIMARY KEY(localGuid, mergedGuid)
> +    )`);

ditto on WITHOUT ROWID

::: toolkit/components/places/SyncBookmarkBuffer.jsm:896
(Diff revision 18)
> +      )
> +      SELECT b.id, b.guid, p.guid AS parentGuid, (CASE b.type
> +        /* Map Places item types to Sync record kinds. */
> +        WHEN :bookmarkType THEN (
> +          CASE WHEN EXISTS(
> +            /* Queries are bookmarks with a smart bookmark annotation. */

this is not completely true, queries are just urls with a place: scheme, some queries are "smart" and have the anno.

::: toolkit/components/places/SyncBookmarkBuffer.jsm:1004
(Diff revision 18)
> +   * @param {MergedBookmarkNode} mergedRoot
> +   *        The root of the merged bookmark tree.
> +   * @param {BookmarkObserverRecorder} observersToNotify
> +   *        Records Places observer notifications for all local changes.
> +   */
> +  async updatePlaces(mergedRoot, observersToNotify) {

May we rename this to a unique name, to avoid confusion when searching code?

::: toolkit/components/places/SyncBookmarkBuffer.jsm:1106
(Diff revision 18)
> +      LEFT JOIN moz_places h ON h.url_hash = u.hash AND
> +                                h.url = u.url
> +      JOIN mergeStates r ON r.mergedGuid = v.guid
> +      WHERE r.valueState = :valueState`,
> +      { valueState: BookmarkMergeState.TYPE.REMOTE });
> +  }

After bug 1414892, when we insert into moz_places we don't update moz_hosts anymore, we accumulate into moz_updatehostsinsert_temp, then a DELETE FROM moz_updatehostsinsert_temp takes care of updating moz_hosts in the same transaction.

Though, if you are on your connection, the original trigger won't fire at all since it's only on the Places connection.

::: toolkit/components/places/SyncBookmarkBuffer.jsm:1208
(Diff revision 18)
> +
> +    // Update the value state, using `-1` as placeholders for new remote items.
> +    // We'll update their structure later.
> +    BookmarkBufferLog.debug("Updating value states for local bookmarks");
> +    await this.db.execute(`
> +      REPLACE INTO moz_bookmarks(id, guid, parent, position, type, fk, title,

replace doesn't always fire triggers (provided they fire at all). It may not be a problem until we don't change the url of the bookmark...

::: toolkit/components/places/SyncBookmarkBuffer.jsm:1360
(Diff revision 18)
> +    BookmarkBufferLog.debug("Removing all existing tag entries");
> +    for (let chunk of PlacesSyncUtils.chunkArray(tagEntryIdsToDelete,
> +      SQLITE_MAX_VARIABLE_NUMBER)) {
> +
> +      await this.db.execute(`
> +        DELETE FROM moz_bookmarks WHERE id IN (${

Again, if this uses your connection, this won't fire our trigger that reduces foreign_count in moz_places

::: toolkit/components/places/SyncBookmarkBuffer.jsm:1480
(Diff revision 18)
> +        WHERE fk IN (${new Array(chunk.length).fill("?").join(",")})`,
> +        chunk);
> +
> +      // Remove existing keywords from remotely changed URLs.
> +      await this.db.execute(`
> +        DELETE FROM moz_keywords

this also doesn't fire triggers

::: toolkit/components/places/SyncBookmarkBuffer.jsm:1934
(Diff revision 18)
> +  // Sync bookmarks collection metadata. Currently stores the collection
> +  // last modified time.
> +  await db.executeCached(`CREATE TABLE collectionMeta(
> +    type INTEGER UNIQUE NOT NULL,
> +    value TEXT NOT NULL
> +  )`);

I'd prefer to see an explicitly primary key, in the worst case it will just be an id integer column that aliases rowid. In any case not having a PK will still create one to store rowid.
Though, it's not completely clear what "type" is supposed to be, if it is expected to be unique (and this is just a store of metadata, each with it's own code), it could well be the primary key.

Also, why executeCached?

::: toolkit/components/places/SyncBookmarkBuffer.jsm:1948
(Diff revision 18)
> +    kind INTEGER NOT NULL DEFAULT -1,
> +    /* The creation date, in microseconds. */
> +    dateAdded INTEGER NOT NULL DEFAULT 0,
> +    title TEXT,
> +    urlId INTEGER REFERENCES urls(id)
> +                  ON DELETE SET NULL,

Could you please document why we retain references with a null urlId?
It sounds dangerous at first glance, indeed updateLocalItems seems to set fk to NULL if urlId is null, but then we could have a bookmark with a null fk, that is unexpected.

::: toolkit/components/places/SyncBookmarkBuffer.jsm:1949
(Diff revision 18)
> +    /* The creation date, in microseconds. */
> +    dateAdded INTEGER NOT NULL DEFAULT 0,
> +    title TEXT,
> +    urlId INTEGER REFERENCES urls(id)
> +                  ON DELETE SET NULL,
> +    keyword TEXT,

what about post data? usually a keyword is made of 2 infos, the keyword itself and a (possibly null) post data string.
Potentially, there would also be a charset (currently stored as a page anno)

::: toolkit/components/places/SyncBookmarkBuffer.jsm:1950
(Diff revision 18)
> +    dateAdded INTEGER NOT NULL DEFAULT 0,
> +    title TEXT,
> +    urlId INTEGER REFERENCES urls(id)
> +                  ON DELETE SET NULL,
> +    keyword TEXT,
> +    tagFolderName TEXT

how does this support our (likely short term) future state where tags won't be nor folders nor bookmarks, but just a separate entity with a relation with bookmarks?

::: toolkit/components/places/SyncBookmarkBuffer.jsm:1980
(Diff revision 18)
> +    itemId INTEGER NOT NULL REFERENCES items(id)
> +                            ON DELETE CASCADE,
> +    attributeId INTEGER NOT NULL REFERENCES annoAttributes(id)
> +                                 ON DELETE CASCADE,
> +    value TEXT NOT NULL,
> +    PRIMARY KEY(itemId, attributeId)

You may want to use WITHOUT ROWID since you have a non integer primary key.

::: toolkit/components/places/SyncBookmarkBuffer.jsm:1984
(Diff revision 18)
> +    value TEXT NOT NULL,
> +    PRIMARY KEY(itemId, attributeId)
> +  )`);
> +
> +  await db.execute(`CREATE TABLE tags(
> +    itemId INTEGER NOT NULL REFERENCES items(id)

same question as above, we should start with a plan for future tags storage.

::: toolkit/components/places/SyncBookmarkBuffer.jsm:2001
(Diff revision 18)
> +  // Set up the syncable roots.
> +  await createBufferRoots(db);
> +
> +  // Set up synced anno attributes. We store annos and anno attributes in
> +  // separate tables, instead of columns in `items`, to make merging easier.
> +  const syncableAnnos = [{

If we will just store a subset of annos, we could maybe completely remove the attributes table from here and just repeat the name and type for each anno in the annos table.
Most of these annos are unlikely to be present for each bookmark, description, smart and livemarks annos will be short lived, it's all deprecated stuff.
Attachment #8901394 - Flags: review?(mak77)
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review209350
Attachment #8901394 - Flags: review-
Attached patch clone-temps.patch (obsolete) — Splinter Review
Mak and I chatted a bit more about the temp trigger problem on IRC yesterday, and I did some hacking on the train today.

For folks following along, the reason I'm using a separate buffer connection that attaches to Places, instead of using `PlacesUtils.withConnectionWrapper` and attaching to the buffer, is to work around our nested transaction handling.

SQLite transactions can't nest, so executing `BEGIN` on a connection with an active transaction will fail. However, the Places connection has JS (via `Sqlite.jsm`) and native (via `PlacesUtils.history.DBConnection`) callers, each unaware that the other might have started a transaction.

Sqlite.jsm handles this specially for wrapped connections: if it sees a transaction is already in progress, it proceeds anyway, and executes statements as part of the active transaction (https://searchfox.org/mozilla-central/rev/7f45cb7cc0229398fc99849bdc150cb6462d6966/toolkit/modules/Sqlite.jsm#567,569-578). On the C++ side, `mozStorageTransaction` does the same (https://searchfox.org/mozilla-central/rev/7f45cb7cc0229398fc99849bdc150cb6462d6966/storage/mozStorageHelper.h#89-91).

Unfortunately, I think this will confuse the buffer. The buffer starts an exclusive transaction, builds a bookmark tree from what's in Places, merges it with the synced tree, and writes changes back to Places based on the in-memory merged tree. It assumes that the tree won't change underneath it as it's merging...which isn't true if we let other callers (`Bookmarks.jsm` and the synchronous `PlacesUtils.bookmarks` API) plow ahead and make changes as part of our exclusive transaction.

Using a separate connection means other writes will fail with `SQLITE_BUSY` until we're done with our transaction. IIUC, per comment 122, mozStorage will automatically wait and retry async statements. So, it doesn't matter whether we attach Places to the buffer, or the buffer to Places; but we do need to update Places from a separate connection.

It looks like mozStorage already has a way to clone connections. `clone` and `asyncClone` copy over attached databases, pragmas, and SQL functions, but not temp entities. I did some digging, and discovered that we can get at the statements to create temp tables and triggers (and views, I think) through the `sqlite_temp_master` table.

In this patch, I extended `clone` to do that. It seems to work, but I'm not sure if there are gotchas with this approach. (We're duplicating temp tables in memory, would that be a problem? What about other statements that access the temp table on the main connection, outside of the triggers? Are there any?)

But, assuming this is OK, I think I can change the buffer patch to do something like this:

    let db = await Sqlite.cloneStorageConnection({
      connection: PlacesUtils.history.DBConnection,
      readOnly: false,
    });
    await db.execute(`ATTACH '/path/to/buf.db' AS buf`);

What do you think, Mak? (And thank you very much for reviewing! :-) I'll reply more in depth to your other comments tomorrow).
Attachment #8933543 - Flags: feedback?(mak77)
(In reply to Kit Cambridge (he/him) [:kitcambridge] from comment #165)
> In this patch, I extended `clone` to do that. It seems to work, but I'm not
> sure if there are gotchas with this approach. (We're duplicating temp tables
> in memory, would that be a problem? What about other statements that access
> the temp table on the main connection, outside of the triggers? Are there
> any?)

Temp entities exist on a connection basis, they are part of a connection.
I'm not sure I understand your last question, a statement that uses the temp entities must run on a connection where those entities exist.
Clone() this far has mostly been used to create read-only clones, because the WAL journal (but in general embedded db systems) works better with a single writer and many readers, so that all the writes are serialized and don't block each other. That may also explain why why we didn't bother to clone temp entities so far: usually readers don't really need them.

It could make sense to clone temp entities just for non read-only clones.
I currently can't think of a reason to not do so, but please split that change to a separate Storage bug and make it a dependency, so that other Storage peers can notice it and eventually comment about it, I may be missing something that they'd notice.
Attachment #8933543 - Flags: feedback?(mak77) → feedback+
(In reply to Marco Bonardo [::mak] from comment #166)
> Temp entities exist on a connection basis, they are part of a connection.
> I'm not sure I understand your last question, a statement that uses the temp
> entities must run on a connection where those entities exist.

Right, I meant "do we have any code, outside of the triggers, that selects from the temp table and updates a non-temp table on the main connection?"

I'm worried we might still end up with coherence issues if we, say, have a timer that reads from a temp table and does something with the results. (IIUC, you're proposing something like this in bug 1420597, but for history, not bookmarks). As you said, temp entities exist per connection, so that timer won't see any rows added to the cloned connection's temp tables.

I'll go ahead and split this out into its own bug; just wanted to get your quick feedback before going further down that path. Thanks!
(In reply to Kit Cambridge (he/him) [:kitcambridge] from comment #167)
> I'm worried we might still end up with coherence issues if we, say, have a
> timer that reads from a temp table and does something with the results.
> (IIUC, you're proposing something like this in bug 1420597, but for history,
> not bookmarks). As you said, temp entities exist per connection, so that
> timer won't see any rows added to the cloned connection's temp tables.

Ah yes, cloned triggers will just work, but you'll still have to do the same operations on temp tables that the original connection does, because you have your own instance of those temp tables. For example you'll have to delete from moz_UPDATEHOSTSINSERT_TEMP and moz_UPDATEHOSTSDELETE_TEMP after you execute a query that could modify those temp tables.
Things like bug 1420597 may be a bit more complex, but I don't yet have a clear idea on how that may end up, so it's too early to despair. In general the triggers should be written in a way to be able to handle missing entries in the destination table (the where will just return nothing) and there should be no problem.
Depends on: 1422383
(In reply to Marco Bonardo [::mak] from comment #163)
> What happens if the user replaces places.sqlite with a different one, but
> leaves the buffer db behind? For example, he may be restoring a
> places.sqlite from a backup, or moving it across profiles. While it's all
> edge cases, we should be sure it won't be extremely painful to handle.
> Off-hand looks like this db has no direct relation with places.sqlite and
> thus it should be fine, but I want to ask regardless what you think.

Right, the buffer DB has no direct relation to Places, so this is OK. Indeed, the first time you sign in to Sync on a profile with existing bookmarks, it's almost certain that Places and the buffer won't match. The buffer is really a mirror of the server, and merging reconciles the mirror with Places. Maybe `SyncedBookmarksMirror` would be a better name for it.

The longer answer, and thinking out loud some more...

If Firefox detects corruption and restores `places.sqlite` from a backup, that's fine. Restoring will set `syncChangeCounter = 1` and `syncStatus = UNKNOWN` for all bookmarks, the merger will run on the next sync, reupload the restored bookmarks, reset change counters, and set `syncStatus = NORMAL`. Restoring from a backup in the Library wipes Places, the buffer, and the Sync server, so we'll start fresh on the next sync.

It would be good to add a buffer test that merges after a restore. One test might have a buffer that's close or identical to what was restored; another might be completely different, to check that the merger still builds a coherent tree.

If `sync-bookmarks.sqlite` is somehow corrupted, that's fine, too. On the next sync, we'll redownload everything from the server, and do a full merge. If Places is mostly up-to-date with the server, we won't need to reupload anything. This is also the case if you refresh Firefox, which will keep `places.sqlite` and `signedInUser.json`, but discard `sync-bookmarks.sqlite`.

Moving `places.sqlite` between profiles that aren't signed in to Sync, or from a signed-in profile to a not-signed-in profile, is OK. For not-signed-in to not-signed-in, there are no sync status flags to worry about. For signed-in to not-signed-in, if you eventually sign in to Sync on the second profile, we'll reset all local flags before the first sync.

Things get interesting if you move `places.sqlite` from a not-signed-in profile, or signed in to one Firefox Account, to a profile signed in to a *different* account. In that case, the `syncChangeCounter` and `syncStatus` flags will be off, and Sync will be very confused. You'll need to sign out and sign back in on the second profile. (This is already the case today, unless you also copy over all your prefs).

Moving `sync-bookmarks.sqlite` between different accounts will certainly confuse Sync, because the buffer won't match the server at all. However, our "sync ID mismatch" logic should kick in at that point, wipe the buffer, reset the sync flags in Places, and treat it as a first sync. We can improve this by storing `services.sync.bookmarks.syncID` in the buffer's `collectionMeta` table, instead of in prefs.

We could consider storing the `syncID` in Places, too. If it doesn't match what's in the buffer, or the buffer doesn't match what's on the server, delete the buffer, reset all bookmarks in Places to `syncChangeCounter = 1` and `syncStatus = NEW`, and continue as a first sync.

I'm really getting into the weeds here, and I think there's some follow-up work we can do to make this more robust.

> 1. There's no frecency calculation for the added/removed bookmarks. The
> normal APIs do recalculate frecency every time a bookmark is added or
> removed. At least frecency should be set to -1 so it will be recalculated at
> the next idle, but that also means a synced bookmark will be pushed down in
> the location bar results.

Thank you for calling this out, I forgot about frecencies. :-( I think something like this should work:

    UPDATE moz_places SET
      frecency = CALCULATE_FRECENCY(id)
    WHERE id IN (
      SELECT h.id FROM moz_places h
      JOIN buf.urls u ON u.hash = h.url_hash AND
                         u.url = h.url
      JOIN items v ON v.urlId = u.id
      JOIN mergeStates r ON r.mergedGuid = v.guid
      WHERE r.valueState = REMOTE
    )

And fire an `onManyFrecenciesChanged` notification.
Attachment #8933543 - Attachment is obsolete: true
(In reply to Kit Cambridge (he/him) [:kitcambridge] from comment #169)
> We could consider storing the `syncID` in Places, too.

From some time I think we should add a simple key/value metadata table to places.sqlite that allows to store this kind of data.
Comment on attachment 8901394 [details]
Bug 1305563 - Add a bookmark buffer and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/172854/#review208924

> doesn't head_helpers include some of these already?
> 
> Fwiw, this test is very very long, maybe it could be splitted into 2 or 3 parts and utils be shared through head?

I'll split this up in the next round, especially because I'd also like to add tests for import and restore. Leaving the issue open until then.

> due to the size of the module, I hope we'll always defineLazyGetter it.. though, by exposing 2 objects that becomes a little bit less handy.
> 
> Maybe we could expose BookmarkConsistencyError somewhere else, maybe as  PlacesSyncUtils.BookmarkConsistencyError?
> 
> I'd also prefer if the exposed object is named the same as the jsm, per usual convention.

I renamed the class to `SyncedBookmarksMirror`, per comment 169, to make it clearer that it's persistent. Also exposed the error as `SyncedBookmarksMirror.ConsistencyError`. I didn't rename the file yet, because I don't think Git has an `hg mv` equivalent, and I don't want to mess up the interdiff. Leaving the issue open so I remember to do that before landing.

> This is another limit that doesn't make much sense and we should reduce it by a lot, likely it shouldn't be larger than description.
> Please change bug 1375896 to be a bit more generic and expose all of these limits more broadly, so we can handle them in just one place.

I changed the description limit to 256. Keeping title the same for now, pending discussion in bug 1375896, but leaving the issue open. Maybe we can figure out suitable limits next week in Austin.

> could this subquery pick the max already?

I think I'd need a subquery either way, but I can clean this up a bit. Maybe something like `SELECT MAX(IFNULL((SELECT MAX(serverModified) FROM items), 0), IFNULL((SELECT CAST(value AS INTEGER) FROM collectionMeta WHERE type = :type), 0))`. I want the newer of the most recent record, or the collection modified time, where either or both can be NULL.

> After bug 1414892, when we insert into moz_places we don't update moz_hosts anymore, we accumulate into moz_updatehostsinsert_temp, then a DELETE FROM moz_updatehostsinsert_temp takes care of updating moz_hosts in the same transaction.
> 
> Though, if you are on your connection, the original trigger won't fire at all since it's only on the Places connection.

Bug 1422383 and bug 1423820 should help here, and I added `DELETE FROM moz_updatehostsinsert_temp` to fire the trigger. I think I should also update the tests to check `moz_hosts` coherency.

> replace doesn't always fire triggers (provided they fire at all). It may not be a problem until we don't change the url of the bookmark...

I'd like your feedback about my new approach here, to avoid `REPLACE` and clean things up in general.

I created two views, one for the items and one for the structure, and several `INSTEAD OF DELETE` triggers that update Places. On the one hand, this seems a bit hacky, since I'm not deleting anything; I'm just exploiting the fact that the trigger fires for each row. On the other, this lets SQLite fetch the affected rows for us, instead of our code selecting everything from the view, then executing separate `INSERT`, `UPDATE`, and `DELETE` statements. The views also avoid repeating joins on `mergeStates` to find local bookmarks that need updating, which I've goofed more than once. I left tags alone, since the way they're implemented makes it tricky to reimplement in the trigger, and we'll be migrating them soon, anyway.

I also switched to using temp tables and triggers to store info for observer notifications. This feels cleaner, and I'm especially happy about removing the complicated queries to find changed annos and keywords. These might also be easier to remove entirely once we redesign observers. It does require more temp tables and recursive triggers.

> I'd prefer to see an explicitly primary key, in the worst case it will just be an id integer column that aliases rowid. In any case not having a PK will still create one to store rowid.
> Though, it's not completely clear what "type" is supposed to be, if it is expected to be unique (and this is just a store of metadata, each with it's own code), it could well be the primary key.
> 
> Also, why executeCached?

This "collectionMeta" table is essentially the same key/value metadata table that you mentioned you'd like to add to places.sqlite. Renamed the table to "meta" and the "type" column to "key", and also made it the primary key. Dropped `executeCached` here and everywhere else it didn't make sense.

> Could you please document why we retain references with a null urlId?
> It sounds dangerous at first glance, indeed updateLocalItems seems to set fk to NULL if urlId is null, but then we could have a bookmark with a null fk, that is unexpected.

Everything except bookmarks and queries should have a NULL `urlId`. I added a `CHECK` constraint to validate that.

> what about post data? usually a keyword is made of 2 infos, the keyword itself and a (possibly null) post data string.
> Potentially, there would also be a charset (currently stored as a page anno)

We don't currently sync the POST data or charset, so bookmarks will lose them if they're round-tripped through Sync, yes. Ed made a patch to support POST data in bug 1345417, but it turned out to be gnarly. That's why I'm clearing incoming keywords for records in the mirror, and for the old URL and new URL.

> how does this support our (likely short term) future state where tags won't be nor folders nor bookmarks, but just a separate entity with a relation with bookmarks?

I think moving tags to a separate table will simplify this substantially. Since we'll include the tag directly in the "query:" URL, we won't need to rewrite tag queries anymore, and we could move tag updates into our `INSTEAD OF DELETE ON newRemoteItems` trigger. `rewriteRemoteTagQueries`, `updateLocalTags`, and `getOrCreateLocalTagFolder` could go away entirely. However, we'll still want to carry the tag folder name around, so that old clients can apply tag queries. I'll comment more on bug 1421664.

> If we will just store a subset of annos, we could maybe completely remove the attributes table from here and just repeat the name and type for each anno in the annos table.
> Most of these annos are unlikely to be present for each bookmark, description, smart and livemarks annos will be short lived, it's all deprecated stuff.

So...I originally separated the attributes into a `mirror.annoAttributes` table because we only sync a subset of annos, and we need to distinguish between "synced anno doesn't exist in `mirror.annos` because it's not set for an item" and "synced anno doesn't exist in `mirror.annos` because it was removed for an item". The contents of `mirror.annoAttributes` are fixed, and we keep things simple during application by deleting all synced annos from `moz_items_annos`, then repopulating from `mirror.annos` for each item. 5 annos was just annoying enough to repeat enough time.

However, using triggers and one view for updates makes this somewhat easier, so I went ahead and dropped both `mirror.annoAttributes` and `mirror.annos`, flattened the annos into columns on `mirror.items`, and extended the triggers trigger (built up with string fragments, yeah!) to update `moz_items_annos`. In theory, this will be easier to clean up or remove if we decide not to sync those annos anymore.
Since I already did a first pass that identified some of the main issues with the patch (now resolved), and since I won't be able to look into this for at least a couple weeks, I suggest to crash land it as rs=mak and I'll do a post landing formal review in January.

This is based on a few premises:
- the patch cannot cause regressions
- the feature is disabled by default
- we have a good understanding of what it does and how it's doing it
- my main "concerns" are limited to a single jsm module, thus well contained

There are a few advantages into doing this:
- QA may begin working on a stable build containing the feature
- Sync team can begin working on follow-up work without having to share queues of patches that could bitrot easily
See Also: → 1426554
Blocks: 1426556
Attachment #8901394 - Attachment is obsolete: true
I think this has changed enough over the past few weeks to where I wouldn't mind a second look, even if it's just to make sure everything still makes sense. I'll hold off on landing this until I'm back from holiday in mid-January, so there's no hurry. The merger code is the same, but the way we apply the merged tree to Places is different. I also renamed "buffer" to "mirror", to reflect that it's persistent and matches the complete server tree.

The biggest change is that we now use `INSTEAD OF` triggers on two views (`newRemoteItems` and `newRemoteStructure`) to figure out what to update, and separate triggers on `moz_bookmarks` to record the changes in temp tables. The old approach used `REPLACE` statements to update Places, and separate queries to fetch item info for observers. `REPLACE` doesn't fire triggers, as Mak pointed out in comment 163, and I caught some corner cases where the observer queries missed some changes. I talked about this new approach with Mak at the All Hands; he said it's fine, as long as we don't read from the views, since they can't use indexes.

I'm really just using (or abusing) the views and triggers here to execute a group of statements for every row, without reading from the database first (`DELETE FROM newRemoteItems` fires the `INSTEAD OF DELETE` triggers for every row in the view, even though we're not actually deleting anything. I could just as easily define an `INSTEAD OF UPDATE` trigger, and update a column for every row; the effect's the same).

The triggers are equivalent to `SELECT * FROM newRemoteItems`, looping over the rows in JS, and executing the statements in the trigger body between `BEGIN` and `END`. That adds overhead, especially for large trees, since we need to execute each statement on the storage thread, wrap each result row in a `mozStorageRow` object, and pass the results back to the main thread. Using the trigger, we still read before writing, but everything stays inside SQLite.

`localTags`, `tagLocalPlace`, and `untagLocalPlace` are annoyingly complex, because tags are currently stored as bookmarks. I tried to move as much of this complexity as I could into the `localTags` view, and this could all go away once we move tags to a separate table in bug 1293445.

Removing items is also somewhat complex, because we need to pass the removed items' info to the observers. Instead of deleting them directly, we buffer their info into `itemsToRemove` first, then use an `AFTER UPDATE OF wasRemoved` trigger to delete them from `moz_bookmarks`. There's a comment above the "itemsToRemove" table definition that explains a bit more.

The second big change is that there's no `recordFactory` or `deletedRecordFactory` anymore, and `fetchLocalChangeRecords` just builds the record cleartext directly. It also stages outgoing changed items in another temp table, instead of fetching and walking *every* item in the local tree. This has the nice side effect of fixing bug 1407857.

Thanks again for all your patience with this, folks. Happy holidays! :-)
Mak, I'm on PTO next week, as is Mark. Since I've requested re-review from everyone else, I'll just add this to your review queue in case you have cycles. Thanks!
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review219302

And this is why reviewing bookmarks syncing is hard...
Honestly, the patch is hardly reviewable line by line, and the topic is so complex that likely only a couple persons (kit and rnewman) can follow all of it 100%.
This makes me a little bit nervous for a couple reasons:
1. only a very limited number of people will be able to understand and keep this code working
2. since this bypasses any kind of API, it will increase Places refactoring costs, indeed we'll likely need your feedback more often
I hope the benefit is very well worth it!

By the look of it, it's very well done. It's not without risks though, so better taking it sooner and dedicating much more testing before it can be enabled for everyone.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:1151
(Diff revision 3)
> +      LEFT JOIN moz_places h ON h.url_hash = u.hash AND
> +                                h.url = u.url
> +      JOIN mergeStates r ON r.mergedGuid = v.guid
> +      WHERE r.valueState = :valueState`,
> +      { valueState: BookmarkMergeState.TYPE.REMOTE });
> +    await this.db.execute(`DELETE FROM moz_updatehostsinsert_temp`);

warning: bug 1239708 may be renaming this.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:1627
(Diff revision 3)
> +      await initializeMirrorDatabase(db);
> +    }
> +    // Downgrading from a newer profile to an older profile rolls back the
> +    // schema version, but leaves all new rows in place. We'll run the
> +    // migration logic again on the next upgrade.
> +    await db.execute(`PRAGMA mirror.user_version = ${MIRROR_SCHEMA_VERSION}`);

Sqlite.jsm has getSchemaVersion and setSchemaVersion... they could probably gain an optional schemaName = "main" argument.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:2079
(Diff revision 3)
> +            parent <> OLD.newParentId;
> +
> +      UPDATE moz_bookmarks SET
> +        position = OLD.newPosition
> +      WHERE id = OLD.localId AND
> +            position <> OLD.newPosition;

how much can we be sure this won't create holes in position?

::: toolkit/components/places/SyncedBookmarksMirror.jsm:2363
(Diff revision 3)
> +      inflateTree(tree, pseudoTree, node.guid);
> +    }
> +  }
> +}
> +
> +class BookmarkContent {

some javadoc above this?

::: toolkit/components/places/tests/sync/head_sync.js:124
(Diff revision 3)
> +  });
> +  return buf;
> +}
> +
> +// A debugging helper that dumps the contents of an array of rows.
> +function dumpRows(rows) {

This seems to be only used by dumpMirrorTable. You could expose a dumpTable from PlacesTestUtils taking a connection and a table name. Long term that would also replace dump_table from head_common.js
Attachment #8938967 - Flags: review?(mak77) → review+
Depends on: 1431149
Blocks: 1432681
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review219302

Thanks for the review, Mak! I share your concern that this increases the refactoring cost for Places, and I'm hoping that we can spread more knowledge out in follow-up bugs. This was not a chunk of work undertaken lightly: we investigated (bug 1323333, bug 747699, bug 1293163, bug 1392794), and landed (bug 1299978, bug 1276823), different ways to work around the lack of a tree merge, but there's a point where the hacks become less feasible.

For the common case, we work just fine today, but when we get it wrong, we really get it wrong. I'm optimistic that this patch will address some of those thornier consistency issues.

> how much can we be sure this won't create holes in position?

The `mergeStates` table should include all items that we're keeping from the local and remote trees, which we assert in `merger.subsumes(localTree)` and `merger.subsumes(remoteTree)`. Any items that aren't in `mergeStates` should be flagged for deletion, either in `deleteLocally` or `deleteRemotely`. Tags don't appear in `mergeStates`, but the `localTags` view and its `INSTEAD OF {INSERT, DELETE}` triggers take care of that. The tests also check that the positions are correct when they call `assertLocalTree`, sometimes checking the complete tree from the root to ensure we didn't corrupt positions elsewhere.
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review220720

::: toolkit/components/places/SyncedBookmarksMirror.jsm:1451
(Diff revision 4)
> +              // Desktops and Android use `parentid` as the canonical parent.
> +              // iOS is stricter and requires both `children` and `parentid` to
> +              // match.
> +              parentid: parentRecordId,
> +              // Older Desktops use `hasDupe` and `parentName` for deduping.
> +              hasDupe: false,

We should consider setting `hasDupe: true` unconditionally here, so that older Desktops don't try to apply their deduping logic. (I think iOS round-trips `hasDupe` for existing bookmarks). This suggests we might not need bug 1408551.

This does mean, though, that connecting a pre-mirror version to your account with existing dupes, like default bookmarks, will reupload them.
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review219608

This patch is very complicated, but that's the nature of the beast. I hope you have no plans on leaving any time soon ;) That said though, the quality of this patch and the test coverage is is excellent - very nice work Kit!

::: toolkit/components/places/SyncedBookmarksMirror.jsm:307
(Diff revision 3)
> +   */
> +  async apply({ localTimeSeconds = Date.now() / 1000,
> +                remoteTimeSeconds = 0 } = {}) {
> +    // We intentionally don't use `executeBeforeShutdown` in this function,
> +    // since merging can take a while for large trees, and we don't want to
> +    // block shutdown.

trailing this comment with " - but we will recover next time" or similar might put readers of this more at ease :)

::: toolkit/components/places/tests/sync/test_bookmark_corruption.js:778
(Diff revision 3)
> +  await PlacesUtils.bookmarks.eraseEverything();
> +  await PlacesSyncUtils.bookmarks.reset();
> +});
> +
> +add_task(async function test_tombstone_as_child() {
> +  // TODO(kitcambridge): Add a folder that mentions a tombstone in its

should we have a bug on file for these? (I'm guessing we will end up with a bug to enable this new magic by default, so now seems like a reasonable time to open it and to capture dependences - having them captured in this bug doesn't seem ideal)
Attachment #8938967 - Flags: review?(markh) → review+
Blocks: 1433177
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review221424

::: toolkit/components/places/SyncedBookmarksMirror.jsm:9
(Diff revision 4)
> +
> +"use strict";
> +
> +/**
> + * This file implements a mirror and two-way merger for synced bookmarks. The
> + * mirror matches the complete tree stored on the Sync server, and stages new

It's probably worth mentioning that the mirror _maintains_ a copy of the complete tree as stored on the Sync server. It's not transient.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:43
(Diff revision 4)
> + * - After application, we flag all applied incoming items as unchanged, create
> + *   Sync records for the locally new and updated items in Places, and upload
> + *   the records to the server.
> + *
> + * - Once upload succeeds, we update the mirror with the uploaded records, so
> + *   that the mirror matches the server again.

It would be nice to see a couple of lines explaining the state we're in if a failure occurs at any of these points,  how retrying will get us to a consistent state, and how we're reliant on atomic uploads for doing that correctly.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:74
(Diff revision 4)
> +// 1375896).
> +const DB_URL_LENGTH_MAX = 65536;
> +const DB_TITLE_LENGTH_MAX = 4096;
> +const DB_DESCRIPTION_LENGTH_MAX = 256;
> +
> +const SQLITE_MAX_VARIABLE_NUMBER = 999;

I would prefer to see this fetched from SQLite each time we're about to do some work:

`sqlite3_limit(db, SQLITE_LIMIT_VARIABLE_NUMBER)`

— it means this code doesn't need to change if we run against a system SQLite, or if we build our SQLite with different parameters.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:81
(Diff revision 4)
> +// The current mirror database schema version. Bump for migrations, then add
> +// migration code to `migrateMirrorSchema`.
> +const MIRROR_SCHEMA_VERSION = 1;
> +
> +/**
> + * A mirror stores a local copy of the remote bookmark tree in a separate

'local' is redundant and potentially misleading in this sentence.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:110
(Diff revision 4)
> + * records. We also want to avoid infinite sync loops, where two clients never
> + * converge, and each tries to make the other consistent by reuploading the same
> + * records. However, because we opt to continue syncing even if the remote tree
> + * is incomplete, we can still clobber partial uploaders if an inconsistent
> + * remote item was also changed locally. Once the server supports atomic
> + * uploads, we can revisit this decision.

Link to bug.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:118
(Diff revision 4)
> + * last modified time, or the server last modified time of the most recent
> + * record if newer. New incoming records always replace existing records in the
> + * mirror.
> + *
> + * We delete the mirror database when the user is node reassigned, disables the
> + * bookmarks engine, or signs out.

Or if the `syncID` changes, no? Do you really mean "on local wipe"?

::: toolkit/components/places/SyncedBookmarksMirror.jsm:126
(Diff revision 4)
> +  constructor(db, { recordTelemetryEvent, finalizeAt =
> +                      AsyncShutdown.profileBeforeChange } = {}) {
> +    this.db = db;
> +    this.recordTelemetryEvent = recordTelemetryEvent;
> +
> +    // Automatically finalize the mirror on shutdown.

What happens on crash?

::: toolkit/components/places/SyncedBookmarksMirror.jsm:135
(Diff revision 4)
> +                               this.finalizeBound);
> +  }
> +
> +  /**
> +   * Sets up the mirror database connection and upgrades the mirror to the
> +   * newest schema version.

Recreates the mirror if it's corrupt. Throws on failure.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:188
(Diff revision 4)
> +
> +  /**
> +   * Returns the newer of the bookmarks collection last modified time, or the
> +   * server modified time of the newest record. The bookmarks engine uses this
> +   * timestamp as the "high water mark" for all downloaded records. Each sync
> +   * fetches and stores all records newer than this time.

Strictly speaking it's not 'newer than', it's 'at least as new as' — we must resume _at_ the high-water mark to avoid missing records. This might need a little finesse to avoid fetching redundant records if you're using `serverModified`.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:243
(Diff revision 4)
> +      "SyncedBookmarksMirror: store",
> +      db => db.executeTransaction(async () => {
> +        for (let record of records) {
> +          switch (record.type) {
> +            case "bookmark":
> +              MirrorLog.trace("Storing bookmark in mirror", record.cleartext);

This approach computes `record.cleartext` even if trace logging is disabled. Be kind to the GC: check the log level _outside_ each call to `.trace`.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:325
(Diff revision 4)
> +
> +    // It's safe to build the remote tree outside the transaction because
> +    // `RemoteBookmarkStore.fetchTree` doesn't join to Places, only Sync
> +    // writes to the mirror, and we're holding the Sync lock at this point.
> +    MirrorLog.debug("Building remote tree from mirror");
> +    let remoteTree = await this.fetchRemoteTree(remoteTimeSeconds);

Does this record telemetry? Should it?

::: toolkit/components/places/SyncedBookmarksMirror.jsm:411
(Diff revision 4)
> +    }, this.db.TRANSACTION_EXCLUSIVE);
> +
> +    let changeRecords;
> +    try {
> +      MirrorLog.debug("Fetching records for local items to upload");
> +      changeRecords = await this.fetchLocalChangeRecords();

Lemme see if I understand this, and present a failure case.

Precursor:

1. We download a bunch of records from the server. They're now in the mirror.
2. We merge them, and on line 395 we populate `itemsToUpload`.

Here on line 411 we turn those items into in-memory change records. We drop the contents of `itemsToUpload`, notify observers, and then return the change records.

Failure case A:

* Around line 408 we crash. On restarting, we download any new changes from the server, merge again, and then when we hit line 395 we will blindly attempt to insert into `itemsToUpload`, even if a previously merged record has now been remotely deleted. `itemsToUpload` will contain the mixed results of a previous merge and a subsequent merge: `to_upload(local + remote) + to_upload(local' + remote')`. That doesn't seem right to me.

Failure case B:

* After this function returns, we fail to upload. But we've already updated Places and committed the transaction, and right here we've deleted the only persistent record of the items we need to upload. Will a subsequent sync recover from this?

Can you elaborate?
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review221416

As markh said, this is very complicated (probably no moreso than necessary), but well written and has good tests. I think most of us are a little uncomfortable landing something as complex as this, but I think it is fine to land prefed off for now, to allow it to get thorough testing from QA (and us -- after this lands we probably should all enable it as dogfooding) before enabling it by default.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:436
(Diff revisions 1 - 4)
> -             h.url, v.guid
> -      FROM itemsToRemove v
> +             h.url, v.guid, v.isUntagging
> +      FROM itemsRemoved v
>        LEFT JOIN moz_places h ON h.id = v.placeId
>        /* Sort deleted children before parents, to ensure that we update caches
>           in the correct order (bug 1297941). */
> -      ORDER BY v.level DESC`);
> +      ORDER BY v.level DESC, v.parentId, v.position`);

Can you to the comment and say that the rest of the ordering (e.g. `parentId` and `position`) is so that notifications are well ordered for tests?

I'd also almost argue for pulling it out of the sql since it applies to the queries below too, and it's more visible if it's syntax highlighed like a comment.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:1173
(Diff revisions 1 - 4)
>      for (let chunk of PlacesSyncUtils.chunkArray(guidsToDelete,
>        SQLITE_MAX_VARIABLE_NUMBER)) {
>  
> +      let levelsValues = chunk.map(({ level }) => `(?, ${level})`).join(",");
> +      let guids = chunk.map(({ guid }) => guid);
> +      let guidsParams = new Array(guids.length).fill("?").join(",");

Nit: move guidsParams to just before the first query (initially it looked like you were going to use this inside the same query as `levelsValues`, query also, which would have meant you need to chunk by `MAX_VARIABLE_NUMBER/2` instead)
Attachment #8938967 - Flags: review?(tchiovoloni) → review+
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review221424

> It would be nice to see a couple of lines explaining the state we're in if a failure occurs at any of these points,  how retrying will get us to a consistent state, and how we're reliant on atomic uploads for doing that correctly.

There's a more thorough explanation above `BufferedBookmarksEngine#_onRecordsWritten` in part 2 of this patch, I'll point to it here and add some details.

> What happens on crash?

The shutdown blocker doesn't run, we rely on the OS to close the database file, and, if we have unfinished writes, SQLite will recover when we restart and reopen the connection.

> Strictly speaking it's not 'newer than', it's 'at least as new as' — we must resume _at_ the high-water mark to avoid missing records. This might need a little finesse to avoid fetching redundant records if you're using `serverModified`.

Good catch! I think the `MAX` does this correctly, but the comment is wrong. This does use the record `serverModified` time (or the collection modified timestamp, if it's newer; we just might not have fast-forward it because the sync was interrupted while we were fetching records).

> This approach computes `record.cleartext` even if trace logging is disabled. Be kind to the GC: check the log level _outside_ each call to `.trace`.

`record.cleartext` is a static property: https://searchfox.org/mozilla-central/rev/062e1cf551f5bf3f0af33671b818f75a55ac497b/services/sync/modules/record.js#112 We use the `deferGetSet` magic for properties of the cleartext, but not the cleartext itself.

> Does this record telemetry? Should it?

Telemetry for what? We emit event telemetry as we're merging, and rely on our validator to report problems for dashboards, as we do today. We could extend `fetchRemoteOrphans` to do more of this validation eventually, though.

> Lemme see if I understand this, and present a failure case.
> 
> Precursor:
> 
> 1. We download a bunch of records from the server. They're now in the mirror.
> 2. We merge them, and on line 395 we populate `itemsToUpload`.
> 
> Here on line 411 we turn those items into in-memory change records. We drop the contents of `itemsToUpload`, notify observers, and then return the change records.
> 
> Failure case A:
> 
> * Around line 408 we crash. On restarting, we download any new changes from the server, merge again, and then when we hit line 395 we will blindly attempt to insert into `itemsToUpload`, even if a previously merged record has now been remotely deleted. `itemsToUpload` will contain the mixed results of a previous merge and a subsequent merge: `to_upload(local + remote) + to_upload(local' + remote')`. That doesn't seem right to me.
> 
> Failure case B:
> 
> * After this function returns, we fail to upload. But we've already updated Places and committed the transaction, and right here we've deleted the only persistent record of the items we need to upload. Will a subsequent sync recover from this?
> 
> Can you elaborate?

Sure thing! Expanding on my comments from IRC...

`itemsToUpload` is a `TEMP TABLE`, and we use `PRAGMA temp_store = MEMORY` (Places sets it in https://searchfox.org/mozilla-central/rev/062e1cf551f5bf3f0af33671b818f75a55ac497b/toolkit/components/places/Database.cpp#1008-1010, and we inherit it by way of https://searchfox.org/mozilla-central/rev/062e1cf551f5bf3f0af33671b818f75a55ac497b/storage/mozStorageConnection.cpp#1575 when we clone the connection), so, if we crash, its contents will vanish.

On restart, we'll recreate the table in `initializeTempMirrorEntities`, so it won't have any items. (Also, we're doing all this in the merge transaction; if anything throws, we'll roll back, so items we added to `itemsToUpload` will be dropped, too).

For case A, `itemsToUpload` will never contain the results of the previous merge. A crash or `finalize` call means we'll drop the table, and a successful merge that commits the transaction will drop the contents thanks to `DELETE FROM itemsToUpload`, which we also execute as part of the transaction.

For case B, we'll recover correctly. `itemsToUpload` is not the persistent record of the items we need to upload; the `syncChangeCounter` in Places is. (Conceptually, I think of `itemsToUpload` as a view of the changed bookmarks in Places, that's friendlier for Sync to query). On the subsequent sync, we'll download new items, merge them, and write everything with `syncChangeCounter >= 1` into `itemsToUpload`, which includes everything we tried to upload during the last sync.
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review221416

> Can you to the comment and say that the rest of the ordering (e.g. `parentId` and `position`) is so that notifications are well ordered for tests?
> 
> I'd also almost argue for pulling it out of the sql since it applies to the queries below too, and it's more visible if it's syntax highlighed like a comment.

Good call. I went with a compromise and factored the comment out of the query, here and above other queries in this function.
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review221424

> Good catch! I think the `MAX` does this correctly, but the comment is wrong. This does use the record `serverModified` time (or the collection modified timestamp, if it's newer; we just might not have fast-forward it because the sync was interrupted while we were fetching records).

I'm wrong, you caught a subtle bug here, too: if the collection last modified time is newer, we want to use it, since we know we're caught up when we set it. If we have a *record* that's newer than the collection modified time, we want to use `MAX(serverModified) - 1000`, so that we don't miss records that we didn't see yet, but that have the same `serverModified` time as the most recent one we downloaded. (I'm picking 1 second because the server uses seconds, and the Python docs say "not all systems provide time with a better precision than 1 second").

This means we'll end up downloading those records again (maybe an entire batch, since all records committed in a batch have the same modified time...I think), but that's better than missing records.
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review221424

> I would prefer to see this fetched from SQLite each time we're about to do some work:
> 
> `sqlite3_limit(db, SQLITE_LIMIT_VARIABLE_NUMBER)`
> 
> — it means this code doesn't need to change if we run against a system SQLite, or if we build our SQLite with different parameters.

Sorry for butting in, it's worth noting places already uses 999 as the value in several locations: https://searchfox.org/mozilla-central/search?q=SQLITE_MAX_VARIABLE_NUMBER&path=,so  doesn't exactly seem like this is the right bug to do it (e.g. if the system sqlite's limit is \<999, we're broken in a few places whether or not this patch is applied).

I think that getting the value at runtime would be the responsibility of places anyway (it's a c function, so we can't exactly call it from sync without undue pain).
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review221424

> It's probably worth mentioning that the mirror _maintains_ a copy of the complete tree as stored on the Sync server. It's not transient.

I'll plagiarize you here. :-) "A mirror maintains a copy of the complete tree as stored on the Sync server. It is persistent."

> Sorry for butting in, it's worth noting places already uses 999 as the value in several locations: https://searchfox.org/mozilla-central/search?q=SQLITE_MAX_VARIABLE_NUMBER&path=,so  doesn't exactly seem like this is the right bug to do it (e.g. if the system sqlite's limit is \<999, we're broken in a few places whether or not this patch is applied).
> 
> I think that getting the value at runtime would be the responsibility of places anyway (it's a c function, so we can't exactly call it from sync without undue pain).

I think Thom said everything I wanted to. :-)

In principle, this is doable: we'd need to expose `sqlite3_limit` via XPIDL on `mozIStorageConnection`, and also add an API to Sqlite.jsm. In practice, the cat's out of the bag, a hard-coded limit is how we currently chunk in Places (where we chunk at all...in other parts, we use `JSON.stringify` to escape SQL), and we should unify and fix all the callers in a follow-up bug.

I think an even better fix is to add a way to bind arrays to mozStorage and Sqlite.jsm. Bug 483318.

> Or if the `syncID` changes, no? Do you really mean "on local wipe"?

Exactly. We call the method that does this `resetClient`, so I'll use the "client reset" language here.
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review221494

::: toolkit/components/places/SyncedBookmarksMirror.jsm:74
(Diff revision 4)
> +// 1375896).
> +const DB_URL_LENGTH_MAX = 65536;
> +const DB_TITLE_LENGTH_MAX = 4096;
> +const DB_DESCRIPTION_LENGTH_MAX = 256;
> +
> +const SQLITE_MAX_VARIABLE_NUMBER = 999;

That's fair. File a follow-up to make this general?

::: toolkit/components/places/SyncedBookmarksMirror.jsm:188
(Diff revision 4)
> +
> +  /**
> +   * Returns the newer of the bookmarks collection last modified time, or the
> +   * server modified time of the newest record. The bookmarks engine uses this
> +   * timestamp as the "high water mark" for all downloaded records. Each sync
> +   * fetches and stores all records newer than this time.

Yep, we'll sometimes redownload if we use a record timestamp, because we might not have seen all the records with that timestamp…

::: toolkit/components/places/SyncedBookmarksMirror.jsm:243
(Diff revision 4)
> +      "SyncedBookmarksMirror: store",
> +      db => db.executeTransaction(async () => {
> +        for (let record of records) {
> +          switch (record.type) {
> +            case "bookmark":
> +              MirrorLog.trace("Storing bookmark in mirror", record.cleartext);

You're right! I was thinking of `cleartextJSONString`.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:325
(Diff revision 4)
> +
> +    // It's safe to build the remote tree outside the transaction because
> +    // `RemoteBookmarkStore.fetchTree` doesn't join to Places, only Sync
> +    // writes to the mirror, and we're holding the Sync lock at this point.
> +    MirrorLog.debug("Building remote tree from mirror");
> +    let remoteTree = await this.fetchRemoteTree(remoteTimeSeconds);

I mean old-skool timing of building the tree from the mirror. It would be interesting to know how much CPU time we burn on this, particularly correlated with number of records.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:411
(Diff revision 4)
> +    }, this.db.TRANSACTION_EXCLUSIVE);
> +
> +    let changeRecords;
> +    try {
> +      MirrorLog.debug("Fetching records for local items to upload");
> +      changeRecords = await this.fetchLocalChangeRecords();

That clarifies a lot! Thanks! Hard to get a whole picture of this, even with your excellent comments.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:619
(Diff revision 4)
> +      }
> +    );
> +  }
> +
> +  async storeRemoteBookmark(record, { needsMerge }) {
> +    let guid = validateGuid(record.id);

I assume that this validation is fairly weak, right? I'm thinking specifically of those misbehaving partner clients…

Or did you reject handling that case?

::: toolkit/components/places/SyncedBookmarksMirror.jsm:907
(Diff revision 4)
> +    // First, build a flat mapping of parents to children. The `LEFT JOIN`
> +    // includes items orphaned by an interrupted upload on another device.
> +    // We keep the orphans in "unfiled" until the other device returns and
> +    // uploads the missing parent.
> +    let itemRows = await this.db.execute(`
> +      SELECT v.guid, IFNULL(s.parentGuid, :unfiledGuid) AS parentGuid,

Elsewhere in this file you use string substitution to put the built-in GUIDs into the query. Perhaps do so here, too.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:1105
(Diff revision 4)
> +  /**
> +   * Builds a temporary table with the merge states of all nodes in the merged
> +   * tree, rewrites tag queries, and updates Places to match the merged tree.
> +   *
> +   * Conceptually, we examine the merge state of each item, and either keep the
> +   * complete local state, takes the complete remote state, or apply a new

s/takes/take

::: toolkit/components/places/SyncedBookmarksMirror.jsm:1353
(Diff revision 4)
> +        siteURLAnno: PlacesUtils.LMANNO_SITEURI,
> +        valueState: BookmarkMergeState.TYPE.REMOTE });
> +
> +    // Record tag folder names for tag queries. Parsing query URLs one by one
> +    // is inefficient, but queries aren't common today, and we can remove this
> +    // logic entirely once bug 1293445 lands.

Could you file follow-ups for these "we can do X when bug Y lands" things, tracking the dependencies?

::: toolkit/components/places/SyncedBookmarksMirror.jsm:1451
(Diff revision 4)
> +              // Desktops and Android use `parentid` as the canonical parent.
> +              // iOS is stricter and requires both `children` and `parentid` to
> +              // match.
> +              parentid: parentRecordId,
> +              // Older Desktops use `hasDupe` and `parentName` for deduping.
> +              hasDupe: false,

I think this is a good idea and worth investigating for a few minutes.

::: toolkit/components/places/SyncedBookmarksMirror.jsm:1597
(Diff revision 4)
> +}
> +
> +this.SyncedBookmarksMirror = SyncedBookmarksMirror;
> +
> +/** Synced item kinds. Each corresponds to a Sync record type. */
> +SyncedBookmarksMirror.KIND = {

It's like you're writing Rust!

::: toolkit/components/places/SyncedBookmarksMirror.jsm:1617
(Diff revision 4)
> + * tree is inconsistent.
> + */
> +SyncedBookmarksMirror.ConsistencyError =
> +  class ConsistencyError extends Error {};
> +
> +// Indicates whether the mirror should be replaced because the database file

Are you concerned about recovery from a locked database file? (E.g., a zombie Firefox or an orphan lockfile)

::: toolkit/components/places/SyncedBookmarksMirror.jsm:2371
(Diff revision 4)
> +function validateKeyword(rawKeyword) {
> +  if (typeof rawKeyword != "string") {
> +    return null;
> +  }
> +  let keyword = rawKeyword.trim();
> +  return keyword ? keyword.toLowerCase() : null;

Perhaps add a comment to make this explicit:

```
// Drop empty keywords.
```

::: toolkit/components/places/SyncedBookmarksMirror.jsm:2530
(Diff revision 4)
> +   */
> +  static fromLocalRow(row, localTimeSeconds) {
> +    let guid = row.getResultByName("guid");
> +
> +    // Note that this doesn't account for local clock skew. `localModified`
> +    // is in *microseconds*.

So `lastModified` in `meta` is in milliseconds, and `lastModified` in Places is in microseconds. It's so scary! :D

::: toolkit/components/places/SyncedBookmarksMirror.jsm:3809
(Diff revision 4)
> +    this.cleartext = cleartext;
> +    this.synced = false;
> +  }
> +}
> +
> +// In conclusion, this is why bookmark syncing is hard.

We should get tattoos.

::: toolkit/components/places/tests/sync/head_sync.js:1
(Diff revision 4)
> +const { utils: Cu, interfaces: Ci, classes: Cc, results: Cr } = Components;

Check other files, see if this needs a license block/disclaimer.
Attachment #8938967 - Flags: review?(rnewman) → review+
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review221494

> I mean old-skool timing of building the tree from the mirror. It would be interesting to know how much CPU time we burn on this, particularly correlated with number of records.

Oh, I see, that would be useful. It'd be nice to use `performance.now()` for this, but it's not exposed to chrome JS. I asked about it in #content, and, in the meantime, used old-fashioned `Date.now()` to measure and emit event telemetry for times and counts.

> That clarifies a lot! Thanks! Hard to get a whole picture of this, even with your excellent comments.

I expanded the comment above `stageItemsToUpload`. Ta!

> I assume that this validation is fairly weak, right? I'm thinking specifically of those misbehaving partner clients…
> 
> Or did you reject handling that case?

It's weak, yes. We'll end up ignoring GUIDs that Places doesn't think are valid from misbehaving clients. (Today, we'll throw and prevent you from syncing, so it's somewhat better). Places validates these elsewhere, too, and we decided the consequences of letting a bad GUID or URL (as in bug 1425764) sneak into the database were worse than diverging from the server.

> Elsewhere in this file you use string substitution to put the built-in GUIDs into the query. Perhaps do so here, too.

Hmm, I do use the bound version in more places, just string substitution in the triggers because those don't support parameters. Including them as strings makes these queries a bit more unwieldy, so I'd prefer to leave them as params, but I could be convinced otherwise. Are you concerned about consistency, accidental SQL injections, other things?

> Could you file follow-ups for these "we can do X when bug Y lands" things, tracking the dependencies?

Bug 1433368.

> Are you concerned about recovery from a locked database file? (E.g., a zombie Firefox or an orphan lockfile)

Nice catch! Not here, but we do want the engine to recover from temporary failures like this. I changed `BufferedBookmarksStore#ensureOpenMirror` in part 2 to clear out its state on error, so that it can retry on the next sync.

> So `lastModified` in `meta` is in milliseconds, and `lastModified` in Places is in microseconds. It's so scary! :D

Yes. The Sync server thinks in seconds, the mirror and JS dates think in milliseconds, and Places thinks in microseconds with millisecond precision. This is not going to be a source of subtle bugs at all... :-P

> Check other files, see if this needs a license block/disclaimer.

The guidance I got in bug 1258127, comment 306 is that it's optional for test files.
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review219608

> should we have a bug on file for these? (I'm guessing we will end up with a bug to enable this new magic by default, so now seems like a reasonable time to open it and to capture dependences - having them captured in this bug doesn't seem ideal)

Bug 1433177.
Comment on attachment 8938967 [details]
Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks.

https://reviewboard.mozilla.org/r/209406/#review221424

> Link to bug.

Bug 1433512.
TPS is green, with and without the pref set. \o/ Let's land this thing.

http://shears.yakshaving.ninja/mriya.gif

Many thanks to Mak, Richard, Mark, and Thom for their guidance, thorough reviews, and patience!
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s f81de1f4f461558e885f8c5813933711a9586f20 -d 668941ef2a95: rebasing 444305:f81de1f4f461 "Bug 1305563 - Add a bookmark mirror and two-way merger for synced bookmarks. r=mak,markh,rnewman,tcsc"
merging tools/lint/eslint/modules.json
rebasing 444306:0987607414b7 "Bug 1305563 - Add a `BufferedBookmarksEngine` that can be toggled with a pref. r=markh,tcsc" (tip)
merging services/sync/modules/engines.js
merging services/sync/tests/unit/test_bookmark_engine.js
merging services/sync/tests/unit/test_telemetry.js
merging tools/lint/eslint/modules.json
warning: conflicts while merging services/sync/tests/unit/test_bookmark_engine.js! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
Pushed by kcambridge@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3b57cf8b3767
Add a bookmark mirror and two-way merger for synced bookmarks. r=mak,markh,rnewman,tcsc
https://hg.mozilla.org/integration/autoland/rev/7e6ae9bac16d
Add a `BufferedBookmarksEngine` that can be toggled with a pref. r=markh,tcsc
https://hg.mozilla.org/mozilla-central/rev/3b57cf8b3767
https://hg.mozilla.org/mozilla-central/rev/7e6ae9bac16d
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 60
\o/ - nice work Kit! Let the testing begin :)
Sync is still awfully slow for the initial sync, similar to the test I did in Bug 1426103 comment 17 :
Testing with Nightly build id 20180203220331 on Windows 10 64-bit, For 2238 bookmarks it took 4 minutes for the first/sending sync, 11-12 minutes for the second/receiving device sync. 
I set the preference services.sync.engine.bookmarks.buffer to true before logging in to Sync(before each sync) to test the new engine.
I have 10-20 bookmarks on the bookmarks menu and the bookmarks toolbar, and 4 folders in the bookmarks menu, with 471 to 600 bookmarks in each folder.
Hi there, I'm sorry I didn't follow up with you on bug 1426103. I filed bug 1433806 for bookmarks mirror perf and profiling, let's move our discussion there.

(In reply to nivtwig from comment #218)
> I set the preference services.sync.engine.bookmarks.buffer to true before
> logging in to Sync(before each sync) to test the new engine.

You're brave. :-) It goes without saying, but, for other folks following along on this bug, please make sure you regularly back up your `places.sqlite`. In theory, this won't trash Places; I've been using the new engine as my default for the past week, and the bugs we've fixed so far have been in the merger. Follow bug 1433177 if you're interested. We'll put up a post on the Firefox Nightly Blog and encourage more people to dogfood once this gets a QA pass and we fix more of the obvious issues.

> I have 10-20 bookmarks on the bookmarks menu and the bookmarks toolbar, and
> 4 folders in the bookmarks menu, with 471 to 600 bookmarks in each folder.

My tree is somewhat similar: just under 3500 bookmarks, about 100 in the toolbar and subfolders, over 2k in the menu, and the remainder in menu subfolders.

If you haven't already, could you please enable "Debug" logging for about:sync-log (https://addons.mozilla.org/en-US/firefox/addon/about-sync/ is easiest; it'll also give you validation results, which I'd be very curious to see for your full 35k tree), and paste the times here? (Side question: do you notice Firefox freezing or janking during the first sync at all?)

Don't enable "Trace" logging, it'll dump the full tree structure, which comes out to about 4 MB/log for me. I can only imagine what it will do for a tree with 35k. :-)

Here are my times:

> 1517726197459 Sync.Collection DEBUG GET success 200 https://sync-542-us-west-2.sync.services.mozilla.com/1.5/82835063/storage/bookmarks?full=1&sort=index&limit=1000
> 1517726197481 Sync.BrowserIDManager DEBUG _ensureValidToken already has one
> 1517726197864 Sync.Collection DEBUG GET success 200 https://sync-542-us-west-2.sync.services.mozilla.com/1.5/82835063/storage/bookmarks?full=1&sort=index&limit=1000&offset=1000
> 1517726197875 Sync.BrowserIDManager DEBUG _ensureValidToken already has one
> 1517726198279 Sync.Collection DEBUG GET success 200 https://sync-542-us-west-2.sync.services.mozilla.com/1.5/82835063/storage/bookmarks?full=1&sort=index&limit=1000&offset=2000
> 1517726198291 Sync.BrowserIDManager DEBUG _ensureValidToken already has one
> 1517726198476 Sync.Collection DEBUG GET success 200 https://sync-542-us-west-2.sync.services.mozilla.com/1.5/82835063/storage/bookmarks?full=1&sort=index&limit=1000&offset=3000
> 1517726222871 Sync.Engine.Bookmarks INFO  Records: 3493 applied, 3493 successfully, 0 failed to apply, 0 newly failed to apply, 0 reconciled.

26 seconds to download, decrypt, and store all my bookmarks in the buffer. I think we can do better here; see bug 1435588.

Merging and application:

> 1517726222876 Sync.Engine.Bookmarks.Mirror  DEBUG Building remote tree from mirror
> 1517726222962 Sync.Telemetry  DEBUG recording event: {"object":"mirror","method":"fetch","value":"remoteTree","extra":{"time":"85.97415599999658","count":"3493"}}
> 1517726222963 Sync.Engine.Bookmarks.Mirror  DEBUG Building local tree from Places
> 1517726222965 Sync.Telemetry  DEBUG recording event: {"object":"mirror","method":"fetch","value":"localTree","extra":{"time":"2.063951999996789","count":"16"}}
> 1517726222965 Sync.Engine.Bookmarks.Mirror  DEBUG Fetching content info for new mirror items
> 1517726223053 Sync.Telemetry  DEBUG recording event: {"object":"mirror","method":"fetch","value":"newRemoteContents","extra":{"time":"87.41916800000763","count":"3476"}}
> 1517726223054 Sync.Engine.Bookmarks.Mirror  DEBUG Fetching content info for new Places items
> 1517726223055 Sync.Telemetry  DEBUG recording event: {"object":"mirror","method":"fetch","value":"newLocalContents","extra":{"time":"1.1903730000049109","count":"12"}}
> 1517726223055 Sync.Engine.Bookmarks.Mirror  DEBUG Building complete merged tree
> 1517726223110 Sync.Telemetry  DEBUG recording event: {"object":"mirror","method":"merge","value":"dupe"}
> 1517726223110 Sync.Telemetry  DEBUG recording event: {"object":"mirror","method":"merge","value":"structure","extra":{"type":"new"}}
> 1517726223110 Sync.Telemetry  DEBUG recording event: {"object":"mirror","method":"merge","value":"structure","extra":{"type":"new"}}
> 1517726223111 Sync.Telemetry  DEBUG recording event: {"object":"mirror","method":"merge","value":"structure","extra":{"type":"new"}}
> 1517726223115 Sync.Engine.Bookmarks.Mirror  DEBUG Applying merged tree
> 1517726223116 Sync.Engine.Bookmarks.Mirror  DEBUG Setting up merge states table
> 1517726223215 Sync.Engine.Bookmarks.Mirror  DEBUG Rewriting tag queries in mirror
> 1517726223222 Sync.Engine.Bookmarks.Mirror  DEBUG Inserting new URLs into Places
> 1517726224899 Sync.Engine.Bookmarks.Mirror  DEBUG Updating value states for local bookmarks
> 1517726225777 Sync.Engine.Bookmarks.Mirror  DEBUG Updating structure states for local bookmarks
> 1517726225805 Sync.Engine.Bookmarks.Mirror  DEBUG Removing remotely deleted items from Places
> 1517726225837 Sync.Engine.Bookmarks.Mirror  DEBUG Flagging remotely deleted items as merged
> 1517726225837 Sync.Engine.Bookmarks.Mirror  DEBUG Recording observer notifications
> 1517726225837 Sync.Engine.Bookmarks.Mirror  DEBUG Recording observer notifications for removed items
> 1517726225838 Sync.Engine.Bookmarks.Mirror  DEBUG Recording observer notifications for changed GUIDs
> 1517726225843 Sync.Engine.Bookmarks.Mirror  DEBUG Recording observer notifications for new items
> 1517726227178 Sync.Engine.Bookmarks.Mirror  DEBUG Recording observer notifications for moved items
> 1517726227181 Sync.Engine.Bookmarks.Mirror  DEBUG Recording observer notifications for changed items
> 1517726227184 Sync.Engine.Bookmarks.Mirror  DEBUG Recording observer notifications for changed annos
> 1517726227190 Sync.Engine.Bookmarks.Mirror  DEBUG Recording notifications for changed keywords
> 1517726227192 Sync.Engine.Bookmarks.Mirror  DEBUG Staging locally changed items for upload
> 1517726228345 Sync.Engine.Bookmarks.Mirror  DEBUG Fetching records for local items to upload
> 1517726228393 Sync.Engine.Bookmarks.Mirror  DEBUG Replaying recorded observer notifications
> 1517726228393 Sync.Engine.Bookmarks.Mirror  DEBUG Notifying bookmark observers
> 1517726231033 Sync.Engine.Bookmarks.Mirror  DEBUG Notifying anno observers

9 seconds to build the local and remote trees, merge them, stage changes, apply to Places (906ms to fire the merge triggers is better than I was expecting for my tree; those are the "Updating {value, structure} states for local bookmarks" entries).

Definitely not in the range of minutes that you're seeing, though. :-/ This is also on a 2013 MBA with an SSD, running macOS 10.12.6. I've heard Windows has slower I/O, but not "two orders of magnitude" slower.
(In reply to Kit Cambridge (they/them) [:kitcambridge] from comment #219)
> (In reply to nivtwig from comment #218)
> > I set the preference services.sync.engine.bookmarks.buffer to true before
> > logging in to Sync(before each sync) to test the new engine.
> 
> You're brave. :-) It goes without saying, but, for other folks following
> along on this bug, please make sure you regularly back up your
> `places.sqlite`. In theory, this won't trash Places; I've been using the new

Of course, I backed up my bookmarks :) 
I had ~40000, and the 2238 bookmarks I tested with, are just a subset for testing the "average" case, after I backed up, and deleted the rest.

> If you haven't already, could you please enable "Debug" logging for
> about:sync-log (https://addons.mozilla.org/en-US/firefox/addon/about-sync/
> is easiest; it'll also give you validation results, which I'd be very
> curious to see for your full 35k tree), and paste the times here? (Side
> question: do you notice Firefox freezing or janking during the first sync at
> all?)

The times and more observations related to performance are in Bug 1433806 comment 2 

In addition I found a possible bug :
I set the bookmarks engine to the buffer engine by setting the pref services.sync.engine.bookmarks.buffer to true, but I noticed that it was suddenly/automatically reverted back to false several times, after some time. After a little investigation I found that disconnecting from Sync resets it back to false. It seems like a bug, the engine selection should be kept.
(In reply to Kit Cambridge (they/them) [:kitcambridge] from comment #219)
> If you haven't already, could you please enable "Debug" logging for
> about:sync-log (https://addons.mozilla.org/en-US/firefox/addon/about-sync/
> is easiest; it'll also give you validation results, which I'd be very
> curious to see for your full 35k tree), and paste the times here? (Side

I forgot to mention that I didn't find how or where to get the validation results.

I will try my full ~40k (currently) tree later, because it will probably take a lot of time, when I can already show with 2238 bookmarks (close to the average case) that it takes 12 minutes for a Sync (although it probably is slow due to the fact my original bookmarks were ~40000, most of them deleted to make the 2238 bookmarks tree, but the server still keeps deleted records, which slows the Sync process )
(In reply to nivtwig from comment #220)
> In addition I found a possible bug :
> I set the bookmarks engine to the buffer engine by setting the pref
> services.sync.engine.bookmarks.buffer to true, but I noticed that it was
> suddenly/automatically reverted back to false several times, after some
> time. After a little investigation I found that disconnecting from Sync
> resets it back to false. It seems like a bug, the engine selection should be
> kept.

By design, all prefs under services.sync.* are reset when Sync is disconnected. This can be a PITA as, eg, logging prefs are also reset, but it's the safest option for the vast majority of users, so advanced users just need to be aware of that.

IOW, that's not a bug.
No longer blocks: 1414435
Blocks: 1702686
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: