Closed Bug 676563 Opened 13 years ago Closed 7 years ago

Bookmarks sync does not sync bookmark timestamps

Categories

(Firefox :: Sync, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
Tracking Status
firefox55 --- fixed

People

(Reporter: aelilea, Assigned: tcsc)

References

(Depends on 1 open bug, )

Details

(Keywords: dev-doc-needed, Whiteboard: [sync-engine-addition][sync:bookmarks][sync-activity-stream])

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux i686; rv:7.0a2) Gecko/20110804 Firefox/7.0a2
Build ID: 20110804042002

Steps to reproduce:

Imported bookmarks to new device via sync


Actual results:

The "added" timestamps on the imported bookmarks is set to the time of import, forgetting the original bookmark date. Similarly for "Last modified".


Expected results:

That the "Last modified" timestamp is changed on import from sync arguably makes some sense. But imho the bookmark creation timestamp should be synced, to make it possible to order/search by creation date.
If the sync philosophy is "all devices hold the same data", then the "Last modified" timestamp should probably be synced too, making the fact a bookmark has been synced invisible.
Component: Bookmarks & History → Firefox Sync: Backend
Product: Firefox → Mozilla Services
QA Contact: bookmarks → sync-backend
Version: 7 Branch → unspecified
Status: UNCONFIRMED → NEW
Ever confirmed: true
Severity: normal → enhancement
OS: Linux → All
Hardware: x86 → All
Whiteboard: [sync-engine-addition]
Assignee: nobody → jarguello
Assignee: jarguello → nobody
Whiteboard: [sync-engine-addition] → [sync-engine-addition][sync:bookmarks]
This is a mild form of data loss: after using Sync, it's no longer possible to usefully sort bookmarks by date.
It's worth noting that creation times stored on creating devices use the device's local clock, which is wrong surprisingly often.

The naïve approach of just adding desktop or Android's creation timestamp as a field will lead to a number of surprises, like bookmarks created in 1970, or far in the future, or just off by a few hours/days/weeks/months.
See Also: → 1299914
(In reply to Richard Newman [:rnewman] from comment #4)
> The naïve approach of just adding desktop or Android's creation timestamp as
> a field will lead to a number of surprises, like bookmarks created in 1970,
> or far in the future, or just off by a few hours/days/weeks/months.

ISTM that currently we *always* have a bookmark creation date that is off by some arbitrary time, with different devices reporting different dates for the same bookmark. That doesn't seem much better than *possibly* arbitrary dates that are at least consistent when you look at the same bookmark on different devices.

IOW, I don't see how using the local device timestamp is going to be worse in practice than using now().
Depends -- if timestamps are allocated at moment of sync, then on a device with a correct clock, starting from empty, what we have now is pretty good! The modified time of the record will usually be within a couple of seconds of bookmarking.

Where we're currently really let down is in connecting to an already populated account.

Options are some combination of:

* Transmit our local creation time to other devices as part of the record. It will sometimes be wrong. Needs a format change.
  * Optionally: track clock skew as we do already on a per-protocol basis.
  * Optionally: use the local creation time, but proactively encourage users to fix their clocks.
* Fetch a server timestamp and reupload when a page is bookmarked. Needs a format change, expensive, doesn't help with old bookmarks.
* Use the server timestamp when we download. It will often be within a second of the real time, thanks to instant sync. No format change.

Whether it's better for some users to have a misleading date (because your phone clock is wrong and you've never noticed), or all users to have no date… well, I'd rather reject the dichotomy!

Perhaps a decent combination is:

* Add creation times to the record on first upload.
* On subsequent downloads, if the record has a creation date later than the server's modified time, fix the record, and use the fixed time.
Whiteboard: [sync-engine-addition][sync:bookmarks] → [sync-engine-addition][sync:bookmarks][sync-activity-stream]
Priority: -- → P2
Priority: P2 → P1
Assignee: nobody → tchiovoloni
Adding dev-doc-needed as a blunt way of noting that Thom needs to file a PR on

https://github.com/mozilla-services/docs/blob/master/source/sync/objectformats.rst

when fixing this bug!
Status: NEW → ASSIGNED
Keywords: dev-doc-needed
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review96258

This looks great to me, but rnewman is an encyclopedia of sync edge cases, so I'll ask him for review too.

::: services/sync/modules/engines/bookmarks.js:130
(Diff revision 1)
>        syncId: this.id,
>        parentSyncId: this.parentid,
>      };
> +    if (this.dateAdded &&
> +        typeof this.dateAdded === "number" &&
> +        this.dateAdded < Date.now()) {

see comment below - while it is almost certainly impossible it would correctly be exactly equal to Date.now(), I wonder if using <= here would make things more consistent?  (Assuming that comment below is indeed sane)

::: toolkit/components/places/PlacesUtils.jsm:290
(Diff revision 1)
>      return v;
>    },
>    keyword: simpleValidateFunc(v => v === null || typeof v == "string"),
>    description: simpleValidateFunc(v => v === null || typeof v == "string"),
>    loadInSidebar: simpleValidateFunc(v => v === true || v === false),
> +  dateAdded: simpleValidateFunc(v => typeof v === "number" && v < Date.now()),

I wonder if this should be <= Date.now() - seems extremely unlikely, but possible (especially on Windows) that this might be the same tick as it was actually created locally.

(It's probably impossible for that to be true today, but might be in the future)
Attachment #8815096 - Flags: review?(markh) → review+
Attachment #8815096 - Flags: review?(rnewman)
See Also: → 1267213
It's probably worth noting that the approach taken here is 

1. to use the local timestamp (which will often be wrong).
2. to only respect this for insertion, and ignore it for update (and even then only use it if it's not in the future). The rationale here is that it's not clear that an update is ever going to be the "right" value, and it seems more likely that it's a confused client.

This seems suboptimal, but using a server timestamp would make me really uncomfortable unless we stored it somewhere other than dateAdded (Maybe this is unfounded?).
To talk around the issue a little:

Server timestamps are _more_ trustworthy than local timestamps, assuming the server event happens close to the local event. The reasons we tend not to use them are: because a modification to a record changes the timestamp; and a record might be uploaded long after the originating local event.

Sync's traditional avoidance of comparing server to local is that we might land on the wrong side of a conflict.

Those are not really concerns for us here. Indeed, from one perspective introducing server time _improves_ the quality of our data.


A bookmark will get a local creation timestamp on its originating device.

The record will receive a server modified timestamp when first uploaded.

It'll then receive new, later modified timestamps when changed.

We can use the modification timestamp to check and bound creation timestamps.


Let's assume that bookmarks were created no earlier than 23 years ago (NCSA Mosaic) at 755112174098.

A new device looking at an existing (pre-this-bug) record knows for sure that the bookmark was created some time in the window (755112174098, modified]. That's a pretty big window, but it's better than "dunno".

Given that we don't know _any_ creation timestamp, it's perhaps reasonable to take the modified time as "Created before" -- and I think that's a reasonable interpretation for "dateAdded".

If the bookmark was created on a device that was already connected to Sync, and didn't change since, then this modified time will be only milliseconds after the bookmark was actually created.


Each time we get a later modified time, we know we can ignore it: it's later.


If we see a record without a creation time, we mark it for "weak reupload".

If another device has an earlier creation timestamp, it will eventually replace the one in the record and reupload.

In this way we crawl back to the earliest timestamp of any connected devices.

* If we no longer have the device that originally uploaded the bookmarks, we'll never get the real creation time, so I think a heuristic is acceptable. These bookmarks will show up as 'added' when first uploaded.
* If we do, and its clock is slow, we'll get that device's incorrectly-early timestamp eventually.
* If we do, and its clock is fast, we'll get the server's more accurate timestamp, which is great!
* If we do, and its clock is accurate, eventually all devices will have that correct timestamp.


How do we avoid errors?

* A remote creation timestamp later than the server modified time (or the record creation time from another device) is obviously wrong, and we'll ignore it; the uploading client has a bad clock.
* A creation timestamp before 755112174098 is too early to be realistic, so we'll ignore that, too.


The only downsides here:

* You can end up with a mix of timestamps from different machines and the server, so it's possible to end up with a reordered 'sequence' of bookmarking activity. Right now you can't even get _any_ sequence, so this is better, and timestamps from different machines are a reality of this bug.
* The meaning of "dateAdded" is weakened from "this is the exact local clock time on which this record was created" to "this is the upper bound on a wall clock of when this record was created", and it can now change as more information comes to light. This is the tradeoff we make to go from "dunno" to "best guess".
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review96592

::: services/sync/modules/engines/bookmarks.js:133
(Diff revision 1)
> +    if (this.dateAdded &&
> +        typeof this.dateAdded === "number" &&
> +        this.dateAdded < Date.now()) {
> +      // Only include dateAdded if it's not in the future. Note that dateAdded
> +      // is only used for insert, and is ignored for update.
> +      result.dateAdded = this.dateAdded;

I think my large comment explains some more logic I'd like to see added here and elsewhere, but right here you should also check a lower bound for the date: the release of Mosaic 0.10 on March 14th 1993.
Attachment #8815096 - Flags: review?(rnewman)
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review100572

::: services/sync/modules/engines.js:1629
(Diff revision 2)
> +        this._log.warn("Weak reupload failed", e);
> +      }
>        Observers.notify("weave:engine:sync:uploaded", counts, this.name);
> +    } else if (this._needWeakReupload.size) {
> +      try {
> +        let counts = this._weakReupload(up);

You should probably do the small amount of work (just defining `let counts = {failed: 0};`) to avoid having this duplicated `else` block.

::: services/sync/modules/engines/bookmarks.js:130
(Diff revision 2)
>        syncId: this.id,
>        parentSyncId: this.parentid,
>      };
> +    let date = this.dateAdded;
> +    let serverModifiedDate = +this.modified * 1000;
> +    if (!date || isNaN(date) || date >= serverModifiedDate) {

Add a comment: server modified time is an upper bound on bookmark creation date, and should only be hit if a client's clock was wrong at some point.

::: services/sync/modules/engines/bookmarks.js:134
(Diff revision 2)
> +    let serverModifiedDate = +this.modified * 1000;
> +    if (!date || isNaN(date) || date >= serverModifiedDate) {
> +      date = serverModifiedDate;
> +    }
> +    if (date && !isNaN(date) &&
> +        date < Date.now() &&

`<=`.

But this bit I'm not sure about. If this client's clock is wrong, this can cause us to not persist a correct remote value!

Consider a case where `serverModifiedDate > Date.now()`!

We already know the date is bounded by the server modified time, and we trust the remote clock, so I think we can simply remove this check.

::: services/sync/tests/unit/test_bookmark_engine.js:810
(Diff revision 2)
> +
> +    let record1 = await store.createRecord(item1GUID);
> +    let record2 = await store.createRecord(item2GUID);
> +
> +    equal(item1.dateAdded, record1.dateAdded, "dateAdded in past should be synced");
> +    greater(item2.dateAdded, record2.dateAdded, "dateAdded in future should be ignored");

Which is perhaps not true, per above.

::: toolkit/components/places/PlacesSyncUtils.jsm:1039
(Diff revision 2)
>        updateInfo.syncId} does not exist`);
>    }
>  
> +  if (updateInfo.hasOwnProperty("dateAdded")) {
> +    let keepDateAdded = updateInfo.dateAdded > BookmarkSyncUtils.EARLIEST_BOOKMARK_TIMESTAMP
> +                     && updateInfo.dateAdded <= Date.now()

Same check here.

This is a fundamental question: should Places store dates that are ahead of its local clock? I'd argue yes, because we acknowledge that the local clock can be wrong.

(Even consider the case of instant/push-driven syncs for a device that's only a few minutes wrong — it'll never save bookmark creation dates that arrive via Sync!)
Attachment #8815096 - Flags: review?(rnewman)
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review100572

> Same check here.
> 
> This is a fundamental question: should Places store dates that are ahead of its local clock? I'd argue yes, because we acknowledge that the local clock can be wrong.
> 
> (Even consider the case of instant/push-driven syncs for a device that's only a few minutes wrong — it'll never save bookmark creation dates that arrive via Sync!)

Hm, this (and the issues above) seem pretty tricky to me. I'll follow your advice, but I just want to mention a couple things that stand out to me as slightly dubious.

1. This will introduce a bug where a device whose clock is ahead could effectively make your "recently bookmarked" list much less useful. That is, it's items will always take priority. (This seems minor, and just spitballing a possible solution: we could use `min(lastModified, dateAdded)` to populate that list, since `lastModified` should never be in the future, even with this scheme).
2. My gut feeling says that there's a issue given that the rest of the algorithm attempts to march backwards in time using the earliest available timestamp -- I can't come up with a situation where this would actually break anything though (which is the largest part of why I don't have a problem following the advice in this issue)
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review100572

> Hm, this (and the issues above) seem pretty tricky to me. I'll follow your advice, but I just want to mention a couple things that stand out to me as slightly dubious.
> 
> 1. This will introduce a bug where a device whose clock is ahead could effectively make your "recently bookmarked" list much less useful. That is, it's items will always take priority. (This seems minor, and just spitballing a possible solution: we could use `min(lastModified, dateAdded)` to populate that list, since `lastModified` should never be in the future, even with this scheme).
> 2. My gut feeling says that there's a issue given that the rest of the algorithm attempts to march backwards in time using the earliest available timestamp -- I can't come up with a situation where this would actually break anything though (which is the largest part of why I don't have a problem following the advice in this issue)

Less useful, but not necessarily *much* less useful. Remember that we will already upper-bound to server timestamps when applying incoming records, and we will re-sync values back, so the effect of a client with a far-future clock is that all of its bookmarking activity is eventually clamped to its sync interval.

We sync bookmarks immediately, so each new bookmark is almost immediately matched up on the server with a very close server timestamp (perhaps only milliseconds later). Any other device -- inaccurate or not -- will fix the record to `min(modified, dateAdded)` and reupload. On the inaccurate device, then, a new bookmark temporarily takes on the 'current' far-future timestamp, then a few minutes later gets a corrected value volleyed back from another client.
It sounds like we should land bug 1326243 before this?
Depends on: 1326243
No longer depends on: 1326243
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review103980

::: services/sync/modules/engines.js:1625
(Diff revision 5)
>        postQueue.flush(true);
> +    }
> +
> +    if (this._needWeakReupload.size) {
> +      try {
> +        this._weakReupload(up, counts);

Mutable in-out blergh. Perhaps return counts from `_weakReupload`, destructure and increment here instead? Less mutability to keep in mind…

::: services/sync/modules/engines.js:1650
(Diff revision 5)
> +        throw resp;
> +      }
> +      if (!batchOngoing) {
> +        counts.sent += pendingSent;
> +        pendingSent = 0;
> +      }

Should we update any local persisted sync metadata for the local record at this point?

::: services/sync/modules/engines.js:1666
(Diff revision 5)
> +          throw ex;
> +        }
> +        continue;
> +      }
> +      let { enqueued } = postQueue.enqueue(out);
> +      if (!enqueued) {

Why would this fail? If it fails, do we want to continue posting anything, or might this introduce problems?

::: services/sync/modules/engines/bookmarks.js:129
(Diff revision 5)
>        kind: this.type,
>        syncId: this.id,
>        parentSyncId: this.parentid,
>      };
> +    let date = this.dateAdded;
> +    let serverModifiedDate = +this.modified * 1000;

`serverModifiedMillis`

::: services/sync/modules/engines/bookmarks.js:137
(Diff revision 5)
> +      // `date >= serverModifiedDate` means that a client's clock was wrong at
> +      // some point.
> +      date = serverModifiedDate;
> +    }
> +    if (date && !isNaN(date) &&
> +        date > PlacesSyncUtils.bookmarks.EARLIEST_BOOKMARK_TIMESTAMP) {

I would break all of this new stuff out into a function:

```
/**
 * Returns `undefined` if no sensible timestamp could be found. Otherwise, returns the earliest sensible timestamp between `existingMillis` and `serverMillis`.
 */
function ratchetTimestampBackwards(existingMillis, serverMillis, lowerBound) {
  …
}
```

::: services/sync/modules/engines/bookmarks.js:149
(Diff revision 5)
>    // `PlacesSyncUtils.bookmarks.fetch`.
>    fromSyncBookmark(item) {
>      this.parentid = item.parentSyncId;
>      this.parentName = item.parentTitle;
> +    if (item.dateAdded) {
> +      this.dateAdded = item.dateAdded

Semicolon;

::: toolkit/components/places/Bookmarks.jsm:901
(Diff revision 5)
>        if (syncChangeDelta) {
> -        let isChangingIndex = info.hasOwnProperty("index") &&
> -                              info.index != item.index;
>          // Sync stores child indices in the parent's record, so we only bump the
>          // item's counter if we're updating at least one more property in
> -        // addition to the index and last modified time.
> +        // addition to the index, last modified time, and dateAdded

Period at end of sentence.

::: toolkit/components/places/PlacesSyncUtils.jsm:582
(Diff revision 5)
>     *  - kind (all): A string representing the item's kind.
>     *  - syncId (all): The item's sync ID.
>     *  - parentSyncId (all): The sync ID of the item's parent.
>     *  - parentTitle (all): The title of the item's parent, used for de-duping.
>     *    Omitted for the Places root and parents with empty titles.
> +   *  - dateAdded (all): Date the bookmark was added, or date created in sync

Timestamp (in milliseconds) at which the bookmark was added, …

::: toolkit/components/places/PlacesUtils.jsm:290
(Diff revision 5)
>      return v;
>    },
>    keyword: simpleValidateFunc(v => v === null || typeof v == "string"),
>    description: simpleValidateFunc(v => v === null || typeof v == "string"),
>    loadInSidebar: simpleValidateFunc(v => v === true || v === false),
> +  dateAdded: simpleValidateFunc(v => typeof v === "number"

Does this validator apply to incoming records? Is it possible for incoming records to have a missing field?
Attachment #8815096 - Flags: review?(rnewman)
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review103980

> Should we update any local persisted sync metadata for the local record at this point?

Like what?

> Why would this fail? If it fails, do we want to continue posting anything, or might this introduce problems?

Mainly if the record is too big, and we want to continue in this case. Things like 412, network errors, etc. will throw.

I'll add a comment.

> I would break all of this new stuff out into a function:
> 
> ```
> /**
>  * Returns `undefined` if no sensible timestamp could be found. Otherwise, returns the earliest sensible timestamp between `existingMillis` and `serverMillis`.
>  */
> function ratchetTimestampBackwards(existingMillis, serverMillis, lowerBound) {
>   …
> }
> ```

Well, we don't know the existingMillis until later, this is for deciding which server timestamp to use, but I'll see if I can consolidate it better.

> Does this validator apply to incoming records? Is it possible for incoming records to have a missing field?

Yes, however `validateItemProperties` only validates properties that are present by default.
(In reply to Thom Chiovoloni [:tcsc] from comment #24)

> > Should we update any local persisted sync metadata for the local record at this point?
> 
> Like what?

For example: what if the local bookmark was materially changed after the sync began, but before the speculative upload? We uploaded the changed record (even though we perhaps lost upstream data as a result), but we are still, after Bug 1258127, tracking that record as being changed in the database. If it changes again remotely, there will be a conflict; if not, we'll redundantly upload it again.

As a general rule, each time we upload a record, we should mark it as such, even if it's a weak upload.


> > Why would this fail? If it fails, do we want to continue posting anything, or might this introduce problems?
> 
> Mainly if the record is too big, and we want to continue in this case.

I don't think we do. If the record is too big, we probably wouldn't have downloaded it, but if we did, and it's now too big, bookmark syncing becomes impossible -- the entire system is unable to handle the failure to upload a single record and maintain consistency.

In general, all records must be uploaded, or none.
(In reply to Richard Newman [:rnewman] from comment #26)
> (In reply to Thom Chiovoloni [:tcsc] from comment #24)
> 
> > > Should we update any local persisted sync metadata for the local record at this point?
> > 
> > Like what?
> 
> For example: what if the local bookmark was materially changed after the
> sync began, but before the speculative upload? We uploaded the changed
> record (even though we perhaps lost upstream data as a result), but we are
> still, after Bug 1258127, tracking that record as being changed in the
> database. If it changes again remotely, there will be a conflict; if not,
> we'll redundantly upload it again.
> 
> As a general rule, each time we upload a record, we should mark it as such,
> even if it's a weak upload.
> 

So you think we should explicitly ensure that the syncStatus of these items is NORMAL, and not UNKNOWN?

> 
> > > Why would this fail? If it fails, do we want to continue posting anything, or might this introduce problems?
> > 
> > Mainly if the record is too big, and we want to continue in this case.
> 
> I don't think we do. If the record is too big, we probably wouldn't have
> downloaded it, but if we did, and it's now too big, bookmark syncing becomes
> impossible -- the entire system is unable to handle the failure to upload a
> single record and maintain consistency.
> 
> In general, all records must be uploaded, or none.

What about the case where the addition of the "dateAdded" field pushes us over the limit? That's the primary case where this might happen, that I can think of at least. Giving up on syncing this one record seems like the right choice here since 

1. not having an accurate dateAdded isn't a huge deal (and clients shouldn't expect it to be 100% accurate), and 
2. syncing will continue working for the user until they make other changes that cause that record that need to be uploaded, which might never happen.

Not having a dateAdded field when we know a better date locally doesn't seem like it would cause any problems. This is currently the strategy we take for a number of other record types, e.g. forms, history, ...

My understanding is capturing the notion that "this is a best effort upload" was the whole reason that this was done separately from the main upload.
(In reply to Thom Chiovoloni [:tcsc] from comment #27)

> So you think we should explicitly ensure that the syncStatus of these items
> is NORMAL, and not UNKNOWN?

Very likely, yes. (Not sure what to do about the change count; perhaps Kit has an opinion.)


> What about the case where the addition of the "dateAdded" field pushes us
> over the limit? That's the primary case where this might happen, that I can
> think of at least. Giving up on syncing this one record seems like the right
> choice here since 

The worry is that some other change is present in the uploaded record. If it's just a date, fine… but how confident are we that there is no other change sneaking into the upload?

In particular, imagine a situation like:

- Download some records.
- Reconcile them.
- Upload changes since last time.
- At that moment in time, move a downloaded bookmark BB from folder EE to folder FF. (Or have an add-on do it automatically!)
- Speculatively upload everything that needed a timestamp, including BB and FF.
- FF was too big to upload, but BB made it. (EE won't be tracked for upload until next time.)
- Quit Firefox.

Now the server has a BB record that claims it's in FF, and both EE and FF that agree that BB is in EE. Oops!

Ordinarily we would converge later: we'd upload EE and FF and BB, because they were all marked as changed. In normal operation an over-sized record would safely stop us from syncing bookmarks at this point: FF was too big, so we'll never commit a batch, so none of these changes will ever reach the server. But BB escaped!

Ensuring that every best-effort record makes it to the server isn't a complete fix (what if FF weren't a record we just downloaded?), but it's a step closer.

A complete fix would involve immediately triggering another bookmarks sync, augmenting the set of local changes with the set of records we'd like to reupload. If we trust our regular bookmark syncs, this is the safest approach.


> My understanding is capturing the notion that "this is a best effort upload"
> was the whole reason that this was done separately from the main upload.

Just because it's best-effort (and thus not persisted to disk) doesn't mean we don't need to make an effort to ensure consistency in our upload.
This patch checks if the record has been locally modified, and skips it if it has, before doing the weak reupload. AFAICT that resolves both issues you bring up.
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review106134

The lint changes made this a little unpleasant to review :/

::: services/sync/modules/engines.js:1657
(Diff revisions 6 - 7)
>          pendingSent = 0;
>        }
>      });
>  
> -    for (let id of this._needWeakReupload) {
> +    return Promise.all(
> +      Array.from(this._needWeakReupload, Task.async(function* (id) {

I'm wincing at 5,000 async calls invoking 5,000 database queries returning 5,000 promises in an array.

You also run the risk of having a record sneak into the list and be modified before you finish doing all this work.

How 'bout we build the list of records, then fetch the list of changed IDs (usually very small) with a single query, wrapped in promiseSpinningly, and subtract the latter from the former?
Attachment #8815096 - Flags: review?(rnewman) → review-
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review106134

Those came from rebasing :(

> I'm wincing at 5,000 async calls invoking 5,000 database queries returning 5,000 promises in an array.
> 
> You also run the risk of having a record sneak into the list and be modified before you finish doing all this work.
> 
> How 'bout we build the list of records, then fetch the list of changed IDs (usually very small) with a single query, wrapped in promiseSpinningly, and subtract the latter from the former?

The main reason I didn't want to do this is that we've had problems with sqlite in the past caused by building large lists of IDs (https://bugzilla.mozilla.org/show_bug.cgi?id=1310525), and so it would require chunking.

But you're right, doing this in a batch is likely worth it.
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review106134

> The main reason I didn't want to do this is that we've had problems with sqlite in the past caused by building large lists of IDs (https://bugzilla.mozilla.org/show_bug.cgi?id=1310525), and so it would require chunking.
> 
> But you're right, doing this in a batch is likely worth it.

Forgot to mention -- the current approach doesn't have the risk of a record being modified since we check for modification after serializing the record.
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review106134

> Forgot to mention -- the current approach doesn't have the risk of a record being modified since we check for modification after serializing the record.

Probably worth mentioning that that issue won't be triggered with a large `IN (...)` query, so we don't need to do the query in chunks. It's possible there's a better way to write the query though, certainly.
See Also: → 1335198
See Also: → 1335201
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review109684

::: services/sync/modules/engines.js:1862
(Diff revisions 7 - 8)
>    },
>  
>    /**
> -   * Should resolve to true if a record has local changes that should prevent it's
> -   * weak reupload.
> -   */
> +   * Returns a map of (id, unencrypted record) that will be used to perform
> +   * the weak reupload. Subclasses may override this to filter out items we
> +   * shouldn't do upload as part of a weak reupload (items that have changed,

s/do upload/upload

::: services/sync/modules/engines/bookmarks.js:577
(Diff revisions 7 - 8)
>  
> -  hasLocalChanges(id) {
> -    return PlacesSyncUtils.bookmarks.hasChanged(id);
> +  buildWeakReuploadMap(idSet) {
> +    let map = SyncEngine.prototype.buildWeakReuploadMap.call(this, idSet);
> +    let ids = Async.promiseSpinningly(PlacesSyncUtils.bookmarks.getChangedIds(map.keys()));
> +    for (let id of ids) {
> +      map.delete(id);

Delete the IDs from the `idSet` *before* calling the super implementation. That way we don't call `createRecord` for records we're about to throw away.

::: toolkit/components/places/PlacesSyncUtils.jsm:120
(Diff revisions 7 - 8)
>      }
> -    return result[0].getResultByName("syncChangeCounter") == 0;
> +    let db = yield PlacesUtils.promiseDBConnection();
> +    let result = yield db.execute(`
> +      SELECT guid FROM moz_bookmarks
> +      WHERE (syncChangeCounter >= 1 OR
> +             syncStatus != ${PlacesUtils.bookmarks.SYNC_STATUS.NORMAL})

Why do you need this status check?

There are only three statuses: UNKNOWN (in which case they won't be changed), NEW (in which case we should start them with a change counter of 1), and NORMAL.

::: toolkit/components/places/PlacesSyncUtils.jsm:121
(Diff revisions 7 - 8)
> -    return result[0].getResultByName("syncChangeCounter") == 0;
> +    let db = yield PlacesUtils.promiseDBConnection();
> +    let result = yield db.execute(`
> +      SELECT guid FROM moz_bookmarks
> +      WHERE (syncChangeCounter >= 1 OR
> +             syncStatus != ${PlacesUtils.bookmarks.SYNC_STATUS.NORMAL})
> +        AND guid IN (${guids.join(",")})`);

For a client that's about to do a weak upload, do you ever think that there will be more locally changed records than weak entries?

I don't think you need to bother with this at all. Don't take GUIDs as input, don't inline GUIDs into the query. Just return *all* of the changed IDs.
Attachment #8815096 - Flags: review?(rnewman) → review-
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review109684

> Delete the IDs from the `idSet` *before* calling the super implementation. That way we don't call `createRecord` for records we're about to throw away.

This seems unsafe to me. A change could happen to one of the records during the createRecord calls, and we'd fail to exclude it from the set of records we're going to upload. The extra database IO from createRecord is unfortunate, but we'd expect the set of changed IDs to be small, so it should be rare that we do any unnecessary work, and when it happens it should be a small amount.

> For a client that's about to do a weak upload, do you ever think that there will be more locally changed records than weak entries?
> 
> I don't think you need to bother with this at all. Don't take GUIDs as input, don't inline GUIDs into the query. Just return *all* of the changed IDs.

Good point. The common case is very few or no changed IDs.
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review111290

::: toolkit/components/places/PlacesSyncUtils.jsm:104
(Diff revisions 8 - 10)
>    syncIdToGuid(syncId) {
>      return ROOT_SYNC_ID_TO_GUID[syncId] || syncId;
>    },
>  
>    /**
> -   * Resolves to an array of the ids within the input `syncIds` (which may be
> +   * Resolves to an array of the ids of bookmarks that have a nonzero change

s/ids/GUIDs, for clarity.
Attachment #8815096 - Flags: review?(rnewman) → review+
Comment on attachment 8815096 [details]
Bug 676563 - Add dateAdded field to synced bookmarks

https://reviewboard.mozilla.org/r/96084/#review111290

> s/ids/GUIDs, for clarity.

`syncIDs` would follow the terminology used elsewhere in the file (since we run guidToSyncId on the result from the database), but I'll assume that's fine as well.
Blocks: 1337978
See Also: → 1341939
Pushed by tchiovoloni@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3215abf8a6cb
Add dateAdded field to synced bookmarks r=markh,rnewman
https://hg.mozilla.org/mozilla-central/rev/3215abf8a6cb
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Depends on: 1392271
Blocks: 538017
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: