Closed Bug 710448 Opened 8 years ago Closed 8 years ago

Fix record duplicate detection and reconciling

Categories

(Firefox :: Sync, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla12
Tracking Status
firefox11 --- verified

People

(Reporter: gps, Assigned: gps)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [fixed in services])

Attachments

(1 file, 9 obsolete files)

There are still a number of remaining issues from bug 709424 around duplicate records and reconciling that still need addressed. Here are philikon's suggestions:

* makes _handleDupe() always prefer the remote/incoming GUID
* fixes the history engine to not even look for dupes (make _findDupe and _handleDupe no-ops). mozIAsyncHistory::updatePlaces() applies GUID changes transparently, so we get "prefer the remote/incoming GUID" for free there. I consider the re-reconciling YAGNI because at most you're going to miss some of your history visits, but never all of them. (This is a different quality than missing a bookmark rename or a bookmark move, and perf matters for history)
* fix the bookmarks engine's _handleDupe(). It does too much right now, especially with the re-reconciliation that this patch brings us.

I'm assigning a higher priority because the bugs represent potential data loss and lower performance.
This is highly visible in add-ons sync when users put their add-ons in different states on each client then attempt to sync.  bug 712328, for instance.
Depends on: 712328
Attached patch Rewritten reconcile logic, v1 (obsolete) — Splinter Review
This is my first stab at rewriting _reconcile() and friends. I'm still tracking down a few xpcshell test failures. There could be some grave logic issues lingering in the code. I'm submitting this so others can see the road I'm going down.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Attachment #585601 - Flags: feedback?(philipp)
Attached patch Rewritten reconcile logic, v2 (obsolete) — Splinter Review
Newer patch. New tests in syncengine_sync for all the corner cases. The only failing xpcshell tests are clients_engine. Not sure why they are failing.
Attachment #585601 - Attachment is obsolete: true
Attachment #585601 - Flags: feedback?(philipp)
Attachment #585631 - Flags: feedback?(philipp)
Attached patch Rewritten reconcile logic, v3 (obsolete) — Splinter Review
This patch passes xpcshell and TPS tests.
Attachment #585631 - Attachment is obsolete: true
Attachment #585631 - Flags: feedback?(philipp)
Attachment #585960 - Flags: review?(philipp)
Attached patch Rewritten reconcile logic, v4 (obsolete) — Splinter Review
I reverted the _reconcile() API back to the old true/false convention, as I don't think it's appropriate to change the API in Aurora. We could do this in Nightly if we really want to (I don't really want to).
Attachment #585960 - Attachment is obsolete: true
Attachment #585960 - Flags: review?(philipp)
Attachment #586162 - Flags: review?(philipp)
Comment on attachment 586162 [details] [diff] [review]
Rewritten reconcile logic, v4

Review of attachment 586162 [details] [diff] [review]:
-----------------------------------------------------------------

Some general notes:

(a) I would prefer if you got rid of the unrelated whitespace changes to keep the patch as concise as possible.

(b) I have not read the tests yet. I have also given this only one pass so far. I think this deserves at least one more pass from me. It definitely needs super-review, so please make sure you loop Richard in as soon as possible. Either now with this patch or with an updated patch where you already address some of my comments. I'll leave this up to you. For now I'm not changing the review flags.

(c) I haven't done a case-by-case comparison, but it's likely you're changing the cases and amount of times when _createRecord() and itemExists() are called. This will have perf implications. It would be good to have an idea of any changes in perf characteristics. To construct a test case with lots of dupes to exercise, for instance, I suggest syncing two profiles and then choosing Reset Sync -> Merge in one of them.

::: services/sync/modules/engines.js
@@ +1028,5 @@
> +   *
> +   * This function essentially determines whether to apply an incoming record.
> +   *
> +   * @param item record From server to be tested for application.
> +   * @return boolean indicating whether to apply incoming record.

Nit: formatting and wording

@param item
       Incoming record to be tested for application.

@return true if incoming record should be applied, false if not.

@@ +1047,5 @@
> +
> +    this._log.debug("Reconciling " + item.id + ". Exists locally? " +
> +                    existsLocally + ". Locally modified? " + locallyModified +
> +                    ". Local age: " + localAge + " Incoming age: " +
> +                    remoteAge);

This will be printed for every incoming item. Feels more like a trace level thing to me.

@@ +1071,5 @@
> +      if (remoteAge < localAge) {
> +        return true;
> +      } else {
> +        return false;
> +      }

Just do `return remoteAge < localAge;` here. I would also get rid of the line of whitespace and squeeze it right next to the `if` above. That way it's visually much closer to the comment that describes this code.

@@ +1080,1 @@
>  

Nit: excessive whitespace. It feels like this just the first sentence of the comment that follows.

@@ +1080,5 @@
>  
> +    // If the incoming record does not exist locally, we check for a local
> +    // duplicate existing under a different ID. By default, it isn't possible
> +    // for a duplicate to found found because _findDupe() is an empty function.
> +    // However, engines can override as appropriate.

typo: "to be found"

As much as I love your commenting style, this is almost a bit too wordy. I think saying something like the following would work too:

 We check for a local duplicate existing under a different ID. Engines have to opt into this feature by implementing the _findDupe() method.

(Why _findDupe() is an Engine and not a Store API, I don't know. I guess we'll never know... or care.)

@@ +1087,5 @@
> +    // update our metadata and continue to determine what action to take on the
> +    // record.
> +    //
> +    // We always prefer the remote/incoming ID over the local one because:
> +    //   a) It is simpler than having an abitrary heuristic.

The (a) note would only be understood by somebody who knows the hysterical raisins. To anybody else it raises an eyebrow, so I would just get rid of it.

@@ +1100,4 @@
>  
> +        // It is possible the local ID exists on the server. So, we mark the ID
> +        // as deleted to ensure no traces are left.
> +        this._deleteId(dupeID);

I'm not opposed to this, but I do wonder under which circumstances you think the local dupe ID is already on the server... With a well-behaving client (Firefox Sync), this shouldn't happen. At least I can't think of a case (except accidental GUID collision.)

@@ +1101,5 @@
> +        // It is possible the local ID exists on the server. So, we mark the ID
> +        // as deleted to ensure no traces are left.
> +        this._deleteId(dupeID);
> +
> +        existsLocally = this._store.itemExists(dupeID);

I would say that it's not only safe to assume but part of the _findDupe contract that the dupe ID returned belongs in fact to an item that exists locally. So would this not always be `true`?

@@ +1107,5 @@
> +        // It might be tempting to conditionally change the item ID based on
> +        // whether the original item exists. However, itemExists() does not
> +        // imply the item is unknown. So, it is reasonable for an engine to
> +        // do something with the old ID. This caveat should disappear when the
> +        // engine API is rewritten.

I don't understand this comment. We're in the `if (!existsLocally)` path. Why would there be this conditional? What exactly is tempting here? And what exactly would an engine do with the old ID?

@@ +1117,5 @@
> +        // appropriate reconciling can be performed.
> +        if (dupeID in this._modified) {
> +          locallyModified = true;
> +          localAge = Date.now() / 1000 - this._modified[dupeID];
> +          this._modifiedIDs.push(item.id);

Um. Let's go through this.

The incoming item's GUID, the one we keep, is `item.id`. The one we have thrown away is `dupdeID`. So, it's correct to re-calculate `localAge` this way. But I don't understand why we're adding `item.id` to this._modifiedIDs. 

Also, shouldn't we remove `dupeID` from `this._modified` and `this._modifiedIDs`? Otherwise they'd point to a no-longer locally existing record and we'd needlessly upload a {deleted: true} record for them (which we'd then subsequently delete again). In fact, what we should do is move the information from `dupeID` to `item.id`. So something like:

  this._modified[item.id] = this._modified[dupeID];
  delete this._modified[dupeID];
  this._modifiedIDs.push(item.id)
  this._modifiedIDs.splice(this._modifiedIDs.indexOf(dupeID), -1);

Actually, it's probably a better idea to delay the calculation of `this._modifiedIDs` until _uploadOutgoing(). That way we can manipulate `this._modified` in the reconciling stage without having to keep duplicate state up to date in `this._modifiedIDs`. (Given that you don't do the double book-keeping anywhere below, I'm gonna say we definitely want to do this.)

@@ +1139,5 @@
> +        return true;
> +      }
> +      // If the item was modified locally but isn't present, it must have
> +      // been deleted. We compare modification ages and take the youngest.
> +      else {

No else needed after a `return` (this is part of the coding style, in fact.) Makes the whole comment situation nicer, too.

@@ +1145,5 @@
> +          return true;
> +        } else {
> +          this._deleteId(item.id);
> +          return false;
> +        }

So, the record doesn't exist locally but it's locally modified, which means it was deleted. If the remote change is younger (the `if` clause), then we want to apply it and not sync up our local record. So I think you're missing a

  delete this._modified[item.id];

here. Conversely, if the local change is younger (the `else` clause), we want to sync the deletion to the server. So why are you scheduling the record for a DELETE? DELETEs are only ever needed for dupes when the local ID is preferred (which we no longer do.) IMHO this should read:

  if (remoteAge < localAge) {
    delete this._modified[item.id];
  }
  return remoteAge < localAge;

@@ +1154,5 @@
> +    // done, so we don't do anything. In the ideal world, this logic wouldn't
> +    // be here and the engine would take a record and apply it. The reason we
> +    // want to defer this logic is because it would avoid a redundant and
> +    // possibly expensive dip into the storage layer to query item state.
> +    // This should get addressed in the async rewrite, so we ignore it for now.

This comment is a bit convoluted. I think in general it's quite hurtful for perf to get each record from the database first, compare it against some other object, and then conditionally apply it. There might some engines where it's better to do this, but I bet most would perform better if we didn't.

Is that what you meant?

@@ +1194,1 @@
>      }

Same as above: `return remoteAge < localAge;`

Actually, I've noticed a pattern. We can only compare `remoteAge < localAge` if there is in fact a `localAge`, so you have always carefully gated those comparisons on `locallyModified`. With bail-out-early and my short hand applied, all returns look like this:

  if (!locallyModified) {
    return true;
  }
  return remoteAge < localAge;

I think instead of maintaining `localAge` and `locallyModified` as separate values, we could have a simple `remoteIsNewer` boolean:

  remoteIsNewer = true;
  if (item.id in this._modified) {
    remoteIsNewer = remoteAge < (Date.now() / 1000 - this._modified[item.id]);
  }

Obviously recalculate `remoteIsNewer` in the dupe case.

::: services/sync/tests/unit/head_helpers.js
@@ +401,5 @@
>  
>    changeItemID: function(oldID, newID) {
> +    if (oldID in this.items) {
> +      this.items[newID] = this.items[oldID];
> +    }

Seems unnecessary, this change.
(In reply to Philipp von Weitershausen [:philikon] from comment #6)
> (c) I haven't done a case-by-case comparison, but it's likely you're
> changing the cases and amount of times when _createRecord() and itemExists()
> are called. This will have perf implications. It would be good to have an
> idea of any changes in perf characteristics.

Here are the old scenarios:

* Local item is modified: 1 (fetch)
* Item exists locally (same ID): old=2 (exists, fetch)
* Incoming item deleted: 0
* Item doesn't exist locally: 1 (exists) + variable (whatever findDupe does - 1 is safe assumption) + one of above points (1 or 2)

Keep in mind that some of the scenarios in the old case didn't properly detect for duplicates.

In the new world:

* Incoming item deleted: 1 (exists)

Now, it gets a little complicated. If findDupe() returns something, we incur that hit + an exists. For engines that doesn't implement findDupe(), we have 0. If a dupe isn't found, let's say we incur 1 (although this is engine dependent). If a dupe is found, we incur 1+1=2 (dupe + exists)

* Incoming item not found: 1 (exists) + dupe detection {0,1,2}
* Incoming item found locally: 2 (exists + fetch) + dupe detection {0,1,2}

We go +1 in the incoming deleted case because we now properly handle this scenario. This was buggy before.

I'm pretty sure we stay even everywhere else. We may go +-1 in some dupe cases, but I think it all cancels out on average.

I'm not sure how I would go about collecting performance data since we really don't have a good way of measuring this kind of performance AFAIK. Besides, in the cases where we regress, we have to choose between doing it right or doing it fast. In the future, we can probably shave off more data queries by implementing better APIs. But, that is beyond the scope of this bug.

> @@ +1100,4 @@
> >  
> > +        // It is possible the local ID exists on the server. So, we mark the ID
> > +        // as deleted to ensure no traces are left.
> > +        this._deleteId(dupeID);
> 
> I'm not opposed to this, but I do wonder under which circumstances you think
> the local dupe ID is already on the server... With a well-behaving client
> (Firefox Sync), this shouldn't happen. At least I can't think of a case
> (except accidental GUID collision.)

I /think/ it might be possible if you performed Firefox upgrades/downgrades in a crazy order. And, I /think/ I inserted this line to get some tests to pass. Call me crazy. I don't think it is too painful and it *does* help ensure sanity, so I'd like to keep it.

> 
> @@ +1101,5 @@
> > +        // It is possible the local ID exists on the server. So, we mark the ID
> > +        // as deleted to ensure no traces are left.
> > +        this._deleteId(dupeID);
> > +
> > +        existsLocally = this._store.itemExists(dupeID);
> 
> I would say that it's not only safe to assume but part of the _findDupe
> contract that the dupe ID returned belongs in fact to an item that exists
> locally. So would this not always be `true`?

_findDupe() has no such formal contract. Of course, we could easily define it as part of this bug.

One use case would be deriving the ID from contents of the record. This would be tricky without violating our security best practices of randomizing the ID, but it is doable. Should we preclude this use case? I'm not opposed to it. But, I'm also weary of breaking it in a patch destined for Aurora (although it might already be broken in today's _reconcile).

> 
> @@ +1107,5 @@
> > +        // It might be tempting to conditionally change the item ID based on
> > +        // whether the original item exists. However, itemExists() does not
> > +        // imply the item is unknown. So, it is reasonable for an engine to
> > +        // do something with the old ID. This caveat should disappear when the
> > +        // engine API is rewritten.
> 
> I don't understand this comment. We're in the `if (!existsLocally)` path.
> Why would there be this conditional? What exactly is tempting here? And what
> exactly would an engine do with the old ID?

It makes more sense when you consider that existsLocally was refreshed a few lines above. The "tempting" bit would be to rewrite as:

  if (existsLocally) {
    this._store.changeItemID(dupeID, item.id);
  }

But, as the comment says, it is possible the engine knows of the existence of a record, just isn't exposing it as existing through itemExists (perhaps the engine applies a filter on which items are syncable). It would be reasonable in this case for an engine to be notified when an ID is updated.

Do we have such uses cases today? No, AFAIK. But, it is something I thought of during implementation and it is trivial to support (just leave out the conditional).

> 
> @@ +1117,5 @@
> > +        // appropriate reconciling can be performed.
> > +        if (dupeID in this._modified) {
> > +          locallyModified = true;
> > +          localAge = Date.now() / 1000 - this._modified[dupeID];
> > +          this._modifiedIDs.push(item.id);
> 
> Um. Let's go through this.
> 
> The incoming item's GUID, the one we keep, is `item.id`. The one we have
> thrown away is `dupdeID`. So, it's correct to re-calculate `localAge` this
> way. But I don't understand why we're adding `item.id` to this._modifiedIDs.

My original reasoning was I wanted to ensure that the reconciled record was uploaded to the server with a current timestamp. Having thought about this more, I think I'm wrong: this would effectively constitute the client lying about modification times and this could throw off reconciling on other clients because the mtimes aren't proper. I'll change it.

That being said, as I typed this, I removed that line and a sync test broke. Hmmm...

> 
> Also, shouldn't we remove `dupeID` from `this._modified` and
> `this._modifiedIDs`? Otherwise they'd point to a no-longer locally existing
> record and we'd needlessly upload a {deleted: true} record for them (which
> we'd then subsequently delete again). In fact, what we should do is move the
> information from `dupeID` to `item.id`. So something like:
> 
>   this._modified[item.id] = this._modified[dupeID];
>   delete this._modified[dupeID];
>   this._modifiedIDs.push(item.id)
>   this._modifiedIDs.splice(this._modifiedIDs.indexOf(dupeID), -1);
> 
> Actually, it's probably a better idea to delay the calculation of
> `this._modifiedIDs` until _uploadOutgoing(). That way we can manipulate
> `this._modified` in the reconciling stage without having to keep duplicate
> state up to date in `this._modifiedIDs`. (Given that you don't do the double
> book-keeping anywhere below, I'm gonna say we definitely want to do this.)

I think most of that makes sense. I'll take a stab at it.

> 
> @@ +1145,5 @@
> > +          return true;
> > +        } else {
> > +          this._deleteId(item.id);
> > +          return false;
> > +        }
> 
> So, the record doesn't exist locally but it's locally modified, which means
> it was deleted. If the remote change is younger (the `if` clause), then we
> want to apply it and not sync up our local record. So I think you're missing
> a
> 
>   delete this._modified[item.id];
> 
> here. Conversely, if the local change is younger (the `else` clause), we
> want to sync the deletion to the server. So why are you scheduling the
> record for a DELETE? DELETEs are only ever needed for dupes when the local
> ID is preferred (which we no longer do.) IMHO this should read:
> 
>   if (remoteAge < localAge) {
>     delete this._modified[item.id];
>   }
>   return remoteAge < localAge;
>

I agree. Will be in next patch.

> @@ +1154,5 @@
> > +    // done, so we don't do anything. In the ideal world, this logic wouldn't
> > +    // be here and the engine would take a record and apply it. The reason we
> > +    // want to defer this logic is because it would avoid a redundant and
> > +    // possibly expensive dip into the storage layer to query item state.
> > +    // This should get addressed in the async rewrite, so we ignore it for now.
> 
> This comment is a bit convoluted. I think in general it's quite hurtful for
> perf to get each record from the database first, compare it against some
> other object, and then conditionally apply it. There might some engines
> where it's better to do this, but I bet most would perform better if we
> didn't.
> 
> Is that what you meant?

More or less. I think it is stupid for the general _reconcile logic to perform deep record inspection like this. Instead, it should just care about high-level semantics. "Oh, the incoming record is newer: here it is." Then, we defer to the actual engine if/how that record should be applied. But, I wasn't willing to tackle that for this bug b/c it would require auditing/fixing existing engines and would bloat scope. Hence the comment.

> Actually, I've noticed a pattern. We can only compare `remoteAge < localAge`
> if there is in fact a `localAge`, so you have always carefully gated those
> comparisons on `locallyModified`. With bail-out-early and my short hand
> applied, all returns look like this:
> 
>   if (!locallyModified) {
>     return true;
>   }
>   return remoteAge < localAge;
> 
> I think instead of maintaining `localAge` and `locallyModified` as separate
> values, we could have a simple `remoteIsNewer` boolean:
> 
>   remoteIsNewer = true;
>   if (item.id in this._modified) {
>     remoteIsNewer = remoteAge < (Date.now() / 1000 -
> this._modified[item.id]);
>   }
> 
> Obviously recalculate `remoteIsNewer` in the dupe case.

I like the idea, but I'm not convinced it is worth doing. I feel both solutions are equally meh. If nothing else, localAge is used as part of a logging statement, so it needs to exist. So, that effectively means replacing "remoteAge < localAge" with remoteIsNewer. Do I want to maintain another variable? meh.

> ::: services/sync/tests/unit/head_helpers.js
> @@ +401,5 @@
> >  
> >    changeItemID: function(oldID, newID) {
> > +    if (oldID in this.items) {
> > +      this.items[newID] = this.items[oldID];
> > +    }
> 
> Seems unnecessary, this change.

The change is required because the new logic unconditionally calls findDupe() and the patched RotaryEngine engine always returns a dupeID for a specific ID, even if it doesn't exist locally. This may change given feedback.
Attached patch Rewritten reconcile logic, v5 (obsolete) — Splinter Review
Incorporated some of feedback (mostly style nits).
Attachment #586162 - Attachment is obsolete: true
Attachment #586162 - Flags: review?(philipp)
I'd like to underscore the importance of this bug.  I've been using Sync full-time since it was added to Firefox itself, and my bookmarks are being duplicated like crazy: nearly every bookmark has 3-10 duplicates!  Also, when I use Firefox on my netbook after not using it for a while, Firefox runs at 100% CPU for about two hours while it syncs--and a quick glance at the sync debug logs seems to indicate that it's performing a full sync by downloading every remote bookmark, even though there are probably less than 100 new ones since I last used Firefox on this system.

Considering the extensive bugginess of this code and the extensive rewriting which is going on, I wonder if it was wise to push Weave into Firefox itself rather than leaving it as an extension.  I know Firefox is releasing "early and often" now (which is another topic), but it doesn't seem like a good idea to have Sync fixes waiting on an entire Firefox release (or blocked by systems which can't upgrade the browser itself at-will).

Anyway, I look forward to Sync working reliably and well--but it seems like it's got a ways to go.
(In reply to Adam Porter from comment #9)
> I'd like to underscore the importance of this bug.  I've been using Sync
> full-time since it was added to Firefox itself, and my bookmarks are being
> duplicated like crazy: nearly every bookmark has 3-10 duplicates! 

Adam, which versions of Firefox are you using on your various devices?

Lots of bookmark reliability fixes landed after the final release of the Sync add-on, so if you have an old machine running Firefox 3.6 then you're running known-buggy code.

If every single device you're using is running Firefox 9 or higher, then I'd be very interested in a detailed report.

> when I use Firefox on my netbook after not using it for a while, Firefox
> runs at 100% CPU for about two hours while it syncs--and a quick glance at
> the sync debug logs seems to indicate that it's performing a full sync by
> downloading every remote bookmark, even though there are probably less than
> 100 new ones since I last used Firefox on this system.

Most likely your device is stuck in an error situation, and it's attempting to dig itself out. If you file a *new* bug by clicking here:

  https://bugzilla.mozilla.org/enter_bug.cgi?product=Mozilla%20Services&component=Firefox%20Sync%3A%20Backend

and attach a sync log for one of the failing syncs, we can give you more advice.
Comment on attachment 587904 [details] [diff] [review]
Rewritten reconcile logic, v5

Flagging this for my feedback so that it's in my queue. It's past 11pm, so I won't get to it tonight, but this week...
Attachment #587904 - Flags: feedback?(rnewman)
Richard, thank you for your reply.

I'm using Firefox 9.0.1 from Ubuntu Oneiric on all my systems.  My laptop never runs at 100% CPU while syncing, but the netbook does consistently.  I have the same packages and versions on both systems, although I have amd64 on the netbook and i386 on the laptop.  But I also use sync on my desktop computer with a different Sync account and the 64-bit packages, and it never does the 100% CPU thing either.  

I'd be happy to provide more info if I can.  All I know at the moment is that my bookmarks have been duplicated like crazy.  I do remember that, a few months ago, I moved all of my ancient bookmarks out of the Bookmarks Menu into Unfiled Bookmarks on one system, because the global menu was causing X to crash when it tried to display the long bookmarks menu.  But even after syncing, my netbook still had all of the bookmarks in the Menu, not in Unfiled.  But that still doesn't explain them being duplicated 8-10 times in many cases.  I think I mentioned this on another bug report somewhere...

I have noticed some errors in the sync logs.  I'd be happy to provide the logs, but they contain all of my bookmarks in-full.  Is there a script or anything available to remove or obfuscate data in the debug logs?  14,000+ (with dupes) is a bit too many to do manually.  :)
Attached patch Rewritten reconcile logic, v6 (obsolete) — Splinter Review
Attachment #587904 - Attachment is obsolete: true
Attachment #587904 - Flags: feedback?(rnewman)
Comment on attachment 590352 [details] [diff] [review]
Rewritten reconcile logic, v6

Review of attachment 590352 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/sync/modules/engines.js
@@ +1046,4 @@
>  
> +    let remoteAge = AsyncResource.serverTime - item.modified;
> +    let localAge  = locallyModified ?
> +      (Date.now() / 1000 - this._modified[item.id]) : null;

Future work: estimate clock drift.

@@ +1077,5 @@
> +    // At this point the incoming record is not for a deletion and must have
> +    // data. If the incoming record does not exist locally, we check for a local
> +    // duplicate existing under a different ID. The default implementation of
> +    // _findDupe() is empty, so engines have to opt in to this functionality.
> +    // implement _findDupe(), so the function does nothing.

Dud comment line?

@@ +1091,3 @@
>  
> +        // We assume the local ID never existed on the server. Otherwise, we
> +        // should call deleteId here.

As discussed, I am concerned that two implementations with subtly different reconciling logic could result in divergent records on the server, because one will not consider the incoming record to be a dupe and then upload their own.

The duping client will re-dupe these two records every time it sees them, but subsequent changes could then cause bookmarks to reappear.

I generally think that the state after a sync should be approximately the same as after a wipe and upload from a correct and complete client, and if one doesn't call deleteId in the circumstance above then that won't be the case.

@@ +1098,5 @@
> +        existsLocally = this._store.itemExists(dupeID);
> +
> +        // Changing the item's ID isn't conditional in the odd case that the
> +        // engine knows of an item but isn't exposing it through itemExists.
> +        // If the API contract were stronger, this could be changed.

"We always change an item's ID, in case the engine knows…"

@@ +1113,5 @@
> +
> +          this._modified[item.id] = this._modified[dupeID];
> +          delete this._modified[dupeID];
> +          this._modifiedIDs.push(item.id);
> +          this._modifiedIDs.splice(this._modifiedIDs.indexOf(dupeID), 1)

I contend that _modifiedIDs is only used in outgoing, and thus you're modifying it for no reason now that _modified is not shared with the tracker. See if you can get away with computing it once with Object.keys, rather than maintaining it with two object allocations each time you touch _modified.

@@ +1194,5 @@
> +    //
> +    // The implementation is prone to clock skew between the client and
> +    // server. It has been this way forever, so we aren't too worried.
> +    // TODO Work around issues surrounding clock skew when resolving conflicts.
> +    this._log.debug("Both local and remote changes to record: " + item.id);

Warning. The original code should really shout about data loss, so we might as well fix that now.
Attachment #590352 - Flags: feedback+
(In reply to Richard Newman [:rnewman] from comment #14)
> > +        // We assume the local ID never existed on the server. Otherwise, we
> > +        // should call deleteId here.
> 
> As discussed, I am concerned that two implementations with subtly different
> reconciling logic could result in divergent records on the server, because
> one will not consider the incoming record to be a dupe and then upload their
> own.

This would be erroneous behaviour, even now, if they don't properly clean up. I see no reason to send a DELETE request for an object *we* never uploaded.

> > +          this._modified[item.id] = this._modified[dupeID];
> > +          delete this._modified[dupeID];
> > +          this._modifiedIDs.push(item.id);
> > +          this._modifiedIDs.splice(this._modifiedIDs.indexOf(dupeID), 1)
> 
> I contend that _modifiedIDs is only used in outgoing, and thus you're
> modifying it for no reason now that _modified is not shared with the
> tracker. See if you can get away with computing it once with Object.keys,
> rather than maintaining it with two object allocations each time you touch
> _modified.

I already told him the same thing in comment 6:

> Actually, it's probably a better idea to delay the calculation of
> `this._modifiedIDs` until _uploadOutgoing(). That way we can manipulate
> `this._modified` in the reconciling stage without having to keep duplicate
> state up to date in `this._modifiedIDs`. (Given that you don't do the double
> book-keeping anywhere below, I'm gonna say we definitely want to do this.)
(In reply to Philipp von Weitershausen [:philikon] from comment #15)

> This would be erroneous behaviour, even now, if they don't properly clean
> up. I see no reason to send a DELETE request for an object *we* never
> uploaded.

The client that recognizes a duplicate must be the one that does the deletion. The other client believes that the two records are different (else they wouldn't be there), and is maintaining the truth of that on the server; there is nothing for it to clean up. The client that merges the two must assert its truth, by marking the duped record as deleted. There's no way for the merging client to inform other clients of the de-duplication, save by uploading the canonical record and deleting the others.

If you consider the postcondition of a sync as "server and client agree for known engines", then deletion is obviously required in this kind of edge case.


> I already told him the same thing in comment 6:

I'm glad we independently agree!
Attached patch Rewritten reconcile logic, v7 (obsolete) — Splinter Review
Incorporated feedback from rnewman.

Not sure who gets final review, so assigning to 2 people. Given the nature of the changes, perhaps a dual review is appropriate?
Attachment #590352 - Attachment is obsolete: true
Attachment #590858 - Flags: review?(rnewman)
Attachment #590858 - Flags: review?(philipp)
(In reply to Gregory Szorc [:gps] from comment #17)
> Not sure who gets final review, so assigning to 2 people. Given the nature
> of the changes, perhaps a dual review is appropriate?

Not just appropriate but also *required* as per review rules (super/systems review for highly invasive changes like this.)
(In reply to Richard Newman [:rnewman] from comment #16)
> If you consider the postcondition of a sync as "server and client agree for
> known engines", then deletion is obviously required in this kind of edge
> case.

I'm not sure I'm following. Sorry for being thick, but can you outline the edge case in terms of computer A, B, and REMOTE_GUID and LOCAL_DUPE_GUID (or other appropriate variable names) and how the state of the server and the two clients is mutated in syncs?

Assuming correct client implementations and the new "remote GUID always wins" behaviour, I still see no reason to ever send a DELETE request. I would like to avoid DELETE requests at all cost, in fact, because they are probably the single most expensive call we can make on our server-side databases (though I'm not sure if that's the case if the record doesn't exist at all...)
(In reply to Philipp von Weitershausen [:philikon] from comment #19)
> (In reply to Richard Newman [:rnewman] from comment #16)
> > If you consider the postcondition of a sync as "server and client agree for
> > known engines", then deletion is obviously required in this kind of edge
> > case.
> 
> I'm not sure I'm following. Sorry for being thick, but can you outline the
> edge case in terms of computer A, B, and REMOTE_GUID and LOCAL_DUPE_GUID (or
> other appropriate variable names) and how the state of the server and the
> two clients is mutated in syncs?

I'll give it a shot. This started as intuition, so let me puzzle it through!

  Client A. Toolbar > Slashdot, AAAAAA
  Client B. Toolbar > Slashdot, BBBBBB

There are a few variants of this. One is timing-based, one is reconciling-based.

Timing:

  Client B syncs first. Server now has BBBBBB.
  Client A syncs simultaneously. Server now has AAAAAA with a later timestamp than BBBBBB.
  Client B syncs again. Reconciles local BBBBBB to remote AAAAAA.
  Client A syncs again. Never sees remote BBBBBB because timestamp is older.
  Remote BBBBBB is now orphaned.
  Client A renames Slashdot to Foobar. Record is reuploaded.
  Client B correctly renames its local bookmark.
  Client C connects. Both AAAAAA and BBBBBB are downloaded. Both Slashdot and Foobar appear on C's toolbar.

Reconciling:

Imagine that each bookmark has the same tags but differing case, or otherwise will hit some unspecified (it's all unspecified!) edge case in reconciling on only one client.

  Client B syncs first. Server now has BBBBBB.
  Client A syncs. Downloads BBBBBB, decides records are not identical. Uploads AAAAAA.
  Client B syncs again. Reconciles local BBBBBB to remote AAAAAA.

  Client A now considers both AAAAAA and BBBBBB to be different records, but Client B thinks they're the same with ID AAAAAA. The server contains both records.

  Client A renames one Slashdot to Foobar, perhaps with the user not spotting the 'duplicate'. Record is reuploaded.

If it's AAAAAA, Client B will see the rename. If it's BBBBBB, Client B will download BBBBBB and *not* reconcile it to the local record, because the records aren't titled the same; a duplicate will appear, matching Client A's state.


These two states are reached because the machine that decides that one record should be merged with another is not updating the state on the server. The way to update the state on the server is to upload and delete accordingly. If we don't delete, we trust that race conditions or differences in reconciling logic are not risks (and they are, because we don't specify how reconciling should work).

Even disregarding all this, the presence of orphan records on the server is a real danger: the moment a user moves a reconciled tree of bookmarks then adds a new device, their bookmarks on the new device will be screwed up, because both the 'live' and 'orphan' records will still be downloaded and added.


Please poke holes in my reasoning!
Oh, and scenario 3: bug. We rename a bookmark in XUL Fennec before Sync starts tracking, yadda yadda. Anything that could cause an orphaned record to remain on the server can result in later sadness.

(Think about all those users who futzed with deactivate / restore backup / move and rename / set up sync again...)
Comment on attachment 590858 [details] [diff] [review]
Rewritten reconcile logic, v7

Review of attachment 590858 [details] [diff] [review]:
-----------------------------------------------------------------

Quick mini-review before I eat something.

::: services/sync/modules/engines.js
@@ +751,2 @@
>      // Array of just the IDs from this._modified. This is what we iterate over
>      // so we can modify this._modified during the iteration.

Comment no longer means anything.

@@ +1070,5 @@
> +      // newer record.
> +      if (!locallyModified) {
> +        return true;
> +      }
> +      return remoteIsNewer;

return !locallyModified && remoteIsNewer;

@@ +1184,5 @@
>  
> +    // At this point, records are different and the local record is modified.
> +    // We resolve conflicts by record age, where the newest one wins. This does
> +    // result in data loss and should be handled by giving the engine an
> +    // opportunity to merge the records.

File a bug, put bug number here?

@@ +1185,5 @@
> +    // At this point, records are different and the local record is modified.
> +    // We resolve conflicts by record age, where the newest one wins. This does
> +    // result in data loss and should be handled by giving the engine an
> +    // opportunity to merge the records.
> +    //

Spare line. Please donate it to Goodwill.

@@ +1297,5 @@
>      // Mark failed WBOs as changed again so they are reuploaded next time.
>      for (let [id, when] in Iterator(this._modified)) {
>        this._tracker.addChangedID(id, when);
>      }
>      this._modified    = {};

I'd like to fix the code style and spacing in this method, but not in this commit.

::: services/sync/tests/tps/test_addon_nonrestartless_xpi.js
@@ +79,5 @@
>  // And we uninstall it
> +
> +// There is a race condition in the test. The add-on could get uninstalled
> +// before the AddonsReconciler is listening. So, we perform a sync both
> +// before and after, just in case.

This kinda sounds like it slipped in here by mistake. Did you intend for it to be in this part of this bug?

::: services/sync/tests/unit/test_syncengine_sync.js
@@ +28,5 @@
> +    rotary: {}
> +  };
> +
> +  const USER = "foo";
> +  Svc.Prefs.set("clusterURL", "http://localhost:8080/");

Set serverURL too. Good practice, avoids test code occasionally hitting up prod...
Blocks: 720592
(In reply to Richard Newman [:rnewman] from comment #22)
> @@ +1070,5 @@
> > +      // newer record.
> > +      if (!locallyModified) {
> > +        return true;
> > +      }
> > +      return remoteIsNewer;
> 
> return !locallyModified && remoteIsNewer;

Yes, that is what the boolean logic collapses to. But, I kinda like the expressiveness, readability, and consistency of the code [throughout the function]. I'm going to ignore this comment unless you force me otherwise.
 
> 
> @@ +1297,5 @@
> >      // Mark failed WBOs as changed again so they are reuploaded next time.
> >      for (let [id, when] in Iterator(this._modified)) {
> >        this._tracker.addChangedID(id, when);
> >      }
> >      this._modified    = {};
> 
> I'd like to fix the code style and spacing in this method, but not in this
> commit.

The spacing looks wrong because I deleted the line below it and didn't readjust the '='. I'll fix it, despite your comment.

> 
> ::: services/sync/tests/tps/test_addon_nonrestartless_xpi.js
> @@ +79,5 @@
> >  // And we uninstall it
> > +
> > +// There is a race condition in the test. The add-on could get uninstalled
> > +// before the AddonsReconciler is listening. So, we perform a sync both
> > +// before and after, just in case.
> 
> This kinda sounds like it slipped in here by mistake. Did you intend for it
> to be in this part of this bug?

I discovered (and fixed) the issue when testing this patch. At the time, I wasn't sure if it was related to reconciling or what. I /think/ I eventually realized it was a race condition (by looking at logs) and added the comment. I agree it can probably be split out into a new bug.

> 
> ::: services/sync/tests/unit/test_syncengine_sync.js
> @@ +28,5 @@
> > +    rotary: {}
> > +  };
> > +
> > +  const USER = "foo";
> > +  Svc.Prefs.set("clusterURL", "http://localhost:8080/");
> 
> Set serverURL too. Good practice, avoids test code occasionally hitting up
> prod...

The code is wrong in the whole file then. I'll fix it in the new code.
Attached patch Rewritten reconcile logic, v8 (obsolete) — Splinter Review
Incorporated latest feedback.
Attachment #590858 - Attachment is obsolete: true
Attachment #590858 - Flags: review?(rnewman)
Attachment #590858 - Flags: review?(philipp)
Attachment #590975 - Flags: review?(rnewman)
Attachment #590975 - Flags: review?(philipp)
Comment on attachment 590975 [details] [diff] [review]
Rewritten reconcile logic, v8

Review of attachment 590975 [details] [diff] [review]:
-----------------------------------------------------------------

I'm feeling pretty good about this. Let's see if philikon or mconnor spot something I haven't. Just nits.

::: services/sync/modules/engines.js
@@ +1128,5 @@
> +    // state seamlessly.
> +
> +    if (!existsLocally) {
> +      // If the item doesn't exist locally and we have no local modifications
> +      // to the item (implying it was deleted), always apply the remote item.

That parenthetical should either be "local modifications would imply its deletion" or "implying that it was not deleted".

::: services/sync/tests/unit/test_syncengine_sync.js
@@ +29,5 @@
> +  };
> +
> +  const USER = "foo";
> +  Svc.Prefs.set("clusterURL", "http://localhost:8080/");
> +  Svc.Prefs.set("serverURL", "http://localhost:8080/");

const these URLs?

(Maybe even share the const with the rest of the file...)

@@ +428,5 @@
> +  do_check_true(engine._store.itemExists("entry"));
> +
> +  engine._sync();
> +
> +  do_check_eq(1, Object.keys(engine._store.items).length);

do_check_attribute_count(engine._store.items, 1);

@@ +498,5 @@
> +
> +  engine._sync();
> +
> +  // Since the remote change is newer, the incoming item should exist locally.
> +  do_check_eq(1, Object.keys(engine._store.items).length);

As before.

@@ +507,3 @@
>    do_check_eq(1, collection.count());
> +  let wbo = collection.wbo("DUPE_INCOMING");
> +  let payload = JSON.parse(JSON.parse(wbo.payload).ciphertext);

IIRC you can use collection.payload("DUPE_INCOMING") here to skip directly to wbo.payload.

@@ +580,5 @@
> +  let collection = server.getCollection(user, "rotary");
> +  do_check_eq(1, collection.count());
> +  let wbo = collection.wbo("DUPE_INCOMING");
> +  do_check_neq(undefined, wbo);
> +  let payload = JSON.parse(JSON.parse(wbo.payload).ciphertext);

As before.
Attachment #590975 - Flags: review?(rnewman) → review+
(In reply to Richard Newman [:rnewman] from comment #20)
> Please poke holes in my reasoning!

I can't find any. I had not thought of race conditions. B2G is rotting my services mind.

I withdraw my objections against DELETEs on the server in the discussed case.
(In reply to Richard Newman [:rnewman] from comment #25)
> @@ +507,3 @@
> >    do_check_eq(1, collection.count());
> > +  let wbo = collection.wbo("DUPE_INCOMING");
> > +  let payload = JSON.parse(JSON.parse(wbo.payload).ciphertext);
> 
> IIRC you can use collection.payload("DUPE_INCOMING") here to skip directly
> to wbo.payload.

This doesn't work. I'm not sure if the implementation is purposefully busted:

  payloads: function () {
    return this.wbos().map(function (wbo) {
      return JSON.parse(JSON.parse(wbo.payload).ciphertext);
    });
  },

  payload: function payload(id) {
    return this.wbo(id).payload;
  },
Attached patch Rewritten reconcile logic, v9 (obsolete) — Splinter Review
Incorporated rnewman's nits.
Attachment #590975 - Attachment is obsolete: true
Attachment #590975 - Flags: review?(philipp)
Attachment #591322 - Flags: review?(rnewman)
Attachment #591322 - Flags: review?(philipp)
Comment on attachment 591322 [details] [diff] [review]
Rewritten reconcile logic, v9

Review of attachment 591322 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good overall. Basically only have style nits, can't find any holes in the logic. r=me with nits addressed.

::: services/sync/modules/engines.js
@@ +1043,2 @@
>  
> +    // TODO Handle clock drift better.

No new TODO comments without a bug on file (and a mentioning of said bug here)!

@@ +1044,5 @@
> +    // TODO Handle clock drift better.
> +    let remoteAge = AsyncResource.serverTime - item.modified;
> +    let localAge  = locallyModified ?
> +      (Date.now() / 1000 - this._modified[item.id]) : null;
> +    let remoteIsNewer = remoteAge < localAge;

Any reason why you didn't go for the algorithm that I suggested in comment 6, defaulting  remoteIsNewer to `true` (right now it will be `false` if the item isn't `locallyModified`). That way you make stanzas like

  if (!locallyModified) {
    return true;
  }
  return remoteIsNewer;

more concise because they can just be

  return remoteIsNewer;

@@ +1091,5 @@
>  
> +        // The current API contract does not mandate that the ID returned by
> +        // _findDupe() actually exists. Therefore, we have to perform this
> +        // check.
> +        existsLocally = this._store.itemExists(dupeID);

Is that really part of the contract? All of the existing stores can only dupe if the record is present locally. Maybe the add-ons engine is different here, but we can easily make it do that check. I'd rather retrofit that API contract to mean "find a dupe that actually *exists* locally, than always paying the penalty of *another* call to the store here.

@@ +1148,5 @@
> +      }
> +
> +      this._log.trace("Ignoring incoming item because the local item's " +
> +                      "deletion is newer.");
> +      return false;

Stylistically, I would prefer to do

  if (remoteIsNewer) {
    delete this._modified[item.id];
  }
  return remoteIsNewer;

And again, you can save the `if (!locallyModified)` stanza if you chose a different default value for `remoteIsNewer` (see above).
Attachment #591322 - Flags: review?(philipp) → review+
Comment on attachment 591322 [details] [diff] [review]
Rewritten reconcile logic, v9

Review of attachment 591322 [details] [diff] [review]:
-----------------------------------------------------------------

Concur with philikon, but happy for you to address the larger comments in a follow-up. I don't want to hold this up any longer than is necessary for correctness.
Attachment #591322 - Flags: review?(rnewman) → review+
Blocks: 721181
(In reply to Philipp von Weitershausen [:philikon] from comment #29)
> @@ +1044,5 @@
> > +    // TODO Handle clock drift better.
> > +    let remoteAge = AsyncResource.serverTime - item.modified;
> > +    let localAge  = locallyModified ?
> > +      (Date.now() / 1000 - this._modified[item.id]) : null;
> > +    let remoteIsNewer = remoteAge < localAge;
> 
> Any reason why you didn't go for the algorithm that I suggested in comment
> 6, defaulting  remoteIsNewer to `true` (right now it will be `false` if the
> item isn't `locallyModified`). That way you make stanzas like
> 
>   if (!locallyModified) {
>     return true;
>   }
>   return remoteIsNewer;
> 
> more concise because they can just be
> 
>   return remoteIsNewer;

Actually, yes. Per other feedback, I added trace-level logs before every return which explain exactly what is being done. I forgot to add these for a few return cases. I'd like to be as explicit as possible in the logged message, which requires having 2 variables. I'll add the missing log lines in the next patch.

> @@ +1091,5 @@
> >  
> > +        // The current API contract does not mandate that the ID returned by
> > +        // _findDupe() actually exists. Therefore, we have to perform this
> > +        // check.
> > +        existsLocally = this._store.itemExists(dupeID);
> 
> Is that really part of the contract? All of the existing stores can only
> dupe if the record is present locally. Maybe the add-ons engine is different
> here, but we can easily make it do that check. I'd rather retrofit that API
> contract to mean "find a dupe that actually *exists* locally, than always
> paying the penalty of *another* call to the store here.

I'm going to make a follow-up for this. Since this extra store hit only happens if a dupe is found and that is rare, I'm not too worried about the performance implications.

> @@ +1148,5 @@
> > +      }
> > +
> > +      this._log.trace("Ignoring incoming item because the local item's " +
> > +                      "deletion is newer.");
> > +      return false;
> 
> Stylistically, I would prefer to do
> 
>   if (remoteIsNewer) {
>     delete this._modified[item.id];
>   }
>   return remoteIsNewer;
> 
> And again, you can save the `if (!locallyModified)` stanza if you chose a
> different default value for `remoteIsNewer` (see above).

I disagree on the grounds I like the explicit logging. If we weren't logging, yes, I think collapsing the logic is a good idea.
Pushed to s-c: https://hg.mozilla.org/services/services-central/rev/93c071e2d4ea
Whiteboard: [fixed in services]
This is the patch that landed in s-c. It is v9 with minor nits from philikon's review addressed.

[Approval Request Comment]
Regression caused by (bug #): 
User impact if declined: Add-on sync (added in Firefox 11) will not work properly in some scenarios. Other Sync engines have likely been silently losing data as a result of the old implementation as well.
Testing completed (on m-c, etc.): None yet. Services QA will be on this bug shortly.
Risk to taking this patch (and alternatives if risky): It doesn't have much exposure in m-c yet. We could have broken something we don't test. The worst care scenario is breakage is noticed in the beta cycle. We back out this patch, release a new beta, and assess whether to hide add-on sync from Firefox 11.
Attachment #591322 - Attachment is obsolete: true
Attachment #591593 - Flags: review+
Attachment #591593 - Flags: approval-mozilla-aurora?
Blocks: 721194
Comment on attachment 591593 [details] [diff] [review]
Rewritten reconcile logic, v10

[Triage Comment]
Fixes a highly visible add-on sync issue. Approved for Aurora 11.
Attachment #591593 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Depending on how timing works out, I might land this with Bug 721271 and save you some time, gps.
I think rnewman meant status-firefox11 = fixed

However,  I am unable to verify it with this s-c client train build.

The s-c build I am trying to verify this on was built with a bookmark tagging regression in Firefox (which appears fixed in m-c today).  Trying to replicate comment #20 scenarios is impossible with the Fx bug. bug 712328 still occurs in addon reconciling.

If expcshell test cases have passed and 712328 wasn't intended fixed with this.  Then let's roll the train to m-c and I'll attempt to verify there with working BM tagging.
(In reply to Tracy Walker [:tracy] from comment #37)
> I think rnewman meant status-firefox11 = fixed

Good catch, thanks.

> If expcshell test cases have passed and 712328 wasn't intended fixed with
> this.  Then let's roll the train to m-c and I'll attempt to verify there
> with working BM tagging.

wfm if gps is happy.
Well, it was just theorized that bug 712328 was due to bad reconciling. I suppose it isn't. Time to work on bug 712328 then.
Merged to m-c: https://hg.mozilla.org/mozilla-central/rev/93c071e2d4ea
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla12
I have done my best to create conflicts requiring reconciliation in bookmarks, history and passwords.  Sync appears to be doing the right thing. Marking verified. However, we should keep our eyes open for further reports.
Status: RESOLVED → VERIFIED
> --- Comment #33 from Gregory Szorc [:gps] <gps@mozilla.com> 2012-01-25 13:42:40 PST ---
> User impact if declined: Add-on sync (added in Firefox 11) will not work
> properly in some scenarios. Other Sync engines have likely been silently losing
> data as a result of the old implementation as well.

Would you please elaborate on this statement?  Does this mean that
some of my bookmarks may have been being silently deleted all this
time that I've been using Weave/Sync?
(In reply to Adam Porter from comment #42)
> Would you please elaborate on this statement?  Does this mean that
> some of my bookmarks may have been being silently deleted all this
> time that I've been using Weave/Sync?

In most circumstances, probably not. The specific circumstances where data loss occurred was with regards to handling of deleted records. In short, the deletion would be ignored or a previous deleted item would be restored.

If you are seeing issues where Sync is silently deleting bookmarks (which I believe has been an issue in past releases but has since been rectified) or see any other kind of data corruption, please file a bug! Data loss/corruption bugs are typically treated with high priority.
(In reply to Gregory Szorc [:gps] from comment #43)
> In most circumstances, probably not. The specific circumstances where data
> loss occurred was with regards to handling of deleted records. In short, the
> deletion would be ignored or a previous deleted item would be restored.
> 
> If you are seeing issues where Sync is silently deleting bookmarks (which I
> believe has been an issue in past releases but has since been rectified) or
> see any other kind of data corruption, please file a bug! Data
> loss/corruption bugs are typically treated with high priority.

Thanks for the clarification.  Could this explain the many, many duplicates (e.g. bookmarks duplicated 5+ times) I've seen since using Weave?

I've been using Firefox since it was Phoenix and have accumulated many bookmarks, including ones imported from before Phoenix.  Even if many were silently deleted, the likelihood of my noticing is quite low--and the problem with duplication I've seen since using Weave makes it even more mind-boggling.  Do I need to run some kind of tool to monitor Firefox's bookmarks externally to make sure they aren't disappearing?  Is anyone else doing so?  Are there external unit tests?  I hate to be paranoid, but given the fundamental rewrites that some parts of Weave seem to be undergoing, I'm not sure it's trustworthy yet.
(In reply to Adam Porter from comment #44)

> Thanks for the clarification.  Could this explain the many, many duplicates
> (e.g. bookmarks duplicated 5+ times) I've seen since using Weave?

Unlikely.

If you've been using Weave, and versions of Sync prior to Firefox 4 (and even before Firefox 8 or 9, to an extent) then you've likely simply encountered other bugs.

Weave, and the Sync addon for most of its life, were full of bookmarks bugs. Those have largely been fixed, most of them more than a year ago.

> Do I need to run some kind of tool to monitor Firefox's
> bookmarks externally to make sure they aren't disappearing?

That seems unlikely.

Any data that you consider precious should be backed up. Time Machine + Firefox's own snapshotting should suffice.
Adam: If you are using the Weave/Sync extension, please uninstall it right now and upgrade to a Firefox that has Sync built-in. If you are just referring to Sync by its old name of Weave, then don't worry. But, please say "Firefox Sync" in the future [if referring to the version of Sync built into the desktop Firefox application].

Bookmarks are the most complicated item in Firefox to sync. Historically, there have been a number of bugs around bookmark sync. Some involve duplication and data loss. We do our best to ensure that these types of bugs never make it to release/stable builds. However, if you are running beta software (like Android Sync builds from a few weeks back), have old Firefox clients (like those running the Weave extension), or are running a 3rd party client [that may not fully support the bookmarks data model], then these expose your surface area for potential for data integrity loss.

Also, Sync is not a backup service. So, you should periodically back up your profile. If you just care about bookmarks, there is a backup option in the bookmarks manager under the "Import and Backup" drop-down.

And, yes, Sync does have comprehensive tests for bookmarks. If you find something breaking, please don't hesitate to file a bug.
Oops, I thought "Weave" was the cool kids' codename for it.  "Sync" sounds so boring by comparison.  ;)  No, I'm using 10.0.2, on Linux (not a Mac).  I certainly make regular backups of my Firefox profile.  My point is that, I have many bookmarks, and I don't go through all of them all the time.  If one silently disappeared, I might never notice until much later when I go to access it, and then I might conclude that I never bookmarked it in the first place.  Even then, trying to dig it out of some ancient backup would be impractical, to say the least.  Guess that's why we have Google--hey, who needs bookmarks, anyway?

Don't get me wrong, I'm glad for Sync.  But I have been concerned that it was released as stable software before it was ready.  It's been part of Firefox proper for some time now, but it hasn't had good conflict resolution all this time?  :/
Blocks: 712328
No longer depends on: 712328
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.