Closed Bug 709424 Opened 12 years ago Closed 12 years ago

Reconciling doesn't work correctly after duplicate detected

Categories

(Firefox :: Sync, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla11

People

(Reporter: gps, Assigned: gps)

References

Details

(Whiteboard: [verified in services])

Attachments

(1 file, 1 obsolete file)

philikon and I think we found a bug in the reconciler.

Steps to reproduce:

1) Set up 2 profiles with the same entity but different IDs, let's call them profiles A and B and the record IDs X and Y.
2) Make a change on record Y in profile B 
3) Perform sync on profile B
4) Delete record X in profile A
5) Perform sync on A

In A, the tracker has X marked as a changed ID. And, the engine has _findDupe() implemented. During sync, Y arrives on A. Let's follow along in Engine._reconcile():

1) item.id does not exist in _modified b/c X != Y
2) store.itemExists(Y) is false because X != Y and X is deleted anyway.
3) Incoming record isn't deleted, so step 2.5 passes
4) In the _findDupe(), we finally detect that X and Y refer to the same entity. The local ID, X, is preferred over the incoming one, Y.
5) The incoming record's ID is changed from Y to X.
6) _reconcile() returns true, telling the engine to apply the record
7) The incoming record with ID X eventually makes its way to store.applyIncoming()
8) store.applyIncoming() calls store.itemExists(X), which returns true because the store knows about the deleted record.
9) store.update() is called. This makes no sense because the local record X was deleted and update() makes no sense.

The underlying problem is that dupe detection occurs at the end of reconcile and handling duplication assumes that we just want to figure out which ID to take and do nothing more. The fact that there could be actual changes to reconcile on duplication is ignored!

I believe the solution is to effectively re-process/reconcile the record with its new ID after duplication.
I /think/ the bug also occurs for non-deletes as well. When an incoming record arrives for a dupe item, the incoming record will *always* be preferred over the local state, even if it is older. Again, this is because duplication checking occurs *after* earlier reconciling steps (which would properly determine whether to take the local or incoming state from record age).
+ Data loss bug.
- Apparently old.
+ Provoked by add-on sync?
+ Might not be noticed in the wild (an unpleasant kind of data loss).

If you can get a fix and a test done by Monday, I'd like to get this into Aurora. If it's a small and stable fix, I'll ask for Beta approval.
Assignee: nobody → gps
Severity: normal → major
Status: NEW → ASSIGNED
Priority: -- → P2
Add-on sync will likely provoke this. I did uncover this bug when diagnosing a failure in an add-on sync test I wrote ;)
Blocks: 534956
This patch makes reconciling on duplicate records more robust by feeding the de-duped record back through Engine._reconcile(). In the 2nd pass, an appropriate action is taken instead of the incoming record always winning out.

New tests ensure that the local record wins out. The tests fail with the old _reconcile() code by pass with the changes. All xpcshell tests pass with the new code.
Attachment #581174 - Flags: review?(rnewman)
Forgot changes to head_helpers.js to support xpcshell tests.
Attachment #581174 - Attachment is obsolete: true
Attachment #581174 - Flags: review?(rnewman)
Attachment #581176 - Flags: review?(rnewman)
Attachment #581176 - Flags: feedback?(philipp)
Comment on attachment 581176 [details] [diff] [review]
More robust reconciling after dupe record detection, v2

Review of attachment 581176 [details] [diff] [review]:
-----------------------------------------------------------------

Review of test part.

::: services/sync/tests/unit/test_syncengine_sync.js
@@ +367,5 @@
> +
> +  let engine = new RotaryEngine();
> +
> +  let contents = {
> +    meta: {global: {engines: {rotary:  {version: engine.version,

Nit: s/rotary:  /rotary: /

@@ +378,5 @@
> +
> +  Svc.Prefs.set("clusterURL", "http://localhost:8080/");
> +  Svc.Prefs.set("username", USER);
> +
> +  let now = Date.now()/1000 - 10;

Nit: spaces around arithmetic operators, please.

@@ +392,5 @@
> +  let wbo = new ServerWBO("DUPE_INCOMING", record, now + 2);
> +  server.insertWBO(USER, "rotary", wbo);
> +
> +  engine._store.items = {};
> +  engine._tracker.addChangedID("DUPE_LOCAL", now + 3);

Add a comment: "Simulate a deleted local item."

@@ +404,5 @@
> +  // deleted after the incoming record was updated. The server should also have
> +  // deleted the incoming record. Since the local record only existed on the
> +  // client at the beginning of the sync, it shouldn't exist on the server
> +  // after.
> +  do_check_eq(0, Object.keys(engine._store.items).length);

do_check_empty(engine._store.items);

(head_helpers.js:222)

@@ +428,5 @@
> +  const USER = "foo";
> +  Svc.Prefs.set("clusterURL", "http://localhost:8080/");
> +  Svc.Prefs.set("username", USER);
> +
> +  let now = Date.now()/1000 - 10;

Same nit.

@@ +448,5 @@
> +  do_check_eq("DUPE_LOCAL", engine._findDupe({id: "DUPE_INCOMING"}));
> +
> +  engine._sync();
> +
> +  do_check_eq(1, Object.keys(engine._store.items).length);

do_check_attribute_count(1, engine._store.items);
Comment on attachment 581176 [details] [diff] [review]
More robust reconciling after dupe record detection, v2

Review of attachment 581176 [details] [diff] [review]:
-----------------------------------------------------------------

I'm satisfied by this. Let's hope philikon doesn't find any glaring holes.

::: services/sync/modules/engines.js
@@ +1102,2 @@
>        this._handleDupe(item, dupeId);
> +      this._log.info("Reconciling de-duped record: " + item.id);

debug, please.
Attachment #581176 - Flags: review?(rnewman) → review+
Comment on attachment 581176 [details] [diff] [review]
More robust reconciling after dupe record detection, v2

gps and I established that this doesn't increase the bug's surface area, but thanks the braindeadness of _handleDupe(), it only decreases it 50% of the time.

I would like to see a follow-up bug that

* makes _handleDupe() always prefer the remote/incoming GUID
* fixes the history engine to not even look for dupes (make _findDupe and _handleDupe no-ops). mozIAsyncHistory::updatePlaces() applies GUID changes transparently, so we get "prefer the remote/incoming GUID" for free there. I consider the re-reconciling YAGNI because at most you're going to miss some of your history visits, but never all of them. (This is a different quality than missing a bookmark rename or a bookmark move, and perf matters for history)
* fix the bookmarks engine's _handleDupe(). It does too much right now, especially with the re-reconciliation that this patch brings us.
Attachment #581176 - Flags: feedback?(philipp) → feedback+
Pushed to s-c: https://hg.mozilla.org/services/services-central/rev/3969042ab0b0

I will file a follow-up bug.
Whiteboard: [fixed in services]
Blocks: 710448
verified with s-c builds of 20111216
Whiteboard: [fixed in services] → [verified in services]
Pushed to m-c: https://hg.mozilla.org/mozilla-central/rev/3969042ab0b0
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla11
verified with m-c nightly builds of 20111223

for the record, I'm using steps from original comment on bookmarks; changing title on client B deleting bookmark on client A
Status: RESOLVED → VERIFIED
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.