Closed Bug 1235269 Opened 4 years ago Closed 4 years ago

Changes made during a Sync may be ignored causing items to be missing or deleted items to be resurrected

Categories

(Firefox :: Sync, defect, P1)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1258127

People

(Reporter: rnewman, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: dataloss, Whiteboard: [data-integrity])

Attachments

(1 file)

In August I created two bookmarks, numbers 49 and 50 in my 'Recipes' folder.

These bookmarks were created on desktop, seven seconds apart, clearing open tabs:

  Mon Aug 10 12:52:02 PDT 2015
  Mon Aug 10 12:52:09 PDT 2015

Firefox for iOS's consistency checker just told me those weren't present in my iOS server bookmarks buffer.

And sure enough, they're not on the server.

Desktop places.sqlite:

---
$ sqlite3 places.sqlite "select * from moz_bookmarks where guid in ('FtN9dOhARjHD', 'r1jDGaKZ_003');"
id          type        fk          parent      position    title                                          keyword_id  folder_type  dateAdded         lastModified      guid
----------  ----------  ----------  ----------  ----------  ---------------------------------------------  ----------  -----------  ----------------  ----------------  ------------
6545        1           365326      4760        49          Red Braised Pork Belly – China Sichuan Food                           1439236322479000  1439236325532000  FtN9dOhARjHD
6546        1           365324      4760        50          Hong Shao Rou Recipe—(Red Braised Pork Bell                           1439236329874000  1439236332351000  r1jDGaKZ_003
---

There's no mention of these two in $profiledir/weave/ -- not in failed, not in logs, nothing.

They're not on the server if I try to fetch them:

	Sync.RecordManager	TRACE	Importing record: https://sync-221-us-west-2.sync.services.mozilla.com/1.5/23528203/storage/bookmarks/FtN9dOhARjHD?full=1
	Sync.Resource	DEBUG	mesg: GET fail 404 https://sync-221-us-west-2.sync.services.mozilla.com/1.5/23528203/storage/bookmarks/FtN9dOhARjHD?full=1
	Sync.Resource	DEBUG	GET fail 404 https://sync-221-us-west-2.sync.services.mozilla.com/1.5/23528203/storage/bookmarks/FtN9dOhARjHD?full=1

but their parent record on the server claims to have them as children!

{"id":"zU14bWY9AY2q","type":"folder","parentName":"Food","title":"Recipes","description":null,"children":["gPtraaq3VWS5","7LnanjWSE2dR","-qt32FFduyTJ","Vcu0b7Yhyr4n","1UIsVacGOpN_","M8-C6xNEPi01","lEEm3OoVkJck","beGBJU_GID35","1UOFRd58a7EM","-mhx_iQ8mvSu","cUpFZYps3PgB","3loYBiI1WMQA","o-Oyai_TDzkI","gGRhexfu1H6Y","sLOW0Rd12Kfz","RLn7qrosrL29","nKcHznZnX6-x","Dt4Jcn2F1cJn","WebaD9RvROZJ","KSzu2knU5LMT","LQnsH-pBfuAW","eeaW4iEjXaBJ","O934Gm6wjHIz","92m9yf5C_Aet","TIBhr9etGCZm","d5d0GJyKc7gA","I2Ls3ozDhLt4","37pp7btj9nuz","iNkz8Op0q-g9","bp4y7IEMguuy","YcZz37xZzZvy","3vxmJS25Aq0Q","UjF0VSTfajsk","9QPx1xnJGBQi","NZlA1kRy4RKk","BDT-_LLi4Ssy","lqutaAqQcOfx","RQgvat9frau-","OD8_0Hl74Bgr","unsk_8WEhPiw","E94LZbmeVlBh","HJminkGHuc1M","f7NG7DDqqOjX","aFMOyHIfvVIn","IgUlW4QKuYK6","_PbdCkXyuEZa","KFHHY3ht0rba","ZrSoa9iPr81O","97MIzlWVx6cO","FtN9dOhARjHD","r1jDGaKZ_003","4g43-Q8BE4VV","kWQOZGTngK8Y"],"parentid":"5lQz_glXhEtn"}



These logs imply one of two things:

* The server accepted a POST containing the changed parent folder, but dropped the two new records on the floor.
* The client never uploaded the two new records, either because they were skipped, because we thought they were uploaded successfully, or because they were never entered into the tracker.

I'll bet on the latter. Another record, 97MIzlWVx6cO, was created 4 seconds earlier:

  Mon Aug 10 12:51:58 PDT 2015

so my hypothesis is this:

* The user saves a bookmark.
* We begin a sync. This sets ignoreAll so we don't trigger ourselves.
* The user bookmarks another link. We ignore it. They bookmark another, and we ignore that, too.
* All three bookmarks are in the same folder. We eventually come to upload that, and -- because we go to the DB to compute the record -- all three bookmarks appear in the uploaded parent record!

This bug worries me, because my sample size has reached *1* and I've already found bookmark corruption that would cause iOS to back off for fear of doing more damage.


All you need to do to hit this bug is bookmark two pages faster than Sync can complete -- four seconds seems to be enough.

One more check mark next to "blind trust in observer notifications is bad". This explains another slice of bookmark quality issues over the past few years.


Fixes or mitigations for this bug might be:

* Narrow the ignoreAll window.

* Don't use observer notifications for change tracking. See <http://160.twinql.com/syncing-and-storage-on-three-platforms/>.

* If we're uploading a folder, make sure -- through any means; fetching IDs from the server, checking timestamps against when our sync began, annotating bookmarks that we've uploaded them ever -- that we've uploaded all of its children. Eventual consistency is better than none.

* …
Note that fixes that are scoped to an individual folder within a sync aren't sufficient. The same problem exists at longer timescales:

* Hit the bookmark star to add an Unsorted Bookmark. A sync begins.
* Drag a URL to the toolbar. A bookmark is created, but neither the new bookmark nor the changed toolbar is marked as changed.
* Wait for the sync to finish.
* Weeks later, drag another bookmark to the toolbar. The uploaded toolbar record now includes two children, only one of which is new and only one of which will be uploaded to the server.


When another client downloads these incomplete sets, it'll do one of three things:


* Quietly screw up until it makes a local change, then screw up other clients. (Desktop)

* Actively fix up the damage, then push the changes. (Android)
 -- This will screw up the original desktop, which will move its local-never-synced bookmarks to Unsorted Bookmarks as it realizes the magnitude of its error.

* Refuse to sync (either the whole tree or the smallest inconsistent subtree) until the server gets its shit in order. (iOS)
 -- I might end up having the iOS client send reset commands to try to force other clients to fix things, but that doesn't make me happy.

We've been seeing these symptoms as a quiet hum of complaints for years, so it's nice to have another explanation beyond Bug 814801.
Here's some chrome JS that'll see if your account is missing records.

//////////
function removeAll(xs, ys) {
  ys.forEach((y) => xs.delete(y));
  return xs;
}

let bme = Weave.Service.engineManager.get("bookmarks");
let bms = bme._store;
let local = Object.keys(bms.getAllIDs());
let remote = JSON.parse(Weave.Service.resource(Weave.Service.storageURL + "bookmarks").get());

let missing = removeAll(new Set(local), remote);

console.log(missing);

// Set [ "FtN9dOhARjHD", "r1jDGaKZ_003" ]
//////////


A reasonable fix might be to run this and add the results to the tracker:

//////////
missing.forEach((id) => bme._tracker.addChangedID(id, Date.now()));
//////////
(In reply to Richard Newman [:rnewman] from comment #2)
> Here's some chrome JS that'll see if your account is missing records.

This reports 58 records for my real profile :(
(In reply to Mark Hammond [:markh] from comment #3)
> (In reply to Richard Newman [:rnewman] from comment #2)
> > Here's some chrome JS that'll see if your account is missing records.
> 
> This reports 58 records for my real profile :(

I have 605 bookmarks and none reported missing.  Victory!
Flags: firefox-backlog+
Priority: -- → P2
FTR, the same issue obviously exists with deletions, resulting in the server still having records that don't exist locally (because we were syncing when they were deleted). When the next client for that account does a full sync (eg, a new device) those "deleted" bookmarks spring back into life. My test profiles have thousands of such records.
I'm going to steal this as a meta bug for the general problem discussed above.
Blocks: 1214117
Summary: Desktop fails to upload some new bookmarks, leaving the server inconsistent → Changes made during a Sync may be ignored causing items to be missing or deleted items to be resurrected
I've also seen the opposite, just now: a record that's present on the server, not deleted, yet not present in its parent's child list. The consistency checker catches this, too.
To be clear, isn't this the same issue as comment 5? ie, a bookmark manually deleted during a sync causes it to be removed locally but left on the server, and thus resurrected next time a client does a full sync?
(In reply to Mark Hammond [:markh] from comment #8)
> To be clear, isn't this the same issue as comment 5? ie, a bookmark manually
> deleted during a sync causes it to be removed locally but left on the
> server, and thus resurrected next time a client does a full sync?

I think in your scenario the server is still consistent: the deletion is missed, and neither the deleted record nor the changed parent is uploaded, right?

In this situation a record is present on the server but its parent on the server doesn't mention it. That means we either:

* Uploaded the new record, but didn't upload its changed parent.
* Raced with another client uploading that parent, losing that write.
* Deleted the record locally, uploaded the parent but not the deleted record, and then redownloaded the record and didn't update the server parent.

In theory, the last scenario should mark the parent as changed when we find a parent folder for the new record, so the server parent should end up consistent. Dunno if it works that way in practice, but that's the theory :)
Priority: P2 → P1
See Also: → 1250866
See Also: 1250866
Mark: would it be worth splitting this bug into "don't fail to track changes" and "recover from having missed changes"?

The latter might be a once-per-week hook before/after a sync or something like that -- fairly tractable -- and I'd love for clients to be able to recover from this automatically.

The hook might be as simple as fetching info/collection_counts, fetching and processing the ID list as in Comment 2 if the counts don't match.
Flags: needinfo?(markh)
We now have preliminary telemetry data from iOS users. This is a very small sample size: about 63 users.

Above 0 bookmarks, your chance of your server data being correct is 68%.
Above 222 bookmarks, your chance is 50-50.
Above 1000 bookmarks, your chance is just 37%.



20 users have bookmark data that isn't correct. Only one user recovered on a subsequent attempt.

Of these 20 users, there were:

9 users with parentid disagreements -- where a record's parentid did not match up to the children array in a folder.
12 users with missing records -- where a record is mentioned as a child of a folder but not present on the server.
16 users with missing structure -- where a folder isn't present but a record claims that's where it lives.
8 users with overlapping structure -- where more than one folder claims to own a child.


Probably many of these are caused by the same root issue: failing to track a change, and an ongoing sync seeing partial state.


In short: our frequent bookmark users have a very high chance of failure, and we need to do significant work to make sure that desktop clients behave more correctly.
OS: Mac OS X → All
How would I have gotten into a state of my remote having 342 records and my local having 207 records?
Orphaned records could be one. Richard do you have a JS script which does the analysis in comment 11?
(In reply to Chris Karlof [:ckarlof] from comment #13)
> How would I have gotten into a state of my remote having 342 records and my
> local having 207 records?

Deleted records?
> Deleted records?

Yeah. I should write a script to figure this out. :)
(In reply to Nick Alexander :nalexander from comment #15)
> (In reply to Chris Karlof [:ckarlof] from comment #13)
> > How would I have gotten into a state of my remote having 342 records and my
> > local having 207 records?
> 
> Deleted records?

Most likely exactly that: every bookmark you delete locally remains as a deleted record on the server, forever and ever, until you get assigned to a new server or change your password.

Something like this should print "135" for you:

---
Components.utils.import("resource://services-sync/main.js");
Components.utils.import("resource://services-sync/record.js");

let collection = "bookmarks";

let bme = Weave.Service.engineManager.get(collection);
let recordType = bme._recordObj;
let coll = new Collection(Weave.Service.storageURL + collection, recordType, Weave.Service);

var deleted = [];
let collKey = Weave.Service.collectionKeys.keyForCollection(collection);

coll.full = true;
coll.limit = 500;
coll.recordHandler = function(item) {
  item.collection = collection;
  item.decrypt(collKey);
  if (item.cleartext.deleted) {
    deleted.push(item.id);
  }
};
coll.get();

console.log("Deleted: " + deleted.length);

---
(In reply to Chris Karlof [:ckarlof] from comment #14)
> Richard do you have a JS script which does
> the analysis in comment 11?

Sadly not; this relies on the structure extracted by the iOS bookmark mirrorer. The validation is expressed in SQL here:

https://github.com/mozilla/firefox-ios/blob/master/Storage/SQL/SQLiteBookmarksSyncing.swift#L677

It would be easy enough to write a bit of desktop JS to download everything and do similar analysis in memory.
Blocks: 1257961
See Also: → 1258127
Depends on: 1269902
Blocks: 1269902
No longer depends on: 1269902
No longer blocks: 1257961
Whiteboard: [sync-data-integrity]
Depends on: 1258127
this is essentially a dupe of bug 1258127 , just setting it as blocks for now.
Whiteboard: [sync-data-integrity] → [data-integrity]
Probably should check the other engines for similar problems, too.
(In reply to Richard Newman [:rnewman] from comment #20)
> Probably should check the other engines for similar problems, too.

Yep, we will probably only get to bookmarks this quarter.
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(markh)
Resolution: --- → DUPLICATE
Duplicate of bug: 1258127
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.