Open Bug 1423418 Opened 6 years ago Updated 1 year ago

Bookmark errors in telemetry: "Duplicate {entry,item or tombstone} for $id in changeset"

Categories

(Firefox :: Sync, defect, P3)

defect

Tracking

()

People

(Reporter: tcsc, Unassigned)

References

Details

Noticed while investigating bug 1422672. Note that the error message here changed from `Duplicate entry` to `Duplicate item or tombstone` recently.

The ID it lists will be an arbitrary guid, but it will also often be something like `moz_places`, `toolbar`, `unfiled`.

This error seems like one of the more common ones, and is the most common unexpectederror by a wide margin in my sampled dataset.

The code in question insists that such things are impossible because of database reasons. https://searchfox.org/mozilla-central/rev/ba2b0cf4d16711d37d4bf4d267b187c9a27f6638/toolkit/components/places/PlacesSyncUtils.jsm#2005-2007
This is so bizarre. We've had issues where a GUID in `moz_bookmarks` might also appear in `moz_bookmarks_deleted`, but this seems like it's a duplicate *in the same table*.

Trying to insert a duplicate GUID into `moz_bookmarks_deleted` fails with "UNIQUE constraint failed: moz_bookmarks_deleted.guid", and, likewise, `moz_bookmarks` returns "UNIQUE constraint failed: moz_bookmarks.guid".

Maybe those GUIDs were already there? Is it possible, in some of those cases, that those are old accounts where we didn't migrate GUIDs correctly (e.g., the GUID is "toolbar", so it's different from "toolbar_____", and `guidToRecordId` leaves it alone)? It seems unlikely, because I'd have thought our `VALID_GUID` maintenance task would have caught these and given them new ones by now.

I'm wondering if our old tracker migration script might have inserted invalid GUIDs, but we already check for that: https://searchfox.org/mozilla-central/rev/ba2b0cf4d16711d37d4bf4d267b187c9a27f6638/toolkit/components/places/PlacesSyncUtils.jsm#422-426 Maybe a manual bookmark restore, or an automatic one on startup to replace a corrupt database? Though, that should never delete roots.

I'm loathe to suspect the database, so maybe it's a bug in `addRowToChangeRecords` or our query. We can paper over this using a `UNION` instead of a `UNION ALL`, but those duplicates, if that's what they are, shouldn't exist in the first place!
Flags: needinfo?(markh)
I looked at 100% of Nightly pings from the last 100 days, and the count of GUIDs reported in this error are:

{u'Girc4n0tHKfG': 265,
 u'mobile': 248,
 u'toolbar': 2981,
 u'unfiled': 1993}

Girc4n0tHKfG is odd - the fact that every other one is one of our roots is very interesting.

Also interesting is that there are only 8 devices causing these errors (although we must note that these numbers are likely to increase dramatically as we hit release).

Counting by a (guid, deviceID) tuple, we get:

(u'Girc4n0tHKfG', u'c22c4fee0ff44d1631307dfd515936ccbabc42bb17ae5db8ee2687d11e31bbd6'): 265,
(u'mobile', u'32e9077f68efcddc3ae584fc343c3754914c00f1b02e73c6e539ecc16bc8a6ae'): 60,
(u'mobile', u'985e28fe0959faf29649bc6a388d1984ee3e1b3096646259f76a40d0a26bd7b9'): 146,
(u'mobile', u'f8c7153450067bd375987a423870b892e3512675b237a229910aa81f6a1191b9'): 42,
(u'toolbar', u'40674673f5036479518e5e3e6df7c21cff10a739334e3599ba7b3a1ff801436e'): 99,
(u'toolbar', u'c22c4fee0ff44d1631307dfd515936ccbabc42bb17ae5db8ee2687d11e31bbd6'): 2882,
(u'unfiled', u'0a8b89d6f896e368f6aea7f422f9093a08ceadaecedaffa512ec2ee5aae1b090'): 314,
(u'unfiled', u'3283c913dce0ad809c09ea1ac19986b1a0ac2640ff99ada609066533fd8ec84f'): 1594,
(u'unfiled', u'dbfdb7efc4e4b9467bebdbd2d3b94997e31c60360d39b268cf2fb29bf29ef91f'): 85})

So the device with 'Girc4n0tHKfG' is the most problematic (it has that unique guid plus 2882 errors for "toolbar").

It certainly does sound like, as Kit mentioned, that both "toolbar" and "toolbar____" somehow exist in moz_bookmarks, but given the high counts for a few of those devices, it seems like VALID_GUID task probably *isn't* fixing those 8 clients - and that also doesn't account for Girc4n0tHKfG.

I wonder if it is worth changing the message depending on tombstone or not, just to help nail down exactly what table is impacted? Alternatively, we could consider just logging a warning and ignoring the change? Or even just ignoring it for now?
Flags: needinfo?(markh)
Given the small number of devices involved and that they are on Nightly, it seems likely we'll have validation info in some of their pings, which might be very interesting to see.
Flags: needinfo?(markh)
See Also: → 1428519
We have no validation info from those devices :( Some of those devices, in the last 100 days, have no successful bookmark syncs (and hence will not validate). 2 devices that do have successful syncs also have no validation, presumably due to the number of bookmarks. Interestingly, one device appears to have magically fixed itself, then got stuck again, but for every one of those devices, the last sync has that failure.
Flags: needinfo?(markh)
Flags: needinfo?(markh)
Priority: -- → P3
TBH I don't think we should paper over this - there seems a risk we will do more harm than good. I think we should instead wait and see if some kind user can provide us logs with it happening.
Flags: needinfo?(markh)
Bug 478035 might help for the case where the roots are corrupted.
https://support.mozilla.org/questions/1219438 may contain useful info for a use case.
Bug 1453994 would almost certainly cause this. The `syncedItems` CTE (https://searchfox.org/mozilla-central/rev/d2966246905102b36ef5221b0e3cbccf7ea15a86/toolkit/components/places/PlacesSyncUtils.jsm#2145-2155) selects the four roots in the base case, then recurses to select their descendants. But, if `unfiled` is a child of `menu`, we'll see it, and its children, multiple times.
See Also: → 1453994
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.