Open Bug 1386953 Opened 7 years ago Updated 2 years ago

Suspicious bookmark validation results from iOS

Categories

(Firefox for iOS :: Sync, enhancement)

Other
iOS
enhancement

Tracking

()

Tracking Status
fxios + ---

People

(Reporter: markh, Unassigned)

References

()

Details

See the attached gist. It appears that iOS is reporting pings with the following counts:

'missingChildren': 17064,
'missingstructure': 322354,
'missingvalues': 210792,
'overlappingstructure': 65218,
'parentiddisagreement': 130720,
'serverMissing': 43382

(ie, there is at least one ping with one of those counts - not a single ping with all those counts).

These values seem insane. At the end of the gist is a selection of 5 pings where one of the values is > 10k in case that helps. Let me know if you want more info, or if you want some tweaks to that analysis.
I mentioned this yesterday in Slack:

[12:05] FYSA (@st3fan, @markh, others): the broken automatic recovery in iOS ~7.5-8 will cause excess reporting of validation failures:
   "validation":{"problems":[{"name":"missingvalues","count":2208},{"name":"missingstructure","count":6}],"took":98}}
[12:05] We dropped our DB but didn’t start downloading again from 0.
[12:07] Great test for repair in the future, though.

Someone running a pre-release build -- they're on an 8.0.1 beta -- with 300,000 bookmarks is unlikely, but not impossible… and if they hit data loss, that's the kind of ping they'd send.
       u'validation': {u'problems': [{u'count': 113058,
          u'name': u'missingvalues'}],
       u'validation': {u'problems': [{u'count': 33548,
          u'name': u'missingvalues'}],

seem to be dataloss-caused.

       u'validation': {u'problems': [{u'count': 52952,
          u'name': u'missingstructure'}],

is probably a folder with 52,000 children that the server keeps refusing to store.


       u'validation': {u'problems': [{u'count': 18908,
          u'name': u'overlappingstructure'},
         {u'count': 37816, u'name': u'parentiddisagreement'},
         {u'count': 5896, u'name': u'missingstructure'}],
       u'validation': {u'problems': [{u'count': 12498,
          u'name': u'overlappingstructure'},
         {u'count': 25368, u'name': u'parentiddisagreement'},
         {u'count': 9896, u'name': u'missingstructure'}],

are at significant scale, so that could be just about any kind of missed change -- perhaps they moved 20,000 bookmarks into a "junk" folder, which was then too big to upload -- 25,000 parent ID disagreements (because those 20K records got uploaded), plus overlapping structure from the old parents?
See Also: → 1387351
Question from triage: Is anything actionable here?
Flags: needinfo?(rnewman)
(In reply to Justin D'Arcangelo [:justindarc] from comment #3)
> Question from triage: Is anything actionable here?

Two things, I think:

- We need to keep moving ahead with configuration changes to unblock clients with large folders. Some of these validation results are due to some other client being unable to upload a folder. See Bug 1300451.

- Users who suffered data loss prior to Bug 1377646 landing -- somewhere between 7.0 and 8.x -- will simply be missing history and bookmarks, and it won't be redownloaded unless the user is node-reassigned. On the iOS side we should consider detecting this and recovering.

We could morph this bug into the latter: if there's a significant mismatch between the number of records in the buffer and the number of records on the server for bookmarks (if merging is disabled) or history, then fetch 0 < modified < earliest_local.
Flags: needinfo?(rnewman)
See Also: → 1300451
I've updated the gist now that bug 1387351 tells us how many bookmarks were actually checked - and things are definitely suspicious. I changed the script to:

* Look at 10% of pings, and only consider those where there are > 50k bookmarks checked *and* some "problems" reported.
* Reduced these to the max number of checked items and problems an individual device reported in a single validation.
* Looked at the top 10 number of items checked, and the top 10 number of problems reported.

In summary:

* 144 devices report > 50k bookmarks and had problems.
* The largest number of bookmarks checked was 401539 and it took 21s to validate, but it has relatively few problems (parentiddisagreement=920)
* the 10th largest number of bookmarks checked was 146654 and took "only" 3s to validate.
* The largest number of problems reported is 'overlappingstructure': 65218, 'parentiddisagreement': 130720. Only 65122 were reported as being checked but if we assume parentiddisagreement typically involves 2 items, this is very close to "every single bookmark has this problem". This validation took 437s.
* The 10th largest number of problems reported is 'missingstructure': 68920, with 129072 being checked and the validation taking 53s

So my take-aways:

* I'm highly skeptical there are 144 devices which legitimately have > 50k bookmarks - desktop almost certainly can't handle that.
* However, most of the the counts are internally consistent - I see a number where parentiddisagreement is > than the number checked, but never more than double. I don't see any other problems that report counts greater than the number checked.
* The time taken for the validation typically implies a huge amount of work being done - most validations are very quick, but these top ones are minutes (but it's difficult to put too much faith in this as, eg, I guess the device may have slept during the check. For example, one of the top 10 checked 149473 in 0.5s. However, the "took" numbers are significantly larger in general than for most pings.)

IOW, I'd guess that some devices have far more records than are actual bookmarks, but I've no further insights.
(In reply to Mark Hammond [:markh] from comment #5)

> * I'm highly skeptical there are 144 devices which legitimately have > 50k
> bookmarks - desktop almost certainly can't handle that.

I've personally interacted with desktop/Android users who have 15K-25K, and probably users with more.

This might be several users with several reinstalls/retries each, too. A couple of devices, and a few attempts, takes 144 down to maybe 50. I could see that being a valid number: I expect users with tons of Firefox bookmarks are more likely to try Firefox for iOS rather than use Safari.

Additionally, there are lots of potential explanations for users with excessive quantities of bookmarks: e.g., doing restores from bookmark backups prior to keeping GUIDs, which could cause doubling or trebling of server contents.

> * The time taken for the validation typically implies a huge amount of work
> being done

Some of these queries might end up being greater than linear, but with very small constant times. I don't think we've tested the validation queries on half a million bookmarks. Usually they should only run a few times before a merge succeeds!

Additionally, we might be timing the amount of time it took for the DB runnable to complete, which also includes waiting for other database accesses, including WAL checkpoints, to run. It's not impossible that a DB with tens of thousands of bookmarks is pretty big and pretty busy.
(In reply to Richard Newman [:rnewman] from comment #6)
> (In reply to Mark Hammond [:markh] from comment #5)
> 
> > * I'm highly skeptical there are 144 devices which legitimately have > 50k
> > bookmarks - desktop almost certainly can't handle that.
> 
> I've personally interacted with desktop/Android users who have 15K-25K,

Yes, but 25k is getting close to an upper limit of what I've ever seen on desktop, and things start to break fairly soon after that - with this volume of bookmarks, they are highly unlikely to have been carefully organized, and via bug 1321021, we know that > 16k bookmarks in a single folder isn't possible (well - wasn't - we bumped the BSO size just last week)

I can't recall ever helping a desktop user with > 50k, and certainly never > 400k.

But to be clear - you are asserting that these counts probably accurately reflect the number of bookmarks the user would report as existing on the device, and thus isn't actionable?
For additional info, I made another worksheet which looked at 1% of pings, and for each unique device ID, recorded the maximum number of items it reported as "checked", and created a histogram from the results.

 	bookmarks
count 	3849.000000
mean 	2148.715511
std 	8280.207107
min 	1.000000
25% 	209.000000
50% 	501.000000
75% 	1261.000000
max 	148398.000000

Given the min is 1, I assume that the iOS validator reports nothing when there are zero bookmarks (which obviously makes sense), so this isn't directly comparable to the telemetry captured by desktop - but IMO these figures are also suspicious - I'm not convinced that the 25th percentile of all iOS users with a bookmark have over 200, nor that the 75th percentile has over 1200.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.