Closed
Bug 1378569
Opened 7 years ago
Closed 7 years ago
Increase max records per batch so ~99% of sync users are able to post all their bookmarks in a single batch.
Categories
(Cloud Services Graveyard :: Server: Sync, enhancement)
Cloud Services Graveyard
Server: Sync
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: markh, Assigned: rfkelly)
References
Details
(Whiteboard: [Sync Q3 OKR])
https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=1&end_date=2017-07-04&keys=__none__!__none__!__none__&max_channel_version=nightly%252F56&measure=PLACES_BOOKMARKS_COUNT&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-06-12&table=0&trim=1&use_submission_date=0 shows that 5k is over 98%, but note that histogram is opt-in and not limited to Sync users - bobm may be able to get actual figures for sync users from the servers. Note that it may, or may not, make sense to try and do this as part of bug 1378567 (pro: single deployment, con: additional risk)
Assignee | ||
Comment 1•7 years ago
|
||
Mark, what do you think about adding a telemetry flag to specifically indicate "this sync failed to fit everything into a single batch"? I'm happy to have a go at landing some (assumedly fairly straightforward) code for this if you can point me in the right direction. (Something a bit different, it'll be fun, right?)
Flags: needinfo?(markh)
Assignee | ||
Comment 2•7 years ago
|
||
For reference, the current default size of a batch is 10,000 records and 100 MB. We depend on the client sending these numbers in the request headers though, we don't check that the sent value of e.g. X-Weave-Total-Records actually reflects the number of records in the batch.
> adding a telemetry flag to specifically indicate "this sync failed to fit everything into a single batch"
To be clear, the idea here would be to take the number of sync attempts with this flag, as a percentage of the total number of sync attempts, to produce the actual rate at which syncs fail to fit everything into a single batch.
Assignee | ||
Comment 3•7 years ago
|
||
> We depend on the client sending these numbers in the request headers
In IRC, :tcsc says that clients probably do not send these headers, but they do limit the size of their batches based on the advertised values from /info/collections
Anyways, the default limits appear to be safely larger than the numbers in that telemetry dashboard, so I think the next step here needs to be getting more nuanced telemetry to see whether further changes are needed.
Assignee: nobody → rfkelly
Reporter | ||
Comment 4•7 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #1) > Mark, what do you think about adding a telemetry flag to specifically > indicate "this sync failed to fit everything into a single batch"? The issue as I see it is that at least once (plus once each time they are node reassigned), a user is going to try and fit *all* their records into a single batch. So ISTM that looking at the 9xth percentile of WBO counts for users (probably excluding history) is what we should consider using. IOW, I think we'd see significantly different counts now compared to March when we did the mass reassignment. However, if you want to land something in the client, I guess we could just record the number of batches taken using code similar to what landed in bug 1353645 (although in this case it would be in record.js) - alternatively, we could just record the total number of records and maybe total number of bytes and deduce how many batches that must have taken. (In reply to Ryan Kelly [:rfkelly] from comment #3) > Anyways, the default limits appear to be safely larger than the numbers in > that telemetry dashboard, so I think the next step here needs to be getting > more nuanced telemetry to see whether further changes are needed. As above though, that is likely to be mainly recording "incremental" syncs, which we do expect to rarely hit those limits. It's the first syncs I'm primarily concerned about.
Flags: needinfo?(markh)
Assignee | ||
Comment 5•7 years ago
|
||
Are incremental vs full syncs distinguished in any way in the client telemetry?
Reporter | ||
Comment 6•7 years ago
|
||
Not as far as I'm aware. It wouldn't be too difficult to differentiate in new probes though. However, isn't it true that for bookmarks at least, we have a very high confidence that if a user was node-reassigned tomorrow, they'd re-upload exactly how many bookmarks they currently have on the server. (hrm - excluding deletions - although we do plan on storing deletions locally, so in the future we'd expect deletions to also be counted)
Assignee | ||
Comment 7•7 years ago
|
||
> However, isn't it true that for bookmarks at least, we have a very high confidence that if a
> user was node-reassigned tomorrow, they'd re-upload exactly how many bookmarks they currently have on the server.
Yes, I expect this is fairly accurate. Counting server-side sounds like it's probably a good place to start, although I'm still moderately interested in some client-side telemetry to see if we can monitor this over time.
Comment 8•7 years ago
|
||
Related to Bug 1250747?
Reporter | ||
Comment 9•7 years ago
|
||
(In reply to Richard Newman [:rnewman] from comment #8) > Related to Bug 1250747? Related, but I'm struggling to work out if it is a dupe - with one eye open, that other bug appears to be about how many records per POST (regardless of max per batch), but with the other eye open it does appear to be max per batch. So let's just bump both :)
Assignee | ||
Updated•7 years ago
|
Updated•7 years ago
|
Whiteboard: [Sync Q3 OKR]
Assignee | ||
Comment 10•7 years ago
|
||
I think we've done this analysis in the past, but let's take a fresh look at the data. Basically we want to know the distribution of the number of items in each collection for each user. :bobm, can we safely grab this from the db on a representative sample of storage nodes? I'm thinking just a big dump of counts produced by something like: SELECT collection, COUNT(id) FROM bsoN GROUP BY userid, collection; Then we can do some post-processing to extract statistics from it.
Flags: needinfo?(bobm)
Comment 11•7 years ago
|
||
I'll collect a fresh dataset for these and make it available as before (https://mana.mozilla.org/wiki/display/SVCOPS/Sync+Dataset)
Flags: needinfo?(bobm)
Assignee | ||
Comment 12•7 years ago
|
||
Just to confirm, this dataset looks like it can tell me exactly what I want, thanks :bobm! Please let me know when the fresh data is available.
Comment 13•7 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #12) > Just to confirm, this dataset looks like it can tell me exactly what I want, > thanks :bobm! Please let me know when the fresh data is available. I've sent the dataset out of band as a CSV. I can have it added to the data portal if you would prefer. Let me know if this works.
Flags: needinfo?(rfkelly)
Assignee | ||
Comment 14•7 years ago
|
||
Thanks :bobm! It looks like many of the rows are missing some columns in the csv, is this because that user doesn't have any data for that particular collection? Do you have a link to the script that generates this report, so I can check my understanding of the data?
Flags: needinfo?(rfkelly) → needinfo?(bobm)
Comment 15•7 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #14) > Thanks :bobm! It looks like many of the rows are missing some columns in > the csv, is this because that user doesn't have any data for that particular > collection? That's correct. Missing columns indicate missing collections for that user. > Do you have a link to the script that generates this report, so > I can check my understanding of the data? The perl scripts I used to generate the csv are linked in this section: https://mana.mozilla.org/wiki/display/SVCOPS/Sync+Dataset#SyncDataset-CollectingtheData
Flags: needinfo?(bobm)
Assignee | ||
Comment 16•7 years ago
|
||
I crunched the provided data dump to determine the item count and maximum total size stats for each collection, results as follows: ITEM COUNTS (min - 1% - 10% - 50% - 90% - 99% - max): Passwords: 1 - 1 - 2 - 12 - 92 - 346 - 3924 Addons: 1 - 1 - 1 - 2 - 8 - 25 - 199 Clients: 1 - 1 - 1 - 1 - 1 - 3 - 41 Crypto: 1 - 1 - 1 - 1 - 1 - 1 - 1 Forms: 1 - 1 - 4 - 90 - 974 - 4439 - 293450 Bookmarks: 1 - 7 - 13 - 36 - 376 - 3327 - 225197 Prefs: 1 - 1 - 1 - 1 - 1 - 1 - 1 History: 1 - 1 - 11 - 234 - 2945 - 11088 - 127424 TOTAL SIZES (min - 1% - 10% - 50% - 90% - 99% - max): Passwords: 167 - 472 - 1014 - 6552 - 51080 - 193720 - 2065492 Addons: 251 - 295 - 315 - 654 - 2812 - 8331 - 66489 Clients: 275 - 423 - 467 - 487 - 543 - 1513 - 184339 Crypto: 339 - 339 - 339 - 339 - 339 - 363 - 379 Forms: 187 - 231 - 968 - 21275 - 234287 - 1067475 - 66824818 Bookmarks: 278 - 2478 - 5266 - 16705 - 199392 - 1730069 - 118639371 Prefs: 8019 - 8379 - 8955 - 9659 - 10963 - 19387 - 166439 History: 187 - 487 - 5761 - 121936 - 1495029 - 5451000 - 62205656 The 99th percentile for bookmarks on this box is 3327 items, but the max is 225197. Bob, if it's straightforward, can I please get a similar dump from two different hosts for comparison?
Flags: needinfo?(bobm)
Assignee | ||
Comment 18•7 years ago
|
||
For the record, the current limits being reported in production are: { u'max_post_records': 100, u'max_post_bytes': 1048576, u'max_total_bytes': 104857600, u'max_total_records': 10000, u'max_record_payload_bytes': 262144} } The max_total_bytes and max_total_records values seem large enough to comfortably fit the 99th percentile of bookmarks sizes here. ni? myself to repeat analysis using latest figures from :bobm
Flags: needinfo?(rfkelly)
Assignee | ||
Comment 19•7 years ago
|
||
Results from db dumps on 3 separate webheads: ITEM COUNTS (min - 1% - 10% - 50% - 90% - 99% - max): Passwords: 1 - 1 - 2 - 20 - 153 - 521 - 21155 Addons: 1 - 1 - 1 - 3 - 12 - 32 - 1161 Clients: 1 - 1 - 1 - 1 - 2 - 5 - 229 Crypto: 1 - 1 - 1 - 1 - 1 - 1 - 1 Forms: 1 - 1 - 8 - 283 - 2549 - 9543 - 352498 Bookmarks: 1 - 7 - 13 - 57 - 650 - 4890 - 393092 Prefs: 1 - 1 - 1 - 1 - 1 - 1 - 1 History: 1 - 1 - 23 - 888 - 6758 - 19959 - 299626 TOTAL SIZES (min - 1% - 10% - 50% - 90% - 99% - max): Passwords: 167 - 473 - 1122 - 11231 - 84880 - 288556 - 5976736 Addons: 251 - 295 - 315 - 929 - 3857 - 10689 - 345035 Clients: 251 - 423 - 467 - 487 - 1007 - 2435 - 230331 Crypto: 339 - 339 - 339 - 339 - 341 - 364 - 379 Forms: 167 - 231 - 1888 - 67116 - 615364 - 2314938 - 95034014 Bookmarks: 278 - 2429 - 5394 - 27980 - 347236 - 2598413 - 220248500 Prefs: 7655 - 8403 - 9639 - 9683 - 12051 - 23015 - 261267 History: 187 - 774 - 12643 - 459591 - 3398008 - 9619860 - 341436609 This moves the extremes out a little, but still seems comfortably within the current upload limits. I'm leaning towards declaring the current limits ok for >99% of users and closing this bug - :markh thoughts?
Flags: needinfo?(rfkelly) → needinfo?(markh)
Reporter | ||
Comment 20•7 years ago
|
||
STGM, thanks.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(markh)
Resolution: --- → WORKSFORME
Updated•1 year ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•