Closed Bug 1250747 Opened 9 years ago Closed 2 years ago

Raise per-post item limit from 100 to something more reasonable

Categories

(Cloud Services Graveyard :: Server: Sync, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: rnewman, Assigned: autrilla)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [sync-data-integrity])

Attachments

(1 file)

We introduced this in Bug 692372. I don't think we need it: the backend splits up requests and runs them in chunks, and we limit overall size. I'd like to put as many BSOs in a single request as possible -- for bookmarks that's over 1500. That helps us avoid partial writes (Bug 1250189).
Flags: needinfo?(bobm)
I propose a sanity limit more like 4000. At 1MB for the whole thing that allows 250 bytes per record.
Blocks: 1196238
OS: Unspecified → All
Hardware: Unspecified → All
Do you still want/need this even if Bug 1250189 goes ahead? > the backend splits up requests and runs them in chunks No it doesn't, and even if we did, we'd have to write out all chunks in a single db transaction in order to preserve X-I-U-S and related timestamp semantics. It's probably safe to go slightly bigger than 100 though.
(In reply to Ryan Kelly [:rfkelly] from comment #2) > It's probably safe to go slightly bigger than 100 though. This sounds good to me pending some larger batch size specific load tests. Presumably, we'd just need to tweak this: https://github.com/mozilla-services/server-syncstorage/blob/master/loadtest/stress.py#L183 Moving it out to the config might make it easier for QA.
Flags: needinfo?(bobm)
(In reply to Bob Micheletto [:bobm] from comment #3) > This sounds good to me pending some larger batch size specific load tests. Note in particular that clients have a fixed number of records to upload -- if the batch limit is 100, they send fifteen requests; if the batch size is >=1500, they send one request. Have to make sure that load tests take that into account. (In reply to Ryan Kelly [:rfkelly] from comment #2) > > the backend splits up requests and runs them in chunks > > No it doesn't… My bad, I was going off the comments in the bug where the limit was introduced: "So, if you submit 500 valid items, we batch them up into 5 batches of 100 and do those queries. " "note also that you can configure the batch_size and batch_max_count independently. So e.g. we could process a maximum of 500 objects but save them in batches of 100" Perhaps I misunderstood! > … we'd have to write out all chunks in a > single db transaction in order to preserve X-I-U-S and related timestamp > semantics. > > It's probably safe to go slightly bigger than 100 though. Yeah. We regularly write out chunks of 1000/number of variables in sqlite on cellphone hardware, limited only by the MAX_VARIABLE_NUMBER built into in sqlite, so I hope we can do better on big AWS iron!
Blocks: 1252469
> "So, if you submit 500 valid items, we batch them up into 5 batches of 100 and do those queries. " > "note also that you can configure the batch_size and batch_max_count independently. > So e.g. we could process a maximum of 500 objects but save them in batches of 100" > > Perhaps I misunderstood! FTR, those comments apply to the old sync, where we'd happily give items from the same upload different timestamps. For Sync1.5 we use a single transaction to ensure consistency/monotonicity of the timestamps.
Here's a quick patch that makes this limit configurable.
Attachment #8725523 - Flags: review?(dcoates)
Attachment #8725523 - Flags: feedback?(bobm)
Comment on attachment 8725523 [details] [review] make per-batch item limit configurable LGTM
Attachment #8725523 - Flags: feedback?(bobm) → feedback+
:kthiessen this makes batchsize configurable in load tests for Sync: https://github.com/mozilla-services/server-syncstorage/blob/configurable-max-batch-size/syncstorage/tests/tests.ini#L18 We'll need that for testing Bug 1250189 as well as this bug.
Attachment #8725523 - Flags: review?(dcoates) → review+
Committed at https://github.com/mozilla-services/server-syncstorage/commit/d8f87bbbaa6d07b341265ca865d6d386b26e35f5 Note that the default remains 100. I think we should work on QAing it up towards a larger size, which I'll leave this bug open to track. We should also use the /info/configuration endpoint from Bug 1250189 to expose the server's setting to clients, so they can take advantage of limit increases if and when we make them.
Filed Bug 1254402 to adapt on iOS.
See Also: → 1254402
Blocks: 1250189
Priority: -- → P3
Whiteboard: [sync-data-integrity]
Priority: P3 → --
Ryan, what's the status on rolling this out, complete with configurability? My iPhone wants to upload 80,000 history records, which at this rate will take 800 POSTs… less than ideal.
Flags: needinfo?(rfkelly)
This flag, and the corresponding /info/configuration entry, should now be live on all machines in production. The next step will be to QA an increase in the limit, and trying your proposed value of 4000 seems like as good a place to start as any. Bob/Karl have you guys done any experiments with increasing the batch size already?
Flags: needinfo?(rfkelly)
Flags: needinfo?(kthiessen)
We haven't yet, but I know it's on the list of things to do. Is this something that we can do early next year, or does it need to be done ASAP?
Flags: needinfo?(kthiessen)
QA Contact: kthiessen
Client code doesn't know to look for it yet, so it's not a rush job. (But now we know we can build the client bits!)
Grisha, do we need this now?
Flags: needinfo?(gkruglov)
IIUC, this is specifically talking about per-payload limits. Short answer is "yes, please!". Android clients have been aware of limits defined in info/configuration since 51, and using them for both batching and non-batching uploads (in case the endpoint isn't there, defaults are defined in [0]). From a mobile client's perspective, minimizing amount of requests it needs to make is very important, both for data consistency and resource efficiency. As for specific numbers... Until limits are lifted in Bug 1291821, android clients are capped at 5000 bookmark records. Can we get close to that with the per-payload limit (maxPost*), and for when the limits are lifted, much higher than that with the batch limit? (maxTotal*) [0] https://dxr.mozilla.org/mozilla-central/source/mobile/android/services/src/main/java/org/mozilla/gecko/sync/InfoConfiguration.java?q=InfoConfiguration.java&redirect_type=direct#41-45
Flags: needinfo?(gkruglov)
Ryan, Bob, Karl: can we get this into production?
Flags: needinfo?(rfkelly)
Flags: needinfo?(kthiessen)
Flags: needinfo?(bobm)
I'll defer to Bob, but I see no objection in principle.
Flags: needinfo?(kthiessen)
(In reply to Karl Thiessen [:kthiessen] from comment #18) > I'll defer to Bob, but I see no objection in principle Sounds good, I'll set a stage node to unlimited batch size (or as big as it'll go). Let's run a series of load tests identical except for increasing (doubling?) batch sizes per comment 9. Then compare the results. We should be less focused on requests per second than records per second, which I think we'll have to guestimate for now. We'll also want to keep an eye on slow MySQL queries and upstream query time.
Flags: needinfo?(bobm)
To clarify, are you going to be testing higher batch _and_ payload limits? Higher batch limits are particularly important in light of Bug 1336001. Currently they're at 10k, and it'll be great if we can triple that at least. Comment 9 talks about payload limits; those are currently at 100, which also needs to go way up.
Flags: needinfo?(bobm)
Let's increase max_post_records to 1000. Most batch=true POSTs are less than 100KB [1]. I bet it is because they are hitting the max_post_records limit. Increasing it 10x should still be less than the 2MB request limit while letting the client send BSOs per request. FWIW, about 88% of POST requests are under 100 BSOs and won't be affected by this change. [1] batch creation requests # NumSamples = 3943; Min = 16384.00; Max = 248148.00 # each ∎ represents a count of 21 16384 - 39560 [ 335]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (8.50%) 39560 - 62736 [ 1631]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (41.36%) 62736 - 85913 [ 1572]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (39.87%) 85913 - 109089 [ 262]: ∎∎∎∎∎∎∎∎∎∎∎∎ (6.64%) 109089 - 132266 [ 50]: ∎∎ (1.27%) 132266 - 155442 [ 81]: ∎∎∎ (2.05%) 155442 - 178618 [ 5]: (0.13%) 178618 - 201795 [ 2]: (0.05%) 201795 - 224971 [ 1]: (0.03%) 224971 - 248148 [ 4]: (0.10%)
(In reply to Benson Wong [:mostlygeek] from comment #21) > Let's increase max_post_records to 1000. Most batch=true POSTs are less than > 100KB [1]. I bet it is because they are hitting the max_post_records limit. > Increasing it 10x should still be less than the 2MB request limit while > letting the client send BSOs per request. Is there any reason to significantly limit the number of records per POST? Clients respect both payload size and record count limits, so there should be no reason to limit the number of records artificially to approximate a payload size limit. (And at a 2MB request limit, and 250 bytes per record, we could safely raise the POST limit to 8000. Even with the stats you mention -- 1KB per record max -- you mention, we could easily accept 2000 records per POST.) The higher the limit, the more chance we have of avoiding using a batch at all, which presumably makes the storage server much happiper -- if the limit is 2MB and 2000 records, the vast majority of users would be able to upload all of their bookmarks in a single request.
(In reply to :Grisha Kruglov from comment #20) > To clarify, are you going to be testing higher batch _and_ payload limits? > Higher batch limits are particularly important in light of Bug 1336001. > Currently they're at 10k, and it'll be great if we can triple that at least. > Comment 9 talks about payload limits; those are currently at 100, which also > needs to go way up. I'm all for testing it with both.
Flags: needinfo?(bobm)
> Is there any reason to significantly limit the number of records per POST? I don't think so, it's only that the server parses them all into memory at once. I don't think there are e.g. any O(n^2) operations on the list of records, or other things where we might get into trouble by having too many of them at once.
Flags: needinfo?(rfkelly)
Any progress on bumping these limits further? Testing or otherwise. I'd love to be able to one day put "atomic" somewhere into our uploaders' names ;-)
Flags: needinfo?(rfkelly)
Flags: needinfo?(bobm)
I'm not aware of any updates, but it also doesn't appear to be blocked on anything other than :bobm bandwidth. We may have to wait until the successful re-rollout of the batch upload stuff in order to try it in prod at all.
Flags: needinfo?(rfkelly)
Re-titling this bug to be just about the per-individual-POST item limit, and Bug 1378569 is about the overall limit on the number of items in a batch. That said, I want to get batch upload API out and stable before we do any further fiddling with the config here, so I'm going to make that bug block this one.
Summary: Raise per-batch item limit from 100 to something more reasonable → Raise per-post item limit from 100 to something more reasonable
Depends on: 1378567
(In reply to Ryan Kelly [:rfkelly] from comment #26) > I'm not aware of any updates, but it also doesn't appear to be blocked on > anything other than :bobm bandwidth. We may have to wait until the > successful re-rollout of the batch upload stuff in order to try it in prod > at all. Should we re-visit this?
Flags: needinfo?(bobm)
I think we are finally free to experiment with raising this limit :-)
And in fact, this is the final blocking bug for Bug 1250189
I'm taking this out of Bug 1250189's dependency tree, since the larger batch-upload effort has completed. But this one still feels worth doing.
No longer blocks: 1250189
:autrilla I think you should do this on the new server type.
Flags: needinfo?(autrilla)
Assignee: nobody → autrilla
Flags: needinfo?(autrilla)
QA Contact: kthiessen → nobody

This is no longer valid, we've moved to Spanner which originally limited our max_total_records to 1664. Spanner recently increased its internal limit so we can raise this 1664 value by probably around 2 fold. This is covered here: https://mozilla-hub.atlassian.net/browse/SYNC-3433

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INVALID
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: