Consider increasing max record size

REOPENED
Unassigned

Status

Cloud Services
Server: Sync
REOPENED
a year ago
4 months ago

People

(Reporter: markh, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

a year ago
Sync currently limits single records to 256k bytes. We've opened bugs like bug 1293496 with the expectation that even for clients which correctly limit the POST sizes, we will unstick some users with large records (which current versions of all our clients do). It turns out this is wrong, and the only way of "unsticking" clients with large, legitimate records is to increase this size.

Bug 1300378 is a client hitting this.

See also bug 1300447 (which is asking for this max size to be reported via info/configuration) and the those blocked by that bug, which are for our 3 clients to do the right thing when finding records greater than this size.
(Reporter)

Comment 1

a year ago
FWIW, bug 1300378 is hitting this max size on the "unfiled" folder - presumably due to the user having a massive number of bookmarks in that folder.
I have no particular objection to this, although it'd be good to gather some stats on the size distribution in practice, so we can judge what a sensible upper-bound would be.  I assume that in theory clients could have a legitimate record of artibrary length, but we won't be willing to make this arbitrarily large in practice.

At a practical level, increasing this could get interesting...the size limit on individual payloads is reflected in the MySQL storage schema itself (the "payload" column is a TEXT field with max size) so bumping it up on existing storage nodes could be difficult.  Bob, what do you think?
Flags: needinfo?(bobm)
(Reporter)

Comment 3

a year ago
Adding Benson as a change here may impact the go impl
The sqlite3 db used by go-syncstorage doesn't have a max size on the payload column. It probably won't have too much effect on the database. 

IIRC the average payload size is about 480bytes. It'll be interesting to see a histogram of the distribution. Most of them are very small. 

:markh how are bookmark folders stored? Do they have a list of pointers to the BSOs or do the BSOs point to the folders?
Flags: needinfo?(markh)
> how are bookmark folders stored? Do they have a list of pointers to the BSOs
> or do the BSOs point to the folders?

According to the description of "version 2" format at [1], it's both - individual bookmarks contain the GUID of their parent folder, and the folder contains list of of child GUIDs.

[1] http://docs.services.mozilla.com/sync/objectformats.html#bookmarks
Flags: needinfo?(markh)
That format, incidentally, caps the maximum number of children that a synced bookmark folder can have at about 17,000.

And means we upload 250K of redundant changes each time you add a new bookmark to your swollen Unsorted Bookmarks folder.

I've talked to users who have hit that limit. :/
(In reply to Ryan Kelly [:rfkelly] from comment #2)
> I have no particular objection to this, although it'd be good to gather some
> stats on the size distribution in practice, so we can judge what a sensible
> upper-bound would be.  I assume that in theory clients could have a
> legitimate record of artibrary length, but we won't be willing to make this
> arbitrarily large in practice.
> 
> At a practical level, increasing this could get interesting...the size limit
> on individual payloads is reflected in the MySQL storage schema itself (the
> "payload" column is a TEXT field with max size) so bumping it up on existing
> storage nodes could be difficult.  Bob, what do you think?

The payload column is a MEDIUMTEXT field without a limit, so that buys us 16,777,215 characters if we want them.  I don't think we want all of them.  Getting a sense of the actual distribution of payload sizes, and the size of the user population that needs larger records is key.  What's a reasonable limit on record size?  I suspect it's closer to 250K than 16MB.  

The Sync Schema: 
+--------------+-------------+------+-----+------------+-------+
| Field        | Type        | Null | Key | Default    | Extra |
+--------------+-------------+------+-----+------------+-------+
| id           | varchar(64) | NO   | PRI | NULL       |       |
| userid       | int(11)     | NO   | PRI | NULL       |       |
| collection   | int(11)     | NO   | PRI | NULL       |       |
| sortindex    | int(11)     | YES  |     | NULL       |       |
| modified     | bigint(20)  | NO   |     | NULL       |       |
| payload      | mediumtext  | NO   |     | NULL       |       |
| payload_size | int(11)     | NO   |     | 0          |       |
| ttl          | int(11)     | NO   | MUL | 2100000000 |       |
+--------------+-------------+------+-----+------------+-------+
Flags: needinfo?(bobm)
(In reply to Bob Micheletto [:bobm] from comment #7)
> (In reply to Ryan Kelly [:rfkelly] from comment #2)

I think it's work pointing out here that the Sync schema is created during the deploy process for a new node through configuration management, and not by the syncserver code when it spins up.
> (In reply to Richard Newman [:rnewman] from comment #6)

> And means we upload 250K of redundant changes each time you add a new
> bookmark to your swollen Unsorted Bookmarks folder.

That also means we send out 250KB x (clients - 1) of redundant data each time it changes as well.
Would it be better to fix this client side? Maybe split up the bookmark folder over several BSOs?
(In reply to Benson Wong [:mostlygeek] from comment #9)

> Would it be better to fix this client side? Maybe split up the bookmark
> folder over several BSOs?

Can't be done without an engine version bump; that's Sync's only provision for non-backward-compatible record format changes.

If we were doing that, we'd probably redesign the bookmarks record format more fundamentally.

This, along with a dozen other problems, was to be fixed in Sync 2.0….
(Reporter)

Comment 11

a year ago
(In reply to Bob Micheletto [:bobm] from comment #7)
> The payload column is a MEDIUMTEXT field without a limit, so that buys us
> 16,777,215 characters if we want them.  I don't think we want all of them. 
> Getting a sense of the actual distribution of payload sizes, and the size of
> the user population that needs larger records is key. 

I agree. Unfortunately, looking at the sizes of the records in the DB doesn't actually help us determine how many records were rejected for being too large. Is there any scope for server-side metrics that reports when a record is rejected in this way and what the size of the payload was?

> I suspect it's closer to 250K than 16MB.

Me too - I'd guess that doubling the size would have a huge impact (which still remains closer to 250k than 16MB :) Without stats though, my guesses aren't worth much...
Flags: needinfo?(bobm)
We can instrument the server to emit stats when it rejects an oversize payload, assuming the request wasn't so large as to be rejected by nginx.
(In reply to Ryan Kelly [:rfkelly] from comment #12)
> We can instrument the server to emit stats when it rejects an oversize
> payload, assuming the request wasn't so large as to be rejected by nginx.

Am I right in guessing I should see 'payload too large' in the sync.log when this happens?
Flags: needinfo?(bobm)
(Reporter)

Comment 14

a year ago
(In reply to Bob Micheletto [:bobm] from comment #13)
> Am I right in guessing I should see 'payload too large' in the sync.log when
> this happens?

Not in the client log, no. In attachment 8787927 [details], the user inspected the response with devtools (yay!) so did see what message in the response - but the client doesn't log it.

Unless you were asking about the server logs, in which case I have no idea :)
Re size: (In reply to Mark Hammond [:markh] from comment #11)

> I agree. Unfortunately, looking at the sizes of the records in the DB
> doesn't actually help us determine how many records were rejected for being
> too large. Is there any scope for server-side metrics that reports when a
> record is rejected in this way and what the size of the payload was?

It would if we expected payload sizes to follow a continuous distribution of some sort -- the truncation line should be obvious. Concur that server-side metrics would be most helpful, so long as we don't double-count records that are attempted multiple times by the same user.


> Me too - I'd guess that doubling the size would have a huge impact (which
> still remains closer to 250k than 16MB :) Without stats though, my guesses
> aren't worth much...

Two datapoints:

We know for sure that going to 1MB would definitely handle the largest bookmark folders we've seen (50K).

We know that we've seen some users with 1.7MB form records, and I doubt we'd want to routinely go that high.
(In reply to Mark Hammond [:markh] from comment #14)
> (In reply to Bob Micheletto [:bobm] from comment #13)

> Unless you were asking about the server logs, in which case I have no idea :)

Yeah, I meant server side, via this: https://github.com/mozilla-services/server-syncstorage/blob/d8f87bbbaa6d07b341265ca865d6d386b26e35f5/syncstorage/bso.py#L114
I don't think that "payload too large" message will actually show up in any server logs.  Happy to add some instrumentation though.  Bob, what's the best way to do it - just log.info a line giving the size of the rejected item, or something fancier with e.g. statsd?

> Concur that server-side metrics would be most helpful,
> so long as we don't double-count records that are attempted multiple times by the same user.

Of the top of my head, I can't think of a nice way to avoid this potential double-counting.  I guess we could log the uid and itemid and deduplicate in post-processing.
Flags: needinfo?(bobm)
(In reply to Richard Newman [:rnewman] from comment #10)

> Can't be done without an engine version bump; that's Sync's only provision
> for non-backward-compatible record format changes.

Can we create a new record types in the current engine version? An object's type is just(?) a string. For example a `folder_ext` type would look exactly like a `folder`. The only difference is its type value and how it is treated.

For backwards compatibility, a folder's children array would start with regular items and the end can be the id's of `folder_ext` objects. I am assuming older clients will see the folder_ext types as invalid and skip them. For example: 

>  { "type" : "folder", 
>    "children": ["bso1", "bso2", "bso3",
>          ....(200KB later)..., 
>          "folder_ext_bsoId1", "folder_ext_bsoId2"] ... 
>  }


I'm 0% aware of the engineering effort / impact / side effects / requirements this would have. 

On the other hand, I can change a constant in the server side code to say, 512KB and it'll probably go away for another decade.
(In reply to Ryan Kelly [:rfkelly] from comment #17)
> I don't think that "payload too large" message will actually show up in any
> server logs.  Happy to add some instrumentation though.  Bob, what's the
> best way to do it - just log.info a line giving the size of the rejected
> item, or something fancier with e.g. statsd?

I think the log line is the way to go since we don't need these as a real-time stat, since it doesn't reflect server performance or health, and we have other proxies for user behavioral changes.
 
> Of the top of my head, I can't think of a nice way to avoid this potential
> double-counting.  I guess we could log the uid and itemid and deduplicate in
> post-processing.

I'm guessing the path will be included here, so we can measure distribution by record type.
Flags: needinfo?(bobm)
(Reporter)

Updated

a year ago
Blocks: 1300378
Created attachment 8789195 [details] [review]
PR to log info about invalid bsos

Here's a simple PR that log.info's some info about invalid BSOs.  :bobm, thoughts?
Attachment #8789195 - Flags: feedback?(bobm)
Comment on attachment 8789195 [details] [review]
PR to log info about invalid bsos

The way I read it, that should give us enough information to put together some baseline stats around this issue.  

Is the 'id' field here: https://github.com/mozilla-services/server-syncstorage/pull/41/files#diff-43ff0a964143dcfda3219038a479d0b4R213 sufficient for de-duplication?
Attachment #8789195 - Flags: feedback?(bobm) → feedback+
(Reporter)

Comment 22

a year ago
(In reply to Bob Micheletto [:bobm] from comment #21)
> Is the 'id' field here:
> https://github.com/mozilla-services/server-syncstorage/pull/41/files#diff-
> 43ff0a964143dcfda3219038a479d0b4R213 sufficient for de-duplication?

That looks like the ID of the row we are attempting to upload. I think that probably is enough - we'd just want to report each ID exactly once, even though we expect the same failing row to be logged each time the affected client syncs. IOW, the count of unique IDs roughly tells us how many *records* are impacted. It's not clear that will tell us how many *clients* are impacted (IIUC that will depend on how the "bso" object stringifies itself), but that might be OK for this purpose.
Comment on attachment 8789195 [details] [review]
PR to log info about invalid bsos

Cool, OK, let's land this thing then.  :rtilder r?
Attachment #8789195 - Flags: review?(rtilder)
Comment on attachment 8789195 [details] [review]
PR to log info about invalid bsos

Looks good pending one question for :rfkelly in the github issue.
Attachment #8789195 - Flags: review?(rtilder) → review+
(Reporter)

Comment 25

a year ago
Turning this into the back-end bug - will open one for the client.
Component: Firefox Sync: Backend → Server: Sync
:bobm, do you want a deploy bug to get the metrics-gathering out into production, or would you prefer to wait and bundle it in with e.g. the batch upload stuff as a single deploy?
Flags: needinfo?(bobm)
(In reply to Ryan Kelly [:rfkelly] from comment #26)
> :bobm, do you want a deploy bug to get the metrics-gathering out into
> production, or would you prefer to wait and bundle it in with e.g. the batch
> upload stuff as a single deploy?

In the interest of simplifying the number of active tests we have running in stage, I think it would be best if you bundled it into the batch upload changes.  However, that has implications for QA so let's get their input too.  

:kthiessen what do you think?
Flags: needinfo?(bobm) → needinfo?(kthiessen)
I would prefer to bundle these changes in with the batch-upload material -- I don't mind re-running a few load tests.  In fact, I think we can probably bring the batch-upload changes into the deploy branch at this point -- I perceive the sharding refinements to the load tests to be low-risk.
Flags: needinfo?(kthiessen)
See also Bug 1321021, which discusses automatically splitting large bookmark folders.
See Also: → bug 1321021
See Also: → bug 1386953
IIUC this metrics-gathering code should be live in production now.  :bobm, at your convenience, can you please take a look in the allocation logs for messages like "Invalid BSO" and see whether we're getting them?  (Or, point me to a dashboard where I can poke around myself).
Flags: needinfo?(bobm)
> take a look in the allocation logs for messages like "Invalid BSO" and see whether we're getting them?

We are indeed getting them.  Dashboard shows 292 instances of "payload too large" failures in the past 2 days.  Mostly from tabs, history, and bookmarks collections.
Flags: needinfo?(bobm)
I did a quick analysis on the errors from the last two days, and here's how they break down:

COUNT OF 'too large' ERRORS BY COLLECTION:
  forms: 2
  clients: 5
  stylishsync: 16
  bookmarks: 56
  tabs: 90
  history: 108

DISTRIBUTION OF PAYLOAD SIZES:
  >=100000: 277
  >=200000: 277
  >=300000: 214
  >=400000: 112
  >=500000: 68
  >=600000: 45
  >=700000: 31
  >=800000: 26
  >=900000: 22
  >=1000000: 20
  >=1500000: 9

MAX PAYLOAD SIZE: 1792259
(bob tells me this dashboard may be sampled, but I think the stats should give us enough to work with even is sampled)
> Dashboard shows 292 instances of "payload too large" failures in the past 2 days

Some of these are dups though; uniqifying by (uid, collection, id) shows 145 unique failing items.
(Reporter)

Comment 35

5 months ago
In desktop telemetry I looked at 10% of pings over the last 50 days and the breakdown is:

	bytes
count 	40114
mean 	500235
std 	850629
min 	262355
25% 	305915
50% 	379751
75% 	524519
90% 	771771
95% 	952167
99% 	1777083
max 	93952891

Note that is only capturing failure to post records due to the existing size constraints - so nothing under 256k is recorded.

ie, there's at least one attempt to post a 94MB(!) payload, but increasing to 1MB will address over 95% of the failures and 2MB will address over 99%.
Created attachment 8904395 [details] [review]
PR to increase max payload size to 2MB

> increasing to [...] 2MB will address over 99%.

Let's do that thing then.  This PR increases the limit to 2MB for the python server.
Attachment #8904395 - Flags: review?(bwong)
Attachment #8904395 - Flags: review?(bobm)
Comment on attachment 8904395 [details] [review]
PR to increase max payload size to 2MB

The numbers look good to me, so r+ from that perspective.
Attachment #8904395 - Flags: review?(bobm) → review+
Attachment #8904395 - Flags: review?(bwong)
Bob, do you want to do anything special about rolling this out for testing, or just include it in the next release and go from there?
This is going out in v1.6.10
Status: NEW → RESOLVED
Last Resolved: 4 months ago
Resolution: --- → FIXED

Updated

4 months ago
Blocks: 1397553
It occurs to me that memcache won't store items bigger than 1M, so regardless of the setting here, attempts to store records in "meta" or "tabs" or "clients" records with a payload over 1M are going to fail.  Is it worth us trying to increase the size limits in memcached in production to accommodate this?

:bobm, are those the only collections that will store payloads in memcached?
Flags: needinfo?(markh)
Flags: needinfo?(bobm)
(Reporter)

Comment 41

4 months ago
I think tabs in particular could be an issue - we limit tab URLs sizes but some "tab-guy" type users might hit it given there's one BSO with all tabs. I expect clients and meta to be fine in theory :)

If there's not much cost in increasing the max when we expect it to be very rarely hit, I guess it makes more sense to increase it than to have clients hit this and get stuck. Or if that's a problem, maybe longer-term per-collection limits might make a (sad) kind of sense?
Flags: needinfo?(markh)
(Reporter)

Comment 42

4 months ago
also, but desktop scales the number of tabs it syncs back based on the max size, so users with a massive number of tabs will currently get only a small number of their tabs synced, whereas now they they are likely to get none (we'll try and sync many more, but our post will fail)
(In reply to Ryan Kelly [:rfkelly] from comment #40)
> It occurs to me that memcache won't store items bigger than 1M, so
> regardless of the setting here, attempts to store records in "meta" or
> "tabs" or "clients" records with a payload over 1M are going to fail.  Is it
> worth us trying to increase the size limits in memcached in production to
> accommodate this?
> 
> :bobm, are those the only collections that will store payloads in memcached?

Our cache config is as follows:
cached_collections = meta clients
cache_only_collections = tabs

I'd like to have a better understanding of how increasing this will impact our memory consumption.  Would a study of 413 responses to these collections suffice?
Flags: needinfo?(bobm)
(Reporter)

Comment 44

4 months ago
(In reply to Bob Micheletto [:bobm] from comment #43)
> I'd like to have a better understanding of how increasing this will impact
> our memory consumption.  Would a study of 413 responses to these collections
> suffice?

If not for tabs I'd be fine with that. What does "cache_only" mean? I also wonder if tabs really needs to be backed by memcache? They will have a high write-load, but I expect still less than history.
> What does "cache_only" mean?

It means they're *only* stored in memcache, never written through to the db.

I honestly don't know what analysis we did to lead us to put tabs in memcache, it was before my time.
(Reporter)

Comment 46

4 months ago
(In reply to Ryan Kelly [:rfkelly] from comment #45)
> I honestly don't know what analysis we did to lead us to put tabs in
> memcache, it was before my time.

Richard, do you recall? (For additional context, see comment 40 and later)
Flags: needinfo?(rnewman)
ISTM it might be straightforward to test this out in production - preff off cache-only storage of tabs on a couple of webheads and see what effect it has on performance :-)
> I'd like to have a better understanding of how increasing this will impact our memory consumption.
> Would a study of 413 responses to these collections suffice?

For a historical analysis, yes.  Be aware though that we've upped the nginx request-size limit so that it's now *larger* than the storage-size limit in memcache.  That means it's now possible for clients to upload a tabs record that's small enough to be allowed through by nginx, but big enough to cause an error if we store it int memcache.  Users who do that will currently get a 500 Server Error.
(In reply to Mark Hammond [:markh] from comment #46)
> (In reply to Ryan Kelly [:rfkelly] from comment #45)
> > I honestly don't know what analysis we did to lead us to put tabs in
> > memcache, it was before my time.
> 
> Richard, do you recall? (For additional context, see comment 40 and later)

It's been that way since the server code introduced the MySQL + memcache backend, seven years ago:

https://hg.mozilla.org/services/server-storage/rev/98b384bd07b5#l1.265

I can't find the bug for that, but I did find the one for meta/global — meta/global represented 20% of our query load: Bug 581559.

I expect the same was (is?) true for tabs: given N devices, the `tabs` collection contains only N records, and each are completely replaced each time the corresponding device syncs, dozens of times per day. That also means that each device fetches all other active devices' tab records each time it syncs, which is heavy read volume.

Tab records never pass data back to the originating client — they are always overwritten by the current open tabs. If you connect a new device, it won't treat tabs like, say, bookmarks.

They also expire in 21 days. (Though lol: Bug 756321.)


So they're churny, they're typically small, and they don't need to be durable. Perfect to keep in memory rather than supporting the write and read load that storing in the DB would require.

The same is _almost_ true of `clients`, but these days we actually would like client records to stick around, given Send Tab. Oh well.
Flags: needinfo?(rnewman)
> these days we actually would like client records to stick around, given Send Tab.  Oh well.

FWIW the clients collection is "cached" in the traditional sense of "it sticks around even if you blow away the cache":

  cached_collections = meta clients

Or were you talking about client-specified ttls on client records?  In any case, tabs are special in that they're only stored in memcache and never hit the db at all:

  cache_only_collections = tabs

They're the only collection that has this special behaviour.
So it looks like newer versions of memcached make it easy to increase the max item size via command-line option, although they discourage its use:

  https://github.com/memcached/memcached/wiki/ReleaseNotes142#configurable-maximum-item-size

Internally, memcache manages its memory by assigning "pages" to "slab classes" to cut up for storage of fixed size "chunks" as described here:

  https://github.com/memcached/memcached/wiki/UserInternals

Which in theory means that changing the max item size shouldn't have much of an affect on memory usage unless clients actually start sending larger items...unless it causes the size of a page to change and this dramatically changes memory allocation patterns.

Bob, what's the right next step here, do you want to do a bit of analysis of historical 413's on the "tabs" collection to see what we might be in for?
Status: RESOLVED → REOPENED
Flags: needinfo?(bobm)
Resolution: FIXED → ---
(In reply to Ryan Kelly [:rfkelly] from comment #51)
 
> Bob, what's the right next step here, do you want to do a bit of analysis of
> historical 413's on the "tabs" collection to see what we might be in for?

That seems like a reasonable approach to me.
Flags: needinfo?(bobm)
>  do you want to do a bit of analysis of historical 413's on the "tabs" collection to see what we might be in for?

:markh set me straight that this won't actually be useful - Desktop has long contained logic to avoid uploading tabs records that exceed the maximum item size, using a hard-coded limit in older versions where the item-size was not configurable.
You need to log in before you can comment on or make changes to this bug.