Closed Bug 1361178 Opened 7 years ago Closed 7 years ago

go-sync limits GET BSO payload sizes

Categories

(Cloud Services Graveyard :: Server: Sync, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mostlygeek, Assigned: mostlygeek)

References

Details

A sync server should not limit the amount of BSO records that can be returned on a fetch. This has been addressed in PR 167 [1]. 

As a workaround the production go-sync box's GET limit has been increased to 100,000 records by setting LIMIT_MAX_BSO_GET_LIMIT=100000 on the configuration. 

[1] https://github.com/mozilla-services/go-syncstorage/pull/167
Added a few people to the CC list. 

Not sure how bad the effect of this bug is. If a client requests an unbounded (no limit) GET for all of their BSOs and only gets back 1000 of them how broken are they? Do they have any chance of recovering? 

I suppose if they didn't get their bookmark folder BSOs they're be in a pretty bad state.
Status: NEW → ASSIGNED
Component: Firefox Sync: Backend → Server: Sync
(In reply to Benson Wong [:mostlygeek] from comment #1)
> Not sure how bad the effect of this bug is. If a client requests an
> unbounded (no limit) GET for all of their BSOs and only gets back 1000 of
> them how broken are they?

Very :)

> Do they have any chance of recovering? 

Probably not, unless the BSOs returned were returned in date order (in which case a next sync should get more, and they might correctly reparent on the later sync when they appear.)
> (in which case a next sync should get more, and they might correctly reparent on the later sync when they appear.)

Even then, ISTM the client will think "I've got all BSOs modified before time X" and then any future requests will have "?newer=X" on them, and will not return the skipped bookmarks.
(In reply to Mark Hammond [:markh] from comment #2)

> Probably not, unless the BSOs returned were returned in date order (in which
> case a next sync should get more, and they might correctly reparent on the
> later sync when they appear.)

Nah, the client will have advanced its timestamp on success -- it'll only retrieve the skipped records if they're modified or something causes the client to reset and resync.
Hmm. Even if the bug is fixed it seems safe to migrate everybody off the server.
Deployed mozilla/go-syncstorage:1.6.7r3 to the two production go servers.
:grisha, should we migrate iOS and Android users from these servers due to this bug?
Flags: needinfo?(gkruglov)
(In reply to Bob Micheletto [:bobm] from comment #7)
> :grisha, should we migrate iOS and Android users from these servers due to
> this bug?

Why do we think this only affects mobile clients?
(In reply to Richard Newman [:rnewman] from comment #8)

> Why do we think this only affects mobile clients?

As far as I can tell from reading the code, desktop still fetches non-history collections without a limit, and the history limit is 5000, so I expect desktop to be just as affected as any other client.

You should node-reassign every user with more than 1000 records in any collection.
Seems like a reasonable thing to do, unfortunately. Otherwise we have clients with partial views of their collections. Migration will mean that clients might even get a chance to upload their partial view of the data to a new node, which is bad. It seems like either way there's a good chance we're going to corrupt data, and migrating sooner will hopefully make situation a bit better.

Alternatively:
I wonder if bumping modified timestamps for all records in all collections might also do the trick? It will force clients to re-download all of their collections. We're probably going to run into various reconciliation bugs here, but I reckon that's better than potentially loosing subsets of users' data by having them upload incomplete collections in some cases.

Perhaps we can only do that for more structure sensitive ones.

Is that a reasonable approach, or would it be too hard to orchestrate?
Flags: needinfo?(gkruglov)
(In reply to :Grisha Kruglov from comment #10)

> Migration will mean that
> clients might even get a chance to upload their partial view of the data to
> a new node, which is bad. 

Client's conflict resolution couldn't/wouldn't catch this?
> > Migration will mean that
> > clients might even get a chance to upload their partial view of the data to
> > a new node, which is bad. 

Older Android clients will have _already_ screwed everything up: once Android is confident that it's downloaded everything, it fixes up bookmark trees and uploads whatever changes it needs.


> Client's conflict resolution couldn't/wouldn't catch this?

There's no conflict to detect. This is just part and parcel of Sync: we assume that it's safe for any client to restore to a server, and that once all clients have synced they'll all converge on the same good place. Naturally that assumption isn't true if a client is forced to write before they can finish reading.


> I wonder if bumping modified timestamps for all records in all collections
> might also do the trick?

On an updated Go node, that would force a redownload, but it would also cause data loss on clients as the remote timestamp wins a conflict.

If you want to force a resync on all clients without a node migration, simply permute the meta/global syncID for that collection.

I'd be inclined to simply node-reassign, tbh. It's a more tested flow.
Depends on: 1362180
Resolving this issue. The bug was fixed in gosync and all users were migrated away in Bug 1362180.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.