Closed
Bug 1330839
Opened 9 years ago
Closed 9 years ago
Importance of maintaining the same GET parameters while fetching a collection using an offset
Categories
(Firefox :: Sync, defect, P1)
Firefox
Sync
Tracking
()
RESOLVED
FIXED
Iteration:
1.14
People
(Reporter: Grisha, Assigned: Grisha)
Details
(Whiteboard: [MobileAS])
After receiving an offset token while batching downloads, how should clients treat subsequent requests? Should they make an effort to maintain the same `since`, `order`, and `limit` parameters? Or is it possible to, say, maintain the same `since` value but change a `limit` parameter on a subsequent request, and expect it to be respected?
Example: there are 30 records in total to fetch.
- 1) since=1&limit=10. We get back something like offset=10.
- 2) since=1&offset=10&limit=50 - will the new limit be respected?
- 2.1) what if instead we reduced our limit? since=1&offset=10&limit=5 ? what should be expected to happen?
Docs currently seem to be somewhat ambiguous on the subject.
Context for this is the "resume incremental sync where we left off" work as part of Bug 1291821. If a sync is interrupted for a non-412 reason, we'd like to resume it at some point in the future in the most optimal way. Ideally, by re-using whatever offset token we had previously. Client's per-batch limits might change over time, as may other parameters. Primary example I can think of is an application update which changes per-batch limits. If this update also interrupts a sync-in-progress, the client needs to either maintain more state, reset and fetch more records than necessary, or be certain that its new request (with an old offset and new limit, for example) will be respected.
May an `order` change during batching downloads? Does this make sense at all, or should a client just reset its state and "try again"? What about other parameters that one might specify such as "ids" and "full"?
To quote rnewman from IRC, "I would be happy to see either consistent definition used: either all other parameters are ignored (i.e., the offset is a continuation token for a pickled query) or 'since' and order etc. must be the same, and limit can change and will be respected (i.e., the offset is either an offset or a reference to a server-side cursor that can be read from in smaller or larger chunks); in either case, the client can safely use a different limit and something sensible will happen".
How do the two sync servers handle the `offset` now?
Comment 1•9 years ago
|
||
> May an `order` change during batching downloads?
I think this is pretty clearly 'no' -- any change in either data or request parameters that would result in a different set or order of results would invalidate the batching.
My mental model for this is a Java iterator -- you could walk it differently (faster or slower: change the limit), but you can't alter the data underneath it (ConcurrentModificationException) and you can't keep walking the same iterator but somehow change the 'query' (alter order or since).
Setting ni on Benson and Ryan.
Flags: needinfo?(rfkelly)
Flags: needinfo?(bwong)
Comment 2•9 years ago
|
||
Richard's intuition is exactly right from my perspective - anything that might change the underlying data or its order should stay the same across requests. I would support an update to the docs saying explicitly that you can change `limit` but must keep all the other parameters the same. It might be worth having the server return an error if you violate this expectation.
Implementation-wise, the python server handles sorting by timestamp and sorting by sortindex different:
https://github.com/mozilla-services/server-syncstorage/blob/master/syncstorage/storage/sql/__init__.py#L383
When sorting by timestamp, the offset token is actually a "timestamp:count" pair telling us the last timestamp value that you read up to, and the number of items you read with that timestamp. It lets us formulate a more efficient query to get to the next items. But it also means we'll silently do the wrong thing if you change the sort order :-/
Flags: needinfo?(rfkelly)
Comment 3•9 years ago
|
||
I concur with Ryan and Richard. Changing the sort parameter of the query will provide different results. Even so, there's no guarantee of a consistent result between requests based on the capabilities of the sync 1.5 protocol [2].
Scenario A:
1. client A: GET .../storage/history?sort=newest (returns 10 bsos)
2. client B: POST .../storage/history/ (add 10 new bsos)
3. client A: GET .../storage/history/?sort=newest&offset=10
... same response as #1, &offset=10 skips the the new ones from #2
and you get the same batch as before
Scenario B:
1. client A: GET .../storage/history?sort=oldest (returns 10 bsos)
2. client B: POST .../storage/history/ (modifies the BSOs in #1, no longer the oldest)
3. client A: GET .../storage/history/?sort=oldest&offset=10
... you're now missing #20 to #30
BSOs #1 and #10, have been modified and are *moved*
offset=10 skips #20 to #30, since they are now the absolute oldest
Scenario B is rare but unfortunately not impossible. It could result in missing chunks of data.
In both gosync and pysync we only keep a modified timestamp for BSOs. So there's no way to prevent a writer from changing things between GET requests.
[1] https://github.com/mozilla-services/go-syncstorage/blob/master/syncstorage/db.go#L864-L866
[2] https://docs.services.mozilla.com/storage/apis-1.5.html#individual-collection-interaction
Flags: needinfo?(bwong)
| Assignee | ||
Comment 4•9 years ago
|
||
(In reply to Benson Wong [:mostlygeek] from comment #3)
> I concur with Ryan and Richard. Changing the sort parameter of the query
> will provide different results. Even so, there's no guarantee of a
> consistent result between requests based on the capabilities of the sync 1.5
> protocol [2].
>
> Scenario A:
>
> 1. client A: GET .../storage/history?sort=newest (returns 10
> bsos)
> 2. client B: POST .../storage/history/ (add 10 new
> bsos)
> 3. client A: GET .../storage/history/?sort=newest&offset=10
> ... same response as #1, &offset=10 skips the the new ones from #2
> and you get the same batch as before
>
> Scenario B:
>
> 1. client A: GET .../storage/history?sort=oldest (returns 10
> bsos)
> 2. client B: POST .../storage/history/ (modifies the
> BSOs in #1, no longer the oldest)
> 3. client A: GET .../storage/history/?sort=oldest&offset=10
> ... you're now missing #20 to #30
> BSOs #1 and #10, have been modified and are *moved*
> offset=10 skips #20 to #30, since they are now the absolute oldest
>
> Scenario B is rare but unfortunately not impossible. It could result in
> missing chunks of data.
>
> In both gosync and pysync we only keep a modified timestamp for BSOs. So
> there's no way to prevent a writer from changing things between GET requests.
>
> [1]
> https://github.com/mozilla-services/go-syncstorage/blob/master/syncstorage/
> db.go#L864-L866
> [2]
> https://docs.services.mozilla.com/storage/apis-1.5.html#individual-
> collection-interaction
These scenarios are a good example of why one should always set x-i-u-s header.
Comment 5•9 years ago
|
||
> These scenarios are a good example of why one should always set x-i-u-s
> header.
The gosync server does guarantee that the collection's modified timestamp always increases monotonically. Except during a leap second. Though it did survive the last time without any catastrophe. [1]
[1] https://en.wikipedia.org/wiki/Leap_second#Examples_of_problems_associated_with_the_leap_second
Comment 6•9 years ago
|
||
(In reply to Benson Wong [:mostlygeek] from comment #3)
> Even so, there's no guarantee of a
> consistent result between requests based on the capabilities of the sync 1.5
> protocol [2].
Clients use X-I-U-S and expect the server to detect and raise conflicts during batched downloads, as described in the docs.
'Modern' clients will handle those conflicts by resuming at a sensible spot (e.g., when sorting oldest-to-newest, they'll do an overlapping resume from the last saved timestamp, because all modifications for sort=oldest shuffle records to the end).
Comment 7•9 years ago
|
||
Grisha: You mentioned wanting to mention this explicitly in the doc repo[0] and asked that the bug be assigned to you.
[0]: https://github.com/mozilla-services/docs/
Assignee: nobody → gkruglov
Priority: -- → P1
| Assignee | ||
Comment 8•9 years ago
|
||
Doc change PR at https://github.com/mozilla-services/docs/pull/68.
Status: NEW → RESOLVED
Iteration: --- → 1.14
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [MobileAS]
Updated•7 years ago
|
Component: Firefox Sync: Cross-client → Sync
Product: Cloud Services → Firefox
You need to log in
before you can comment on or make changes to this bug.
Description
•