Closed Bug 1015016 Opened 10 years ago Closed 7 years ago

Implement client-side pagination when fetching lots of items

Categories

(Firefox :: Sync, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1294599

People

(Reporter: rfkelly, Unassigned)

Details

(Keywords: dupeme, Whiteboard: [qa?])

(I couldn't find or recall an existing bug for this, if there is one please triage as appropriate)

The sync1.5 protocol has some shiny new affordances for paging through large sets of items using the "limit" and "offset" paramters, as shown in example code here:

  https://docs.services.mozilla.com/storage/apis-1.5.html#example-paging-through-a-large-set-of-items

It would be great if client code could be updated to use them in some way.  Currently we're still seeing regular "give me all 10,000 of my bookmarks" queries to the server, which impact availability of the service because they tie up db resources for a long time.

I plan to experiment with this on the server side as described in Bug 1007987 - basically if an unpaginated query comes in from the client, the server would split it up internally using the existing limit/offset API. I'm hopeful that this can reduce the impact of the unpaginated client queries.

But ISTM having it handled client-side would be better in the long term.
Having clients make multiple requests is something that's vaguely terrifying from a reliability standpoint. 

It's not obvious whether it's better to have a client try to grab all 5,000 records, possibly failing, or suffer the known and terrible consequences of trying that in 10 chunks and failing half-way through.

I'm less concerned with, e.g., history, where interruption is less harmful.

Note that we removed batching from XUL Fennec, IIRC: it didn't help.

See also Bug 730142.
> I'm less concerned with, e.g., history, where interruption is less harmful.

Well, another option is for the server to just hard-cap your history requests at 5000 items, silently refusing to give you back any more.  IIRC we did this for a time when the sync1.1 servers were in strife - did it cause any problems?

> Note that we removed batching from XUL Fennec, IIRC: it didn't help.

Interesting. Do you recall what it was expected to help with exactly, and why it failed to help?

Sync1.1 batching used actual MySQL-level "LIMIT" and "OFFSET" clauses which were pretty hard on the server; with the new scheme we have a chance of doing it in a less resource-intensive way server-side.  It's not clear how much of a difference that might make, but it's got a chance.
> See also Bug 730142.

Also, batching + sortindex terrifies me because we don't index for that on the server, each batch would be a full scan and sort of all your items...
(In reply to Richard Newman [:rnewman] from comment #1)
> It's not obvious whether it's better to have a client try to grab all 5,000
> records, possibly failing, or suffer the known and terrible consequences of
> trying that in 10 chunks and failing half-way through.

Isn't another option that we just batch everything up client-side, and don't start processing them until all batches have been received?  That isn't going to help reduce resources on the client, but it sounds like it would keep services happy and have no real impact on the client?
(In reply to Ryan Kelly [:rfkelly] from comment #2)

> Well, another option is for the server to just hard-cap your history
> requests at 5000 items, silently refusing to give you back any more.  IIRC
> we did this for a time when the sync1.1 servers were in strife - did it
> cause any problems?

Clients ought to be sending limited requests for history; if they're not, please file!

They won't send a limited request for bookmarks or other collections, because a semi-random slice of 5,000 bookmarks is guaranteed to result in corruption.


> > Note that we removed batching from XUL Fennec, IIRC: it didn't help.
> 
> Interesting. Do you recall what it was expected to help with exactly, and
> why it failed to help?

I think the assumption was that mobile devices would be less likely to have enough memory, a fast enough connection, and a reliable enough link to successfully pull down hundreds or thousands of records -- they'd have their connection dropped after 400 records, and would fail over and over.

I don't think that assumption was correct: clients that would have failed during a long request would also fail for at least one of the dozens of individual requests we'd make instead.

We reached the point now where most Android devices are better than the laptops of four years ago, so the batching was needless complexity, plus the cost of making lots of HTTP requests rather than just one.

And the batching was ID-based, so even worse.


> Sync1.1 batching used actual MySQL-level "LIMIT" and "OFFSET" clauses which
> were pretty hard on the server; with the new scheme we have a chance of
> doing it in a less resource-intensive way server-side.  It's not clear how
> much of a difference that might make, but it's got a chance.

I don't believe we ever used limit/offset pagination in Sync 1.1. The only batching that was safe was to grab all IDs, then fetch in bundles by passing up sets of IDs, so that's what mobile clients did. Desktop never batched.

Without exclusion, persistent server-side state (compute a list, then step through the list, rather than querying the DB directly), or a proper merge-based timeline, any kind of offset-based pagination will result in data loss.

1.5 has the XIUS guard, which acts as a kind of exclusion. Inefficient, but it works!
(In reply to Mark Hammond [:markh] from comment #4)

> Isn't another option that we just batch everything up client-side, and don't
> start processing them until all batches have been received?

Yup. That's the simplest version of Bug 814801.

It's far from ideal -- for one thing, it assumes that you never see inconsistent state (i.e., you never see a partial write, and all clients are flawless) -- but it solves a big swath of problems.

Beyond that one starts to get into TreeSync.
> > Isn't another option that we just batch everything up client-side, and don't
> > start processing them until all batches have been received?

FWIW this is pretty much what I plan to try out on the server side as well.

> Clients ought to be sending limited requests for history

Great!  I've seen *some* limit-ed history requests, but I thought they were only coming from android.  I've filed Bug 1015036 for a sanity-check.

> 1.5 has the XIUS guard, which acts as a kind of exclusion. Inefficient, but it works!

Haha, indeed, it is basically the worst possible solution that could actually work.

It also comes with a potential starvation problem: if a collection is being frequently updated, clients trying to pull down its contents in a paginated fashion may find they can complete the request before hitting an XIUS conflict.  That doesn't seem too likely in practice, but worth thinking about.
Whiteboard: [qa?]
Keywords: dupeme
Implemented as part of download batching in bug 1294599.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.