Replace toFetch with a backfill mechanism similar to Android's (e.g. HighWaterMark)

NEW
Assigned to

Status

()

enhancement
P2
normal
a year ago
8 months ago

People

(Reporter: tcsc, Assigned: tcsc)

Tracking

(Blocks 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

Assignee

Description

a year ago
On a first sync we download the most recent 5000 history records. Other records from the server need to be backfilled with some mechanism. What desktop does is as follows:

1. Download the entire set of IDs of the collection from the server, remove the ones we already have. For me, this is currently about 90k records on the server. Store this in the toFetch set, which is persisted to disk. 

2. Every time we sync, request up to 5000 items from our toFetch set, (as well as any items that failed in the previous sync). Each request includes up to 50 items (see note), which means we do 100 requests each time. This incurs a large amount of overhead, and makes using on a new device after logging into sync somewhat unpleasant for a while!

(note: In theory it could be 100 for the case of history (According to the comments for `DEFAULT_GUID_FETCH_BATCH_SIZE`, the value of 50 is chosen to avoid URL length limits in the case of 64 char long ids, but we know history entries are 12 chars long. Despite this, the server limits the maximum number of items for the `ids` query parameter to 100, which is still 50 requests, which is better but still bad.)

Aside from the performance issues (which are fairly substantial), this has the downside of making toFetch expensive to setup and update, which makes bugs like bug 1421464 more difficult (or impossible).

On the other hand, Android does something like the following:

1. First there are two engines. recentHistory and history. The recentHistory engine only syncs the N newest records, the history engine is responsible for backfilling historical records (this is oversimplifying)

2. The backfilling history engine persists the `limit`, `older`, and `offset` parameters used to make requests, and does a few steps through this each sync (I've checked with rfkelly and the `offset` tokens do not expire, so this is safe). Conceptually, this is like a getBatched call without an XIUS check where subsequent iterations through the loop might happen in separate syncs.

As I see it this has two problems.

A) You would like to store `newer` too in order to avoid requesting items that were already fetched by the recentHistory engine.

For android this isn't an issue since the recent engine only fetches a small number (50, I believe) of records each sync, but for desktop that doesn't seem like the right choice.

B) If you do that you run the risk of missing some items in the case that you haven't completed your backfill when a number of items greater than your recent fetch size was uploaded to the server.

To solve this I think desktop would just replace the existing toFetch backfill mechanism with one that stores an array of `{ offset, older, newer }` chunks (a new chunk would be added in the same cases that we currently update toFetch, but instead of fetching all the GUIDs since the last sync, we'd just add the range of `{newer: lastSync, older: oldestRecordFromMostRecentGet}`.

Thoughts?

Some of this I have in a patch that I wrote last night when it seemed like less of a change (It's actually not that large of a change if you ignore testing, which I haven't yet touched. It would result in strictly less & simpler code if it weren't for the need to keep around the id-backfilling code to backfill previousFailed, although that probably can be refactored in a better way also).
Priority: -- → P3
Assignee

Comment 1

a year ago
Taking as background work.
Assignee: nobody → tchiovoloni
Assignee

Updated

a year ago
Blocks: 1421464
Priority: P3 → P2
Assignee

Comment 3

a year ago
Comment on attachment 8966353 [details]
Bug 1435007 - (WIP) Kill toFetch, replace with BackfillManager f?markh,kitcambridge

I haven't fixed up all the test changes (The testing for toFetch and previousFailed is fairly extensive), and really I'm unsure how to go about testing this. If you have suggestions on how to test it I'd be interested in hearing them.

Either way, I'd like to get feedback for the changes to engines.js, mostly. But also how you think I should test this / change it to be more testable.
Attachment #8966353 - Flags: feedback?(markh)
Attachment #8966353 - Flags: feedback?(kit)

Comment 4

a year ago
mozreview-review
Comment on attachment 8966353 [details]
Bug 1435007 - (WIP) Kill toFetch, replace with BackfillManager f?markh,kitcambridge

https://reviewboard.mozilla.org/r/235044/#review240820

I only had a quick look, but I think this is a nice improvement and separates some complexity in a nice way, so f+

::: services/sync/modules/engines.js:878
(Diff revision 1)
>     */
>    async resetLocalSyncID() {
>      return this.ensureCurrentSyncID(Utils.makeGUID());
>    },
>  
> +  addToFetch(coll) {

I wonder if a better name for this might be "addToBackfill" or similar?

::: services/sync/modules/engines.js:1934
(Diff revision 1)
>    clear() {
>      this.changes = {};
>    }
>  }
> +
> +// This class is responsible for backfilling records for engines that require a

I'm not too bothered, but this might be better in its own module?
Attachment #8966353 - Flags: feedback?(markh) → feedback+
Duplicate of this bug: 1481030
Comment on attachment 8966353 [details]
Bug 1435007 - (WIP) Kill toFetch, replace with BackfillManager f?markh,kitcambridge

Permalink at https://hg.mozilla.org/mozreview/gecko/rev/244209a3fb15ea09f21a29b128b96fd5b02feb82.

Thanks, Thom! I'm really late to the party, but this looks like a great start, and it's worth thinking about backfill in our Rust adapter...even if we don't end up landing your patch in Desktop.

I'm wondering about a few things:

* Should we record permanently failed IDs somewhere? We persist them in `previousFailed`, but never see them again after two consecutive failures. OTOH, the only use I can think of besides retrying them periodically is adjusting `clientMissing` in telemetry, to account for records that we saw but couldn't apply.

* Do iOS and Android only store `offset` and `older`, and so backfill records that they already saw? Would it be simpler or more efficient to do that, and ignore those records after they're downloaded, instead of chunking to include IDs in the URL?

* Do we want X-I-U-S semantics for batching, too? It likely doesn't matter much for history, and we don't backfill for other engines...
Attachment #8966353 - Flags: feedback?(lina) → feedback+
You need to log in before you can comment on or make changes to this bug.