Open Bug 1435007 Opened 7 years ago Updated 3 years ago

Replace toFetch with a backfill mechanism similar to Android's (e.g. HighWaterMark)

Tracking

()

Status:

NEW

People

(Reporter: tcsc, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Bug 1435007 - (WIP) Kill toFetch, replace with BackfillManager f?markh,kitcambridge 7 years ago Thom Chiovoloni [:tcsc] (ex-moco) 59 bytes, text/x-review-board-request	lina : feedback+ markh : feedback+	Details

Thom Chiovoloni [:tcsc] (ex-moco)

Reporter

Description

•

7 years ago

On a first sync we download the most recent 5000 history records. Other records from the server need to be backfilled with some mechanism. What desktop does is as follows: 1. Download the entire set of IDs of the collection from the server, remove the ones we already have. For me, this is currently about 90k records on the server. Store this in the toFetch set, which is persisted to disk. 2. Every time we sync, request up to 5000 items from our toFetch set, (as well as any items that failed in the previous sync). Each request includes up to 50 items (see note), which means we do 100 requests each time. This incurs a large amount of overhead, and makes using on a new device after logging into sync somewhat unpleasant for a while! (note: In theory it could be 100 for the case of history (According to the comments for `DEFAULT_GUID_FETCH_BATCH_SIZE`, the value of 50 is chosen to avoid URL length limits in the case of 64 char long ids, but we know history entries are 12 chars long. Despite this, the server limits the maximum number of items for the `ids` query parameter to 100, which is still 50 requests, which is better but still bad.) Aside from the performance issues (which are fairly substantial), this has the downside of making toFetch expensive to setup and update, which makes bugs like bug 1421464 more difficult (or impossible). On the other hand, Android does something like the following: 1. First there are two engines. recentHistory and history. The recentHistory engine only syncs the N newest records, the history engine is responsible for backfilling historical records (this is oversimplifying) 2. The backfilling history engine persists the `limit`, `older`, and `offset` parameters used to make requests, and does a few steps through this each sync (I've checked with rfkelly and the `offset` tokens do not expire, so this is safe). Conceptually, this is like a getBatched call without an XIUS check where subsequent iterations through the loop might happen in separate syncs. As I see it this has two problems. A) You would like to store `newer` too in order to avoid requesting items that were already fetched by the recentHistory engine. For android this isn't an issue since the recent engine only fetches a small number (50, I believe) of records each sync, but for desktop that doesn't seem like the right choice. B) If you do that you run the risk of missing some items in the case that you haven't completed your backfill when a number of items greater than your recent fetch size was uploaded to the server. To solve this I think desktop would just replace the existing toFetch backfill mechanism with one that stores an array of `{ offset, older, newer }` chunks (a new chunk would be added in the same cases that we currently update toFetch, but instead of fetching all the GUIDs since the last sync, we'd just add the range of `{newer: lastSync, older: oldestRecordFromMostRecentGet}`. Thoughts? Some of this I have in a patch that I wrote last night when it seemed like less of a change (It's actually not that large of a change if you ignore testing, which I haven't yet touched. It would result in strictly less & simpler code if it weren't for the need to keep around the id-backfilling code to backfill previousFailed, although that probably can be refactored in a better way also).

Julie McCracken (:julie)

Updated

•

7 years ago

Priority: -- → P3

Thom Chiovoloni [:tcsc] (ex-moco)

Reporter

Comment 1

•

7 years ago

Taking as background work.

Assignee: nobody → tchiovoloni

Thom Chiovoloni [:tcsc] (ex-moco)

Reporter

Updated

•

7 years ago

Blocks: 1421464

Mark Hammond [:markh] [:mhammond]

Updated

•

7 years ago

Priority: P3 → P2

Comment hidden (mozreview-request)

Thom Chiovoloni [:tcsc] (ex-moco)

Reporter

Comment 3

•

7 years ago

Comment on attachment 8966353 [details] Bug 1435007 - (WIP) Kill toFetch, replace with BackfillManager f?markh,kitcambridge I haven't fixed up all the test changes (The testing for toFetch and previousFailed is fairly extensive), and really I'm unsure how to go about testing this. If you have suggestions on how to test it I'd be interested in hearing them. Either way, I'd like to get feedback for the changes to engines.js, mostly. But also how you think I should test this / change it to be more testable.

Attachment #8966353 - Flags: feedback?(markh)

Attachment #8966353 - Flags: feedback?(kit)

Mark Hammond [:markh] [:mhammond]

Comment 4

•

7 years ago

mozreview-review

Comment on attachment 8966353 [details] Bug 1435007 - (WIP) Kill toFetch, replace with BackfillManager f?markh,kitcambridge https://reviewboard.mozilla.org/r/235044/#review240820 I only had a quick look, but I think this is a nice improvement and separates some complexity in a nice way, so f+ ::: services/sync/modules/engines.js:878 (Diff revision 1) > */ > async resetLocalSyncID() { > return this.ensureCurrentSyncID(Utils.makeGUID()); > }, > > + addToFetch(coll) { I wonder if a better name for this might be "addToBackfill" or similar? ::: services/sync/modules/engines.js:1934 (Diff revision 1) > clear() { > this.changes = {}; > } > } > + > +// This class is responsible for backfilling records for engines that require a I'm not too bothered, but this might be better in its own module?

Mark Hammond [:markh] [:mhammond]

Updated

•

7 years ago

Attachment #8966353 - Flags: feedback?(markh) → feedback+

Lina Butler (ex-Mozilla)

Comment 6

•

7 years ago

Comment on attachment 8966353 [details] Bug 1435007 - (WIP) Kill toFetch, replace with BackfillManager f?markh,kitcambridge Permalink at https://hg.mozilla.org/mozreview/gecko/rev/244209a3fb15ea09f21a29b128b96fd5b02feb82. Thanks, Thom! I'm really late to the party, but this looks like a great start, and it's worth thinking about backfill in our Rust adapter...even if we don't end up landing your patch in Desktop. I'm wondering about a few things: * Should we record permanently failed IDs somewhere? We persist them in `previousFailed`, but never see them again after two consecutive failures. OTOH, the only use I can think of besides retrying them periodically is adjusting `clientMissing` in telemetry, to account for records that we saw but couldn't apply. * Do iOS and Android only store `offset` and `older`, and so backfill records that they already saw? Would it be simpler or more efficient to do that, and ignore those records after they're downloaded, instead of chunking to include IDs in the URL? * Do we want X-I-U-S semantics for batching, too? It likely doesn't matter much for history, and we don't backfill for other engines...

Attachment #8966353 - Flags: feedback?(lina) → feedback+

janet [:janet]

Updated

•

6 years ago

Priority: P2 → P3

Comment hidden (off-topic)

Suhaib Mujahid [:suhaib]

Comment 8

•

3 years ago

Sorry, there was a problem with the detection of inactive users. I'm reverting the change.

Assignee: nobody → chiovolonit

Thom Chiovoloni [:tcsc] (ex-moco)

Reporter

Updated

•

3 years ago

Assignee: chiovolonit → nobody

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Replace toFetch with a backfill mechanism similar to Android's (e.g. HighWaterMark)

Categories

(Firefox :: Sync, enhancement, P3)

Tracking

()

People

(Reporter: tcsc, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Updated

Updated

Comment 2

Comment 3

Comment 4

Updated

Comment 6

Updated

Comment 7

Comment 8

Updated

Updated

Attachment

General

Description

File Name

Content Type