Closed Bug 1429250 Opened 5 years ago Closed 5 years ago

Optimize Set/Array operations in _processIncoming.

Categories

(Firefox :: Sync, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
Firefox 60
Tracking Status
firefox60 --- fixed

People

(Reporter: tcsc, Assigned: tcsc)

Details

Attachments

(1 file)

Surprisingly, this is one of the things we spend the longest amount of time doing during a first sync.

Specifically, toFetch may be really quite large for history during a first sync, and we traverse it multiple times in a loop during arraySub (example profile showing arraySub as the bulk of the time spent executing during a sync: https://perfht.ml/2FlVEgK)
Comment on attachment 8941261 [details]
Bug 1429250 - Avoid repeated traversals of toFetch in _processIncoming.

Requesting review from kit as suggested by eoger, it turned out to be a little less trivial than I initially suspected, and Kit refactored this code most recently.

Additional context: On large history syncs we spend a large amount time in arraySub (over a second for a first sync on my machine). This is mainly since we're using the wrong datastructure to store toFetch. It's a Set, we're using an Array, and we keep converting back and forth between it.

This patch us store toFetch in memory as a Set when we're using it, but persist it to the current representation at the end.

A downside with this approach includes the code being a bit uglier, as well as us not persisting toFetch until the end, which might cause us to delay shutdown slightly (via the JSONFile shutdown blocker) if one happens during a sync.

One alternative would be to migrate toFetch (and probably previouslyFailed since they share code and it would be annoying to make them different) store it's ids as `{[id]: true, ...}`, instead of as an array. This would be a larger change, would require a migration, but might be a little cleaner.

Alternatively, we could move toFetch to use IndexedDB, which can store Sets directly (this might be a decent first thing to tackle in bug 1426480?).
Attachment #8941261 - Flags: review?(kit)
Priority: -- → P2
Comment on attachment 8941261 [details]
Bug 1429250 - Avoid repeated traversals of toFetch in _processIncoming.

https://reviewboard.mozilla.org/r/211550/#review219454

One nit, and one concern about when we update `toFetch`. If it's unfounded, please re-flag me and I'll r+. I'd also be OK with changing the format to `{[id]: true, ...}`, as you suggest...though, if we decide to ax `toFetch` and `previousFailed`, and use IndexedDB in bug 1426480, it's probably not worth the migration.

::: services/sync/modules/engines.js:1186
(Diff revision 1)
> -    // Stage 3: Backfill records from the backlog, and those that failed to
> +      // Stage 3: Backfill records from the backlog, and those that failed to
> -    // decrypt or apply during the last sync. We only backfill up to the
> +      // decrypt or apply during the last sync. We only backfill up to the
> -    // download limit, to prevent a large backlog for one engine from blocking
> +      // download limit, to prevent a large backlog for one engine from blocking
> -    // the others. We'll keep processing the backlog on subsequent engine syncs.
> +      // the others. We'll keep processing the backlog on subsequent engine syncs.
> -    let failedInPreviousSync = this.previousFailed;
> +      let failedInPreviousSync = this.previousFailed;
> -    let idsToBackfill = Utils.arrayUnion(this.toFetch.slice(0, downloadLimit),
> +      let idsToBackfill = Utils.arrayUnion(Utils.takeUpTo(toFetch, downloadLimit),

While you're here, how about making `takeUpTo` return a set instead of an array, and changing this to `Array.from(Utils.setAddAll(Utils.takeUpTo(toFetch, downloadLimit), failedInPreviousSync))`? That would get rid of the intermediate arrays in `takeUpTo` and `arrayUnion`, and the set in `arraySub`.

::: services/sync/modules/engines.js:1239
(Diff revision 1)
> +        Utils.setDeleteAll(toFetch, ids);
>  
> -      this.toFetch = Utils.arraySub(this.toFetch, ids);
> -      this.previousFailed = Utils.arrayUnion(this.previousFailed, failedInBackfill);
> +        this.previousFailed = Utils.arrayUnion(this.previousFailed, failedInBackfill);
>  
> -      if (this.lastSync < this.lastModified) {
> +        if (this.lastSync < this.lastModified) {

I'm wondering if we should set `this.toFetch` at the same time that we fast-forward `lastSync`, here and above. (And at the same time that we update `previousFailed` here). Now that `_processIncoming` is async, I have a vague concern that quitting the browser while we're awaiting `get{Batched}`, `_maybeReconcile`, `_applyRecords` might shut down the event loop before our `finally` runs, without throwing or returning.

In theory, our jank yielder should notice this, and throw a shutdown exception...but I'm not sure if there's a case where we might miss that.
Attachment #8941261 - Flags: review?(kit)
Comment on attachment 8941261 [details]
Bug 1429250 - Avoid repeated traversals of toFetch in _processIncoming.

Clearing r? until next iteration.
Attachment #8941261 - Flags: review?(eoger)
This version manages to avoid needing migration code, and avoids issues around not updating toFetch/previousFailed soon enough. I don't think it has any real drawbacks other than some additional complexity[0], which I think is less than the versions that required migration, and any versions that introduced indexeddb code.

It does this by adding a `toJSON` method onto Sets returned by Utils.serializableSet, which just saves them as an array of their members. This is both backwards and forwards compatible with the current format (since it's 100% identical). The serializableSets are then used as the storage of the ids.

Here's [2] a profile of the first two syncs for a profile with a lot of history items, you'll that the amount of time spent in setAddAll/setDeleteAll is much smaller than what was spent in Utils.arraySub and friends (it's the same profile as before). It's hard to compare this directly with the previous profile, since this one was two syncs (whoops), but if anything you'd expect this one to be worse, as a result. JSONFile's `_save` method is longer now (since thats where the work is done), but it's still much less time than was spent in arraySub in the past.

[0]: I guess the big caveat is that the `toFetch` and `previousFailed` getters/setters no longer return/accept arrays. It throws if you mess this up so I think that's probably fine. I guess it could convert for you, if you'd prefer that LMK.

[1]: It could have also subclassed Set, but this seems less likely to hit weird issues (e.g. do I need to worry about Symbol.species? Is it going to have unexpected performance issues -- this is an optimization patch after all, etc... I also think subclassing builtins is gross, but I guess I think that about adding properties to objects you don't own too)

[2]: https://perfht.ml/2rFeqNf
Comment on attachment 8941261 [details]
Bug 1429250 - Avoid repeated traversals of toFetch in _processIncoming.

https://reviewboard.mozilla.org/r/211550/#review221052

Oh, I like this a lot more, it reads much better than converting back and forth between sets and arrays. I like your suggestion of subclassing `Set`; it feels nicer than decorating every instance. It should "just work", I don't think you need to worry about `Symbol.species`, `Symbol.toStringTag`, or anything else. (We have several places in the tree where we do `extends Map` and `extends Set`, and none of them do anything exotic with symbols). But, if it ends up being slower, let's keep what you have now, with a comment about why we don't subclass.

::: services/sync/modules/engines.js:930
(Diff revision 2)
>      this._previousFailedStorage.ensureDataReady();
>      return this._previousFailedStorage.data.ids;
>    },
> +
>    set previousFailed(ids) {
> +    if (ids.constructor.name != "Set" || !ids.toJSON) {

I think we're setting on `ChromeUtils.getClassName(ids) == "Set"` (or `Cu.getClassName`; they're equivalent) for type checks like this.

::: services/sync/modules/util.js:575
(Diff revision 2)
> +   *
> +   * @param items  Optional argument, passed to the Set constructor.
> +   */
> +  serializableSet(items) {
> +    const set = new Set(items);
> +    Object.defineProperty(set, "toJSON", {

Unless you feel strongly about it, or it's slower, I'd prefer to see `SerializableSet` subclass `Set`, instead of decorating every instance with a `toJSON` method. It looks like `ExtensionStorage` does something similar: https://searchfox.org/mozilla-central/rev/4611b9541894e90a421debb57ddbbcff55c2f369/toolkit/components/extensions/ExtensionStorage.jsm#32 See also https://searchfox.org/mozilla-central/rev/4611b9541894e90a421debb57ddbbcff55c2f369/toolkit/components/extensions/ExtensionUtils.jsm#253.

::: services/sync/tests/unit/xpcshell.ini:66
(Diff revision 2)
>  [test_enginemanager.js]
>  [test_syncengine.js]
>  [test_syncengine_sync.js]
>  # Bug 676978: test hangs on Android (see also testing/xpcshell/xpcshell.ini)
>  skip-if = os == "android"
> -run-sequentially = Frequent timeouts, bug 1395148
> +#run-sequentially = Frequent timeouts, bug 1395148

Did you mean to include these in this patch?
Attachment #8941261 - Flags: review?(kit)
Comment on attachment 8941261 [details]
Bug 1429250 - Avoid repeated traversals of toFetch in _processIncoming.

https://reviewboard.mozilla.org/r/211550/#review221104

Awesome, thanks! Please revert the changes to xpcshell.ini before landing. :-)
Attachment #8941261 - Flags: review?(kit) → review+
Comment on attachment 8941261 [details]
Bug 1429250 - Avoid repeated traversals of toFetch in _processIncoming.

https://reviewboard.mozilla.org/r/211550/#review221748

Fairly mechanical patch, thank you!
Attachment #8941261 - Flags: review?(eoger) → review+
Pushed by tchiovoloni@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/82bfe86abafc
Avoid repeated traversals of toFetch in _processIncoming. r=eoger,kitcambridge
https://hg.mozilla.org/mozilla-central/rev/82bfe86abafc
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 60
You need to log in before you can comment on or make changes to this bug.