Closed Bug 1621806 Opened 5 years ago Closed 5 years ago

Reduce frequency of client-side extension-storage syncs.

Categories

(Firefox :: Sync, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
Firefox 76
Tracking Status
relnote-firefox --- -
firefox75 --- fixed
firefox76 --- fixed

People

(Reporter: tcsc, Assigned: tcsc)

References

Details

(Keywords: dev-doc-needed)

Attachments

(1 file)

According to https://bugzilla.mozilla.org/show_bug.cgi?id=1620423#c3 "If the effort is relatively low, this should be the first item in the list", which we believe to be the case.

I sent an email out yesterday discussing what to do. It should be pretty easy. Email reproduced at the end of this comment.

If needed, I don't mind taking this, but I'd like someone else to make a call about whether we skip every $Nth sync or use a timer-based appraoch, and what the value of N and/or timer threshold should be (to start with -- presumably we could roll out a pref update if it was bad).

Also, do we want to record any telemetry about this? Or would server side be enough.

So, according to about:telemetry, I synced extension storage 80 times in the past day, 100% of these seems to have spent long enough to plausibly be doing some network activity.

Check your about:telemetry (archived ping data > ping type sync > raw payload > search for "extension-storage" -- almost all of my them taking > 300ms, which seems like it indicates hitting the network. note that if you restarted recently you may have to go backwards to find a ping that has more than a handful of syncs (most are over a day and have several dozen)).

Anyway, other engines avoid touching the network for the routine every-15ish-minutes syncs in other ways, The fact that we are doing real syncs of it this on every, err, sync is really just oversight I think (and the fact that it doesn't say much in telemetry at all).

Anyway, we can't do the clever optimizations the other engines do for extension-storage... but, we could... skip the sync sometimes (we seem to do it more than often enough...)

I think a place you can do this is around: https://searchfox.org/mozilla-central/source/services/sync/modules/stages/enginesync.js#176

If engine.name == "extension-storage" (or something, that string might not be lowercase), we should make a decision on whether or not to sync. I'm pretty sure that if you decide not to sync ext-storage, you just need to continue without, err, calling _syncEngine on it.

Anyway, you have several options here, the two easiest are:

  1. Add a global counter somewhere in memory is fine, (be sure it lives across syncs and you can access it here). Then, if we're going to sync extension-storage, first bump the counter and see if it's divisible by $N. If it is, skip the sync for extension storage.

$N == 5 would mean you'd skip 20% of syncs, $N == 2 50%, etc. If you switch the check to be "skip unless divisible by $N" you can reduce it further, but it's your call.

  1. Track how recent our last ext-storage sync was, and skip if it's too recent. (This is easy to describe, but "too recent" seems like it could be very hard to nail down..., so hard to say what's better -- IDC either way though)

Whatever you do, you should probably make it so that turning on a pref can make us go back to normal. Probably also want to be able to adjust the aggression of it (e.g. value of $N) from a pref as well.

Doing this seems very easy, and like the kind of thing that we could safely uplift if it was written carefully and had tests, and has the potential to reduce load on the server a ton.

One last thought: There's a why parameter in scope in that function -- I don't remember all the values it can take (i think "schedule" is the most common), but it's prudent not to do this if why == "user", since if they're going through the trouble of hammering the sync button, we should just let them do it.

Anyway, does that make sense?

This actually interacts a bit more with scheduling than I thought (forgot about resyncs), so I'm going to take this.

Assignee: nobody → tchiovoloni
Priority: -- → P1

I think I'm actually going to do it as a percentage probability, as that gives us more control (if we need to bring it below 50%, for example).

Thanks a lot for taking this on! Do you have a timeline when you will be able to land this change? In particular, will it be ready when the next Desktop release is cut?

Reduce frequency of client-side extension-storage syncs

I'm not entirely happy with this -- :lina indicated in slack that we don't actually need resyncs anymore, so maybe we should just take them out. In order to reduce the negative effect they'd otehrwise have here I bumped their threshold from 0 to the normal one, reduced their maximum to 1, and made us record that the reason for a sync is "resync" so that we have any insight into them in telemetry.

That said, another approach would be: Do two patches:

  1. Just do the extension-storage part of the patch, and turn the max-resync count to 0 in a pref
  2. Take resyncs out entirely

And then we'd only need to uplift the first patch.

Sven, this is about ready to land if someone gives us the go-ahead from ops on how likely it should be for us to skip a sync. Based on the numbers we've seen locally (the numbers I gave in the email above are actually very low -- it's likely on the order of > 100 times per day), Mark and I discussed in slack and said that we'd recommend 50% to start with (this is a pref and so if it needs changing we can do so easily).

That said, we don't want to land this with a magic number like that without someone from ops giving the goahead, so are you a person who can do that, and if not, can you needinfo someone who is?

Flags: needinfo?(sven)

We can land this with 0% and gradually roll it out using pref-rollout and Normandy if we don't want to use a magic number. Do we have to restart the browser for the pref change to take effect?

It should take effect immediately. If you'd rather it start at 0% that's fine too.

It's worth noting startup syncs (and user-triggered syncs) will happen regardless of the value, which should mean even if we're over-aggressive, the data will eventually make it to the server.

A downside of landing this with 0% would be that any bugs wouldn't be detected until we set it to nonzero. I'd prefer that we let it ride with some nonzero value on nightly to start with...

The percentage you are asking for is the percentage of syncs that are skipped, right? So setting this to 25% would skip one in four sync events?

I'd suggest to start with a value of 20%. This should give an observable reduction in traffic, without impacting the sync frequency significantly.

Due to an unexpected traffic increase, we've actually already hit our scaling limits already. We will mitigate this by splitting out part of the database to a separate server. We will still need the client side changes, but they will only hit the release in five or six weeks, I believe.

Flags: needinfo?(sven)

20% sounds good to me. Let's go with that and we can adjust with Normandy if necessary.

Pushed by tchiovoloni@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f3feace9c2a9 Reduce frequency of client-side extension-storage syncs. r=markh
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 76

Comment on attachment 9132997 [details]
Bug 1621806 - Reduce frequency of client-side extension-storage syncs. r?lina,markh

Beta/Release Uplift Approval Request

  • User impact if declined: Ops will be unable to reduce frequency of extension storage (via the pref this adds), thus extension storage sync / kintowe continuing to be broken.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Low risk, limited to syncing of extension data.
  • String changes made/needed:
Attachment #9132997 - Flags: approval-mozilla-beta?

Comment on attachment 9132997 [details]
Bug 1621806 - Reduce frequency of client-side extension-storage syncs. r?lina,markh

tweak sync logic to reduce server-side pain, approved for 75.0b7

Attachment #9132997 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Release Note Request (optional, but appreciated)
[Why is this notable]: The kinto database (for add-on preference syncing) has been seeing high loads on CPU and RAM which can lead to an increase of 500 errors when users are syncing. Our team (sync) has been working closely with the Benson's team and the Ops team to mitigate the issue until the extension preference syncing is migrated over to use the sync15 backend. This bug is one of the mitigation steps which provides a way to reduce the frequency for extension preference syncing, and therefore reducing the load on the backend.
[Affects Firefox for Android]: No.
[Suggested wording]: Reduces the frequency of extension preference background syncing.
[Links (documentation, blog post, etc)]: This should be part of the developer release notes, preferably in the section specific to add-on developers.

relnote-firefox: --- → ?

:jcristau We expect the reduced sync frequency to be temporary. Webextensions sync is being moved to the same backend as the normal Firefox Sync, and once this work is complete the sync fequency will be once every 10 minutes again. The latest estimate I've heard is that this will be released with Firefox 79.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: