Reduce frequency of client-side extension-storage syncs.
Categories
(Firefox :: Sync, enhancement, P1)
Tracking
()
People
(Reporter: tcsc, Assigned: tcsc)
References
Details
(Keywords: dev-doc-needed)
Attachments
(1 file)
47 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
|
Details | Review |
According to https://bugzilla.mozilla.org/show_bug.cgi?id=1620423#c3 "If the effort is relatively low, this should be the first item in the list", which we believe to be the case.
I sent an email out yesterday discussing what to do. It should be pretty easy. Email reproduced at the end of this comment.
If needed, I don't mind taking this, but I'd like someone else to make a call about whether we skip every $Nth sync or use a timer-based appraoch, and what the value of N and/or timer threshold should be (to start with -- presumably we could roll out a pref update if it was bad).
Also, do we want to record any telemetry about this? Or would server side be enough.
So, according to about:telemetry, I synced extension storage 80 times in the past day, 100% of these seems to have spent long enough to plausibly be doing some network activity.
Check your
about:telemetry
(archived ping data > ping type sync > raw payload > search for "extension-storage" -- almost all of my them taking > 300ms, which seems like it indicates hitting the network. note that if you restarted recently you may have to go backwards to find a ping that has more than a handful of syncs (most are over a day and have several dozen)).Anyway, other engines avoid touching the network for the routine every-15ish-minutes syncs in other ways, The fact that we are doing real syncs of it this on every, err, sync is really just oversight I think (and the fact that it doesn't say much in telemetry at all).
Anyway, we can't do the clever optimizations the other engines do for extension-storage... but, we could... skip the sync sometimes (we seem to do it more than often enough...)
I think a place you can do this is around: https://searchfox.org/mozilla-central/source/services/sync/modules/stages/enginesync.js#176
If
engine.name == "extension-storage"
(or something, that string might not be lowercase), we should make a decision on whether or not to sync. I'm pretty sure that if you decide not to sync ext-storage, you just need tocontinue
without, err, calling_syncEngine
on it.Anyway, you have several options here, the two easiest are:
- Add a global counter somewhere in memory is fine, (be sure it lives across syncs and you can access it here). Then, if we're going to sync extension-storage, first bump the counter and see if it's divisible by
$N
. If it is, skip the sync for extension storage.
$N == 5
would mean you'd skip 20% of syncs,$N == 2
50%, etc. If you switch the check to be "skip unless divisible by $N" you can reduce it further, but it's your call.
- Track how recent our last ext-storage sync was, and skip if it's too recent. (This is easy to describe, but "too recent" seems like it could be very hard to nail down..., so hard to say what's better -- IDC either way though)
Whatever you do, you should probably make it so that turning on a pref can make us go back to normal. Probably also want to be able to adjust the aggression of it (e.g. value of $N) from a pref as well.
Doing this seems very easy, and like the kind of thing that we could safely uplift if it was written carefully and had tests, and has the potential to reduce load on the server a ton.
One last thought: There's a
why
parameter in scope in that function -- I don't remember all the values it can take (i think "schedule" is the most common), but it's prudent not to do this ifwhy == "user"
, since if they're going through the trouble of hammering the sync button, we should just let them do it.Anyway, does that make sense?
Assignee | ||
Comment 1•5 years ago
|
||
This actually interacts a bit more with scheduling than I thought (forgot about resyncs), so I'm going to take this.
Assignee | ||
Comment 2•5 years ago
|
||
I think I'm actually going to do it as a percentage probability, as that gives us more control (if we need to bring it below 50%, for example).
Comment 3•5 years ago
|
||
Thanks a lot for taking this on! Do you have a timeline when you will be able to land this change? In particular, will it be ready when the next Desktop release is cut?
Assignee | ||
Comment 4•5 years ago
|
||
Reduce frequency of client-side extension-storage syncs
Assignee | ||
Comment 5•5 years ago
|
||
I'm not entirely happy with this -- :lina indicated in slack that we don't actually need resyncs anymore, so maybe we should just take them out. In order to reduce the negative effect they'd otehrwise have here I bumped their threshold from 0 to the normal one, reduced their maximum to 1, and made us record that the reason for a sync is "resync" so that we have any insight into them in telemetry.
That said, another approach would be: Do two patches:
- Just do the extension-storage part of the patch, and turn the max-resync count to 0 in a pref
- Take resyncs out entirely
And then we'd only need to uplift the first patch.
Assignee | ||
Comment 6•5 years ago
|
||
Sven, this is about ready to land if someone gives us the go-ahead from ops on how likely it should be for us to skip a sync. Based on the numbers we've seen locally (the numbers I gave in the email above are actually very low -- it's likely on the order of > 100 times per day), Mark and I discussed in slack and said that we'd recommend 50% to start with (this is a pref and so if it needs changing we can do so easily).
That said, we don't want to land this with a magic number like that without someone from ops giving the goahead, so are you a person who can do that, and if not, can you needinfo someone who is?
Comment 7•5 years ago
|
||
We can land this with 0% and gradually roll it out using pref-rollout and Normandy if we don't want to use a magic number. Do we have to restart the browser for the pref change to take effect?
Assignee | ||
Comment 8•5 years ago
|
||
It should take effect immediately. If you'd rather it start at 0% that's fine too.
It's worth noting startup syncs (and user-triggered syncs) will happen regardless of the value, which should mean even if we're over-aggressive, the data will eventually make it to the server.
Assignee | ||
Comment 9•5 years ago
|
||
A downside of landing this with 0% would be that any bugs wouldn't be detected until we set it to nonzero. I'd prefer that we let it ride with some nonzero value on nightly to start with...
Comment 10•5 years ago
|
||
The percentage you are asking for is the percentage of syncs that are skipped, right? So setting this to 25% would skip one in four sync events?
I'd suggest to start with a value of 20%. This should give an observable reduction in traffic, without impacting the sync frequency significantly.
Due to an unexpected traffic increase, we've actually already hit our scaling limits already. We will mitigate this by splitting out part of the database to a separate server. We will still need the client side changes, but they will only hit the release in five or six weeks, I believe.
Comment 11•5 years ago
|
||
20% sounds good to me. Let's go with that and we can adjust with Normandy if necessary.
Comment 12•5 years ago
|
||
Comment 13•5 years ago
|
||
bugherder |
Assignee | ||
Comment 14•5 years ago
|
||
Comment on attachment 9132997 [details]
Bug 1621806 - Reduce frequency of client-side extension-storage syncs. r?lina,markh
Beta/Release Uplift Approval Request
- User impact if declined: Ops will be unable to reduce frequency of extension storage (via the pref this adds), thus extension storage sync / kintowe continuing to be broken.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Low risk, limited to syncing of extension data.
- String changes made/needed:
Comment 15•5 years ago
|
||
Comment on attachment 9132997 [details]
Bug 1621806 - Reduce frequency of client-side extension-storage syncs. r?lina,markh
tweak sync logic to reduce server-side pain, approved for 75.0b7
Comment 16•5 years ago
|
||
bugherder uplift |
Comment 17•5 years ago
|
||
Release Note Request (optional, but appreciated)
[Why is this notable]: The kinto database (for add-on preference syncing) has been seeing high loads on CPU and RAM which can lead to an increase of 500 errors when users are syncing. Our team (sync) has been working closely with the Benson's team and the Ops team to mitigate the issue until the extension preference syncing is migrated over to use the sync15 backend. This bug is one of the mitigation steps which provides a way to reduce the frequency for extension preference syncing, and therefore reducing the load on the backend.
[Affects Firefox for Android]: No.
[Suggested wording]: Reduces the frequency of extension preference background syncing.
[Links (documentation, blog post, etc)]: This should be part of the developer release notes, preferably in the section specific to add-on developers.
Comment 18•5 years ago
|
||
setting dev-doc-needed for comment 17 and https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/75#Changes_for_add-on_developers
Comment 19•5 years ago
|
||
:jcristau We expect the reduced sync frequency to be temporary. Webextensions sync is being moved to the same backend as the normal Firefox Sync, and once this work is complete the sync fequency will be once every 10 minutes again. The latest estimate I've heard is that this will be released with Firefox 79.
Description
•