Closed Bug 1415012 Opened 7 years ago Closed 7 years ago

Sync should not trigger immediate resync if some engines failed during the sync.

Categories

(Firefox :: Sync, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 58
Tracking Status
firefox57 --- wontfix
firefox58 --- fixed

People

(Reporter: tcsc, Assigned: tcsc)

Details

Attachments

(1 file)

This will cause us to sync 5x in a row after a failed sync that has a reason to keep failing.

I can't say for certain but I suspect this is contributing to the bad behavior in bug 1414880 (one user reported logs that look like this in IRC).

This is trivial to fix, also.
This seems quite bad, so taking. Patch incoming.
Assignee: nobody → tchiovoloni
Status: NEW → ASSIGNED
Priority: -- → P1
Comment on attachment 8925762 [details]
Bug 1415012 - Don't trigger a resync if the previous sync wasn't completely successful

https://reviewboard.mozilla.org/r/196930/#review202100

Ouch. Nice catch!
Attachment #8925762 - Flags: review?(kit) → review+
Pushed by tchiovoloni@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5e887aadb923
Don't trigger a resync if the previous sync wasn't completely successful r=kitcambridge
https://hg.mozilla.org/mozilla-central/rev/5e887aadb923
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 58
Comment on attachment 8925762 [details]
Bug 1415012 - Don't trigger a resync if the previous sync wasn't completely successful

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1335891
[User impact if declined]: Extension data might not be synced, because the extension-storage sync servers would be under unnecessarily heavy load. It would probably increase the load for the other servers as well, but not as much (extension storage runs on a separate stack from everything else).

(Effectively, when the extension-storage sync would fail due to timeout or network issues, we'd respond by trying to sync it 5x in a row. It is likely that this contributed to bug 1414880).

[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: No.
[Why is the change risky/not risky?]: It's one line and it fixes a clear (in retrospect) flaw.
[String changes made/needed]: N/A
Attachment #8925762 - Flags: approval-mozilla-beta?
We've shipped 55 and 56 with this bug, is the issue bad enough that we absolutely need to take this in a rc2?
(In reply to Julien Cristau [:jcristau] from comment #7)
> We've shipped 55 and 56 with this bug, is the issue bad enough that we
> absolutely need to take this in a rc2?
Flags: needinfo?(tchiovoloni)
I'd expect usage of webextension APIs like extension storage to be substantially higher in 57 (given that they're the only option), and this issue only occurs under high load, so IMO yes.

:glasserc might have more insight into how problematic this behavior is though, if we're confident we can keep everything above the water, then it isn't an issue.
Flags: needinfo?(tchiovoloni) → needinfo?(eglassercamp)
(In reply to Julien Cristau [:jcristau] from comment #7)
> We've shipped 55 and 56 with this bug, is the issue bad enough that we
> absolutely need to take this in a rc2?

The number of extensions using storage.sync has greatly increased as they port over to WebExtensions with Firefox 57. That increases the importance of this patch from 55 and 56.
Current thinking on our team is that the recent spike in activity is due to certain add-ons being converted to WebExtensions and those new versions being pushed to the entire user population, even on 56 (which had the same APIs). Because our load scales with the number of users of the extension, we expect our load to spike every time a new WebExtension conversion gets pushed to our user population. In other words, we might not expect more load at 57 release time. We are still expecting a couple of big-league conversions to happen, but nothing as big as we had earlier this week. Additionally, we still have some head room in our server-side capacity (we can spend more money to increase scale for a little while longer). So my feeling is that we can get away without this patch. On the other hand, having this patch helps us mitigate the risk that the above hypothesis is wrong, in exchange for the risk that comes from uplifting an additional patch. This is the first time we've been under real load, so our assumptions about the stack are still quite untested.

If we don't uplift this patch, the worst thing that could happen is that syncing storage.sync fails and causes cascading failure which adds more load to us, which could cause lots storage.sync to fail for other users. But storage.sync doesn't have to succeed 100% of the time to be useful to users, and the failure tends to be silent, so maybe it's OK. It tends to really bother the people who have the about:sync extension, which surfaces these failures as notifications.
Flags: needinfo?(eglassercamp)
For extra clarification, if there wasn't already going to be an RC2 release, I don't think this bug is serious enough to warrant one. My recommendation was under the assumption that we were already planning on doing one, in which case I do think we should take this.
I think given the bar for 57 patches at this date and based on comment 11 and comment 12, we should not uplift this to 57.
Panos, Chris, Julie also helped review this one (risk vs reward) and the final decision is to let this one ride the 58 train.
Attachment #8925762 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: