Closed Bug 1416077 Opened 7 years ago Closed 7 years ago

kinto's support for retry-after header may cause sync to be blocked for the entire duration.

Categories

(WebExtensions :: Storage, enhancement)

enhancement
Not set
normal

Tracking

(firefox58 wontfix, firefox59 fixed)

RESOLVED FIXED
mozilla59
Tracking Status
firefox58 --- wontfix
firefox59 --- fixed

People

(Reporter: markh, Assigned: glasserc)

Details

Attachments

(1 file)

If I'm reading the code correctly, a Retry-After header in a Kinto response will block *all* syncing (ie, all other engines) until the retry period expires.

http://searchfox.org/mozilla-central/rev/c99d035f00dd894feff38e4ad28a73fb679c63a6/services/common/kinto-http-client.js#2430

The code appears to wait until that period expires before resolving the promise. During this period, Sync itself believes the engine is still syncing, so will not be able to proceed to any other engines.

I *think* backoff headers are less of a problem - they will just cause the engine to fail during the period rather than waiting for it.

Benson, is the server ever going to send those headers, and if so, can it stop doing so while this bug remains open?
Flags: needinfo?(bwong)
I always thought that the kinto.js and sync operated independently of each other. 

NI :glasserc since he wrote the code.
Flags: needinfo?(bwong) → needinfo?(eglassercamp)
(In reply to Benson Wong [:mostlygeek] from comment #1)
> I always thought that the kinto.js and sync operated independently of each
> other.

Not really - it's complicated :)
 
> NI :glasserc since he wrote the code.

Ethan and I chatted in IRC and he believes my analysis is correct - but it's only relevant if the server actually sends that header - which is why I needinfo'd you. Unless you mean that Ethan wrote the relevant server code?
:leplatrem/:natim are better for answering questions regarding the server. 
AFAIK the server *does* send back off headers when things get too busy.
Flags: needinfo?(rhubscher)
Flags: needinfo?(mathieu)
Flags: needinfo?(eglassercamp)
It seems like there is a retry_after_seconds setting that we send in certain error handler cases. It looks like the design is to only send it in certain transient failure situations. Specifically, I believe it happens only when backend code throws a HTTPServiceUnavailable or IntegrityError exceptions. This may be helping us, actually, because the Sentry errors that we see all the time are integrity error exceptions caused by a strange concurrency bug that we haven't nailed down yet. The retry-after might be preventing that from becoming a sync failure.

The default value for retry_after_seconds appears to be 3. I'm not sure what it is in prod.
Looking at this a bit more in the clear light of day, it seems like the value of retry_after in prod is 30 (https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/kintowe/puppet/yaml/type/kintowe.web.yaml#L36), and it seems like the retry isn't infinite -- it only happens once in the default configuration. Is a single 30-second delay that big a deal in exchange for protection from certain transient errors? It's not hard to change, even with the code that's in Gecko right now. I'll attach a patch.
Component: Sync → WebExtensions: Storage
Product: Firefox → Toolkit
Comment on attachment 8927399 [details]
Bug 1416077 - Disable retries on storage.sync requests

https://reviewboard.mozilla.org/r/198700/#review203996

30 seconds isn't too bad. From the client's POV, this patch makes things better, but I want to check it isn't going to make life more difficult for the server? ie, if this was only sent when the server really really doesn't want new requests, it might be better to live with this for now and come up with a solution that works better for the client *and* server in bug 1415222? So r+, but feel free to use your best judgment about what the right tradeoffs are.
Attachment #8927399 - Flags: review?(markh) → review+
No, this header is only sent when the server believes the error is transient (not driven by load or anything else). In other words, it's more of a "feel free to retry, perhaps in 30s", whereas the Backoff header is more of a "do not retry again in the next 30s". My question to you is, is a failing sync better than a successful one that blocks other engines for 30s? It sounds to me like you are saying "Yes, that is better", and if that's the case, then I can land this tomorrow morning my time.
(In reply to Ethan Glasser-Camp (:glasserc) from comment #8)
> My question to you is, is a failing sync
> better than a successful one that blocks other engines for 30s? It sounds to
> me like you are saying "Yes, that is better", and if that's the case, then I
> can land this tomorrow morning my time.

Yes, IMO that's better - thanks!
Flags: needinfo?(rhubscher)
Flags: needinfo?(mathieu)
Keywords: checkin-needed
Assignee: nobody → eglassercamp
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/5ec013ad4c58
Disable retries on storage.sync requests r=markh
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/5ec013ad4c58
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Is manual testing required on this bug? If Yes, please provide some STR and the proper webextension(if required), if No set the “qe-verify-“ flag.
Flags: needinfo?(eglassercamp)
Flags: needinfo?(eglassercamp) → qe-verify-
Product: Toolkit → WebExtensions
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: