Closed Bug 1416077 Opened 7 years ago Closed 7 years ago

kinto's support for retry-after header may cause sync to be blocked for the entire duration.

Tracking

(firefox58 wontfix, firefox59 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla59

Tracking Flags:

Tracking

Status

firefox58

---

wontfix

firefox59

---

fixed

People

(Reporter: markh, Assigned: glasserc)

Details

Attachments

(1 file)

Bug 1416077 - Disable retries on storage.sync requests 7 years ago Ethan Glasser-Camp (:glasserc) 59 bytes, text/x-review-board-request	markh : review+	Details

Mark Hammond [:markh] [:mhammond]

Reporter

Description

•

7 years ago

If I'm reading the code correctly, a Retry-After header in a Kinto response will block *all* syncing (ie, all other engines) until the retry period expires.

http://searchfox.org/mozilla-central/rev/c99d035f00dd894feff38e4ad28a73fb679c63a6/services/common/kinto-http-client.js#2430

The code appears to wait until that period expires before resolving the promise. During this period, Sync itself believes the engine is still syncing, so will not be able to proceed to any other engines.

I *think* backoff headers are less of a problem - they will just cause the engine to fail during the period rather than waiting for it.

Benson, is the server ever going to send those headers, and if so, can it stop doing so while this bug remains open?

Flags: needinfo?(bwong)

Benson Wong [:mostlygeek]

Comment 1

•

7 years ago

I always thought that the kinto.js and sync operated independently of each other. 

NI :glasserc since he wrote the code.

Flags: needinfo?(bwong) → needinfo?(eglassercamp)

Mark Hammond [:markh] [:mhammond]

Reporter

Comment 2

•

7 years ago

(In reply to Benson Wong [:mostlygeek] from comment #1)
> I always thought that the kinto.js and sync operated independently of each
> other.

Not really - it's complicated :)
 
> NI :glasserc since he wrote the code.

Ethan and I chatted in IRC and he believes my analysis is correct - but it's only relevant if the server actually sends that header - which is why I needinfo'd you. Unless you mean that Ethan wrote the relevant server code?

Benson Wong [:mostlygeek]

Comment 3

•

7 years ago

:leplatrem/:natim are better for answering questions regarding the server. 
AFAIK the server *does* send back off headers when things get too busy.

Flags: needinfo?(rhubscher)

Flags: needinfo?(mathieu)

Flags: needinfo?(eglassercamp)

Ethan Glasser-Camp (:glasserc)

Assignee

Comment 4

•

7 years ago

It seems like there is a retry_after_seconds setting that we send in certain error handler cases. It looks like the design is to only send it in certain transient failure situations. Specifically, I believe it happens only when backend code throws a HTTPServiceUnavailable or IntegrityError exceptions. This may be helping us, actually, because the Sentry errors that we see all the time are integrity error exceptions caused by a strange concurrency bug that we haven't nailed down yet. The retry-after might be preventing that from becoming a sync failure.

The default value for retry_after_seconds appears to be 3. I'm not sure what it is in prod.

Ethan Glasser-Camp (:glasserc)

Assignee

Comment 5

•

7 years ago

Looking at this a bit more in the clear light of day, it seems like the value of retry_after in prod is 30 (https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/kintowe/puppet/yaml/type/kintowe.web.yaml#L36), and it seems like the retry isn't infinite -- it only happens once in the default configuration. Is a single 30-second delay that big a deal in exchange for protection from certain transient errors? It's not hard to change, even with the code that's in Gecko right now. I'll attach a patch.

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/198700/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/198700/

Mark Hammond [:markh] [:mhammond]

Reporter

Updated

•

7 years ago

Component: Sync → WebExtensions: Storage

Product: Firefox → Toolkit

Mark Hammond [:markh] [:mhammond]

Reporter

Comment 7

•

7 years ago

mozreview-review

Comment on attachment 8927399 [details]
Bug 1416077 - Disable retries on storage.sync requests

https://reviewboard.mozilla.org/r/198700/#review203996

30 seconds isn't too bad. From the client's POV, this patch makes things better, but I want to check it isn't going to make life more difficult for the server? ie, if this was only sent when the server really really doesn't want new requests, it might be better to live with this for now and come up with a solution that works better for the client *and* server in bug 1415222? So r+, but feel free to use your best judgment about what the right tradeoffs are.

Attachment #8927399 - Flags: review?(markh) → review+

Ethan Glasser-Camp (:glasserc)

Assignee

Comment 8

•

7 years ago

No, this header is only sent when the server believes the error is transient (not driven by load or anything else). In other words, it's more of a "feel free to retry, perhaps in 30s", whereas the Backoff header is more of a "do not retry again in the next 30s". My question to you is, is a failing sync better than a successful one that blocks other engines for 30s? It sounds to me like you are saying "Yes, that is better", and if that's the case, then I can land this tomorrow morning my time.

Mark Hammond [:markh] [:mhammond]

Reporter

Comment 9

•

7 years ago

(In reply to Ethan Glasser-Camp (:glasserc) from comment #8)
> My question to you is, is a failing sync
> better than a successful one that blocks other engines for 30s? It sounds to
> me like you are saying "Yes, that is better", and if that's the case, then I
> can land this tomorrow morning my time.

Yes, IMO that's better - thanks!

Flags: needinfo?(rhubscher)

Flags: needinfo?(mathieu)

Ethan Glasser-Camp (:glasserc)

Assignee

Updated

•

7 years ago

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

Assignee: nobody → eglassercamp

Pulsebot

Comment 10

•

7 years ago

Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/5ec013ad4c58
Disable retries on storage.sync requests r=markh

Keywords: checkin-needed

Cristina Coroiu [:ccoroiu]

Comment 11

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5ec013ad4c58

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox59: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla59

Gerry Chang [:gchang]

Updated

•

7 years ago

status-firefox58: affected → wontfix

marius.santa

Comment 12

•

6 years ago

Is manual testing required on this bug? If Yes, please provide some STR and the proper webextension(if required), if No set the “qe-verify-“ flag.

Flags: needinfo?(eglassercamp)

Ethan Glasser-Camp (:glasserc)

Assignee

Updated

•

6 years ago

Flags: needinfo?(eglassercamp) → qe-verify-

BMO Automation

Updated

•

6 years ago

Product: Toolkit → WebExtensions

You need to log in before you can comment on or make changes to this bug.