Closed
Bug 877050
Opened 12 years ago
Closed 11 years ago
Cleanly drop a percentage of sync traffic with a 503, rather than allowing weird behavior under load
Categories
(Cloud Services :: Operations: Miscellaneous, task, P2)
Cloud Services
Operations: Miscellaneous
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: rfkelly, Assigned: rfkelly)
References
Details
(Whiteboard: [qa+])
Users are currently seeing error bars and weird behavior like 400 errors, javascript tracebacks, etc. See e.g. Bug 685941 and Bug 749315. As best we understand it, these are caused by timeouts and other load-induced strangeness truncating request/response bodies.
Richard suggested on IRC that it may be better to purposely throttle our traffic with 503s, rather than allow the problems to show up as user-visible error bars. For example, randomly fail 50% of requests to /info/collections with a 503, Retry-After 2 hours.
Advantages:
* clients are better behaved with a clean 503, and produce less error-bar noise
* failing things out at the initial handshake will produce less DB load than allowing them to fail in the middle of a sync
Thoughts?
Updated•12 years ago
|
Whiteboard: [qa+]
Assignee | ||
Comment 3•11 years ago
|
||
Fixed for Sync1.5 in Bug 975306.
Bob do you want any action on this bug for sync1.1 or should we just close it out?
Assignee: nobody → rfkelly
Status: NEW → ASSIGNED
Flags: needinfo?(bobm)
Comment 4•11 years ago
|
||
(In reply to Ryan Kelly [:rfkelly] from comment #3)
> Fixed for Sync1.5 in Bug 975306.
>
> Bob do you want any action on this bug for sync1.1 or should we just close
> it out?
No, we can put nodes into back-off at the various points of redirection in our Sync 1.1 production, which should be fine.
Flags: needinfo?(bobm)
Assignee | ||
Comment 5•11 years ago
|
||
Great, closing it out then.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•