Closed
Bug 1232776
Opened 9 years ago
Closed 9 years ago
Treeherder's own resultset ingestion is causing 17 million HTTP 429 responses/week
Categories
(Tree Management :: Treeherder: API, defect, P1)
Tree Management
Treeherder: API
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: emorley)
References
Details
Attachments
(1 file)
There are 16.5 million HTTP 429 responses compared to 8 million HTTP 200 for treeherder prod in the last 7 days:
https://insights.newrelic.com/accounts/677903/explorer?eventType=Transaction&timerange=week&filters=%255B%257B%2522key%2522%253A%2522appName%2522%252C%2522value%2522%253A%2522treeherder-prod%2522%257D%255D&facet=response.status
These are coming from Treeherder's resultset ingestion:
eg:
https://rpm.newrelic.com/accounts/677903/applications/4180461/filterable_errors#/show/4e9295-d92ce55a-a36b-11e5-bb1e-b82a72d22a14/stack_trace?top_facet=transactionUiName&bottom_facet=host&primary_facet=error.class&_k=0e2tld
There are a few issues:
1) treeherder-client doesn't honour HTTP 429s, so once we hit it, it's just made worse
2) The fetch missing pushlog task then makes things even worse, since if the main pushlog task hits a 429, we then schedule many "fetch missing" tasks increasing load further
3) We still see lots of junk revisions due to bad data (a la bug 1090289)
Assignee | ||
Comment 1•9 years ago
|
||
Meant to say:
* These are what are filling up the logs/disk in bug 1229020
* Not sure why this has suddenly started a few weeks ago. Have we broken something or were we just really close to the rate limit?
Comment 2•9 years ago
|
||
This temporarily increases the rate limit on /resultset/ from 220 -> 400, until we can address the root causes.
Attachment #8698659 -
Flags: review?(emorley)
Assignee | ||
Updated•9 years ago
|
Attachment #8698659 -
Flags: review?(emorley) → review+
Comment 3•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/d994fd8194dbacb66dd3258e030f05fe94ebf0af
Bug 1232776 - Temporarily increase rate limit on /resultset/ endpoint
For probably not-so-great reasons, we seem to be hitting the limits of
what we can submit in production. Until we've addressed the root causes,
let's temporarily increase things, since the rate limiting is causing
more problems than it's solving.
Assignee | ||
Comment 4•9 years ago
|
||
The dependant bugs have resolved the issue; there's more to do in bug 1191934 et al, but this bug can be closed for now.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•