There are 16.5 million HTTP 429 responses compared to 8 million HTTP 200 for treeherder prod in the last 7 days: https://insights.newrelic.com/accounts/677903/explorer?eventType=Transaction&timerange=week&filters=%255B%257B%2522key%2522%253A%2522appName%2522%252C%2522value%2522%253A%2522treeherder-prod%2522%257D%255D&facet=response.status These are coming from Treeherder's resultset ingestion: eg: https://rpm.newrelic.com/accounts/677903/applications/4180461/filterable_errors#/show/4e9295-d92ce55a-a36b-11e5-bb1e-b82a72d22a14/stack_trace?top_facet=transactionUiName&bottom_facet=host&primary_facet=error.class&_k=0e2tld There are a few issues: 1) treeherder-client doesn't honour HTTP 429s, so once we hit it, it's just made worse 2) The fetch missing pushlog task then makes things even worse, since if the main pushlog task hits a 429, we then schedule many "fetch missing" tasks increasing load further 3) We still see lots of junk revisions due to bad data (a la bug 1090289)
Meant to say: * These are what are filling up the logs/disk in bug 1229020 * Not sure why this has suddenly started a few weeks ago. Have we broken something or were we just really close to the rate limit?
Created attachment 8698659 [details] [review] PR This temporarily increases the rate limit on /resultset/ from 220 -> 400, until we can address the root causes.
Attachment #8698659 - Flags: review?(emorley)
Attachment #8698659 - Flags: review?(emorley) → review+
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/d994fd8194dbacb66dd3258e030f05fe94ebf0af Bug 1232776 - Temporarily increase rate limit on /resultset/ endpoint For probably not-so-great reasons, we seem to be hitting the limits of what we can submit in production. Until we've addressed the root causes, let's temporarily increase things, since the rate limiting is causing more problems than it's solving.
The dependant bugs have resolved the issue; there's more to do in bug 1191934 et al, but this bug can be closed for now.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.