Closed Bug 1230179 Opened 8 years ago Closed 8 years ago

Prod returned 85,000 301 redirects in the last 7 days

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

According to New Relic Insights, prod has 85K 301 redirects in the last 7 days:
https://insights.newrelic.com/accounts/677903/explorer?eventType=Transaction&timerange=week&filters=%255B%257B%2522key%2522%253A%2522response%252estatus%2522%252C%2522value%2522%253A%2522301%2522%257D%255D&facet=appName

Unfortunately unlike the APM section, Insights doesn't let us see the URL or user agent, so we have to resort to the logs instead. I've filed a ticket against New Relic asking for them to add support:
https://support.newrelic.com/tickets/172627/edit

Taking just the gunicorn logs from th-prod-web1, looks like there are several sources:

   4018 GET /api/project/mozilla-central/performance/data?interval=172800&signatures=REMOVED HTTP/1.1" 301 - "-" "libcurl/7.19.7 r-curl/0.8 httr/1.0.0.9000"
   3865 GET /api/project/mozilla-central/performance/data?interval=518400&signatures=REMOVED HTTP/1.1" 301 - "-" "libcurl/7.19.7 r-curl/0.8 httr/1.0.0.9000"
   2079 GET /api/project/mozilla-central/performance/data?interval=1036800&signatures=REMOVED HTTP/1.1" 301 - "-" "libcurl/7.19.7 r-curl/0.8 httr/1.0.0.9000"
   2039 GET /api/project/mozilla-central/performance/data?interval=259200&signatures=REMOVED HTTP/1.1" 301 - "-" "libcurl/7.19.7 r-curl/0.8 httr/1.0.0.9000"
   2022 GET /api/project/mozilla-central/performance/data?interval=1296000&signatures=REMOVED HTTP/1.1" 301 - "-" "libcurl/7.19.7 r-curl/0.8 httr/1.0.0.9000"
   1905 GET /api/project/try/jobs/NNNNNNN HTTP/1.1" 301 - "-" "Python-urllib/2.7"
   1407 GET /api/project/bmo-master/resultset HTTP/1.1" 301 - "-" "TreeBot/0.1"
    375 GET /api/project/mozilla-inbound/jobs/NNNNNNN HTTP/1.1" 301 - "-" "Python-urllib/2.7"
    344 GET /api/project/mozilla-aurora/resultset?revision=REMOVED HTTP/1.1" 301 - "-" "python-requests/2.4.3 CPython/2.7.3 Linux/2.6.32-504.3.3.el6.x86_64"
    276 GET /api/project/fx-team/jobs/NNNNNNN HTTP/1.1" 301 - "-" "Python-urllib/2.7"
    221 GET /api/project/mozilla-central/resultset?revision=REMOVED HTTP/1.1" 301 - "-" "python-requests/2.4.3 CPython/2.7.3 Linux/2.6.32-504.3.3.el6.x86_64"
     46 GET /api/project/mozilla-central/resultset/NNNN HTTP/1.1" 301 - "-" "libcurl/7.19.7 r-curl/0.8 httr/1.0.0.9000"
     45 GET /api/project/bmo-master/resultset/NNN/status HTTP/1.1" 301 - "-" "TreeBot/0.1"
     23 GET /api/project/b2g-inbound/jobs/NNNNNNN HTTP/1.1" 301 - "-" "Python-urllib/2.7"
     22 GET /api/project/mozilla-central/jobs/NNNNNNN HTTP/1.1" 301 - "-" "Python-urllib/2.7"
      8 GET /api/project/mozilla-aurora/jobs/NNNNNNN HTTP/1.1" 301 - "-" "Python-urllib/2.7"
      3 GET /api/project/mozilla-central/resultset/NNNN HTTP/1.1" 301 - "-" "libcurl/7.43.0 r-curl/0.9.4 httr/1.0.0"
      2 GET /api/project/mozilla-beta/jobs/NNNNNNN HTTP/1.1" 301 - "-" "Python-urllib/2.7"
      1 GET /api/project/mozilla-central/resultset/NNNN HTTP/1.1" 301 - "-" "libcurl/7.19.7 r-curl/0.9.3 httr/1.0.0"
      1 GET /api/project/mozilla-central/performance/signatures?interval=518400 HTTP/1.1" 301 - "-" "libcurl/7.19.7 r-curl/0.8 httr/1.0.0.9000"
      1 GET /api/project/ash/resultset?revision=REMOVED HTTP/1.1" 301 - "-" "python-requests/2.4.3 CPython/2.7.3 Linux/2.6.32-504.3.3.el6.x86_64"

Will, are the libcurl entries from a tool you have locally?
Flags: needinfo?(wlachance)
Not from me. Saptarshi, are you using curl to fetch the data? If so, you could make your script marginally more efficient (and reduce load for treeherder) by changing:

/api/project/mozilla-central/performance/data?params

to:

/api/project/mozilla-central/performance/data/?params

(same goes for any other endpoints you're using)
Flags: needinfo?(wlachance) → needinfo?(sguha)
Yes, i have a daily cron job which extracts the raw data and stores it. 
Sorry for the mistake
Flags: needinfo?(sguha)
Ok, if you could fix it as suggested that would be good. No big deal. :)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Done.
Leaving open for the other instances.

I've fixed one in:
https://github.com/globau/treebot/pull/1
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Some of the other instances may be:
https://github.com/mozilla/funsize/blob/66717d862f8b7e429c172f1719530a697a72a5b5/funsize/utils.py#L35
and
https://github.com/mozilla/autophone/blob/44f7029f481dc9f38eb8aa70c6019a35b902ad5b/autophonepulsemonitor.py#L432

I think after fixing these it may be worth coming up with a list of use-cases where people are not using TreeherderClient and adding support for them to it, and then persuading people to use the client to avoid issues like this (and people not setting a user agent etc). If we move to a model where people need credentials to avoid a GET rate limit (those credentials would likely be self-serve, no admin approval needed) it will be much easier for them if they are using the client.
Depends on: 1230383
Depends on: 1230402
Think there's one more - don't suppose you could add the trailing slash to this too? 

127.0.0.1 - - [16/Dec/2015:13:05:15 -0800] "GET /api/project/mozilla-central/performance/signatures?interval=691200 HTTP/1.1" 301 - "-" "SaptarshiGuhaTalos/1.0"

Thanks :-)
Flags: needinfo?(sguha)
Before the query string (all paths end with a trailing slash), so like:
https://treeherder.mozilla.org/api/project/{{branch}}/performance/signatures/?interval={{interval}}
Fixed. Thanks
Depends on: 1234233
I'm now seeing:

127.0.0.1 - - [20/Dec/2015:19:12:54 -0800] "GET /api/project/mozilla-central/resultset/5279 HTTP/1.1" 301 - "-" "SaptarshiGuhaTalos/1.0"

I think the best bet is just to disable APPEND_SLASH so these 404 to make the issue more obvious; I've filed bug 1234233 for that :-)
Oh! This should be GET /api/project/mozilla-central/resultset/5279/ (trailing end slash) ?
Okay, fixed
Yup, thank you :-)
Depends on: 1236894
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/39d572d952a5ad435548fcc0dfadd28d16a15e97
Bug 1230179 - Docs: Fix the links to Swagger so they don't 301 redirect

Avoids this redirect seen in prod gunicorn logs:

[05/Jan/2016:05:55:42 -0800] "GET /docs HTTP/1.1" 301 -
"http://treeherder.readthedocs.org/retrieving_data.html"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:46.0) Gecko/20100101 Firefox/46.0"
There are now only 10 events on prod in the last 7 days, rather than the 85,000 previously:
https://insights.newrelic.com/accounts/677903/explorer?eventType=Transaction&timerange=week&filters=%255B%257B%2522key%2522%253A%2522response%252estatus%2522%252C%2522value%2522%253A%2522301%2522%257D%252C%257B%2522key%2522%253A%2522appName%2522%252C%2522value%2522%253A%2522treeherder-prod%2522%257D%255D&facet=appName

And bug 1234233 will prevent us from regressing in the future.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.