Closed Bug 1230222 Opened 9 years ago Closed 8 years ago

[Meta] Encourage tools that interact with our API to set informative user agents

Categories

(Tree Management :: Treeherder: API, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

(Keywords: meta)

There are times when we're looking at New Relic or gunicorn logs and are trying to work out where a request originated from.

For submissions to us, we now have the hawk client_id to help inform this, however:
1) This doesn't help identify GETs
2) The client_id is in the auth header, which isn't present in the gunicorn logs or New Relic transaction traces (albeit the latter will be helped by bug 1124278), unlike the user agent

treeherder-client uses a user agent of eg:
  treeherder-pyclient/1.8.0

TreeBot uses eg:
  TreeBot/0.1

We should try and identify tools other than those that don't set a custom UA, and file bugs/open PRs to add one.

There are also places within Treeherder itself, where we should be setting a UA but don't (eg the bugscache lookups that doesn't use treeherder-client) - plus we should of course do the right thing with requests we make to third party services too (like hg.mozilla.org).
Depends on: 1191403
Hi Saptarshi! I don't suppose you could set a custom user agent for the script that was mentioned in bug 1230179 comment 2? It will just allow us to more easily tell where requests are coming from in the case of API deprecation, or when requests are causing too much load etc (examples other tools use are in comment 0 here). 

Thanks :-)
Flags: needinfo?(sguha)
Absolutely. I've changed everything and my requests ought to have

"SaptarshiGuhaTalos/1.0"

as the user agent.
If you'd like a more canonical string, I can change it easily.
Flags: needinfo?(sguha)
That's great - thank you :-)
Depends on: 1230610
Depends on: 1248272
Depends on: 1248277
Depends on: 1248278
Depends on: 1248279
Depends on: 1248281
Depends on: 1248282
Depends on: 1248283
Depends on: 1248284
Depends on: 1248468
Depends on: 1248650
Depends on: 1248676
Depends on: 1248710
Depends on: 1248932
Depends on: 1248944
Looking much more useful now (and some of the dependant bugs aren't deployed yet):

  90261 treeherder-pyclient/2.0.1
  73589 HTTP-Monitor/1.1
  60359 ouija
  46617 treeherder/treeherder.mozilla.org
  15304 treeherder-nodeclient/0.7.0
   2926 autophone
   2307 SaptarshiGuhaTalos/1.0
    817 NewRelicPinger/1.0 (677903)
    425 TreeBot/0.1
    416 mozscreenshots/0.3.1
    410 Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
    133 curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
     97 funsize
     70 python-requests/2.9.1
     26 Twitterbot/1.0
      8 Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25
      4 mozplatformqa-jenkins
      4 -
      1 Safari/11601.4.4 CFNetwork/760.2.6 Darwin/15.3.0 (x86_64)
      1 Python-urllib/1.17
      1 Goldfire Server
      1 Go 1.1 package http
      1 Flamingo_SearchEngine (+http://www.flamingosearch.com/bot)

The blank UA entries are:
IP-REDACTED - - [18/Feb/2016:06:31:56 +0000] "POST /api/project/gaia/resultset/ HTTP/1.1" 200 37 "-" "-"
IP-REDACTED - - [18/Feb/2016:08:23:54 +0000] "POST /api/project/gaia/resultset/ HTTP/1.1" 200 37 "-" "-"
IP-REDACTED - - [18/Feb/2016:08:57:40 +0000] "POST /api/project/gaia-master/resultset/ HTTP/1.1" 200 37 "-" "-"
IP-REDACTED - - [18/Feb/2016:11:40:47 +0000] "POST /api/project/gaia/resultset/ HTTP/1.1" 200 37 "-" "-"
...guessing gaia-taskcluster perhaps? (I can't check whether it's been deployed due to a Heroku bug not letting my access the app since it's locked, even though admins are supposed to be able to do so; have filed https://help.heroku.com/tickets/336512).

The curl entries are all to /server-status?auto - and are due to the deploy script's drain/undrain feature.

The Python-urllib entry is to /revision.txt?cachescramble=1455818831.65 and is due to whatsdeployed:
https://github.com/peterbe/whatsdeployed/blob/21cdd8350ad074fd0c0573a6a61f611e52695325/app.py#L68
Think we're virtually ready to block non-specific (for non-browser only) UAs:

[emorley@treeherder1.webapp.scl3 ~]$ awk -F\" '{print $6}' /var/log/httpd/treeherder.mozilla.org/access_log |
  grep -v 'Mozilla/' | sort | uniq -c | sort -nr
  42058 treeherder-pyclient/2.0.1
  33387 treeherder/treeherder.mozilla.org
  32401 HTTP-Monitor/1.1
  19662 ouija
   7965 treeherder-nodeclient/0.7.0
   6243 SaptarshiGuhaTalos/1.0
   2178 autophone
    364 NewRelicPinger/1.0 (677903)
    172 Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
    168 TreeBot/0.1
     32 mozscreenshots/0.3.1
     18 funsize
      9 mozmill-ci
      6 -
      3 Twitterbot/1.0
      1 IrssiUrlLog/0.2
      1 Flamingo_SearchEngine (+http://www.flamingosearch.com/bot)
Latest:

[emorley@treeherder1.webapp.scl3 ~]$ awk -F\" '{print $6}' /var/log/httpd/treeherder.mozilla.org/access_log |  grep -v 'Mozilla/' | sort | uniq -c | sort -nr
  46894 treeherder/treeherder.mozilla.org
  41783 ouija
  41686 HTTP-Monitor/1.1
  36042 treeherder-pyclient/2.1.0
  11579 treeherder-nodeclient/0.7.0
   6517 SaptarshiGuhaTalos/1.0
   2684 autophone
   1975 treeherder-pyclient/2.0.1
   1115 Go-http-client/1.1
    473 NewRelicPinger/1.0 (677903)
    228 Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
    223 TreeBot/0.1
    206 mozmill-ci
    178 curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
    167 funsize
    117 mozscreenshots/0.3.1
     26 Opera/9.80 (X11; Linux x86_64; Edition Linux Mint) Presto/2.12.388 Version/12.16
     24 mozplatformqa-jenkins
     20 python-requests/2.9.1
      7 Twitterbot/1.0
      6 wpt-fetchlogs
      6 ltx71 - (http://ltx71.com/)
      1 Scrapy/1.0.5 (+http://scrapy.org)
      1 -

And for stage:

[emorley@treeherder1.stage.webapp.scl3 ~]$ awk -F\" '{print $6}' /var/log/httpd/treeherder.allizom.org/access _log |  grep -v 'Mozilla/' | sort | uniq -c | sort -nr
  45639 treeherder/treeherder.allizom.org
  41563 HTTP-Monitor/1.1
  29195 treeherder-pyclient/2.1.0
  10699 treeherder-nodeclient/0.7.0
   1003 treeherder-pyclient/2.0.1
    439 NewRelicPinger/1.0 (677903)
    187 mozplatformqa-jenkins
    111 arewefastyet
    100 mozmill-ci
     89 treeherder-pyclient/1.8.0
      5 autophone
      1 ltx71 - (http://ltx71.com/)

The Go UAs were of form:
GET /api/project/try/artifact/100032679/

The libcurl ones for server-status and so not affected by DRF blacklisting:
/server-status

The python-requests ones:
//api/project/mozilla-aurora/jobs/?job_guid=79d27713-76c6-4aaa-a86c-c143851b2745
//api/project/mozilla-aurora/resultset/?revision=ca6ab5be342e
Depends on: 1267683
On prod, the only remaining UA that matches the blacklist is:
  python-requests/2.9.1
...which I believe to be leftover machines that didn't get the fix from bug 1248277 deployed.

On stage there was just:

[12/May/2016:12:14:56 +0000] "GET /revision.txt?cachescramble=1463055296.49 HTTP/1.0" 200 41 "-" "Python-urllib/1.17"

-> what's deployed, have filed:
https://github.com/peterbe/whatsdeployed/issues/13

[12/May/2016:06:07:36 +0000] "GET /a2billing/ HTTP/1.1" 400 26 "-" "python-requests/2.9.1"

-> Some spam / someone scanning for exploitable frameworks or similar.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Depends on: 1524941
You need to log in before you can comment on or make changes to this bug.