The default bug view has changed. See this FAQ.

[Meta] Encourage tools that interact with our API to set informative user agents

RESOLVED FIXED

Status

Tree Management
Treeherder: API
P3
normal
RESOLVED FIXED
a year ago
9 months ago

People

(Reporter: emorley, Assigned: emorley)

Tracking

({meta})

Details

There are times when we're looking at New Relic or gunicorn logs and are trying to work out where a request originated from.

For submissions to us, we now have the hawk client_id to help inform this, however:
1) This doesn't help identify GETs
2) The client_id is in the auth header, which isn't present in the gunicorn logs or New Relic transaction traces (albeit the latter will be helped by bug 1124278), unlike the user agent

treeherder-client uses a user agent of eg:
  treeherder-pyclient/1.8.0

TreeBot uses eg:
  TreeBot/0.1

We should try and identify tools other than those that don't set a custom UA, and file bugs/open PRs to add one.

There are also places within Treeherder itself, where we should be setting a UA but don't (eg the bugscache lookups that doesn't use treeherder-client) - plus we should of course do the right thing with requests we make to third party services too (like hg.mozilla.org).
Depends on: 1191403
Hi Saptarshi! I don't suppose you could set a custom user agent for the script that was mentioned in bug 1230179 comment 2? It will just allow us to more easily tell where requests are coming from in the case of API deprecation, or when requests are causing too much load etc (examples other tools use are in comment 0 here). 

Thanks :-)
Flags: needinfo?(sguha)
Absolutely. I've changed everything and my requests ought to have

"SaptarshiGuhaTalos/1.0"

as the user agent.
If you'd like a more canonical string, I can change it easily.
Flags: needinfo?(sguha)
That's great - thank you :-)
Depends on: 1230610
Depends on: 1248272
Depends on: 1248277
Depends on: 1248278
Depends on: 1248279
Depends on: 1248281
Depends on: 1248282
Depends on: 1248283
Depends on: 1248284
Only use treeherder-client (which sets a UA):
https://github.com/mozilla/mozilla_ci_tools
https://github.com/adusca/try_extender
https://github.com/chmanchester/trigger-bot
https://github.com/mozilla/releasetasks
https://hg.mozilla.org/build/puppet

Already set a UA:
https://github.com/globau/treebot

Browser based, so browser UA + referrer is fine:
https://hg.mozilla.org/hgcustom/version-control-tools/

Have a PR open to add a UA:
https://github.com/mozilla/mozmill-ci
https://github.com/mozilla-raptor/post-to-treeherder
https://github.com/mozilla/autophone
https://github.com/jmaher/alert_manager
https://github.com/mozilla/pulse_actions
https://github.com/sydvicious/mozplatformqa-jenkins
https://github.com/mjzffr/treeherder-submission-example
https://github.com/mozilla/funsize

Left:
treeherder-node (bug 1191403)
Depends on: 1248468
Depends on: 1248650
Depends on: 1248676
Depends on: 1248710
Depends on: 1248932
I keep on finding more - it's amazing how many projects are using our API now!

https://github.com/mnoorenberghe/mozscreenshots
https://github.com/h4writer/arewefastyet
https://github.com/dminor/ouija
https://github.com/klahnakoski/TestLog-ETL
https://github.com/jcranmer/m-c-tools-code-coverage
Depends on: 1248944
Looking much more useful now (and some of the dependant bugs aren't deployed yet):

  90261 treeherder-pyclient/2.0.1
  73589 HTTP-Monitor/1.1
  60359 ouija
  46617 treeherder/treeherder.mozilla.org
  15304 treeherder-nodeclient/0.7.0
   2926 autophone
   2307 SaptarshiGuhaTalos/1.0
    817 NewRelicPinger/1.0 (677903)
    425 TreeBot/0.1
    416 mozscreenshots/0.3.1
    410 Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
    133 curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
     97 funsize
     70 python-requests/2.9.1
     26 Twitterbot/1.0
      8 Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25
      4 mozplatformqa-jenkins
      4 -
      1 Safari/11601.4.4 CFNetwork/760.2.6 Darwin/15.3.0 (x86_64)
      1 Python-urllib/1.17
      1 Goldfire Server
      1 Go 1.1 package http
      1 Flamingo_SearchEngine (+http://www.flamingosearch.com/bot)

The blank UA entries are:
IP-REDACTED - - [18/Feb/2016:06:31:56 +0000] "POST /api/project/gaia/resultset/ HTTP/1.1" 200 37 "-" "-"
IP-REDACTED - - [18/Feb/2016:08:23:54 +0000] "POST /api/project/gaia/resultset/ HTTP/1.1" 200 37 "-" "-"
IP-REDACTED - - [18/Feb/2016:08:57:40 +0000] "POST /api/project/gaia-master/resultset/ HTTP/1.1" 200 37 "-" "-"
IP-REDACTED - - [18/Feb/2016:11:40:47 +0000] "POST /api/project/gaia/resultset/ HTTP/1.1" 200 37 "-" "-"
...guessing gaia-taskcluster perhaps? (I can't check whether it's been deployed due to a Heroku bug not letting my access the app since it's locked, even though admins are supposed to be able to do so; have filed https://help.heroku.com/tickets/336512).

The curl entries are all to /server-status?auto - and are due to the deploy script's drain/undrain feature.

The Python-urllib entry is to /revision.txt?cachescramble=1455818831.65 and is due to whatsdeployed:
https://github.com/peterbe/whatsdeployed/blob/21cdd8350ad074fd0c0573a6a61f611e52695325/app.py#L68
Think we're virtually ready to block non-specific (for non-browser only) UAs:

[emorley@treeherder1.webapp.scl3 ~]$ awk -F\" '{print $6}' /var/log/httpd/treeherder.mozilla.org/access_log |
  grep -v 'Mozilla/' | sort | uniq -c | sort -nr
  42058 treeherder-pyclient/2.0.1
  33387 treeherder/treeherder.mozilla.org
  32401 HTTP-Monitor/1.1
  19662 ouija
   7965 treeherder-nodeclient/0.7.0
   6243 SaptarshiGuhaTalos/1.0
   2178 autophone
    364 NewRelicPinger/1.0 (677903)
    172 Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
    168 TreeBot/0.1
     32 mozscreenshots/0.3.1
     18 funsize
      9 mozmill-ci
      6 -
      3 Twitterbot/1.0
      1 IrssiUrlLog/0.2
      1 Flamingo_SearchEngine (+http://www.flamingosearch.com/bot)
Latest:

[emorley@treeherder1.webapp.scl3 ~]$ awk -F\" '{print $6}' /var/log/httpd/treeherder.mozilla.org/access_log |  grep -v 'Mozilla/' | sort | uniq -c | sort -nr
  46894 treeherder/treeherder.mozilla.org
  41783 ouija
  41686 HTTP-Monitor/1.1
  36042 treeherder-pyclient/2.1.0
  11579 treeherder-nodeclient/0.7.0
   6517 SaptarshiGuhaTalos/1.0
   2684 autophone
   1975 treeherder-pyclient/2.0.1
   1115 Go-http-client/1.1
    473 NewRelicPinger/1.0 (677903)
    228 Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
    223 TreeBot/0.1
    206 mozmill-ci
    178 curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
    167 funsize
    117 mozscreenshots/0.3.1
     26 Opera/9.80 (X11; Linux x86_64; Edition Linux Mint) Presto/2.12.388 Version/12.16
     24 mozplatformqa-jenkins
     20 python-requests/2.9.1
      7 Twitterbot/1.0
      6 wpt-fetchlogs
      6 ltx71 - (http://ltx71.com/)
      1 Scrapy/1.0.5 (+http://scrapy.org)
      1 -

And for stage:

[emorley@treeherder1.stage.webapp.scl3 ~]$ awk -F\" '{print $6}' /var/log/httpd/treeherder.allizom.org/access _log |  grep -v 'Mozilla/' | sort | uniq -c | sort -nr
  45639 treeherder/treeherder.allizom.org
  41563 HTTP-Monitor/1.1
  29195 treeherder-pyclient/2.1.0
  10699 treeherder-nodeclient/0.7.0
   1003 treeherder-pyclient/2.0.1
    439 NewRelicPinger/1.0 (677903)
    187 mozplatformqa-jenkins
    111 arewefastyet
    100 mozmill-ci
     89 treeherder-pyclient/1.8.0
      5 autophone
      1 ltx71 - (http://ltx71.com/)

The Go UAs were of form:
GET /api/project/try/artifact/100032679/

The libcurl ones for server-status and so not affected by DRF blacklisting:
/server-status

The python-requests ones:
//api/project/mozilla-aurora/jobs/?job_guid=79d27713-76c6-4aaa-a86c-c143851b2745
//api/project/mozilla-aurora/resultset/?revision=ca6ab5be342e
Depends on: 1267683
On prod, the only remaining UA that matches the blacklist is:
  python-requests/2.9.1
...which I believe to be leftover machines that didn't get the fix from bug 1248277 deployed.

On stage there was just:

[12/May/2016:12:14:56 +0000] "GET /revision.txt?cachescramble=1463055296.49 HTTP/1.0" 200 41 "-" "Python-urllib/1.17"

-> what's deployed, have filed:
https://github.com/peterbe/whatsdeployed/issues/13

[12/May/2016:06:07:36 +0000] "GET /a2billing/ HTTP/1.1" 400 26 "-" "python-requests/2.9.1"

-> Some spam / someone scanning for exploitable frameworks or similar.
Status: NEW → RESOLVED
Last Resolved: 9 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.