[Meta] Encourage tools that interact with our API to set informative user agents

RESOLVED FIXED

Status

Tree Management
Treeherder: API
P3
normal
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: emorley, Assigned: emorley)

Tracking

({meta})

Details

(Assignee)

Description

2 years ago
There are times when we're looking at New Relic or gunicorn logs and are trying to work out where a request originated from.

For submissions to us, we now have the hawk client_id to help inform this, however:
1) This doesn't help identify GETs
2) The client_id is in the auth header, which isn't present in the gunicorn logs or New Relic transaction traces (albeit the latter will be helped by bug 1124278), unlike the user agent

treeherder-client uses a user agent of eg:
  treeherder-pyclient/1.8.0

TreeBot uses eg:
  TreeBot/0.1

We should try and identify tools other than those that don't set a custom UA, and file bugs/open PRs to add one.

There are also places within Treeherder itself, where we should be setting a UA but don't (eg the bugscache lookups that doesn't use treeherder-client) - plus we should of course do the right thing with requests we make to third party services too (like hg.mozilla.org).
(Assignee)

Updated

2 years ago
Depends on: 1191403
(Assignee)

Comment 1

2 years ago
Hi Saptarshi! I don't suppose you could set a custom user agent for the script that was mentioned in bug 1230179 comment 2? It will just allow us to more easily tell where requests are coming from in the case of API deprecation, or when requests are causing too much load etc (examples other tools use are in comment 0 here). 

Thanks :-)
Flags: needinfo?(sguha)
Absolutely. I've changed everything and my requests ought to have

"SaptarshiGuhaTalos/1.0"

as the user agent.
If you'd like a more canonical string, I can change it easily.
Flags: needinfo?(sguha)
(Assignee)

Comment 3

2 years ago
That's great - thank you :-)
(Assignee)

Updated

2 years ago
Depends on: 1230610
(Assignee)

Updated

2 years ago
Depends on: 1248272
(Assignee)

Updated

2 years ago
Depends on: 1248277
(Assignee)

Updated

2 years ago
Depends on: 1248278
(Assignee)

Updated

2 years ago
Depends on: 1248279
(Assignee)

Updated

2 years ago
Depends on: 1248281
(Assignee)

Updated

2 years ago
Depends on: 1248282
(Assignee)

Updated

2 years ago
Depends on: 1248283
(Assignee)

Updated

2 years ago
Depends on: 1248284
(Assignee)

Comment 4

2 years ago
Only use treeherder-client (which sets a UA):
https://github.com/mozilla/mozilla_ci_tools
https://github.com/adusca/try_extender
https://github.com/chmanchester/trigger-bot
https://github.com/mozilla/releasetasks
https://hg.mozilla.org/build/puppet

Already set a UA:
https://github.com/globau/treebot

Browser based, so browser UA + referrer is fine:
https://hg.mozilla.org/hgcustom/version-control-tools/

Have a PR open to add a UA:
https://github.com/mozilla/mozmill-ci
https://github.com/mozilla-raptor/post-to-treeherder
https://github.com/mozilla/autophone
https://github.com/jmaher/alert_manager
https://github.com/mozilla/pulse_actions
https://github.com/sydvicious/mozplatformqa-jenkins
https://github.com/mjzffr/treeherder-submission-example
https://github.com/mozilla/funsize

Left:
treeherder-node (bug 1191403)
(Assignee)

Updated

2 years ago
Depends on: 1248468
(Assignee)

Updated

2 years ago
Depends on: 1248650
(Assignee)

Updated

2 years ago
Depends on: 1248676
(Assignee)

Updated

2 years ago
Depends on: 1248710
(Assignee)

Updated

2 years ago
Depends on: 1248932
(Assignee)

Comment 5

2 years ago
I keep on finding more - it's amazing how many projects are using our API now!

https://github.com/mnoorenberghe/mozscreenshots
https://github.com/h4writer/arewefastyet
https://github.com/dminor/ouija
https://github.com/klahnakoski/TestLog-ETL
https://github.com/jcranmer/m-c-tools-code-coverage
(Assignee)

Updated

2 years ago
Depends on: 1248944
(Assignee)

Comment 6

2 years ago
Looking much more useful now (and some of the dependant bugs aren't deployed yet):

  90261 treeherder-pyclient/2.0.1
  73589 HTTP-Monitor/1.1
  60359 ouija
  46617 treeherder/treeherder.mozilla.org
  15304 treeherder-nodeclient/0.7.0
   2926 autophone
   2307 SaptarshiGuhaTalos/1.0
    817 NewRelicPinger/1.0 (677903)
    425 TreeBot/0.1
    416 mozscreenshots/0.3.1
    410 Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
    133 curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
     97 funsize
     70 python-requests/2.9.1
     26 Twitterbot/1.0
      8 Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25
      4 mozplatformqa-jenkins
      4 -
      1 Safari/11601.4.4 CFNetwork/760.2.6 Darwin/15.3.0 (x86_64)
      1 Python-urllib/1.17
      1 Goldfire Server
      1 Go 1.1 package http
      1 Flamingo_SearchEngine (+http://www.flamingosearch.com/bot)

The blank UA entries are:
IP-REDACTED - - [18/Feb/2016:06:31:56 +0000] "POST /api/project/gaia/resultset/ HTTP/1.1" 200 37 "-" "-"
IP-REDACTED - - [18/Feb/2016:08:23:54 +0000] "POST /api/project/gaia/resultset/ HTTP/1.1" 200 37 "-" "-"
IP-REDACTED - - [18/Feb/2016:08:57:40 +0000] "POST /api/project/gaia-master/resultset/ HTTP/1.1" 200 37 "-" "-"
IP-REDACTED - - [18/Feb/2016:11:40:47 +0000] "POST /api/project/gaia/resultset/ HTTP/1.1" 200 37 "-" "-"
...guessing gaia-taskcluster perhaps? (I can't check whether it's been deployed due to a Heroku bug not letting my access the app since it's locked, even though admins are supposed to be able to do so; have filed https://help.heroku.com/tickets/336512).

The curl entries are all to /server-status?auto - and are due to the deploy script's drain/undrain feature.

The Python-urllib entry is to /revision.txt?cachescramble=1455818831.65 and is due to whatsdeployed:
https://github.com/peterbe/whatsdeployed/blob/21cdd8350ad074fd0c0573a6a61f611e52695325/app.py#L68
(Assignee)

Comment 7

2 years ago
Think we're virtually ready to block non-specific (for non-browser only) UAs:

[emorley@treeherder1.webapp.scl3 ~]$ awk -F\" '{print $6}' /var/log/httpd/treeherder.mozilla.org/access_log |
  grep -v 'Mozilla/' | sort | uniq -c | sort -nr
  42058 treeherder-pyclient/2.0.1
  33387 treeherder/treeherder.mozilla.org
  32401 HTTP-Monitor/1.1
  19662 ouija
   7965 treeherder-nodeclient/0.7.0
   6243 SaptarshiGuhaTalos/1.0
   2178 autophone
    364 NewRelicPinger/1.0 (677903)
    172 Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
    168 TreeBot/0.1
     32 mozscreenshots/0.3.1
     18 funsize
      9 mozmill-ci
      6 -
      3 Twitterbot/1.0
      1 IrssiUrlLog/0.2
      1 Flamingo_SearchEngine (+http://www.flamingosearch.com/bot)
(Assignee)

Comment 8

a year ago
Latest:

[emorley@treeherder1.webapp.scl3 ~]$ awk -F\" '{print $6}' /var/log/httpd/treeherder.mozilla.org/access_log |  grep -v 'Mozilla/' | sort | uniq -c | sort -nr
  46894 treeherder/treeherder.mozilla.org
  41783 ouija
  41686 HTTP-Monitor/1.1
  36042 treeherder-pyclient/2.1.0
  11579 treeherder-nodeclient/0.7.0
   6517 SaptarshiGuhaTalos/1.0
   2684 autophone
   1975 treeherder-pyclient/2.0.1
   1115 Go-http-client/1.1
    473 NewRelicPinger/1.0 (677903)
    228 Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
    223 TreeBot/0.1
    206 mozmill-ci
    178 curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.18 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
    167 funsize
    117 mozscreenshots/0.3.1
     26 Opera/9.80 (X11; Linux x86_64; Edition Linux Mint) Presto/2.12.388 Version/12.16
     24 mozplatformqa-jenkins
     20 python-requests/2.9.1
      7 Twitterbot/1.0
      6 wpt-fetchlogs
      6 ltx71 - (http://ltx71.com/)
      1 Scrapy/1.0.5 (+http://scrapy.org)
      1 -

And for stage:

[emorley@treeherder1.stage.webapp.scl3 ~]$ awk -F\" '{print $6}' /var/log/httpd/treeherder.allizom.org/access _log |  grep -v 'Mozilla/' | sort | uniq -c | sort -nr
  45639 treeherder/treeherder.allizom.org
  41563 HTTP-Monitor/1.1
  29195 treeherder-pyclient/2.1.0
  10699 treeherder-nodeclient/0.7.0
   1003 treeherder-pyclient/2.0.1
    439 NewRelicPinger/1.0 (677903)
    187 mozplatformqa-jenkins
    111 arewefastyet
    100 mozmill-ci
     89 treeherder-pyclient/1.8.0
      5 autophone
      1 ltx71 - (http://ltx71.com/)

The Go UAs were of form:
GET /api/project/try/artifact/100032679/

The libcurl ones for server-status and so not affected by DRF blacklisting:
/server-status

The python-requests ones:
//api/project/mozilla-aurora/jobs/?job_guid=79d27713-76c6-4aaa-a86c-c143851b2745
//api/project/mozilla-aurora/resultset/?revision=ca6ab5be342e
(Assignee)

Updated

a year ago
Depends on: 1267683
(Assignee)

Comment 9

a year ago
On prod, the only remaining UA that matches the blacklist is:
  python-requests/2.9.1
...which I believe to be leftover machines that didn't get the fix from bug 1248277 deployed.

On stage there was just:

[12/May/2016:12:14:56 +0000] "GET /revision.txt?cachescramble=1463055296.49 HTTP/1.0" 200 41 "-" "Python-urllib/1.17"

-> what's deployed, have filed:
https://github.com/peterbe/whatsdeployed/issues/13

[12/May/2016:06:07:36 +0000] "GET /a2billing/ HTTP/1.1" 400 26 "-" "python-requests/2.9.1"

-> Some spam / someone scanning for exploitable frameworks or similar.
(Assignee)

Updated

a year ago
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.