Closed Bug 1279213 Opened 8 years ago Closed 8 years ago

Reduce the footgun potential of the TreeherderClient() constructor

Categories

(Tree Management Graveyard :: Treeherder: Client Libraries, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(1 file)

In bug 1276742, there were problems submitting to a local Treeherder instance using the Python client, since TreeherderClient had been instantiated using something like:

TreeherderClient(host='local.treeherder.mozilla.org')

This looks fine until you realise the unspecified 'protocol' argument defaults to 'https'.

Possible ways to make this less footgun-prone:

1) Change the constructor to accept eg:
   TreeherderClient(api='http://local.treeherder.mozilla.org')
   TreeherderClient(api='https://treeherder.allizom.org')

2) Add a hardcoded check in the constructor to warn if using `host=local.treeherder.mozilla.org` without setting the protocol to 'http'. (I would probably also rename `protocol` to `use_ssl` or similar and make it a bool).

3) Switch to an API like:
   TreeherderClient(backend='production')
   TreeherderClient(backend='local')
   TreeherderClient(backend='https://my-custom-domain.com')
   (where production/stage/local map to constants in the client)

4) Have proper auto-generated docs for the Python client on read the docs, that show the constructor argument defaults, so they are more apparent.


Note for all of the above, any API changes would be made backwards-compatible.

In addition, even if we do #1-3, we should probably still do #4 anyway.
5) Add support for HTTPS using the local Vagrant instance, either using a self-signed cert, or maybe even switch to an ngrok (https://ngrok.com/) based solution.
Can we just make the host= parameter be able to take a full url, and then deprecate the protocol parameter in a backwards-compatible way? We should be able to use the urlparse library for parsing out the full url:

https://docs.python.org/2/library/urlparse.html
I'm happy to add that option into the mix - however the reason I didn't suggest it initially was:
a) backwards compatibility is still easy to add when changing the argument name (as comment 0 mentioned, I'm not planning on making this a breaking change)
b) the term 'host' doesn't normally include the scheme (https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax)
c) it requires using urlparse, which whilst not a big deal, could be avoided if we changed the argument name

There aren't actually a huge number of consumers that will need changing:
https://github.com/search?l=python&q=TreeherderClient&type=Code

Many of those could actually be switched to passing zero arguments to the constructor, since they use prod.
Attachment #8764209 - Flags: review?(wlachance)
Assignee: nobody → emorley
(In reply to Ed Morley [:emorley] from comment #1)
> 5) Add support for HTTPS using the local Vagrant instance, either using a
> self-signed cert, or maybe even switch to an ngrok (https://ngrok.com/)
> based solution.

I think even if we did decide to do this (there are a number of issues with both approaches from my initial research), passing a URL rather than just a hostname seems like a cleaner API, so we should do this regardless.
Comment on attachment 8764209 [details] [review]
[treeherder] mozilla:pyclient-server-url > mozilla:master

This looks great! Just one very minor comment which you can take or leave.
Attachment #8764209 - Flags: review?(wlachance) → review+
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/ad4d364d9185632918adfe7879dbaeadcc4484df
Bug 1279213 - Python client: Remove the per-method timeout parameters

The `timeout` parameter to requests.{get,post} only control the timeout
waiting for a response to the socket, and not the total request time. As
such, it doesn't really make sense to want to specify different timeouts
for different API requests (since the time to connect should be similar,
even if one API request is more time-consuming to generate overall).

This commit removes the per-method timeout argument, leaving the
TreeherderClient constructor as the single place to control the timeout.

https://github.com/mozilla/treeherder/commit/fae3a9ec75585d532566521d6b580add64f678a3
Bug 1279213 - Python client: Reduce the default timeout from 120s to 30s

Since 120s is way too long.

https://github.com/mozilla/treeherder/commit/75997d5ccb373e2b3293b7a87847587847670ed7
Bug 1279213 - Python client: Combine _get_uri and _get_project_uri

Since they are virtually identical, and downstream consumers (eg
`_get_json()`) are having to check for the presence of `project` to
figure out which to call.

https://github.com/mozilla/treeherder/commit/6001db37614f8b41c97499d74b3aac9f749e0bff
Bug 1279213 - Python client: Combine host and protocol into server_url

This makes it harder to inadvertently use HTTPS with local Vagrant
hostnames, as well as reduces the number of config variables users of
the client have to keep track of.

The docs have been tweaked to encourage people using production
Treeherder to just omit the `server_url` argument entirely, which
reduces the boilerplate, and also means they'll be less affected by
changes in the future.

https://github.com/mozilla/treeherder/commit/b4119b03582bfbe9d3aa2c2554be032824be20da
Bug 1279213 - Derive the Treeherder User Agent hostname from SITE_URL

The User Agent Treeherder uses when making requests to other sites (eg
hg.mozilla.org) is of form:
`treeherder/treeherder.mozilla.org`

A hostname derived from `SITE_URL` makes more sense as a site-identity
than `TREEHERDER_REQUEST_HOST` (which is the hostname we make requests
*to*, albeit it's typically the same host), plus we need to stop using
`TREEHERDER_REQUEST_HOST`, since it will be removed in the next commit.

https://github.com/mozilla/treeherder/commit/a1c84ec768a97197ebdc2cd809ca7b1ef4175605
Bug 1279213 - Make Treeherder use the new client server_url parameter

And use `SITE_URL` rather than requiring the separate environment
variables just for requests to Treeherder's own API. (Thereby reducing
the number of environment variables I have to toggle back and forth for
the Heroku migration).
I've also removed the now unused environment variables from Heroku treeherder-{stage,prod} (the prototype isn't on master)

$ ths config:remove TREEHERDER_REQUEST_HOST TREEHERDER_REQUEST_PROTOCOL
Unsetting TREEHERDER_REQUEST_HOST, TREEHERDER_REQUEST_PROTOCOL and restarting treeherder-stage... done, v106
 !    Release command executing: this config change will not be available until the command succeeds.
$ thp config:remove TREEHERDER_REQUEST_HOST TREEHERDER_REQUEST_PROTOCOL
Unsetting TREEHERDER_REQUEST_HOST, TREEHERDER_REQUEST_PROTOCOL and restarting treeherder-prod... done, v39
 !    Release command executing: this config change will not be available until the command succeeds.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
(In reply to Ed Morley [:emorley] from comment #8)
> I've also removed the now unused environment variables from Heroku
> treeherder-{stage,prod} (the prototype isn't on master)

The prototype Heroku instance has now been updated to a newer checkout, so ran:

$ thh config:remove TREEHERDER_REQUEST_HOST TREEHERDER_REQUEST_PROTOCOL
Unsetting TREEHERDER_REQUEST_HOST, TREEHERDER_REQUEST_PROTOCOL and restarting treeherder-heroku... done, v870  !    Release command executing: this config change will not be available until the command succeeds.

(This will help ensure the configs look synced when re-running the config compare script as part of bug 1277304).
Blocks: 1286240
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: