Closed
Bug 1203556
Opened 9 years ago
Closed 7 years ago
Implement a GET rate limit for Treeherder's API
Categories
(Tree Management :: Treeherder: API, defect, P3)
Tree Management
Treeherder: API
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: emorley, Unassigned)
References
Details
Bug 1203518 was caused by thousands of requests to:
https://treeherder.mozilla.org/api/project/try/jobs/?count=2000&result_set_id__in=&return_type=list
We rate limit POSTs to our API, but not GETs to it.
We should do that :-)
https://github.com/mozilla/treeherder/blob/801596504f5c626058a060b56c75480e0e71c00c/treeherder/settings/base.py#L282-L288
https://github.com/mozilla/treeherder/blob/master/treeherder/webapp/api/throttling.py
https://github.com/mozilla/treeherder/blob/90ba77e596e2c3329e0b840fe45e661e8e2a0370/treeherder/webapp/api/jobs.py#L19
https://github.com/mozilla/treeherder/blob/90ba77e596e2c3329e0b840fe45e661e8e2a0370/treeherder/webapp/api/resultset.py#L21
Reporter | ||
Comment 1•9 years ago
|
||
We could always set quite a low limit (not so low that the UI hits it, but low for tooling) and then allow people to apply for read-only API keys as part of bug 1160111 that gave them higher limits? (Similar to many other site's APIs).
Comment 3•9 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #1)
> We could always set quite a low limit (not so low that the UI hits it, but
> low for tooling) and then allow people to apply for read-only API keys as
> part of bug 1160111 that gave them higher limits? (Similar to many other
> site's APIs).
I'm not inherently opposed to the idea, but I think we do want to encourage people to try to use the treeherder api to mine data and create dashboards, and I'm a bit concerned that making them jump through hoops might discourage this. Also, using the API is one of the easiest routes to populating a local server with performance data (which we might want to expand upon later to encompass jobs and result sets).
Can we start by just setting some kind of high (but reasonable) limit to make sure people don't DDOS our server, and then expand on it if necessary?
Reporter | ||
Comment 4•9 years ago
|
||
That's exactly what has been proposed.
Note that the solution in bug 1160111 will eventually be self-serve, so even then it won't be a big hoop.
Reporter | ||
Comment 5•9 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #4)
> That's exactly what has been proposed.
In that: bug 1160111 might not be ready for a while (and presumably won't have full self-serve initially), and we're not going to impose a low limit here whilst that's the case, but could lower it in the future.
Once it is self-serve, that seems a small price to pay to avoid re-occurrences of today's tree closure due to one IP hammering us.
Reporter | ||
Comment 6•9 years ago
|
||
Sounds like we need to use AnonRateThrottle:
http://www.django-rest-framework.org/api-guide/throttling/
"The AnonRateThrottle will only ever throttle unauthenticated users. The IP address of the incoming request is used to generate a unique key to throttle against."
Reporter | ||
Comment 7•9 years ago
|
||
Though we'll have to check that in production:
(a) Zeus sets X-Forwarded-For correctly
(b) When apache proxies the request to gunicorn, X-Forwarded-For is passed through too
...otherwise all requests will appear to the django-rest-framework as having come from the same IP, we'll have some fun and games.
Reporter | ||
Comment 8•9 years ago
|
||
With some of the other recent changes I think this is less urgent.
It's also going to end up being more of a can of worms than it appears as first glance IMO.
(eg multiple people sharing an IP address using the Treeherder UI but who are not logged in etc).
Priority: P1 → P3
Reporter | ||
Updated•9 years ago
|
Assignee: nobody → emorley
Reporter | ||
Comment 9•9 years ago
|
||
Rough thoughts on how to do this:
* Adjust hawk credentials so that even when they are not authorised, they can be used for GETs.
* Improve docs for Hawk auth, emphasising that credential use is recommended, even when not submitting data. (The Python client already supports their use for GETs etc).
* Make sure the throttle key handling is working properly for Hawk client_ids
* Add support to the throttle key handling for a user's Django login too
* As a fallback, add an IP-based throttle
* Check that zeus/gunicorn/django settings are correctly passing the IP through to the throttling layer
* Pick initially *very* lenient throttling defaults
* Make sure we add lots of logging
* Add support to Treeherder's UI for detecting hitting the throttling limit (the limits would be set so high this should never happen, but just in case eg for shared IP addresses), which would say to users "Too many requests from this IP, please login to raise your limit" or similar.
* Open PRs against various projects that interact with our API (starting with those that appear high in the "counts by User Agent" analysis of the webhead logs) to get them to use credentials
* Add extensive docs for all this on Read the Docs + email newsgroups etc
Even a very high rate limit (eg 20,000 requests per hour per user or IP address) would have still helped with bug 1289830, where 1 million requests were made in 4 hours.
Comment 10•9 years ago
|
||
Adding an additional limit, based on User-Agent, will help with the ActiveData-ETL case. ActiveData's ETL machines are constantly changing IPs because they are spot instances.
Reporter | ||
Comment 11•9 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #9)
> Rough thoughts on how to do this:
Also:
* Make sure responses have things like the `Retry-After` header set, so clients know when to try again.
* Make sure the python client and optionally node client handle the HTTP 429 to indicate they should back off.
* See if we can add "remaining API request quota" counts to the credentials management page.
An interesting alternative throttling strategy we can use in the future:
http://jsatt.com/blog/abusing-django-rest-framework-part-2-non-rate-based-throttling/
(In reply to Kyle Lahnakoski [:ekyle] from comment #10)
> Adding an additional limit, based on User-Agent, will help with the
> ActiveData-ETL case. ActiveData's ETL machines are constantly changing IPs
> because they are spot instances.
The problem is that User Agents can be spoofed. In the ActiveData-ETL case, we'd just get you to use Hawk credentials for all requests, then your user id would give you your own limit :-)
Reporter | ||
Comment 12•9 years ago
|
||
> The problem is that User Agents can be spoofed.
ie a malicious party could:
a) Use a random user agent each time, thereby working around any limit completely
b) Spoof eg the ActiveData-ETL user agent and use up your limit, causing your legitimate requests to fail
Comment 13•9 years ago
|
||
My suggestion for User-Agent limiting was to resolve the problem of friendly agents making too many requests. Limiting malicious parties seems harder, and does not seem to be an existential problem.
Reporter | ||
Updated•8 years ago
|
Assignee: emorley → nobody
Reporter | ||
Comment 14•7 years ago
|
||
Lets not worry about this for now.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•