Closed Bug 1410418 Opened 7 years ago Closed 6 years ago

Thousands of API job submission HTTP 500s ("RuntimeError: Django did not handle an incoming Hawk request properly")

Categories

(Tree Management :: Treeherder: API, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Last night Treeherder experienced thousands of HTTP 500s of form:

```
Traceback (most recent call last): 
  File "/app/.heroku/python/lib/python2.7/site-packages/django/core/handlers/base.py", line 131, in get_response 
    response = middleware_method(request, response) 
  File "/app/.heroku/python/lib/python2.7/site-packages/newrelic/hooks/framework_django.py", line 333, in wrapper 
    return wrapped(*args, **kwargs) 
  File "/app/.heroku/python/lib/python2.7/site-packages/hawkrest/middleware.py", line 22, in process_response 
    raise RuntimeError('Django did not handle an incoming ' 
RuntimeError: Django did not handle an incoming Hawk request properly 
```

These requests are either someone trying to fuzz our API or else a genuine submitter. I've not yet heard any complaints from people who submit to us over the REST API, and the IP address from where the requests originated from, resolves to somewhere in China.

The exception itself is from:
https://github.com/kumar303/hawkrest/blob/1.0.0/hawkrest/middleware.py#L22-L23

...and should theoretically only ever occur if the hawkrest package wasn't set up properly (ie: middleware installed but auth backend not).

However I found one edge case where this might not hold true (albeit only something a fuzzer might hit) - for which I've filed:
https://github.com/kumar303/hawkrest/issues/37

However it's not clear if this edge case is the cause here, since the User-Agent of these failing requests was treeherder-pyclient/4.0.0, which should not be sending the type of invalid payloads that would trigger this.

So either:
a) Someone was fuzzing us and faked the User-Agent
b) The requests-hawk library used by the python client has some other bug
c) There's some other case in hawkrest that can result in this state

Ref (c):
One other possibility is side-effects from the recent Django changes in middleware behaviour, where the short-circuiting behaviour is different (ie: the logic for whether to bail early if one piece of the stack fails, or to still send the requests there and back through the full middleware stack). However iirc this new behaviour only kicks in when using the new `MIDDLEWARE` setting.py option rather than the old `MIDDLEWARE_CLASSES` - and we're still using the old one. So this is perhaps is a red-herring.

Either way we should probably improve the upstream hawkrest tests, since glancing at them I see a few deficiencies. Sadly this is the price we pay for using yet another Mozilla home-grown solution that doesn't have as many eyes on it :-(
(In reply to Ed Morley [:emorley] from comment #0)
> However iirc this new behaviour only kicks in when using the new
> `MIDDLEWARE` setting.py option rather than the old `MIDDLEWARE_CLASSES` -
> and we're still using the old one. So this is perhaps is a red-herring.

Filed bug 1412797 for switching to the new Django middleware API.
Depends on: 1412797
This issue hasn't occurred again.

If it does then bug 1412797 will hopefully rule a few things out. Also, once we deprecated REST API job submissions (bug 1349182) then we can just remove Hawk auth entirely.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.