Closed Bug 1283555 Opened 8 years ago Closed 8 years ago

thclient.client HTTPError 403 submitting to https://treeherder.mozilla.org/api/project/try/jobs/: {"detail":"Hawk authentication failed"}

Categories

(Tree Management :: Treeherder: Data Ingestion, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: armenzg, Unassigned)

References

Details

About an hour ago I was being able to post jobs to Treeherder from Heroku [1].
I believe this was before we deleted and recreated the credentials. I'm not 100%.

I have created 'pulse_actions_production'. Could you please approve it?

I want to have this credential available in case we can't fix the current 'pulse_actions' credentials.

[1]
Here's my few production submissions:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-aurora&revision=3053d997f2e9f0fc7dd3960927c92496df1d9ace&filter-searchStr=sch

I tested:
* backfilling
** https://tc-gp-public-31d.s3-us-west-2.amazonaws.com/ateam/pulse-action-dev/d17a6153-17df-4f75-892a-079ff51c562e
* add new jobs
** https://tc-gp-public-31d.s3-us-west-2.amazonaws.com/ateam/pulse-action-dev/241920a6-98bb-457f-83f6-a379711f1d20
* trigger all talos jobs
** https://tc-gp-public-31d.s3-us-west-2.amazonaws.com/ateam/pulse-action-dev/da126ef2-586b-4fe9-aaa3-99ef905b04e3
pulse_actions is approved and should work -- I;ll look at the logs now.

I'd prefer we didn't namespace at the username level.
AuthenticationFailed: Hawk authentication failed: The token has expired. Is your system clock correct?
Any suggestions on what I can do? I'm reading about clocks and Heroku.
I have this on Heroku
> Thu Jun 30 18:41:27 UTC 2016
and this locally (it took me a second to switch)
> Thu Jun 30 14:41:28 EDT 2016

Should I be changing the timezone?
Blocks: 1245943
I've got this from Heroku support [1]
Sometime after 9am UTC this stopped working [2]

[1]
##############################################
Hi Armen,

It's not possible to check that the system clock for an app is up to date, unless you implement something in your app that retrieves and displays the system date. You can easily determine it using a new Dyno:

$ heroku run date -a pulse-actions ; date -u
Running date on ⬢ pulse-actions... up, run.7669
Thu Jun 30 19:56:58 UTC 2016
Thu Jun 30 19:56:58 UTC 2016

The first date string being displayed by the heroku run command, the second being run by my local system. Depending on network speed and timing, you can get those to deviate by a single second, but I haven't seen a larger deviation than that.

Is it possible that your application is having a timezone translation issue, and it's generating a token using a localtime translation rather than UTC time / epoch seconds?

Thanks!
Jason

[2]
Jun 30 07:34:34 pulse-actions app/worker2.1: root	 Created Treeherder 'Sch' job. 
Jun 30 07:50:55 pulse-actions app/worker2.1: root	 Created Treeherder 'Sch' job. 
Jun 30 07:53:25 pulse-actions app/worker2.1: root	 Created Treeherder 'Sch' job. 
Jun 30 07:59:12 pulse-actions app/worker2.1: root	 Created Treeherder 'Sch' job. 
Jun 30 08:44:34 pulse-actions app/worker2.1: root	 Created Treeherder 'Sch' job. 
Jun 30 08:56:53 pulse-actions app/worker2.1: root	 Created Treeherder 'Sch' job. 
Jun 30 08:58:10 pulse-actions app/worker2.1: root	 Created Treeherder 'Sch' job. 
Jun 30 12:25:21 pulse-actions app/worker2.1: root	 We will have to skip scheduling a Treeherder 'Sch' job 
Jun 30 12:27:44 pulse-actions app/worker2.1: root	 As mentioned above we did not schedule a Treeherder 'Sch' job. 
Jun 30 12:27:45 pulse-actions app/worker2.1: root	 We will have to skip scheduling a Treeherder 'Sch' job 
Jun 30 12:29:12 pulse-actions app/worker2.1: root	 As mentioned above we did not schedule a Treeherder 'Sch' job. 
Jun 30 13:56:49 pulse-actions app/worker2.1: root	 WARNING:	 We will have to skip scheduling a Treeherder 'Sch' job 
Jun 30 13:58:08 pulse-actions app/worker2.1: root	 WARNING:	 As mentioned above we did not schedule a Treeherder 'Sch' job. 
Jun 30 14:00:25 pulse-actions app/worker2.1: root	 WARNING:	 We will have to skip scheduling a Treeherder 'Sch' job 
Jun 30 14:03:11 pulse-actions app/worker2.1: root	 WARNING:	 We will have to skip scheduling a Treeherder 'Sch' job
For the record, my host and Heroku match:
armenzg@armenzg-thinkpad:~/repos/pulse_actions$ heroku run date -a pulse-actions ; date -u
Running date on ⬢ pulse-actions... up, run.7937
Thu Jun 30 21:09:51 UTC 2016
Thu Jun 30 21:09:51 UTC 2016
Hi Ed, where did you get this information from?
> AuthenticationFailed: Hawk authentication failed: The token has expired. Is your system clock correct?

If that was related to this bug then something is reporting the issue incorrectly.
After few hours of debugging a "clock" issue, it seems that the issue was actually that when I copy from Treeherder's credentials page, I get few spaces prepended to the secret.

In other words:
1) If the clock error message was related to this; we need to investigate if there is a bug misreporting bad secret with bad clock
2) File bug for copying the secret from the credentials page

Feel free to close or morph this bug.
No longer blocks: 1245943
I skim read the log on one of the Treeherder webheads - the error message could have come from any submitter, it was just the only one I saw.

Also, from IRC: the treeherder python client log output should explain the auth failure -- do you have log handlers set up for that? (in the case of wrong id or secret, it doesn't go into details, but if the clock is out of sync it will say that) the clock sync error in the logs was perhaps from someone else

Setting needinfo on me to check the logs to see who the clock errors were coming from and also see if we can improve the copy-pasting UX for the credentials page.
Flags: needinfo?(emorley)
A third option for comment 7:
3) Make pulse_actions strip env variables

This is the error I would get:
Jun 30 09:56:10 pulse-actions app/worker2.1: thclient.client	 HTTPError 403 submitting to https://treeherder.mozilla.org/api/project/mozilla-inbound/jobs/: {"detail":"Hawk authentication failed"} 
Jun 30 09:56:10 pulse-actions app/worker2.1: root	 403 Client Error: FORBIDDEN for url: https://treeherder.mozilla.org/api/project/mozilla-inbound/jobs/ 
Jun 30 09:56:10 pulse-actions app/worker2.1: Traceback (most recent call last): 
Jun 30 09:56:10 pulse-actions app/worker2.1:   File "pulse_actions/worker.py", line 218, in message_handler 
Jun 30 09:56:10 pulse-actions app/worker2.1:     acknowledge=CONFIG['acknowledge']) 
Jun 30 09:56:10 pulse-actions app/worker2.1:   File "pulse_actions/worker.py", line 329, in route 
Jun 30 09:56:10 pulse-actions app/worker2.1:     end_request_kwargs = start_request(repo_name=repo_name, revision=revision) 
Jun 30 09:56:10 pulse-actions app/worker2.1:   File "pulse_actions/worker.py", line 242, in start_request 
Jun 30 09:56:10 pulse-actions app/worker2.1:     JOB_FACTORY.submit_running(treeherder_job) 
Jun 30 09:56:10 pulse-actions app/worker2.1:   File "/app/.heroku/python/lib/python2.7/site-packages/thsubmitter/treeherder_submitter.py", line 141, in submit_running 
Jun 30 09:56:10 pulse-actions app/worker2.1:     self.submitter._submit(job=job, state=state, **kwargs) 
Jun 30 09:56:10 pulse-actions app/worker2.1:   File "/app/.heroku/python/lib/python2.7/site-packages/thsubmitter/treeherder_submitter.py", line 224, in _submit 
Jun 30 09:56:10 pulse-actions app/worker2.1:     self.client.post_collection(job.data['project'], job_collection) 
Jun 30 09:56:10 pulse-actions app/worker2.1:   File "/app/.heroku/python/lib/python2.7/site-packages/thclient/client.py", line 929, in post_collection 
Jun 30 09:56:10 pulse-actions app/worker2.1:     collection_inst.get_collection_data(), timeout) 
Jun 30 09:56:10 pulse-actions app/worker2.1:   File "/app/.heroku/python/lib/python2.7/site-packages/thclient/client.py", line 712, in _post_json 
Jun 30 09:56:10 pulse-actions app/worker2.1:     resp.raise_for_status() 
Jun 30 09:56:10 pulse-actions app/worker2.1:   File "/app/.heroku/python/lib/python2.7/site-packages/requests/models.py", line 844, in raise_for_status 
Jun 30 09:56:10 pulse-actions app/worker2.1:     raise HTTPError(http_error_msg, response=self) 
Jun 30 09:56:10 pulse-actions app/worker2.1: HTTPError: 403 Client Error: FORBIDDEN for url: https://treeherder.mozilla.org/api/project/mozilla-inbound/jobs/
Ah yes, your error message was correct:

Jun 30 09:56:10 pulse-actions app/worker2.1: thclient.client	 HTTPError 403 submitting to https://treeherder.mozilla.org/api/project/mozilla-inbound/jobs/: {"detail":"Hawk authentication failed"} 

If the clock was out of sync, this would have read:
... {"detail":"Hawk authentication failed: The token has expired. Is your system clock correct?"}

My apologies for not asking whether you had the error message before jumping into the treeherder logs.

As such it was just that the wrong credentials were used.

This leaves:

1) Reducing the potential for accidentally including whitespace when *setting* environment variables:

How was the environment variable set for pulse actions? Via the Heroku CLI, Heroku web UI (classic or new-style?) or a some other method? I'm not sure stripping whitespace from the environment variables in pulse_actions is the correct fix, but happy to leave that up to you :-)

2) That if people triple-click on the credentials page and then copy the text, for at least Firefox this includes whitespace (even if the markup doesn't).

This is a Firefox bug, for which I've filed bug 1284473. We could try to a few hacks to work around it (eg like we did for the more common use-case in bug 1270682), however if we wanted to do that, let's file a new bug. But I'm not sure it's worth the added complication, since bug 1270682 was more about convenience for a frequent task (since people were spotting the added whitespace, it was just tedious to remove manually), whereas in this case I think we should focus equally on why it wasn't spotted on pasting.
Flags: needinfo?(emorley)
> If the clock was out of sync, this would have read:
> ... {"detail":"Hawk authentication failed: The token has expired. Is your system clock correct?"}

To expand on this point:

For two cases only (one of which is the clock sync issue), hawkrest gives a more details error message. For all others it's vague to avoid assisting someone trying to brute-force. See:
https://github.com/kumar303/hawkrest/blob/0.0.10/hawkrest/__init__.py#L82-L90
Depends on: 1284503
* Heroku web UI new style

When I used the CLI I noticed the right shifting of the password compared to other env variables.
Actually the Heroku UI is pretty broken for this case. It hides the leading whitespace when pasted.

I've filed:
https://help.heroku.com/tickets/372074

Marking this bug as invalid since the root cause was wrong credentials, and the remaining action items are being tracked elsewhere (Heroku ticket, Firefox platform bug).

Content (for those not added to the ticket):

"""
STR:
1) Ensure using the new style Heroku dashboard
2) Visit the settings page for an app (eg https://dashboard.heroku.com/apps/APP_NAME/settings)
3) Click 'Reveal Config Vars'
4) In the empty new variable row at the end of the variable list, enter "FOO" for the name, and copy the following string (including leading whitespace), that should then be pasted in the text field for the value:
```
             an-api-key-copied-from-somewhere
```
5) Observe how the value appears in the text field
6) Press 'Add'
7) Reload the settings page and press 'Reveal Config Vars' again
8) Click 'Edit' next to the `FOO` variable
9) Observe how the value appears in the text field
10) From the CLI run `heroku config:get FOO` and note the result

Expected:
The value seen in the text field in the UI at step (5) and (9) matches the value that was actually saved (ie clearly shows there was leading whitespace). 

Actual:
After pasting the value, at (5) the text field just shows `an-api-key-copied-from-somewhere` with no indication of leading whitespace. And similarly at (9) the text field still shows no leading whitespace. However at (10), the CLI returns a value containing the leading whitespace.

We hit this after copying an API key from another service's web dashboard (that inadvertently contained leading whitespace), which was then pasted into a Heroku app via the web UI. We then spent some time trying to debug authentication failures, since the whitespace wasn't visible, so it looked like the pasted API key was correct, when in fact the stored value contained leading whitespace.

If the web UI chose to intentionally trim whitespace on paste (and actually save the trimmed value) then that would be great (though might break some use-cases, so I could understand why you may not do that). It's just important that regardless of that behaviour or not, the value displayed to the user (both on input & later edits) should always match that stored.
"""
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
Heroku have now fixed the UI bug \o/
You need to log in before you can comment on or make changes to this bug.