Investigate Bad Credentials error
Categories
(Taskcluster :: Services, defect)
Tracking
(Not tracked)
People
(Reporter: owlish, Assigned: dustin)
References
(Blocks 1 open bug)
Details
Manifests itself as Bad Credentials
comment on Github.
From the logs:
message: "HttpError: Bad credentials
at /app/node_modules/@octokit/request/dist-node/index.js:66:23
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async /app/services/github/src/github-auth.js:12:16
at async Handlers.taskDefinedHandler (/app/services/github/src/handlers.js:878:20)"
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 2•5 years ago
|
||
Actually looked closer at the bug that was an apparent duplicate - I'm not sure if the reasons are the same in both cases. This specific error is being seen quite often on the last couple of weeks. I am re-opening this one to investigate. If the other gets fixed accidentally - all the better :)
Reporter | ||
Comment 3•5 years ago
•
|
||
some examples of this error:
message: "HttpError: Bad credentials
at /app/node_modules/@octokit/request/dist-node/index.js:66:23
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async /app/services/github/src/github-auth.js:12:16
at async Object.<anonymous> (/app/services/github/src/api.js:293:22)"
message: "HttpError: Bad credentials
at /app/node_modules/@octokit/request/dist-node/index.js:66:23
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async /app/services/github/src/github-auth.js:12:16
at async Handlers.statusHandler (/app/services/github/src/handlers.js:528:7)"
message: "HttpError: Bad credentials
at /app/node_modules/@octokit/request/dist-node/index.js:66:23
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async /app/services/github/src/github-auth.js:12:16
at async Handlers.deprecatedStatusHandler (/app/services/github/src/handlers.js:419:5)"
message: "HttpError: Bad credentials
at /app/node_modules/@octokit/request/dist-node/index.js:66:23
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async /app/services/github/src/github-auth.js:12:16
at async Handlers.taskDefinedHandler (/app/services/github/src/handlers.js:878:20)"
message: "HttpError: Bad credentials
at /app/node_modules/@octokit/request/dist-node/index.js:66:23
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async /app/services/github/src/github-auth.js:12:16
at async Handlers.statusHandler (/app/services/github/src/handlers.js:508:7)"
All of them come from the retry function. The earliest version I can see this in is "v28.0.0". It was released on March 12. Deployed to Firefox CI on April 13. Dustin brought this to my attention on Apr 8, mentioning he had seen it some weeks before (in community cluster). Aki reported this on Apr 17 (also after seeing it several times.)
The v28.0.0 release introduced change in logging. My current theory is that we now over-report this error during the retries.
Reporter | ||
Comment 4•5 years ago
|
||
Dustin and Aki - you were the people who brought this error to my attention, so I'm NIing you. Did you notice at all, if this message on github was accompanied by some actual problem (status not appearing, status not updating, tasks not running etc)?
Assignee | ||
Comment 5•5 years ago
|
||
Note that https://sentry.prod.mozaws.net/operations/taskcluster-community/issues/7147969/?query=is:unresolved%20%22Bad%20Credentials%22 dates back to Jan 24.
I believe that in some cases a task is missing. But I'm not sure -- it's hard to tell with TC CI since there are so many tasks!
Comment 6•5 years ago
•
|
||
This happened to me yesterday. I didn't notice anything wrong, other than 2 instances of the error posted to the PR.
Edit: is it possible to add more information to the error comment? If so, that may help us debug.
Reporter | ||
Comment 7•5 years ago
•
|
||
imo, the problem here is that this error is reported at all. We have 7 retries at each API call - if one of them fails due to some error, we retry. I mean, you need good credentials to post the error on github in the first place :) Which means, whatever the error was - it's not there anymore. And since everything else seems to be working - the error has transient nature and is mitigated by retries. It just doesn't need to be reported and confuse people.
Why it occurs in the first place is a different question, and there is a separate bug for that.
Anyways, if you would like to see more of the error and debug it - I am not the one to stop you :) Dustin also wanted to see moar info in that error. If this, in your opinion, is the perfect resolution of this bug - I'm fine with that.
So, This is an example of a log entry - that's the information we have access to and can put into that message. What kind of information do you want to see in that error message, what would be interesting?
Assignee | ||
Comment 8•5 years ago
|
||
https://github.community/t5/GitHub-API-Development-and/Random-401-errors-after-using-freshly-generated-installation/td-p/22905 suggests that we're not the only one seeing these, and that retrying is a good strategy. For things that we retry on, I think there's no need to log anything or report anything in GitHub comments.
It doesn't look like the existing retry code retries on 401's, though.
Assignee | ||
Updated•5 years ago
|
Comment 9•5 years ago
|
||
As a user, I'd prefer to never see that error, and for things to Just Work. I was suggesting adding logging to the error message if you wanted to get to the root cause of what was happening when we see that error. If that's not useful, I'm perfectly happy dropping those messages.
Reporter | ||
Comment 10•5 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #8)
https://github.community/t5/GitHub-API-Development-and/Random-401-errors-after-using-freshly-generated-installation/td-p/22905 suggests that we're not the only one seeing these, and that retrying is a good strategy. For things that we retry on, I think there's no need to log anything or report anything in GitHub comments.
It doesn't look like the existing retry code retries on 401's, though.
You seem to be ignoring the fact that the frequency of these errors sky-rocketed in the last month. How do you explain that? This is what this bug is about.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 12•5 years ago
|
||
https://github.com/taskcluster/taskcluster/pull/2731
We're also seeing
https://sentry.prod.mozaws.net/operations/taskcluster-community/issues/8179151/activity/
'Expiration time' claim ('exp') must be a numeric value representing the future time at which the assertion expires
And checking in the logs, that also appears to be a 401. So this will retry such errors as well.
Assignee | ||
Comment 13•5 years ago
|
||
And checking in the logs, that also appears to be a 401. So this will retry such errors as well.
This might not be the best idea -- we'll see. It's possible these errors are occurring due to a race condition where we use credentials just as they are expiring.
Comment 14•5 years ago
|
||
In https://github.com/escapewindow/test-xpi-public/pull/7 , these errors seem to coincide with graphs where task status isn't being reported back to the PR. (In taskcluster, the task is green; in the github PR, they remain yellow.)
Assignee | ||
Comment 15•5 years ago
|
||
That's what we have been seeing, too. The next release of TC (29.1.4 or higher) will have the fix.
Assignee | ||
Updated•5 years ago
|
Description
•