1631099 - Investigate Bad Credentials error

Reporter

Description

•

5 years ago

Manifests itself as Bad Credentials comment on Github.

From the logs:

message: "HttpError: Bad credentials
    at /app/node_modules/@octokit/request/dist-node/index.js:66:23
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async /app/services/github/src/github-auth.js:12:16
    at async Handlers.taskDefinedHandler (/app/services/github/src/handlers.js:878:20)"

[:owlish] 🦉 PST

Reporter

Updated

•

5 years ago

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → DUPLICATE

[:owlish] 🦉 PST

Reporter

Comment 2

•

5 years ago

Actually looked closer at the bug that was an apparent duplicate - I'm not sure if the reasons are the same in both cases. This specific error is being seen quite often on the last couple of weeks. I am re-opening this one to investigate. If the other gets fixed accidentally - all the better :)

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

[:owlish] 🦉 PST

Reporter

Comment 3

•

5 years ago

•

Edited

some examples of this error:

 message: "HttpError: Bad credentials
    at /app/node_modules/@octokit/request/dist-node/index.js:66:23
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async /app/services/github/src/github-auth.js:12:16
    at async Object.<anonymous> (/app/services/github/src/api.js:293:22)"   

    message: "HttpError: Bad credentials
    at /app/node_modules/@octokit/request/dist-node/index.js:66:23
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async /app/services/github/src/github-auth.js:12:16
    at async Handlers.statusHandler (/app/services/github/src/handlers.js:528:7)"   

    message: "HttpError: Bad credentials
    at /app/node_modules/@octokit/request/dist-node/index.js:66:23
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async /app/services/github/src/github-auth.js:12:16
    at async Handlers.deprecatedStatusHandler (/app/services/github/src/handlers.js:419:5)"   

    message: "HttpError: Bad credentials
    at /app/node_modules/@octokit/request/dist-node/index.js:66:23
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async /app/services/github/src/github-auth.js:12:16
    at async Handlers.taskDefinedHandler (/app/services/github/src/handlers.js:878:20)"   

    message: "HttpError: Bad credentials
    at /app/node_modules/@octokit/request/dist-node/index.js:66:23
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async /app/services/github/src/github-auth.js:12:16
    at async Handlers.statusHandler (/app/services/github/src/handlers.js:508:7)"

All of them come from the retry function. The earliest version I can see this in is "v28.0.0". It was released on March 12. Deployed to Firefox CI on April 13. Dustin brought this to my attention on Apr 8, mentioning he had seen it some weeks before (in community cluster). Aki reported this on Apr 17 (also after seeing it several times.)

The v28.0.0 release introduced change in logging. My current theory is that we now over-report this error during the retries.

[:owlish] 🦉 PST

Reporter

Comment 4

•

5 years ago

Dustin and Aki - you were the people who brought this error to my attention, so I'm NIing you. Did you notice at all, if this message on github was accompanied by some actual problem (status not appearing, status not updating, tasks not running etc)?

Flags: needinfo?(dustin)

Flags: needinfo?(aki)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 5

•

5 years ago

Note that https://sentry.prod.mozaws.net/operations/taskcluster-community/issues/7147969/?query=is:unresolved%20%22Bad%20Credentials%22 dates back to Jan 24.

I believe that in some cases a task is missing. But I'm not sure -- it's hard to tell with TC CI since there are so many tasks!

Flags: needinfo?(dustin)

Aki Sasaki (not active)

Comment 6

•

5 years ago

•

Edited

This happened to me yesterday. I didn't notice anything wrong, other than 2 instances of the error posted to the PR.

Edit: is it possible to add more information to the error comment? If so, that may help us debug.

Flags: needinfo?(aki)

[:owlish] 🦉 PST

Reporter

Comment 7

•

5 years ago

•

Edited

imo, the problem here is that this error is reported at all. We have 7 retries at each API call - if one of them fails due to some error, we retry. I mean, you need good credentials to post the error on github in the first place :) Which means, whatever the error was - it's not there anymore. And since everything else seems to be working - the error has transient nature and is mitigated by retries. It just doesn't need to be reported and confuse people.

Why it occurs in the first place is a different question, and there is a separate bug for that.

Anyways, if you would like to see more of the error and debug it - I am not the one to stop you :) Dustin also wanted to see moar info in that error. If this, in your opinion, is the perfect resolution of this bug - I'm fine with that.

So, This is an example of a log entry - that's the information we have access to and can put into that message. What kind of information do you want to see in that error message, what would be interesting?

Flags: needinfo?(dustin)

Flags: needinfo?(aki)

[:owlish] 🦉 PST

Reporter

Updated

•

5 years ago

Comment 8

•

5 years ago

https://github.community/t5/GitHub-API-Development-and/Random-401-errors-after-using-freshly-generated-installation/td-p/22905 suggests that we're not the only one seeing these, and that retrying is a good strategy. For things that we retry on, I think there's no need to log anything or report anything in GitHub comments.

It doesn't look like the existing retry code retries on 401's, though.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

5 years ago

Flags: needinfo?(dustin)

Aki Sasaki (not active)

Comment 9

•

5 years ago

As a user, I'd prefer to never see that error, and for things to Just Work. I was suggesting adding logging to the error message if you wanted to get to the root cause of what was happening when we see that error. If that's not useful, I'm perfectly happy dropping those messages.

Flags: needinfo?(aki)

[:owlish] 🦉 PST

Reporter

Comment 10

•

5 years ago

(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #8)

https://github.community/t5/GitHub-API-Development-and/Random-401-errors-after-using-freshly-generated-installation/td-p/22905 suggests that we're not the only one seeing these, and that retrying is a good strategy. For things that we retry on, I think there's no need to log anything or report anything in GitHub comments.

It doesn't look like the existing retry code retries on 401's, though.

You seem to be ignoring the fact that the frequency of these errors sky-rocketed in the last month. How do you explain that? This is what this bug is about.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

5 years ago

Assignee: bugzeeeeee → dustin

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 12

•

5 years ago

https://github.com/taskcluster/taskcluster/pull/2731

We're also seeing
https://sentry.prod.mozaws.net/operations/taskcluster-community/issues/8179151/activity/

'Expiration time' claim ('exp') must be a numeric value representing the future time at which the assertion expires

And checking in the logs, that also appears to be a 401. So this will retry such errors as well.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 13

•

5 years ago

And checking in the logs, that also appears to be a 401. So this will retry such errors as well.

This might not be the best idea -- we'll see. It's possible these errors are occurring due to a race condition where we use credentials just as they are expiring.

Aki Sasaki (not active)

Comment 14

•

5 years ago

In https://github.com/escapewindow/test-xpi-public/pull/7 , these errors seem to coincide with graphs where task status isn't being reported back to the PR. (In taskcluster, the task is green; in the github PR, they remain yellow.)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 15

•

5 years ago

That's what we have been seeing, too. The next release of TC (29.1.4 or higher) will have the fix.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

5 years ago

Status: REOPENED → RESOLVED

Closed: 5 years ago → 5 years ago

Resolution: --- → FIXED

Bugzilla

Investigate Bad Credentials error

Categories

(Taskcluster :: Services, defect)

Tracking

(Not tracked)

People

(Reporter: owlish, Assigned: dustin)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Updated

Comment 9

Comment 10

Updated

Comment 12

Comment 13

Comment 14

Comment 15

Updated