Closed Bug 1395357 Opened 7 years ago Closed 7 years ago

Use auth0 for mozilla-releng.net login, and get TC credentials from there

Categories

(Taskcluster :: Services, enhancement)

enhancement
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: bastien.abadie)

References

Details

This is the culmination of work to stop using Okta.  The idea is to use https://docs.taskcluster.net/reference/integrations/taskcluster-login/docs/getting-user-creds instead of the current federated-login approach.

Okta, and with it the current federated login support, is going away in October, so there's a time limit here.

I'll work with Rok to see how we can do this.
Rok, I just remembered you're away for September.

And then I noticed that the login process is in Elm
  https://github.com/mozilla-releng/services/blob/master/lib/frontend_common/TaskclusterLogin.elm

Is there someone else who knows enough Elm to accomplish this while you're away?
Flags: needinfo?(rgarbas)
We talked this morning:

 * This is probably a day's work, and can happen when he returns
 * bug 1395574 may provide a stopgap (the 15-minute logout would be annoying, but not a showstopper)

Note: if someone does pick this up before Rok is back, you'll want to start by requesting a client:

https://mozilla.service-now.com/sp?id=sc_cat_item&sys_id=1e9746c20f76aa0087591d2be1050ecb

The technology you want is OIDC.  The callback URL should be different than the callback URL currently used for Taskcluster.  I think you'll only need to allow LDAP logins (but it's up to you -- passwordless would let anyone login, although they will have few TC scopes).
Assignee: dustin → nobody
Flags: needinfo?(rgarbas)
Blocks: 1380028
We just found out that okta is being disabled on the 1st of October. Jordan, could you please find someone who might be able to make these changes? I know some other folksin rleng were learning elm as well as people in relman and on bwong's team.
I worked on the previous Taskcluster auth for mozilla-releng/services (in Elm & Python). I'll be working on this issue in the next few days and make a PR.
Assignee: nobody → babadie
Status: NEW → ASSIGNED
Thank you Bastien!  Let me know if you have questions.
(In reply to Bastien Abadie from comment #4)
> I worked on the previous Taskcluster auth for mozilla-releng/services (in
> Elm & Python). I'll be working on this issue in the next few days and make a
> PR.

Thanks Bastien! Let me know if you don't think you'll be able to make the Oct 1st deadline or if you can't find someone to review the PR. Rok is not back until Oct 1st.
The deadline for getting this into production is Friday. Will the work be complete by then, Bastien, or do you need help?
Flags: needinfo?(babadie)
I'm working on this issue, you can view progress directly on the Github pull request [1].

I should have it working on shipit_frontend by Wednesday, but i'm not sure about the integration in other projects (releng_frontend & shipit_signoff).

Another problem is the release to production itself: i don't think i have all necessary access to do that



[1]: https://github.com/mozilla-releng/services/pull/626
Flags: needinfo?(babadie)
(In reply to Bastien Abadie from comment #8)
> I'm working on this issue, you can view progress directly on the Github pull
> request [1].
> 
> I should have it working on shipit_frontend by Wednesday, but i'm not sure
> about the integration in other projects (releng_frontend & shipit_signoff).
> 

shipit_signoff I think we can forget about for now. It shouldn't be consumed at all.

releng_frontend we definitely will want to fix as I believe we use that for treestatus frontend. Probably tooltool frontend too but, iiuc, that's not complete yet anyway.

Bastien, if you could look into a hack for releng_frontend before Oct 1st, that would be greatly appreciated. If not, please let us know so we can prepare accordingly.


> Another problem is the release to production itself: i don't think i have
> all necessary access to do that
> 

This is a bigger issue. How can we determine if you can do this or not before Rok gets back? If this is just a matter of Heroku bits being flipped I should be able to help. Do we have anyone on point who can do Services deployments to production while Rok is away? I thought we had docs and rollback plans in place for at least tooltool and treestatus?
As a stop-gap, we have some work in place to support the existing redirect-to-login flow once Okta dies (it turns out Okta dies on Friday.. somehow "late October" has by now become "September 29" in the IAM team's estimation).  I don't think that's going to be a great user experience, and it's not forever, so I appreciate the continued diligence to get this updated -- but if you miss by a few days, users will (I hope!) still be able to login.
I got authentication working again through Auth0+Taskcluster on shipit_frontend, i'm now working on releng_frontend.
My pull request [1] is now valid for both shipit_frontend & releng_frontend and could be tested on staging.

I just requested 2 OIDC credentials on Service now (REQ0051627 & RITM0056825): we need them before pushing to staging.

Jordan, I cannot update the Taskcluster secret [2] with the new values. Do you have write access on repo:github.com/mozilla-releng/services:branch:staging ?
Once this secret is updated, i can push to staging. I'll also need someone from RelEng to test the releng_frontend on staging, as i do not have necessary credentials for that either.


[1]: https://github.com/mozilla-releng/services/pull/626
[2]: https://tools.taskcluster.net/secrets/repo%3Agithub.com%2Fmozilla-releng%2Fservices%3Abranch%3Astaging
Blocks: 1404461
(In reply to Bastien Abadie from comment #12)
> My pull request [1] is now valid for both shipit_frontend & releng_frontend
> and could be tested on staging.
> 

Bastien, thank you for your work on this. Given comment 10, it sounds like we are okay to wait until Monday. From a risk assessment, it would be better to wait.

It sounds like you have done all the heavy lifting and this should make things dramatically easier for Rok to apply and deploy.
Thanks Bastien!
We are still waiting on an Auth0 credential request to enable the code on staging, then production.

Any idea who i can ping to speed up the Service now process ?
Flags: needinfo?(jlund)
Flags: needinfo?(dustin)
(answered in irc)
Flags: needinfo?(jlund)
Flags: needinfo?(dustin)
Auth0 PR[1] was merged to master and deployed to staging for both mozilla-releng/services frontend projects:
 - https://staging.mozilla-releng.net
 - https://signoff.staging.mozilla-releng.net/

We will deploy to production tomorrow.


[1] https://github.com/mozilla-releng/services/pull/626
Works for me!

I might suggest you remove the "manage credentials" link to tools.taskcluster.net, though -- if the user is logged into the releng-services, but not tools, that will be pretty confusing.

The manual has an example that shows a users' scopes:
  https://docs.taskcluster.net/manual/using/integration/frontend
so that might be a helpful way to replace the "manage credentials" functionality, if necessary.
We've been having problems lately with updating Treestatus lately. I'm wondering if this change is the cause?

Basically what's happening is that Treestatus doesn't show the UI for updating tree statuses, like I'm not signed in at all. But I see up in the corner of the page "mozilla-ldap/wkocher@mozilla.com", and my Taskcluster scopes stll include all of the treestatus scopes.

Maybe there's some racey issue here, because it shows the UI correctly the first time I load Treestatus after signing in, but if I refresh the page, the UI is gone?
Flags: needinfo?(dustin)
Logging into Treeherder or https://mozilla-releng.net/ doesn't work in general neither for Bastien nor me, so trees can't be reopened at the moment.
Rok, please take a look.
Severity: normal → blocker
Flags: needinfo?(rgarbas)
There seemed to be other auth0 issues going on hours ago that affected logins.  I was able to log into this service now.  Possibly it's been corrected?
Yes, the issue was on the Taskcluster side and has been resolved by pmoore.
Auth0 Login is now working again on shipit & treestatus
:bastien I'm closing this issue now since it looks it has been resolved (please reopen if i'm wrong)
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Flags: needinfo?(rgarbas)
Resolution: --- → FIXED
Flags: needinfo?(dustin)
Component: Login → Services
You need to log in before you can comment on or make changes to this bug.