Closed Bug 1316641 Opened 9 years ago Closed 8 years ago

Create runbook for auth0

Categories

(Infrastructure & Operations :: MOC: Documentation, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jeff, Assigned: Usul)

References

()

Details

No description provided.
(In reply to Jeff Bryner [:jeff] (use NEEDINFO) from comment #0) > Please work with Tristan Weir [:weir] to create/update the runbook for auth0 pre go live for air.mo.
Air.Mo is going live with Auth0 on November 14, so I am at your disposal to make this happen.
Component: MOC: Service Requests → MOC: Documentation
QA Contact: lypulong
Assignee: nobody → ludovic
(In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #2) > Air.Mo is going live with Auth0 on November 14, so I am at your disposal to > make this happen. Please fill the linked mana page with the information we'll need to know about. explanations are at the begining of the page. Once ready I'll review try and give you feedback.
Flags: needinfo?(tweir)
Updated Mana page. I copied the format of the Duo runbook. Please let me know what other information you need.
Flags: needinfo?(tweir) → needinfo?(ludovic)
(In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #4) > Updated Mana page. I copied the format of the Duo runbook. Please let me > know what other information you need. Auth0 sends out notifications on incidents/outages to which moc@ is subscribed via email. can we set that up to be pagerduty ? if they provide pd integration it would be very cool. If they don't Ill give you an email to subscribe to incidents. So the only thing we need to do is open a ticket and let you guys know ? "If the incident affects a lot of users," How do we qualify that ?
Flags: needinfo?(ludovic)
(In reply to Ludovic Hirlimann [:Usul] from comment #5) > Auth0 sends out notifications on incidents/outages to which moc@ is > subscribed via email. can we set that up to be pagerduty ? Details getting sent to pagerduty and moc+notifications@mozilla.com would be ideal. Please make sure notifications do not go to plain moc@.
(In reply to Ludovic Hirlimann [:Usul] from comment #5) > (In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #4) > > "If the incident affects a lot of users," > > How do we qualify that ? Great question. This is what the Duo runbook had. Is there a current threshold for # of impacted users before an all-moco@ e-mail goes out? > Details getting sent to pagerduty and moc+notifications@mozilla.com would be ideal. Please make sure > notifications do not go to plain moc@. Acknowledged.
(In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #7) > (In reply to Ludovic Hirlimann [:Usul] from comment #5) > > (In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #4) > > > > "If the incident affects a lot of users," > > > > How do we qualify that ? > > Great question. This is what the Duo runbook had. Is there a current > threshold for # of impacted users before an all-moco@ e-mail goes out? Nope so we are good with that. > > Details getting sent to pagerduty and moc+notifications@mozilla.com would be ideal. Please make sure > > notifications do not go to plain moc@. > can you subscribe mozilla-auth0@mozilla.pagerduty.com too so we get notifications in PD ? > Acknowledged.
So we need to open tickets , shall we create individual acounts or a MOc generic account for that ? Do we need to do it or do you *need* to do it ?
(In reply to Ludovic Hirlimann [:Usul] from comment #9) > So we need to open tickets , shall we create individual acounts or a MOc > generic account for that ? Do we need to do it or do you *need* to do it ? I'm working with our Auth0 rep to identify the best way to do this. I will update the runbook when I have a better answer. At the moment, it looks like an Auth0 account holder needs to do this. I will add the names of those Mozilla account holders to the runbook.
I see that tha mana page says stuff are being sent to pagerduty is that using the email from comment 9 or soemthing prior ?
(In reply to Ludovic Hirlimann [:Usul] from comment #11) > I see that tha mana page says stuff are being sent to pagerduty is that > using the email from comment 9 or soemthing prior ? The one from comment 9. I've made it explicit in the Mana page and added a link to the Auth0 subscription page.
Where should we escalate to ?
Flags: needinfo?(tweir)
I've updated the runbook escalation chart after discussion with the team. Please let me know if it makes sense now.
Flags: needinfo?(tweir)
Can't we use an escalation policy in pagerduty ?
A few updates and requests: 1. We have a method for adding MOC personnel to delegated admin accounts so they can submit support tickets to Auth0. We need a list of appropriate MOC personnel and we will add them. 2. We had an Auth0 outage today that was not caught by their own monitoring. They are investigating why their service failed and their monitoring failed, but in the interim MOC should also establish monitoring. We originally thought we could rely on Auth0, but obviously we cannot. Please enable http connectivity checks (Pingdom or other preferred solution) on: auth.mozilla.auth0.com manage.mozilla.auth0.com (In reply to Ludovic Hirlimann [:Usul] from comment #15) > Can't we use an escalation policy in pagerduty ? 3. Can we walk through this? I don't understand what you are looking for or what an escalation policy should look like.
Flags: needinfo?(ludovic)
After chatting with :unixfairy - would it be possible to add a pingdom check (possibly, ASAP/before runbook is completed,) for connectivity failure to: auth-dev.mozilla.auth0.com (TCP 443/HTTPS) auth.mozilla.auth0.com (TCP 443/HTTPS) thanks!
Follow up to comment 17: Here's a better check also used by the Mozilla Participation team (on their Pingdom): https://auth.mozilla.auth0.com/login?client=7wyIItkJX4t7vYEaDmGrwP9k2fBh5qWP add the check "Check for string: should contain "auth0-lock-passwordless-button-text" https://auth-dev.mozilla.auth0.com/login?client=7PQFR1tyqr6TIqdHcgbRcYcbmbgYflVE add the check "Check for string: should contain "auth0-lock-passwordless-button-text" This ensure the HTML is actually displayed. These client id's are testrp.allizom.security.org client ids and are normally to be kept forever.
(In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #16) > A few updates and requests: > 1. We have a method for adding MOC personnel to delegated admin accounts so > they can submit support tickets to Auth0. We need a list of appropriate MOC > personnel and we will add them. this is currently :sal :vinh :ashlee :ryanc :fauweh :pir :usul :daveio > 2. We had an Auth0 outage today that was not caught by their own monitoring. > They are investigating why their service failed and their monitoring failed, > but in the interim MOC should also establish monitoring. We originally > thought we could rely on Auth0, but obviously we cannot. > Please enable http connectivity checks (Pingdom or other preferred solution) > on: > auth.mozilla.auth0.com > manage.mozilla.auth0.com spawned bug 1323393 for this. > (In reply to Ludovic Hirlimann [:Usul] from comment #15) > > Can't we use an escalation policy in pagerduty ? > 3. Can we walk through this? I don't understand what you are looking for or > what an escalation policy should look like. Of course I can. Accepted the invite for the meeting.
Flags: needinfo?(ludovic)
Added a note on our onboarding process to request Auth0 access.
I guess subscribing mozilla-auth0@mozilla.pagerduty.com will open PD incidents for *anything prod related they send out. This will be tricky unless we know what's safe to ignore? [#42638] [Auth0 status] Investigating : Auth0 MFA (Guardian) is unresponsive [#42639] [Auth0 status] Identified : Auth0 MFA (Guardian) is unresponsive [#42640] [Auth0 status] Resolved : Auth0 MFA (Guardian) is unresponsive
I suspect we might document that in the runbook (?) For ex we're not using Guardian so any message regarding Auth0 MFA (Guardian) are safe to ignore right now (there's no current plan to use this, though we'd have to update the runbook if we change plans)
(In reply to Guillaume Destuynder [:kang] (NEEDINFO to ensure replies) from comment #22) > I suspect we might document that in the runbook (?) > For ex we're not using Guardian so any message regarding Auth0 MFA > (Guardian) are safe to ignore right now (there's no current plan to use > this, though we'd have to update the runbook if we change plans) I looked to see if notification outages for Guardian can be turned off, but it does not appear they can. However, I've added some text to the runbook to say ignore it. Runbook is ready for final review.
(In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #23) > (In reply to Guillaume Destuynder [:kang] (NEEDINFO to ensure replies) from > comment #22) > > I suspect we might document that in the runbook (?) > > For ex we're not using Guardian so any message regarding Auth0 MFA > > (Guardian) are safe to ignore right now (there's no current plan to use > > this, though we'd have to update the runbook if we change plans) > > I looked to see if notification outages for Guardian can be turned off, but > it does not appear they can. However, I've added some text to the runbook to > say ignore it. > > Runbook is ready for final review. Made a small change. asking for broader feedback.
**** I forgot one of my team mate can you add :jlaz to the admin group ? Also we are diuscussing sending emails to all@ wrt to status.mozilla.com/org
And we are good to go document wise.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
(In reply to Guillaume Destuynder [:kang] (NEEDINFO to ensure replies) from comment #18) > Follow up to comment 17: > Here's a better check also used by the Mozilla Participation team (on their > Pingdom): > > https://auth.mozilla.auth0.com/login?client=7wyIItkJX4t7vYEaDmGrwP9k2fBh5qWP > add the check "Check for string: should contain > "auth0-lock-passwordless-button-text" > > https://auth-dev.mozilla.auth0.com/ > login?client=7PQFR1tyqr6TIqdHcgbRcYcbmbgYflVE > add the check "Check for string: should contain > "auth0-lock-passwordless-button-text" > > This ensure the HTML is actually displayed. These client id's are > testrp.allizom.security.org client ids and are normally to be kept forever. kang, this validation is broken with the most recent auth0 changes. Do you have any updates on how we can fix our monitoring here? When visiting the production URL above for example, we see: "Oops!, something went wrong There could be a misconfiguration in the system or a service outage. We track these errors automatically, but if the problem persists feel free to contact us. Please try again."
Status: RESOLVED → REOPENED
Flags: needinfo?(gdestuynder)
QA Contact: mcristofi
Resolution: FIXED → ---
I think this is because you follow the 302 redirect. If you do not follow it this would still work, except the check string should be something that currently exist in the page (such as "This is the custom Mozilla Login Experience, designed and built by Mozilla's IAM Project." or anything from https://raw.githubusercontent.com/mozilla-iam/auth0-deploy/master/pages/login.html Also, here's a URL where the redirect won't end up in an error if you can't "not" follow 302, or if that's easier: https://auth.mozilla.auth0.com/login?client=7wyIItkJX4t7vYEaDmGrwP9k2fBh5qWP&protocol=oauth2&response_type=code Basically, this adds the 2 required parameters
Flags: needinfo?(gdestuynder) → needinfo?(kferrando)
Thanks for the GET parameters kang, that got us working again. For the simple ping tests we are running on these in New Relic, we are not able to prevent following redirects.
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Flags: needinfo?(kferrando)
Resolution: --- → FIXED
See Also: → 1324139
You need to log in before you can comment on or make changes to this bug.