Status

RESOLVED FIXED
2 years ago
5 months ago

People

(Reporter: jeff, Assigned: Usul)

Tracking

Details

(URL)

Comment hidden (empty)
(In reply to Jeff Bryner [:jeff]  (use NEEDINFO) from comment #0)
> 

Please work with Tristan Weir [:weir] to create/update the runbook for auth0 pre go live for air.mo.
Air.Mo is going live with Auth0 on November 14, so I am at your disposal to make this happen.
Component: MOC: Service Requests → MOC: Documentation
QA Contact: lypulong
(Assignee)

Updated

2 years ago
Assignee: nobody → ludovic
(Assignee)

Comment 3

2 years ago
(In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #2)
> Air.Mo is going live with Auth0 on November 14, so I am at your disposal to
> make this happen.

Please fill the linked mana page with the information we'll need to know about. explanations are at the begining of the page. Once ready I'll review try and give you feedback.
Flags: needinfo?(tweir)
(Assignee)

Updated

2 years ago
Blocks: 1313687
Updated Mana page. I copied the format of the Duo runbook. Please let me know what other information you need.
Flags: needinfo?(tweir) → needinfo?(ludovic)
(Assignee)

Comment 5

2 years ago
(In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #4)
> Updated Mana page. I copied the format of the Duo runbook. Please let me
> know what other information you need.

Auth0 sends out notifications on incidents/outages to which moc@ is subscribed via email. can we set that up to be pagerduty ? if they provide pd integration it would be very cool. If they don't Ill give you an email to subscribe to incidents.

So the only thing we need to do is open a ticket and let you guys know ?

"If the incident affects a lot of users,"

How do we qualify that ?
Flags: needinfo?(ludovic)
(In reply to Ludovic Hirlimann [:Usul] from comment #5)
> Auth0 sends out notifications on incidents/outages to which moc@ is
> subscribed via email. can we set that up to be pagerduty ?

Details getting sent to pagerduty and moc+notifications@mozilla.com would be ideal. Please make sure notifications do not go to plain moc@.
(In reply to Ludovic Hirlimann [:Usul] from comment #5)
> (In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #4)
> 
> "If the incident affects a lot of users,"
> 
> How do we qualify that ?

Great question. This is what the Duo runbook had. Is there a current threshold for # of impacted users before an all-moco@ e-mail goes out?

> Details getting sent to pagerduty and moc+notifications@mozilla.com would be ideal. Please make sure 
> notifications do not go to plain moc@.

Acknowledged.
(Assignee)

Comment 8

2 years ago
(In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #7)
> (In reply to Ludovic Hirlimann [:Usul] from comment #5)
> > (In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #4)
> > 
> > "If the incident affects a lot of users,"
> > 
> > How do we qualify that ?
> 
> Great question. This is what the Duo runbook had. Is there a current
> threshold for # of impacted users before an all-moco@ e-mail goes out?

Nope so we are good with that.
 
> > Details getting sent to pagerduty and moc+notifications@mozilla.com would be ideal. Please make sure 
> > notifications do not go to plain moc@.
> 

can you subscribe mozilla-auth0@mozilla.pagerduty.com too so we get notifications in PD ?
> Acknowledged.
(Assignee)

Comment 9

2 years ago
So we need to open tickets , shall we create individual acounts or a MOc generic account for that ? Do we need to do it or do you *need* to do it ?
(In reply to Ludovic Hirlimann [:Usul] from comment #9)
> So we need to open tickets , shall we create individual acounts or a MOc
> generic account for that ? Do we need to do it or do you *need* to do it ?

I'm working with our Auth0 rep to identify the best way to do this. I will update the runbook when I have a better answer. At the moment, it looks like an Auth0 account holder needs to do this. I will add the names of those Mozilla account holders to the runbook.
(Assignee)

Comment 11

2 years ago
I see that tha mana page says stuff are being sent to pagerduty is that using the email from comment 9 or soemthing prior ?
(In reply to Ludovic Hirlimann [:Usul] from comment #11)
> I see that tha mana page says stuff are being sent to pagerduty is that
> using the email from comment 9 or soemthing prior ?

The one from comment 9. I've made it explicit in the Mana page and added a link to the Auth0 subscription page.
(Assignee)

Comment 13

2 years ago
Where should we escalate to ?
(Assignee)

Updated

2 years ago
Flags: needinfo?(tweir)
I've updated the runbook escalation chart after discussion with the team. Please let me know if it makes sense now.
Flags: needinfo?(tweir)
(Assignee)

Comment 15

2 years ago
Can't we use an escalation policy in pagerduty ?
A few updates and requests:
1. We have a method for adding MOC personnel to delegated admin accounts so they can submit support tickets to Auth0. We need a list of appropriate MOC personnel and we will add them.

2. We had an Auth0 outage today that was not caught by their own monitoring. They are investigating why their service failed and their monitoring failed, but in the interim MOC should also establish monitoring. We originally thought we could rely on Auth0, but obviously we cannot.
Please enable http connectivity checks (Pingdom or other preferred solution) on:
auth.mozilla.auth0.com
manage.mozilla.auth0.com

(In reply to Ludovic Hirlimann [:Usul] from comment #15)
> Can't we use an escalation policy in pagerduty ?
3. Can we walk through this? I don't understand what you are looking for or what an escalation policy should look like.
Flags: needinfo?(ludovic)
After chatting with :unixfairy - would it be possible to add a pingdom check (possibly, ASAP/before runbook is completed,) for connectivity failure to:

auth-dev.mozilla.auth0.com (TCP 443/HTTPS)
auth.mozilla.auth0.com (TCP 443/HTTPS)

thanks!
Follow up to comment 17:
Here's a better check also used by the Mozilla Participation team (on their Pingdom):

https://auth.mozilla.auth0.com/login?client=7wyIItkJX4t7vYEaDmGrwP9k2fBh5qWP
add the check "Check for string: should contain "auth0-lock-passwordless-button-text"

https://auth-dev.mozilla.auth0.com/login?client=7PQFR1tyqr6TIqdHcgbRcYcbmbgYflVE
add the check "Check for string: should contain "auth0-lock-passwordless-button-text"

This ensure the HTML is actually displayed. These client id's are testrp.allizom.security.org client ids and are normally to be kept forever.
(Assignee)

Updated

2 years ago
Depends on: 1323393
(Assignee)

Comment 19

2 years ago
(In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #16)
> A few updates and requests:
> 1. We have a method for adding MOC personnel to delegated admin accounts so
> they can submit support tickets to Auth0. We need a list of appropriate MOC
> personnel and we will add them.

this is currently
:sal
:vinh
:ashlee
:ryanc
:fauweh
:pir
:usul
:daveio
 
> 2. We had an Auth0 outage today that was not caught by their own monitoring.
> They are investigating why their service failed and their monitoring failed,
> but in the interim MOC should also establish monitoring. We originally
> thought we could rely on Auth0, but obviously we cannot.
> Please enable http connectivity checks (Pingdom or other preferred solution)
> on:
> auth.mozilla.auth0.com
> manage.mozilla.auth0.com

spawned bug 1323393 for this.
 
> (In reply to Ludovic Hirlimann [:Usul] from comment #15)
> > Can't we use an escalation policy in pagerduty ?
> 3. Can we walk through this? I don't understand what you are looking for or
> what an escalation policy should look like.

Of course I can. Accepted the invite for the meeting.
Flags: needinfo?(ludovic)
(Assignee)

Comment 20

2 years ago
Added a note on our onboarding process to request Auth0 access.
I guess subscribing mozilla-auth0@mozilla.pagerduty.com will open PD incidents for *anything prod related they send out. This will be tricky unless we know what's safe to ignore? 


[#42638] [Auth0 status] Investigating : Auth0 MFA (Guardian) is unresponsive
[#42639] [Auth0 status] Identified : Auth0 MFA (Guardian) is unresponsive
[#42640] [Auth0 status] Resolved : Auth0 MFA (Guardian) is unresponsive
I suspect we might document that in the runbook (?)
For ex we're not using Guardian so any message regarding Auth0 MFA (Guardian) are safe to ignore right now (there's no current plan to use this, though we'd have to update the runbook if we change plans)
(In reply to Guillaume Destuynder [:kang] (NEEDINFO to ensure replies) from comment #22)
> I suspect we might document that in the runbook (?)
> For ex we're not using Guardian so any message regarding Auth0 MFA
> (Guardian) are safe to ignore right now (there's no current plan to use
> this, though we'd have to update the runbook if we change plans)

I looked to see if notification outages for Guardian can be turned off, but it does not appear they can. However, I've added some text to the runbook to say ignore it.

Runbook is ready for final review.
(Assignee)

Comment 24

2 years ago
(In reply to Tristan Weir [:weir] (use NEEDINFO) from comment #23)
> (In reply to Guillaume Destuynder [:kang] (NEEDINFO to ensure replies) from
> comment #22)
> > I suspect we might document that in the runbook (?)
> > For ex we're not using Guardian so any message regarding Auth0 MFA
> > (Guardian) are safe to ignore right now (there's no current plan to use
> > this, though we'd have to update the runbook if we change plans)
> 
> I looked to see if notification outages for Guardian can be turned off, but
> it does not appear they can. However, I've added some text to the runbook to
> say ignore it.
> 
> Runbook is ready for final review.

Made a small change. asking for broader feedback.
(Assignee)

Comment 25

2 years ago
Crap I forgot one of my team mate can you add :jlaz to the admin group ?

Also we are diuscussing sending emails to all@ wrt to status.mozilla.com/org
(Assignee)

Comment 27

2 years ago
And we are good to go document wise.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
(In reply to Guillaume Destuynder [:kang] (NEEDINFO to ensure replies) from comment #18)
> Follow up to comment 17:
> Here's a better check also used by the Mozilla Participation team (on their
> Pingdom):
> 
> https://auth.mozilla.auth0.com/login?client=7wyIItkJX4t7vYEaDmGrwP9k2fBh5qWP
> add the check "Check for string: should contain
> "auth0-lock-passwordless-button-text"
> 
> https://auth-dev.mozilla.auth0.com/
> login?client=7PQFR1tyqr6TIqdHcgbRcYcbmbgYflVE
> add the check "Check for string: should contain
> "auth0-lock-passwordless-button-text"
> 
> This ensure the HTML is actually displayed. These client id's are
> testrp.allizom.security.org client ids and are normally to be kept forever.

kang, this validation is broken with the most recent auth0 changes. Do you have any updates on how we can fix our monitoring here?

When visiting the production URL above for example, we see:

"Oops!, something went wrong

There could be a misconfiguration in the system or a service outage. We track these errors automatically, but if the problem persists feel free to contact us.
Please try again."
Status: RESOLVED → REOPENED
Flags: needinfo?(gdestuynder)
QA Contact: mcristofi
Resolution: FIXED → ---
I think this is because you follow the 302 redirect. If you do not follow it this would still work, except the check string should be something that currently exist in the page (such as "This is the custom Mozilla Login Experience, designed and built by Mozilla's IAM Project." or anything from https://raw.githubusercontent.com/mozilla-iam/auth0-deploy/master/pages/login.html

Also, here's a URL where the redirect won't end up in an error if you can't "not" follow 302, or if that's easier:

https://auth.mozilla.auth0.com/login?client=7wyIItkJX4t7vYEaDmGrwP9k2fBh5qWP&protocol=oauth2&response_type=code

Basically, this adds the 2 required parameters
Flags: needinfo?(gdestuynder) → needinfo?(kferrando)
Thanks for the GET parameters kang, that got us working again. 

For the simple ping tests we are running on these in New Relic, we are not able to prevent following redirects.
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago5 months ago
Flags: needinfo?(kferrando)
Resolution: --- → FIXED
See Also: → bug 1324139
You need to log in before you can comment on or make changes to this bug.