RCA 2017-06-08 - SSO Gmail cutover to Auth0

RESOLVED FIXED

Status

RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: sal, Assigned: jen)

Tracking

Details

(Whiteboard: [ready for cab] [team:enterprise-apps])

(Reporter)

Updated

2 years ago
Assignee: nobody → jhayashi
Whiteboard: [action items pending assignment] [team:enterprise-apps]
Issue originally tracked in ServiceNow INC0023364.

Bug 1371289 is for a related Pingdom checks to be updated as a result of change to Auth0, but is not the incident tracking bug as originally thought.
# First notice of issues from 07:00 PDT maintenance 
2017-06-08 12:47:34 <jen>  moc:unixfairy - SSO Gmail cutover to Auth0

# Mitigated - Duration: 2h4m / 124 minutes
2017-06-08 14:51:16 <jen> moc:unixfairy:  Ok the SSO problem should be resolved for 99% of the users

# Fully recovered - Duration: 19h27m / 1167 minutes
2017-06-09 08:14:54 <jen> moc:unixfairy: the sso google problem is 100% fixed - everyone should be abel to get in by going to sso.mozilla.com
:jen, could we get some assignment/tracking bugs/details for the action items please?
Status: NEW → ASSIGNED
Flags: needinfo?(jhayashi)
(Assignee)

Comment 4

2 years ago
:pir - we haven't met to review this, those are just some suggestions I added to the google doc.  I'm not sure people are going to approve those action items?  Is there someone who is going to go over this with me?
Flags: needinfo?(jhayashi)
I'm not sure which "we" or "people" you're referring to there.

Action items are the responsibility of those involved in the incident that required the RCA and the service owners of the services involved since they're the people who should have the best insight into the system and any interactions that occurred to cause the problem. That would be, presumably, you and opssec?

Whatever would best prevent issues recurring.

Once the RCA doc is complete (which is assigned to you, being the service owner) it goes to the rca-discuss list to be reviewed and people may have further comments but there's nothing organised to go over things before that. If you want or need to discuss things with people then please do.
(Assignee)

Comment 6

2 years ago
I haven't done an RCA before, so I'm not familiar with the process.  I thought someone would walk me through to make sure we completed the form correctly.  Additionally, a third party would ensure we weren't too close to the problem and were objectively breaking down the problem similar to what we did with Shyam's certificate RCA.   Not that I'm not asking for a public forum, I just would like to ensure I actually am preventing the same issue from recurring and not going through the motions.

If you're saying that someone else already reviewed and feels like the RCA google doc is complete, then I'll just go ahead and submit the bugs for the action items I proposed.
(In reply to Jennifer Hayashi [:jen] from comment #6)
> I thought someone would walk me through to make sure we completed the form
> correctly.

I spot check things to make sure everything is there and there isn't anything obviously missing/odd/wrong before passing it along then the review is done by the rca-discuss list. There's no walk through I'm afraid but I can help with any questions.

If you think anyone else involved can add more information then by all means needinfo them in the bug and ask them to add details.

> Additionally, a third party would ensure we weren't too close to
> the problem and were objectively breaking down the problem similar to what
> we did with Shyam's certificate RCA.   Not that I'm not asking for a public
> forum, I just would like to ensure I actually am preventing the same issue
> from recurring and not going through the motions.

That was a full postmortem, normal RCAs just go out for review by email. If any of the reviewers feel strongly about it then there may be requests for more information or people may feel it warrants a postmortem to discuss in that level of detail.

> If you're saying that someone else already reviewed and feels like the RCA
> google doc is complete, then I'll just go ahead and submit the bugs for the
> action items I proposed.

It seems to be complete to me. The timeline, descriptions and root cause are the most important parts and the timeline appears very thorough (thanks for that).
(seems to be complete bar assignee/tracking information for the action items, I mean)
Depends on: 1372648
Depends on: 1372652
Whiteboard: [action items pending assignment] [team:enterprise-apps] → [ready for cab] [team:enterprise-apps]
Nothing further needed here, thanks.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.