Closed Bug 997234 Opened 7 years ago Closed 7 years ago

AWS monitoring for FxA/Sync: consumption and routing

Categories

(Cloud Services :: Operations: Metrics/Monitoring, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kthiessen, Assigned: mostlygeek)

Details

(Whiteboard: [qa+])

Per our discussions yesterday, we're going to need a way to consume AWS notifications and get them to the right places/people.

Cc'ing :mostlygeek, :gene, and :edwong, so they can chime in on the who and how.  And preferably when.
QA Contact: kthiessen
So right now we have some performance and availability monitoring that we set up in Stackdriver. If something is outside of our range it'll send a notification to PagerDuty which will go through the call list. 

That call list is: me, :ckolos currently. We'll need to take time to improve that with a call rotation. 

If anybody is interested, I can give a quick tour of stackdriver demonstrating 

- where to find information on FxA / Sync 
- monitoring / alerting features

It would be great to get Q/A more familiar with it as there's a lot of data showing how the system as a whole is performing.
Whiteboard: [qa+]
Have we scheduled this tour?  I know we're using stackdriver a lot more than we were two months ago.  Are we confident that the right people are getting paged under the right conditions?  If I don't hear anything by end-of-Q2, I'll close this out as complete.
Taking this ticket. 

Notes: 

- stackdriver w/ pagerduty integration has been great
- write up docs and procedures for MOC and have them to be the first point of contact
- figure out w/ MOC how to escalate / rotate call schedules
Assignee: nobody → bwong
Any word on this?
Adding ni? to Ben because it's been a month and he may be buried in bugmail.
Flags: needinfo?(bwong)
:kthiessen we have stackdriver and pagerduty setup and it has been working really good. We're on hold on what we want to do with the new MOC for now. 

Not entirely sure what this bug is for anymore. :)
Flags: needinfo?(bwong)
Closing this as complete since no one has spoken up saying that present monitoring is inadequate.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
QA Contact: kthiessen
You need to log in before you can comment on or make changes to this bug.