Closed
Bug 1306597
Opened 8 years ago
Closed 6 years ago
Set up CloudWatch & event subscriptions for Heroku RDS instances
Categories
(Tree Management :: Treeherder: Infrastructure, defect, P1)
Tree Management
Treeherder: Infrastructure
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: emorley)
References
Details
Our monitoring options for the RDS instances used on Heroku are:
1) AWS CloudWatch (eg free disk space, CPU usage, other hardware metrics)
2) Event subscriptions (eg RDS instance configuration changes, failover, reboots, ...)
3) New Relic MySQL plugin
4) Database stats recorded by the New Relic Python agent, from the app's perspective.
#3 is a bit more involved (since it would require a service on another machine to run the plugin) and already has bug 1201063 filed.
#4 is already occurring.
This bug is about #1-2.
Initially I'll get alerts/notifications to be sent to just me. Then once proven non-spammy, I'll send them to the treeherder-internal list. Finally, we can then select a subset of the alerts to send to MOC too.
https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarm:alarmFilter=ANY
https://console.aws.amazon.com/rds/home?region=us-east-1#event-subscriptions:
Assignee | ||
Comment 1•7 years ago
|
||
Some Cloudwatch alerts about max queue size or some of the other equally-obscure metrics might have helped with the diagnosis of bug 1386331.
Assignee | ||
Updated•7 years ago
|
Priority: P2 → P1
Assignee | ||
Comment 2•7 years ago
|
||
Alert types to enable:
* disk usage
* CPU
Assignee | ||
Updated•6 years ago
|
Assignee: emorley → nobody
Status: ASSIGNED → NEW
Priority: P1 → P2
Assignee | ||
Comment 3•6 years ago
|
||
Today I received a low disk space alert for the dev RDS instance - which was resolved by an instance restart (guessing perhaps stray temp tables or similar? very strange). The alert only went to me since the notification settings have been unchanged since comment 0 - however that's not ideal moving forwards.
As such I've enabled failover, low disk space, ... notifications for all RDS instances, which will now be sent to treeherder-internal@ rather than just me. To modify settings go to:
https://console.aws.amazon.com/rds/home?region=us-east-1#event-subscriptions:
This doesn't include more granular alerts around things like CPU usage, since they have to be configured via CloudWatch (which we don't have sufficient IAM permissions to do at present) and would need a fair amount of tweaking to ensure that there are no false positives - however we can always add those at a later date.
Assignee: nobody → emorley
Status: NEW → RESOLVED
Closed: 6 years ago
Priority: P2 → P1
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•