Update the MySQL slow query alert with SAX alert suppression

RESOLVED FIXED

Status

Cloud Services
Metrics: Pipeline
P1
normal
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: trink, Assigned: trink)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Comment hidden (empty)

Updated

3 years ago
Assignee: nobody → mtrinkala
Priority: -- → P1
(Assignee)

Comment 1

3 years ago
https://github.com/mozilla-services/puppet-config/pull/1607

Notes:
- Previously when the rate of change alert fired about 85% of the false positives could be detected as the maintenance event. However, if the alert is suppressed, and not throttled, the rate of change will continue to fire and the SAX patterns for the next hour have to be taken into account

- It appears the maintenance windows have changed on some of the sync boxes within the last week invalidating the regular maintenance pattern

- To address both of these issues we compare the current day to the previous day using SAX and alert if minimum distance between the two plots is greater than zero. Drawbacks:
  - Requires 2 days of data before monitoring becomes active (on initial startup i.e., no preserved state)
  - When an alert fires the suppression has to be set to 24 hours (since we are comparing full days the alert could continue to fire until we are outside the comparison window containing the alert)

- Don't have a defined use case for a valid alert (for testing we faked the maintenance not properly running, which alters the signature and causes an alert)

- Untested in production since we need to deploy a new version of Heka (:whd). If we fire an alert on in-actionable events too frequently the effect will no useful monitoring with one annoyance alert per day.
Status: NEW → ASSIGNED
Flags: needinfo?(whd)
(Assignee)

Updated

3 years ago
Blocks: 1220257
(Assignee)

Updated

3 years ago
No longer blocks: 1220257
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
(Assignee)

Updated

2 years ago
Flags: needinfo?(whd)
You need to log in before you can comment on or make changes to this bug.