Closed Bug 1171627 Opened 9 years ago Closed 9 years ago

Operational alerts for pipeline health (stackdriver/datadog)

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kparlante, Assigned: relud)

Details

(Whiteboard: [unifiedTelemetry][40b9])

Including but not limited to:
- network use
- memory use
Priority: -- → P1
Whiteboard: [unifiedTelemetry][b5]
Assignee: whd → dthornton
we now have alerts in place for disk usage, memory usage, and ntp drift.
Status: NEW → ASSIGNED
the remaining alerts i'm going to configure are when instance sizes are reaching bandwidth limits, and when 5xx's on the elb are too high.
Whiteboard: [unifiedTelemetry][b5] → [unifiedTelemetry][40b9]
elb alerts are in place, bandwidth alerts are going to be more difficult and i'm still working out how to accomplish those.
Iteration: --- → 42.3 - Aug 10
Iteration: 42.3 - Aug 10 → 43.1 - Aug 24
We haven't had an issues for the last few months that went undetected, so I'll call this done. We can configure bandwidth alerts if they become relevant, probably 42.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.