Closed Bug 1872150 Opened 2 years ago Closed 2 years ago

crash stats outage (15 minutes) due to elasticsearch cluster going unresponsive

Categories

(Socorro :: Webapp, defect, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wsmwk, Unassigned)

References

()

Details

All urls starting with https://crash-stats.mozilla.org/ fail

Seems better now

Looks like between 13:28 UTC and 13:47 UTC, the Elasticsearch cluster had some kind of something where it was unresponsive, CPU spiked, and the node heap usage dropped. Everything looks ok now. I'll leave this open and if everything's still fine later, I'll close it out.

Everything continues to be working, so I'm going to close this out.

We declared an incident for the outage and I wrote up my observations in the incident document:

https://docs.google.com/document/d/1Hm2WWwSKmTARn_vBJakEpYjYUm4M6LzXkTJivb8liMc/edit

We'll do a retrospective in the next couple of weeks.

Status: NEW → RESOLVED
Closed: 2 years ago
Priority: -- → P1
Resolution: --- → FIXED
Summary: Internal Server Error at https://crash-stats.mozilla.org/ → crash stats outage (15 minutes) due to elasticsearch cluster going unresponsive
You need to log in before you can comment on or make changes to this bug.