Closed Bug 1289458 Opened 8 years ago Closed 8 years ago

add monitoring for Elasticsearch

Categories

(Socorro :: Infra, task, P1)

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1298035

People

(Reporter: willkg, Assigned: jschneider)

References

Details

(Moving this from https://github.com/mozilla/socorro-infra/issues/244)

We've had issues over the last couple of months where Elasticsearch shards run out of memory and start failing on queries causing results to be some percentage off of what they should be.

https://bugzilla.mozilla.org/show_bug.cgi?id=1288179
https://bugzilla.mozilla.org/show_bug.cgi?id=1276690

Last week, JP added two nodes to the ES cluster and said he'd add monitoring for it, too.

This bug covers adding monitoring to the ES cluster.
Assigning JP since he said he's working on it now.
Assignee: nobody → jschneider
Marking infra bugs that are important to get fixed asap as P1.
Priority: -- → P1
To test the monitoring in stage, temporarily kill an ES node (e.g. the master one) in the STAGE ES cluster and this page: https://crash-stats.allizom.org/monitoring/healthcheck/ should stop working.
Additionally, I've setup monitoring in datadog.  
It can be accessed via Dashboards-->All Dashboards-->Elasticsearch (right side, toward bottom).
See Also: → 1298035
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.