Fri 08:34:58 PDT  elasticsearch3.bugs.scl3.mozilla.com:nodes - Elasticsearch is CRITICAL: CHECK_NRPE: Socket timeout after 60 seconds.
The other ES nodes complained about replica shards being missing. Running the elasticsearch che elasticsearch3.bugs.scl3 was not responding
Grr. Meant to clear the comment box and edit in a real text editing window.
Short form - I think elasticsearch3.bugs.scl3 JVM ran out of memory and locked up. Restarting elasticsearch resolved the issue. Long form - Shortly before this check alerted, the other ES nodes complained about replica shards being missing. Running the same command as nagios with the -vv flag showed which shards were missing. Formatting for prettiness and conciseness: Index 'public_comments20140317_131004' replica down on shard 0 replica down on shard 1 replica down on shard 2 Index 'public_bugs20140317_131002' replica down on shard 0 replica down on shard 1 replica down on shard 2 Index 'bug_hierarchy_20140515' replica down on shard 0 Looking at the /head plugin on elasticsearch[1-2].bugs.scl3 confirmed that they both thought elasticsearch3.bugs.scl3 was down; elasticsearch3.bugs.scl3's plugins were unreachable. The ES log on elasticsearch3 shows a few "java.lang.OutOfMemoryError: Java heap space" errors at 7:24 and 7:25 before it decides, in its infinite wisdom, that the master of the cluster (elasticsearch2) has left and it tries to elect itself as master before (presumably) wedging itself.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.