elaticsearch3.bugs.scl3.mozilla.com:nodes - Elasticsearch is CRITICAL

RESOLVED FIXED

Status

RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: afernandez, Assigned: cliang)

Tracking

Details

(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/1208] )

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
Fri 08:34:58 PDT [5225] elasticsearch3.bugs.scl3.mozilla.com:nodes - Elasticsearch is CRITICAL: CHECK_NRPE: Socket timeout after 60 seconds.
(Assignee)

Comment 1

4 years ago
The other ES nodes complained about replica shards being missing.  Running the elasticsearch che  elasticsearch3.bugs.scl3 was not responding
(Assignee)

Comment 2

4 years ago
Grr.  Meant to clear the comment box and edit in a real text editing window.

Updated

4 years ago
Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/1208]
(Assignee)

Updated

4 years ago
Assignee: nobody → cliang
(Assignee)

Comment 3

4 years ago
Created attachment 8485037 [details]
bugs_public.log
(Assignee)

Comment 4

4 years ago
Short form -
I think elasticsearch3.bugs.scl3 JVM ran out of memory and locked up.  Restarting elasticsearch resolved the issue.

Long form -

Shortly before this check alerted, the other ES nodes complained about replica shards being missing.  Running the same command as nagios with the -vv flag showed which shards were missing.  Formatting for prettiness and conciseness:

  Index 'public_comments20140317_131004' 
    replica down on shard 0 
    replica down on shard 1 
    replica down on shard 2 
  Index 'public_bugs20140317_131002' 
    replica down on shard 0 
    replica down on shard 1 
    replica down on shard 2 
  Index 'bug_hierarchy_20140515' 
    replica down on shard 0

Looking at the /head plugin on elasticsearch[1-2].bugs.scl3 confirmed that they both thought elasticsearch3.bugs.scl3 was down; elasticsearch3.bugs.scl3's plugins were unreachable.

The ES log on elasticsearch3 shows a few "java.lang.OutOfMemoryError: Java heap space" errors at 7:24 and 7:25 before it decides, in its infinite wisdom, that the master of the cluster (elasticsearch2) has left and it tries to elect itself as master before (presumably) wedging itself.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.