Closed Bug 1050116 Opened 10 years ago Closed 10 years ago

elasticsearch8.webapp.phx1.mozilla.com is 503 status

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: dgarvey, Assigned: nmaul)

References

Details

(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/700] )

david garvey:dgarvey

Reporter

Description

•

10 years ago

I rebooted the box for elasticsearch8.webapp.phx1.mozilla.com and noticed it is not in the cluster.

:kanban

Updated

•

10 years ago

Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/700]

david garvey:dgarvey

Reporter

Comment 1

•

10 years ago

[root@elasticsearch8.webapp.phx1 ~]# curl http://localhost:9200/
{
  "ok" : true,
  "status" : 503,
  "name" : "elasticsearch8_phx1",
  "version" : {
    "number" : "0.90.10",
    "build_hash" : "0a5781f44876e8d1c30b6360628d59cb2a7a2bbb",
    "build_timestamp" : "2014-01-10T10:18:37Z",
    "build_snapshot" : false,
    "lucene_version" : "4.6"
  },
  "tagline" : "You Know, for Search"
}
[root@elasticsearch8.webapp.phx1 ~]# 


Also, It appears that elasticsearch7 has the same issues. I don't know how rebooting #8 caused #7 to drop out of the cluster?

C. Liang [:cyliang]

Comment 2

•

10 years ago

elasticsearch7 looks like it doesn't know who the master of the cluster is.  The cluster may have been in split-brain mode (where it thought that elasticsearch8 was master and now elasticsearch7 thinks that it is alone in the world).  If elasticsearch7 thinks that it is alone, it probably needs a reboot to re-ping for a master.

/var/log/elasticsearch/es_prod_phx1.log on elasticsearch7 shows:

[2014-08-07 05:15:13,961][WARN ][discovery.zen.ping.multicast] [elasticsearch7_phx1] received ping response ping_response{target [[elasticsearch8_phx1][Z4sE-xVNRhGhGpxkhVr_2w][inet[/10.8.81.149:9300]]], master [null], cluster_name[es_prod_phx1]} with no matching id [10160831]

:kanban

Updated

•

10 years ago

Assignee: server-ops-webops → nmaul

Jake Maul [:jakem]

Assignee

Comment 4

•

10 years ago

After 'yum-wrapper upgrade' on ES7 and a reboot, it still can't find the cluster.

I'm switching this cluster from multicast to unicast host discovery. The downside is each host has to be listed, but this is relatively easy to do in puppet. The upside is, IME on other clusters, it seems to be a good bit more reliable at discovering hosts. I'll update again if/when anything comes of this.

Jake Maul [:jakem]

Assignee

Comment 5

•

10 years ago

Okay, 7 came up and joined, and has taken over several shards.

That gets us out of the emergency situation of no redundancy (ES needs at least 3 nodes).

I've disabled puppet for 2 days on ES8, as well as chkconfig off'd elasticsearch on it. This is to try to prevent it from coming back online and then disrupting things by failing randomly.

Adrian J Fernandez [:Aj]

Comment 6

•

10 years ago

(meh meant to update earlier..)

Original Bug was 1050028 but closed that one as this latest one has more information.
As elasticsearch7.webapp.phx1 needed to be rebooted, following updated:
Storage: P410i Slot: 0 [5.70] -> [6.40]
BIOS: 05/05/2011 -> 07/02/2013

Jake Maul [:jakem]

Assignee

Comment 7

•

10 years ago

Back in service. Seems to be a problem caused by certain kernels and auditd? That work is happening in bug 1050013, nothing more to do here.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

5 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

elasticsearch8.webapp.phx1.mozilla.com is 503 status

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

Tracking

(Not tracked)

People

(Reporter: dgarvey, Assigned: nmaul)

References

Details

(Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/700] )

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Updated