Closed
Bug 963883
Opened 11 years ago
Closed 11 years ago
ES Cluster down, add automated check so reboots are done automatically
Categories
(bugzilla.mozilla.org :: Infrastructure, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: ekyle, Assigned: fubar)
References
(Blocks 1 open bug, )
Details
The public cluster is down again. Please reboot.
Please add an automated check:
Send http POST to each of
http://elasticsearch1.bugs.scl3.mozilla.com:9200/public_bugs/bug_version/_search
http://elasticsearch2.bugs.scl3.mozilla.com:9200/public_bugs/bug_version/_search
http://elasticsearch3.bugs.scl3.mozilla.com:9200/public_bugs/bug_version/_search
http://elasticsearch4.bugs.scl3.mozilla.com:9200/private_bugs/bug_version/_search
http://elasticsearch5.bugs.scl3.mozilla.com:9200/private_bugs/bug_version/_search
http://elasticsearch6.bugs.scl3.mozilla.com:9200/private_bugs/bug_version/_search
with the following content
{
"query":{"filtered":{
"query":{"match_all":{}},
"filter":{"and":[
{"match_all":{}},
{"range":{"modified_ts":{"gte":1388534400000}}}
]}
}},
"from":0,
"size":0,
"sort":[],
"facets":{"0":{"statistical":{"field":"modified_ts"}}}
}
A timeout, bad response, or partial failure will indicate a need for reboot.
** Please notice the public cluster and private cluster have similar, but different URLs.
*** This OutOfMemoryException issue is a big problem with ES version 20.x
Assignee | ||
Comment 1•11 years ago
|
||
Clusters restarted. Will see about the monitoring bits.
Assignee: server-ops-webops → klibby
Comment 2•11 years ago
|
||
Hi Kyle & Kendall,
I am only seeing 19024 total documents.
-Harsha
Reporter | ||
Comment 3•11 years ago
|
||
The public cluster is still down (http://elasticsearch1.bugs.scl[123].mozilla.com:9200) did it go down again, or did it never come back up?
Assignee | ||
Comment 4•11 years ago
|
||
More heap errors. I've restarted it and will check with :phrawzty on how to add the index.cache.field.type setting. :solarce may be upgrading another ES cluster tomorrow morning; if he does and it goes well, I'll likely ping you about upgrading these two.
Reporter | ||
Updated•11 years ago
|
Assignee | ||
Comment 5•11 years ago
|
||
All of the ES nodes are now monitored by nagios, including ES health checks.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: WebOps: Bugzilla → Infrastructure
Product: Infrastructure & Operations → bugzilla.mozilla.org
You need to log in
before you can comment on or make changes to this bug.
Description
•