Closed Bug 963883 Opened 11 years ago Closed 11 years ago

ES Cluster down, add automated check so reboots are done automatically

Categories

(bugzilla.mozilla.org :: Infrastructure, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: ekyle, Assigned: fubar)

References

(Blocks 1 open bug, )

Details

The public cluster is down again. Please reboot. Please add an automated check: Send http POST to each of http://elasticsearch1.bugs.scl3.mozilla.com:9200/public_bugs/bug_version/_search http://elasticsearch2.bugs.scl3.mozilla.com:9200/public_bugs/bug_version/_search http://elasticsearch3.bugs.scl3.mozilla.com:9200/public_bugs/bug_version/_search http://elasticsearch4.bugs.scl3.mozilla.com:9200/private_bugs/bug_version/_search http://elasticsearch5.bugs.scl3.mozilla.com:9200/private_bugs/bug_version/_search http://elasticsearch6.bugs.scl3.mozilla.com:9200/private_bugs/bug_version/_search with the following content { "query":{"filtered":{ "query":{"match_all":{}}, "filter":{"and":[ {"match_all":{}}, {"range":{"modified_ts":{"gte":1388534400000}}} ]} }}, "from":0, "size":0, "sort":[], "facets":{"0":{"statistical":{"field":"modified_ts"}}} } A timeout, bad response, or partial failure will indicate a need for reboot. ** Please notice the public cluster and private cluster have similar, but different URLs. *** This OutOfMemoryException issue is a big problem with ES version 20.x
Clusters restarted. Will see about the monitoring bits.
Assignee: server-ops-webops → klibby
Hi Kyle & Kendall, I am only seeing 19024 total documents. -Harsha
The public cluster is still down (http://elasticsearch1.bugs.scl[123].mozilla.com:9200) did it go down again, or did it never come back up?
More heap errors. I've restarted it and will check with :phrawzty on how to add the index.cache.field.type setting. :solarce may be upgrading another ES cluster tomorrow morning; if he does and it goes well, I'll likely ping you about upgrading these two.
Blocks: 959670
No longer blocks: 922877
All of the ES nodes are now monitored by nagios, including ES health checks.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Component: WebOps: Bugzilla → Infrastructure
Product: Infrastructure & Operations → bugzilla.mozilla.org
You need to log in before you can comment on or make changes to this bug.