Closed
Bug 963245
Opened 12 years ago
Closed 11 years ago
MDN: ElasticHttpError: Non-OK response returned (500):
Categories
(Infrastructure & Operations Graveyard :: WebOps: Community Platform, task)
Infrastructure & Operations Graveyard
WebOps: Community Platform
x86
macOS
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: groovecoder, Unassigned)
References
()
Details
ElasticHttpError: Non-OK response returned (500): u'SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[_na_][mdnprod-main_index][0]: No active shards}{[_na_][mdnprod-main_index][1]: No active shards}{[_na_][mdnprod-main_index][2]: No active shards}{[_na_][mdnprod-main_index][3]: No active shards}{[_na_][mdnprod-main_index][4]: No active shards}]'
https://errormill.mozilla.org/mdn/mdn/group/154108/
| Reporter | ||
Updated•12 years ago
|
Updated•12 years ago
|
Assignee: server-ops-webops → bburton
Comment 1•12 years ago
|
||
We apologize for the disruption in service. Nagios alerted the MOC about the cluster health and this was escalated to WebOps as your bug was being filed.
The ElasticSearch cluster that provides services to MDN encountered a cluster-wide issue with trying to perform garbage collection due to hitting it's memory ceiling in combination with a particular index for another service
I'm investigating what caused the issues with the particular index as well as what tuning and upgrades are available, as many OOM fixes are in 0.90.x and later.
https://developer.mozilla.org/en-US/search?q=string is now returning results as expected for me.
Please let me know if I can answer any questions.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 2•12 years ago
|
||
This is back. :(
https://errormill.mozilla.org/mdn/mdn/group/146983/
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 3•12 years ago
|
||
(In reply to Luke Crouch [:groovecoder] from comment #2)
> This is back. :(
>
> https://errormill.mozilla.org/mdn/mdn/group/146983/
Still working on fixing the other index, which is causing some cluster instability, will leave this open until fully resolved but the mdn_prod index is now green
Comment 4•12 years ago
|
||
Cluster status is now green.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 5•12 years ago
|
||
This occurred again @ 12:44 pm PST.
Solarce is currently investigating.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 6•12 years ago
|
||
Today's issue has been resolved, the mdn index was only available for a few minutes
bug 963824 is tracking the plan for permanent resolution of the issue
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 7•12 years ago
|
||
Getting these errors again.
https://rpm.newrelic.com/accounts/263620/applications/3172075/traced_errors/1484382839
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 8•12 years ago
|
||
looks to be related to bug 993671. :cyliang comments on how the permanent fix is an es upgrade. i know there is a bug tracking that, but off the top of my head don't have context on it.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 9•12 years ago
|
||
Again: https://rpm.newrelic.com/accounts/263620/applications/3172075/traced_errors/1492808873
Do we need an ES reboot?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Reporter | ||
Comment 10•12 years ago
|
||
Fixed by :cyliang.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 11•12 years ago
|
||
ES cossetted and rebooted. The upgrade bug in question is 963824. Discussions (outside of Bugzilla) about ameliorating this issue in other ways have started.
| Reporter | ||
Comment 12•12 years ago
|
||
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 13•12 years ago
|
||
Continued issues with 0.20.x GC pauses are affecting this uptime, due to the shared nature and mixed use of the SCL3 cluster, moving forward with giving MDN its own cluster for prod, this is being tracked in bug 995457
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 14•11 years ago
|
||
This happened again today. (For future reference, bugs 1000488, 1000489, 1000490, and 1000493 have the Nagios alert texts.)
| Reporter | ||
Comment 15•11 years ago
|
||
And again today. Can we get another restart? https://rpm.newrelic.com/accounts/263620/applications/3172075/traced_errors/1567702582
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 16•11 years ago
|
||
Passing to :cyliang as she's been working on the final resolution
Assignee: bburton → server-ops-webops
Severity: blocker → normal
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•