Closed
Bug 1380128
Opened 7 years ago
Closed 7 years ago
SuperSearch partial downtime July 11 2017
Categories
(Socorro :: Webapp, task)
Socorro
Webapp
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: peterbe, Unassigned)
Details
We currently have a memory bloat in the ES cluster on prod. At the time of writing, we have 1 of 20 shards failing. See https://sentry.prod.mozaws.net/operations/socorro-prod/issues/622977/ (ConnectionErrors) and https://sentry.prod.mozaws.net/operations/socorro-prod/issues/344569/ (monitoring health check noticing some shards failing)
Reporter | ||
Comment 1•7 years ago
|
||
I've made a Status Message warning ("The ElasticSearch cluster is currently partially failing. See Bug 1380128 Some SuperSearch reporting yields 1/20th too few crashes.") note-to-self: https://crash-stats.mozilla.com/admin/status/
Reporter | ||
Comment 2•7 years ago
|
||
Healthcheck reports it back to working again: https://crash-stats.mozilla.com/monitoring/healthcheck/
Reporter | ||
Comment 3•7 years ago
|
||
Status message now disabled. Stability list was announced about the partial outage. All is well again.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 4•7 years ago
|
||
Interesting side-note; the Custom search (which is how I can see the _shards counts via the webapp) seems to cache responses. I avoided the caching by adding another index to the default custom query.
Comment 5•7 years ago
|
||
This is why we can't have nice things: https://pageshot.net/P33ar2CBs7bScZdM/app.datadoghq.com
Reporter | ||
Comment 6•7 years ago
|
||
(In reply to Miles Crabill [:miles] from comment #5) > This is why we can't have nice things: > https://pageshot.net/P33ar2CBs7bScZdM/app.datadoghq.com Part of me optimistically dreams that once we're in ES 5, we'll have much better control of the JVM heap. I.e. that ES does that better for us.
You need to log in
before you can comment on or make changes to this bug.
Description
•