We currently have a memory bloat in the ES cluster on prod. At the time of writing, we have 1 of 20 shards failing. See https://sentry.prod.mozaws.net/operations/socorro-prod/issues/622977/ (ConnectionErrors) and https://sentry.prod.mozaws.net/operations/socorro-prod/issues/344569/ (monitoring health check noticing some shards failing)
I've made a Status Message warning ("The ElasticSearch cluster is currently partially failing. See Bug 1380128 Some SuperSearch reporting yields 1/20th too few crashes.") note-to-self: https://crash-stats.mozilla.com/admin/status/
Healthcheck reports it back to working again: https://crash-stats.mozilla.com/monitoring/healthcheck/
Status message now disabled. Stability list was announced about the partial outage. All is well again.
Interesting side-note; the Custom search (which is how I can see the _shards counts via the webapp) seems to cache responses. I avoided the caching by adding another index to the default custom query.
This is why we can't have nice things: https://pageshot.net/P33ar2CBs7bScZdM/app.datadoghq.com
(In reply to Miles Crabill [:miles] from comment #5) > This is why we can't have nice things: > https://pageshot.net/P33ar2CBs7bScZdM/app.datadoghq.com Part of me optimistically dreams that once we're in ES 5, we'll have much better control of the JVM heap. I.e. that ES does that better for us.