Closed Bug 1380128 Opened 7 years ago Closed 7 years ago

SuperSearch partial downtime July 11 2017

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: peterbe, Unassigned)

Details

Peter Bengtsson [:peterbe]

Reporter

Description

•

7 years ago

We currently have a memory bloat in the ES cluster on prod. 
At the time of writing, we have 1 of 20 shards failing. 

See https://sentry.prod.mozaws.net/operations/socorro-prod/issues/622977/ (ConnectionErrors) and https://sentry.prod.mozaws.net/operations/socorro-prod/issues/344569/ (monitoring health check noticing some shards failing)

Peter Bengtsson [:peterbe]

Reporter

Comment 1

•

7 years ago

I've made a Status Message warning 
("The ElasticSearch cluster is currently partially failing. See Bug 1380128 Some SuperSearch reporting yields 1/20th too few crashes.")
note-to-self: https://crash-stats.mozilla.com/admin/status/

Peter Bengtsson [:peterbe]

Reporter

Comment 2

•

7 years ago

Healthcheck reports it back to working again: https://crash-stats.mozilla.com/monitoring/healthcheck/

Peter Bengtsson [:peterbe]

Reporter

Comment 3

•

7 years ago

Status message now disabled. Stability list was announced about the partial outage. 

All is well again.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Peter Bengtsson [:peterbe]

Reporter

Comment 4

•

7 years ago

Interesting side-note; the Custom search (which is how I can see the _shards counts via the webapp) seems to cache responses. I avoided the caching by adding another index to the default custom query.

Miles Crabill [:miles]

Comment 5

•

7 years ago

This is why we can't have nice things: https://pageshot.net/P33ar2CBs7bScZdM/app.datadoghq.com

Peter Bengtsson [:peterbe]

Reporter

Comment 6

•

7 years ago

(In reply to Miles Crabill [:miles] from comment #5)
> This is why we can't have nice things:
> https://pageshot.net/P33ar2CBs7bScZdM/app.datadoghq.com

Part of me optimistically dreams that once we're in ES 5, we'll have much better control of the JVM heap. I.e. that ES does that better for us.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

SuperSearch partial downtime July 11 2017

Categories

(Socorro :: Webapp, task)

Tracking

(Not tracked)

People

(Reporter: peterbe, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6