Increase number of replicas for our elasticsearch database

RESOLVED WONTFIX

Status

Socorro
Infra
RESOLVED WONTFIX
3 years ago
3 years ago

People

(Reporter: adrian, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
Jake is suggesting in an email that we raise the number of replicas we use in our elasticsearch indices to 2 instead of 1 (meaning that we would have 3 copies of each document instead of just 2 at the moment). 

There are several questions here: 

1. Do we care much of data consistency in elasticsearch? (as long as we have it our primary storage, we can always backfill missing data, but that is still a pain to do and it's preferable to avoid it)
2. We currently have a LOT of free disk space in our cluster thanks to the new machines and a reconfiguration of the old ones. However, we are planning to process 100% of our crashes in the near future (we currently process only about 12%). How would having 2 replicas impact data size? How many more machines would we have to add to the cluster to support it? 
3. Other questions? 

I am bringing this discussion in a public place for anyone with an opinion to jump in. Please comment and discuss this issue, so we can take an informed decision.

Comment 1

3 years ago
From email:

"Note that we currently index the dump twice, the pipe dump and the json dump. We will stop indexing the pipe dump at some point, hopefully soon, so the average size of documents will decrease by a significant factor."

Any guesses on the size of this, or the timeline on doing it?

Comment 2

3 years ago
100% processing is unlikely to impact ES. 

We spec'd storing 100% for a baseline but our current plan is to throttle after processing. We'll write down UUID + support classifier output in a separate, short lived table for 100% but throttle recording the crash in the primary Socorro data stores.
What is the impetus for having two replicas instead of one?  Is it purely a redundancy issue, or are there other operational concerns or service-level advantages that should be elucidated upon?
Closing due to bug 1118471.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.