OrangeFactor has it's own Elasticsearch instance, and at the moment I don't know whether it's being backed up. Could someone confirm whether it is, and if not, what the options are? (eg VM storage backup vs Elasticsearch index backup tool etc) The nodes are: orangefactor-elasticsearch1.webapp.scl3.mozilla.com orangefactor-elasticsearch2.webapp.scl3.mozilla.com orangefactor-elasticsearch3.webapp.scl3.mozilla.com orangefactor-elasticsearch4.webapp.scl3.mozilla.com orangefactor-elasticsearch5.webapp.scl3.mozilla.com Many thanks!
None of the hosts referenced are being directly backed up. The best option for Elasticsearch is the index backup tool. If you/someone can write a script to backup the elasticsearch data, and let me know where the data is being backed up, I can take it from there.
In the puppet repo I see there's a newlasticsearch module that has various backup/snapshot features, that are used by other ES clusters. Presumably this is what we need? Do you know who might be able to set this up?
My best guess would be "Data & BI Services Team"
Thank you - moving there. In the puppet repo's newlasticsearch module, I see: - newlasticsearch::auto_snapshot - newlasticsearch::auto_snapshot_job - newlasticsearch::auto_knapsack_job - newlasticsearch::snapshot_repo - newlasticsearch::backup - newlasticsearch::backup2 I'm struggling to work out what the difference is between them, and which we want. Elmo appears to use both newlasticsearch::auto_knapsack_job and newlasticsearch::auto_snapshot_job (see modules/elmo/manifests/search/prod_snapshots.pp). Backing up an ES instance feels like it should be a solved problem at Mozilla, given we have several clusters containing data we presumably care about? Eric, I see you adjusted some of the Elmo ES parts a while back. Do you have any ideas? Many thanks :-)
This would be a webops thing. Older versions of knapsack are compatible with OrangeFactor's old ES I believe. OrangeFactor uses the orange_factor puppet module which in turn uses newlasticsearch. It does not appear to do any snapshots/backups at the moment. If you'd like us to turn that on, just let us know the priority so we can schedule it in a sprint.
Ah thank you for the clarification. Yes please, can we enable backups of OrangeFactor within the next week or two, if possible? (Just since the RRA called it out specifically). Many thanks :-)
The cluster in comment 0 has two indices, `bugs` and `bzcache`. It's only really the former that we need a backup of (since `bzcache` will be automatically regenerated), though both is fine if easier, since `bzcache` is only 6MB compared to the 500MB of `bugs`.
Thanks for the info, we will schedule this work for our next 2-week sprint which starts June 8th.
hey Ed, Just to set expectations here - we might not have time to get to this in Q2 since we're working on upgrading the other ES clusters. If we do pick it up, we'll let you know (you'll see the bug being assigned etc) but right now it doesn't look like we can pick up this work in Q2.
Ok, thank you for the update.
The version of knapsack we have in our repos is not compatible with this very old version of ElasticSearch. The newer method of ES exports isn't supported by this version of ElasticSearch either. I looked into getting an older version of Knapsack but there isn't a pre-compiled one and so it has to be built using Maven and...I attempted it but failed and don't know Java well and Maven really at all. This is a large chunk of work to do properly (there's more to do this properly than building the extension (side note: I also looked at a hacky work around but couldn't get that going either)) and given the very old nature of this cluster we're going to politely won'tfix this. I had hoped it would be easier to do but it turned into quite a bit more.
Sounds like a massive pain - thank you for trying anyway, and totally understand the wontfix!