Closed Bug 1239365 Opened 8 years ago Closed 8 years ago

Move to hosted ES for socorro

Categories

(Socorro :: Infra, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jschneider, Unassigned)

Details

Attachments

(1 file)

Adrian has requested we resize the staging socorro ES cluster to be closer to prod so he can test in staging, and this is a fantastic idea.

As we are talking about it, Lonnen fielded the idea of moving to hosted ES.  

Let's move to hosted ES!
Phrawzty, I wondered if I might get a consultation from you about this, just to fill in any gaps in my knowledge about the design of our cluster/ES processing.  

I'm interested in validating that a move to hosted ES doesn't lose us any custom fancypants stuff we might be using in our current env, and want to think a bit about how we move all the data.
Flags: needinfo?(dmaher)
Here's a simple plan to move to a new ES stage cluster: 

1. Get the new ES cluster set up
2. Point processors to both the existing and the new clusters
3. Let data flow in for about a week
4. Switch webapp to the new cluster

This way, when we switch we have enough data to test our reports consistently, if anything goes wrong we can just switch back, and we do not need to backfill any data, because for stage we don't really care about data older than a week. 

Note that this does not apply to production, we would have to run a pretty long and intense backfill for prod. The part where we set up the processors to send to both clusters still applies though.
Adrian's commentary is sound. Regarding hosted ES, do you mean a vanilla ES cluster hosted by some third party, or do you mean the AWS Elasticsearch Service[0]?

In the former, we'd need to ensure that our model can be implemented in their environment, or that their proposed model fits our needs (re: roles).  In the latter, uh, it's a black box so ¯\_(ツ)_/¯.

In either case, we'd also need to make sure that the template(s) we use is/are still functional in the target environment.  Finally - and this is probably the aspect most likely to break - we'd need to verify Groovy[1] functionality.


[0] https://aws.amazon.com/elasticsearch-service/
[1] https://github.com/mozilla/socorro/blob/ffd43518deb4a6e30ac21a656f671870690a2c98/docs/development/elasticsearch.rst#installation
Flags: needinfo?(dmaher)
Good news, phrawzty, as of bug 1237290 we do not need Groovy anymore! :)

I'll update the docs.
Attachment #8712677 - Flags: review?(jschneider)
This has been applied to stage a few minutes ago.
Attachment #8712677 - Flags: review?(jschneider)
This has been rolled back, we failed to write to the new ES cluster because of some permission issues in AWS, as far as I understood. We are going to give another go next Monday.
The issue here is that AWS does not place the ES cluster servers in the VPC, so we can't limit addresses to the internal IP addresses like we can for S3 or other services.

One option is to use NAT and route that traffic meant for ES through that NAT.  Another is to sign requests with an aws key/secret for auth.  I'm talking to AWS about this and the roadmap of how they plan on using these security features.
Hey JP, any news from this front?
Flags: needinfo?(jschneider)
We are looking at where all of the data winds up going, and I'm not sure we want to retool (rearrange chairs on the deck of the titanic, so to speak).  

We are improving monitoring and alerting on the app to watch the middleware and elasticsearch clusters more reliably, and adding monitoring back to elasticsearch itself to address.

It does not seem like we are continuing down this path, so closing as wontfix.  If I'm wrong, and this is still desired, feel free to reopen.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(jschneider)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: