Closed Bug 1361503 Opened 8 years ago Closed 8 years ago

[Tracker] Switching symbols S3 bucket

Categories

(Socorro :: Symbols, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: peterbe, Unassigned)

References

()

Details

We need to switch which S3 buckets we use for storing and retrieving symbols. Please see attached URL to a Google Doc. It's shared (for editing) for all staff Mozillians. (Google docs are better for editing than slowly morphing a bug with multiple comments)
Miles, See the section about "Plan A" in the Google Doc. Can you confirm that we can set up a replication between webeng's org.mozilla.crash-stats.symbols-public + webeng's org.mozilla.crash-stats.symbols-private to Cloud Ops's public symbols bucket + private symbols bucket. What are the caveats and constraints: 0. First of all, is it possible? 1. If we write to the webeng bucket, will it appear in equivalent Cloud Ops bucket? 2. If we write to the Cloud Ops bucket, will it appear in equivalent webeng bucket? 3. How big is the delay between writing to one bucket and it appearing in the other (other org)?
Flags: needinfo?(miles)
0: Yes. 1-2: CRR is one directional, plus the only way we want to use it is ops=>webeng for security concerns. Writes to the ops bucket will be replicated to the webeng bucket. 3: Not sure, needs to be tested. Short.
Flags: needinfo?(miles)
(In reply to Miles Crabill [:miles] from comment #2) > 0: Yes. > 1-2: CRR is one directional, plus the only way we want to use it is > ops=>webeng for security concerns. Writes to the ops bucket will be > replicated to the webeng bucket. Today the socorro webapp is responsible for upload files to org.mozilla.crash-stats.symbols-public and org.mozilla.crash-stats.symbols-private. (I can go into detail about this works). Are you saying that, the day we enable all of this, we need to change the AWS credentials used by the webapp to use Cloud OPs AWS credentials instead to do the uploads? Some day Tecken is going to take over symbol upload (away from Socorro Webapp) but that is likely to be months away for production grade.
I'm saying that we are not putting ops infra credentials in the webeng infra. So: CRR would only go ops=>webeng. It sounds like that isn't ideal given that the webapp currently uploads symbols to those buckets. Given that, it sounds like migrating the webapp is going to be a priority for us. I'm working with Will on getting that spun-up ASAP, he's dockerizing the webapp and should be done early next week at which point we'll be able to test things out.
(In reply to Miles Crabill [:miles] from comment #4) > I'm saying that we are not putting ops infra credentials in the webeng infra. > > So: CRR would only go ops=>webeng. It sounds like that isn't ideal given > that the webapp currently uploads symbols to those buckets. Given that, it > sounds like migrating the webapp is going to be a priority for us. I'm > working with Will on getting that spun-up ASAP, he's dockerizing the webapp > and should be done early next week at which point we'll be able to test > things out. It's a race! Which will be first; Tecken Upload or Socorro in CloudOps. If Tecken Upload finishes first, we can basically ask RelEng to change their URL from `https://crash-stats.mozilla.com/symbols/upload` to `https://symbols.mozilla.org/upload` and that day ALL new symbols should be going into the NEW buckets. If that fulfills having CRR will still be useful since Socorro processor still reads from the old buckets. (Also, as a stopgap we can make Socorro Symbol Upload redirect to symbols.mozilla.org since I intend to manually copy over all API tokens from Socorro that have to do with symbol uploads. So if releng does a POST to https://crash-stats.mozilla.com/symbols/upload it'll 301 Redirect to https://symbols.mozilla.org/upload) If the migration of Socorro into CloudOps finishes first, we just need to change the S3 bucket names in a 4 places (environment variables).
I have put a CRR policy into place that will replicate the contents of the public symbols bucket in webeng infra to an intermediary bucket in ops infra (not the future prod bucket). This is because CRR can only go from one region to another, so the initial CRR will be us-west-2=>us-west-1 and the final CRR will be us-west-2=>us-west-2. Because nothing can ever be easy. The CRR process could take anywhere from a few days to a few weeks according to AWS support. Here we go!
Update: existing objects are actually being CRR'd to the new bucket!
Peter and I just had a conversation about the status of this migration. Status: We have set up CRR from us-west-2=>us-west-1 webeng=>ops prod. It's currently working. Tecken supports multiple buckets. This gives us flexibility. Some points: 1. CRR does not fit our use case. It is almost entirely limited to copying objects between buckets across regions. 2. The 'aws s3 sync' command in the cli (one time sync) between two buckets (either from webeng us-west-2 or from ops us-west-1 to ops us-west-2) would probably be more appropriate to our use case, and we could run this multiple times (manually) to keep the buckets roughly in sync. Once we switch to uploading to the us-west-2 bucket, we can run a final single direction sync (do not delete things in the new bucket) and then forget about it.
Peter and I had another conversation about the status of this migration. Only 1tb out of a total of ~70tb in the original bucket have been migrated after 2+ weeks of CRR. Given that status, we are going to can the CRR entirely, keep the symbols buckets in mozilla-webeng, and use IAM roles to allow instances in cloudops prod access. At least for now, we are not going to switch symbols buckets. Resolving WONTFIX.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.