Closed
Bug 943492
Opened 11 years ago
Closed 8 years ago
Sort out symbol cleanup in the S3 world
Categories
(Socorro :: Infra, task)
Socorro
Infra
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: selenamarie, Assigned: miles)
References
Details
Attachments
(1 file)
255 bytes,
text/x-vhdl
|
Details |
We've run into a bit of an excess breakpad symbols issue, and I'm wondering what we really aught to be keeping. Here's a look at what we're at currently: symbols_ubuntu: 3497 (2483 prior to 2013, 1014 builds in 2013) symbols_opensuse: 2063 (no build dates) symbols_ffx: 6049 (1193 prior to 2013, 4856 builds in 2013) And there's more. We've got 1.6 TB of symbols that are largely unused and don't appear to be useful for debugging. Here's a possibility Ted and I discussed in IRC: * Keep current_release - 2 + ESR builds * Keep 6 months of nightly/aurora/beta
Reporter | ||
Comment 1•11 years ago
|
||
From IRC: These don't need to be online: /mnt/socorro/symbols/symbols_xr /mnt/socorro/symbols/symbols_sunbird
Comment 2•11 years ago
|
||
Our current retention policy is "whatever Ted puts in the cleanup script": http://hg.mozilla.org/build/tools/file/tip/buildfarm/breakpad/cleanup-breakpad-symbols.py Currently we attempt to retain releases (including betas) forever and nightlies (from any branch) for 30 days.
Comment 3•9 years ago
|
||
This doesn't block bug 1071724 currently, but we should get a handle on it after we migrate.
Comment 4•9 years ago
|
||
Beyond codifying and improving our existing retention policies, we'll need to get a handle on how cleanup is actually going to work with the symbols in S3. rhelmer, lonnen and I discussed this when we ran down the list of migration issues in November and we punted on it at the time because we can afford to let the symbol store grow for a while until we figure it out. I believe our short-term plan, if necessary, was to just spin up an EC2 VM, mount the s3 bucket using s3fs, and run the existing cleanup script against the s3 bucket (which is ridiculous but will probably work). Longer-term I'd like to write something that's using the data stored by the symbol upload API: https://github.com/mozilla/socorro/blob/9e8fec7d47fc23e51a28f07d4ee12c81c8a2d609/webapp-django/crashstats/symbols/models.py#L27-L33
Summary: Retention policy for breakpad symbols → Sort out symbol cleanup in the S3 world
Comment 5•8 years ago
|
||
We've been running for like a year without actually thinking about this, hooray for the cloud! We should probably figure out what we're going to do here. I can spend some time on this next quarter if we get a plan together. I'm going to be in SF in a few weeks if any of you would like to have a little meetup about it.
Comment 6•8 years ago
|
||
When we met in Portland we decided that the S3 storage wasn't a big problem. Cost-wise it's a fart in the wind compared to other costs such as EC2 instances. We talked about having multiple buckets with varying TTLs but when we realized it's not a huge we decided to, basically, not worry about it. However, we decided we could set *a* TTL for the bucket. One that is really long so it acts as a "common denominator" even for the OS symbols worth keeping for a long time. Did we ever do that?
Comment 7•8 years ago
|
||
Ted, I checked https://console.aws.amazon.com/s3/home?region=us-west-2#&bucket=org.mozilla.crash-stats.symbols-public&prefix= and there is no Lifecycle set up there. So if you can decide what's the minimum amount of time we (read JP) can set a TTL for deletion of symbols that are getting ancient.
Flags: needinfo?(ted)
Comment 8•8 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #6) > However, we decided we could set *a* TTL for the bucket. One that is really > long so it acts as a "common denominator" even for the OS symbols worth > keeping for a long time. Did we ever do that? We threw out 2 years as a ballpark figure. Can we override the lifecycle for individual symbol files if we want? It would be nice to be able to set OS symbols to be kept forever, but it's probably not a big deal if they aren't.
Flags: needinfo?(ted)
Comment 9•8 years ago
|
||
You mean, we set a general 2 years TTL for the whole public bucket but pinpoint a couple specific files that should last, say, 10 years instead? JP, is that possible? To be clear (so you don't have to read the past comments) this S3 bucket has no TTL at the moment. There are some files that ought to live for longer than the default TTL. Can you override some files in the bucket to live longer than the default TTL? Ted, How many files are we talking about? So few you can manually fix them? Or do we need to script it with some regexes etc?
Flags: needinfo?(jschneider)
Comment 10•8 years ago
|
||
I was more thinking that we could add a way for the symbol uploader to override it for specific symbol uploads, but it's probably not worth the effort. Anything that is still in use could just get re-uploaded by my scripts anyway.
Comment 11•8 years ago
|
||
So if you have the ones locally - that needs to last longer than 2 years - then every time you update them, it should overwrite and thus postpone by 2 years every time. How about we simply set it to 4 years and keep it that simple. Is that too short? If, god forbid, we expired some old OS level (e.g. windows XP) symbols, do we have a way to re-upload them? ...or is that dependent on you and you not getting hit by a bus?
Comment 12•8 years ago
|
||
2 years should be plenty long. I have a cron job that uploads symbols, it's currently dependent on my local machine (as I believe we've discussed in the past) but I should hopefully fix that, so it won't be a problem.
Updated•8 years ago
|
Component: General → Infra
Comment 13•8 years ago
|
||
JP, Can you please set up a TTL of all files in the bucket: org.mozilla.crash-stats.symbols-public [0] The default time across all objects it 2 years. [0] https://console.aws.amazon.com/s3/home?region=us-west-2#&bucket=org.mozilla.crash-stats.symbols-public&prefix=
Assignee: nobody → jschneider
Flags: needinfo?(jschneider)
Comment 14•8 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #9) > You mean, we set a general 2 years TTL for the whole public bucket but > pinpoint a couple specific files that should last, say, 10 years instead? > > JP, is that possible? This can be done with prefixes: https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html Also per that doc, if there's any concern about needing to ever retrieve these, it's pretty easy to move things to Glacier automatically too (for instance keep in S3 for 1 year and Glacier for 10). The cost for Glacier is far less but is slow and expensive to recover from.
Comment 15•8 years ago
|
||
I'd prefer to avoid Glacier
Assignee | ||
Comment 17•8 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #14) > (In reply to Peter Bengtsson [:peterbe] from comment #9) > > You mean, we set a general 2 years TTL for the whole public bucket but > > pinpoint a couple specific files that should last, say, 10 years instead? > > > > JP, is that possible? > > > This can be done with prefixes: > > https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html > > Also per that doc, if there's any concern about needing to ever retrieve > these, it's pretty easy to move things to Glacier automatically too (for > instance keep in S3 for 1 year and Glacier for 10). > > The cost for Glacier is far less but is slow and expensive to recover from. This is indeed possible. However, looking at the contents of this bucket I don't see a reasonable prefix for objects/directories that should be subject to deletion vs. those should be retained. It is important to note that lifecycle rules are simple prefixes, not regex, and there (I believe) there is no concept of rule precedence. If you can provide me with a prefix for deletion, I will use that, otherwise I will add the policy to everything in the bucket as mentioned above. The simplest policy: prefix: / (everything) expiration_days: 730 (2 years) Holding off on the needinfo to see if there's a better way to do this.
Flags: needinfo?(ted)
Flags: needinfo?(peterbe)
Comment 18•8 years ago
|
||
That's correct. You can apply it with prefix /. I.e. everything.
Flags: needinfo?(ted)
Flags: needinfo?(peterbe)
Assignee | ||
Comment 19•8 years ago
|
||
The lifecycle rule has been applied. I ran into issues with Ansible/Boto because the bucket has .'s in its name, so I added the rule manually through the AWS console. I'll attach the corresponding Ansible playbook to the rule I added. The prefix I used is /v1/, which should still be everything of relevance in the bucket.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 20•8 years ago
|
||
Ansible playbook that corresponds to the rule I manually added w/ AWS console.
You need to log in
before you can comment on or make changes to this bug.
Description
•