Removal of unneeded objects from net-mozaws-prod-delivery-firefox
Categories
(Cloud Services :: Operations: Product Delivery, task)
Tracking
(Not tracked)
People
(Reporter: nthomas, Assigned: oremj)
References
Details
Attachments
(3 files)
Having dug into the data from bug 1638181 I have found objects we can clean up. I'm going to split these up into logical groupings, and provide CSV manifests that can be used with S3 Batch. I'm assuming we will use that to tag objects and then use a lifecycle policy to do the deletion, but happy to regenerate in another format if you prefer.
| Reporter | ||
Comment 1•5 years ago
•
|
||
This is everything which doesn't start with pub/, which is a 66.0.4-candidates directory which was probably from some manual work with the aws cli. It's in the wrong place and can be removed. 7074 objects using 119 GB.
| Reporter | ||
Comment 2•5 years ago
•
|
||
These are all the nightly updates that aren't very useful after a few days. 2105844 objects, 102 TB.
You might need to trim the first line of bucket,key,version_id from the csv, not sure if S3 Batch wants it or not.
Details (for my records):
- all have a prefix of
/pub/firefox/nightly/20and suffix of.mar - it's mostly complete mar's up until May 13th 2020, but there are some partials too
- deliberately excludes the objects we want to keep for update watersheds in Balrog on the nightly channel (except those referenced by Firefox-mozilla-central-nightly-20200615092624, which aren't my inventory report)
- I'll need to revisit this again after bug 1648285 has been running a while
- I've omitted the version_id because it helps the compression of the manifest, even though the docs doesn't recomment this. Our objects aren't changing so there's no risk of deleting a newly added file. Also checked there aren't any objects with more than one version_id
| Reporter | ||
Comment 3•5 years ago
|
||
Localised builds for aurora/beta/release, where buildbot was publishing releases into the firefox/nightly directory. The copy in firefox/releases is the canonical one. 1858762 objects, 30 TB.
Details:
- prefix of
pub/firefox/nightly/201and matches regexp of/.*-(aurora|beta|release)-l10n/ - newest file was in
2018-02-06-20-05-32-mozilla-release-l10n,
| Assignee | ||
Updated•5 years ago
|
| Assignee | ||
Comment 4•5 years ago
|
||
66.0.4-candidates/ has been deleted. It looks like s3 batch doesn't support a delete operation, so we'll likely just need to script the delete.
| Assignee | ||
Comment 5•5 years ago
|
||
I think this scipt should be fine for the larger lists of objects, can you review before I run it?
import boto3
import csv
CHUNK = 1000
DIR = '/Users/oremj/Documents/work/one-offs/bug1651313'
with open(DIR + '/stale_nightly_update.csv') as f:
reader = csv.reader(f)
data = list(reader)[1:]
bucket = data[0][0]
if any((bucket != b for b, k in data)):
print("Not all buckets the same.")
exit(1)
client = boto3.client('s3')
for i in range(0, len(data), CHUNK):
batch = data[i:i + CHUNK]
objects = [{'Key': k} for b, k in batch]
delete_op = {
'Objects': objects,
'Quiet': True,
}
res = client.delete_objects(Bucket=bucket, Delete=delete_op)
errors = res.get('Errors', [])
for err in errors:
print(err)
| Assignee | ||
Comment 6•5 years ago
|
||
(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #2)
Created attachment 9162111 [details]
stale_nightly_updates (csv.zst)
These objects have been deleted.
| Assignee | ||
Comment 7•5 years ago
|
||
l10n has been deleted as well.
| Reporter | ||
Comment 8•5 years ago
|
||
Script looks fine to me, hopefully the error rate was low. Are you happy to continue with more cleanups this way or change the process ?
| Assignee | ||
Comment 9•5 years ago
|
||
Yep, works for me.
Description
•