Closed Bug 1651313 Opened 5 years ago Closed 5 years ago

Removal of unneeded objects from net-mozaws-prod-delivery-firefox

Categories

(Cloud Services :: Operations: Product Delivery, task)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: oremj)

References

Details

Attachments

(3 files)

Having dug into the data from bug 1638181 I have found objects we can clean up. I'm going to split these up into logical groupings, and provide CSV manifests that can be used with S3 Batch. I'm assuming we will use that to tag objects and then use a lifecycle policy to do the deletion, but happy to regenerate in another format if you prefer.

This is everything which doesn't start with pub/, which is a 66.0.4-candidates directory which was probably from some manual work with the aws cli. It's in the wrong place and can be removed. 7074 objects using 119 GB.

These are all the nightly updates that aren't very useful after a few days. 2105844 objects, 102 TB.

You might need to trim the first line of bucket,key,version_id from the csv, not sure if S3 Batch wants it or not.

Details (for my records):

  • all have a prefix of /pub/firefox/nightly/20 and suffix of .mar
  • it's mostly complete mar's up until May 13th 2020, but there are some partials too
  • deliberately excludes the objects we want to keep for update watersheds in Balrog on the nightly channel (except those referenced by Firefox-mozilla-central-nightly-20200615092624, which aren't my inventory report)
  • I'll need to revisit this again after bug 1648285 has been running a while
  • I've omitted the version_id because it helps the compression of the manifest, even though the docs doesn't recomment this. Our objects aren't changing so there's no risk of deleting a newly added file. Also checked there aren't any objects with more than one version_id

Localised builds for aurora/beta/release, where buildbot was publishing releases into the firefox/nightly directory. The copy in firefox/releases is the canonical one. 1858762 objects, 30 TB.

Details:

  • prefix of pub/firefox/nightly/201 and matches regexp of /.*-(aurora|beta|release)-l10n/
  • newest file was in 2018-02-06-20-05-32-mozilla-release-l10n,
Assignee: nobody → oremj
Status: NEW → ASSIGNED

66.0.4-candidates/ has been deleted. It looks like s3 batch doesn't support a delete operation, so we'll likely just need to script the delete.

I think this scipt should be fine for the larger lists of objects, can you review before I run it?

import boto3
import csv

CHUNK = 1000
DIR = '/Users/oremj/Documents/work/one-offs/bug1651313'
with open(DIR + '/stale_nightly_update.csv') as f:
    reader = csv.reader(f)
    data = list(reader)[1:]

bucket = data[0][0]
if any((bucket != b for b, k in data)):
    print("Not all buckets the same.")
    exit(1)

client = boto3.client('s3')
for i in range(0, len(data), CHUNK):
    batch = data[i:i + CHUNK]
    objects = [{'Key': k} for b, k in batch]
    delete_op = {
        'Objects': objects,
        'Quiet': True,
    }
    res = client.delete_objects(Bucket=bucket, Delete=delete_op)
    errors = res.get('Errors', [])
    for err in errors:
        print(err)
Flags: needinfo?(nthomas)

(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #2)

Created attachment 9162111 [details]
stale_nightly_updates (csv.zst)

These objects have been deleted.

Flags: needinfo?(nthomas)

l10n has been deleted as well.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

Script looks fine to me, hopefully the error rate was low. Are you happy to continue with more cleanups this way or change the process ?

Yep, works for me.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: