Closed Bug 1373754 Opened 8 years ago Closed 3 years ago

Investigate AWS snapshots in each region

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: garndt, Unassigned)

References

Details

Attachments

(1 file)

abandoned_snapshots.py 8 years ago Greg Arndt [:garndt] 1023 bytes, text/x-python-script		Details

Greg Arndt [:garndt]

Reporter

Description

•

8 years ago

Attached file abandoned_snapshots.py — Details

We have a large number of snapshots in each of our regions. From what I understand these are snapshots of volumes for AMIs not volumes for running instances, but I could be wrong. I wrote up a script that (as hacky as it may be) should show all snapshots that are currently not associated with an active ami, which leads me to believe that perhaps an AMI is removed/deregistered but the corresponding snapshot is not removed. Based on what I think is the right storage cost, $0.049/gb/month this can add up to quite a bit of money if these snapshots are lingering around and really are not used. We need to investigate that these are indeed left over and not used, and if so, remove them. The script is attached and here is the output from running it today: us-west-1 rogue snapshots: 2341 size: 132851 GiB -------------------------------------------------- us-west-2 rogue snapshots: 2149 size: 124610 GiB -------------------------------------------------- us-east-1 rogue snapshots: 2363 size: 134546 GiB -------------------------------------------------- us-east-2 rogue snapshots: 449 size: 35260 GiB -------------------------------------------------- eu-central-1 rogue snapshots: 2271 size: 134888 GiB -------------------------------------------------- total rogue snapshots: 9573 total size: 562155 GiB

Dustin J. Mitchell [:dustin] (he/him)

Comment 1

•

8 years ago

Based on the counts, I'm guessing these are associated with building windows AMIs?

Greg Arndt [:garndt]

Reporter

Comment 2

•

8 years ago

I believe so. Pete took a look at them, and also based on the size seem to be for Windows. Linux uses 8gb snapshots for the AMIs, and most of these are much larger than that.

Pete Moore [:pmoore][:pete]

Comment 3

•

8 years ago

I think OCC just deletes one snapshot per AMI[1], but I think there might be more than one created per AMI (due to having multiple drives). I haven't dug deeper into it yet, but that could be a possible cause. -- [1] https://github.com/mozilla-releng/OpenCloudConfig/blob/769bc87944edaefbdc41328ab64dcd656a9a478f/ci/update-workertype.sh#L205

Pete Moore [:pmoore][:pete]

Comment 4

•

8 years ago

Another thing to check is e.g. if when AMIs are copied across regions, if the snapshots are copied too (presumably they must be) and then to check if the cleanup script also purges snapshots in those other regions too.

Chris Cooper [:coop] (he/him)

Comment 5

•

7 years ago

Found in triage. These are still peanuts compared to artifact storage and active EBS volumes, but for the sake of hygiene we should clean this up periodically. I'd like to see a monthly report that contains the manual commands to delete rogue volumes if there are no anolmalies after inspection.

Priority: -- → P5

Chris Cooper [:coop] (he/him)

Comment 6

•

7 years ago

Found in triage. The worker build process is changing, i.e. images may not even still be created this way, making it easy to cleanup at that point.

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: Operations → Operations and Service Requests

Chris Cooper [:coop] (he/him)

Comment 8

•

6 years ago

This is still an issue.

I think it would be worthwhile for Pete, Wander, and Rob to sit down (perhaps virtually) and go through the list of existing EBS snapshots again and see what can be purged. Maybe that even leads to a heuristic that can become a cleanup script?

Chris Cooper [:coop] (he/him)

Comment 9

•

6 years ago

•

Edited

Here is some investigation we did earlier this year around instance cleanup: https://docs.google.com/spreadsheets/d/1IOOLikW3ms1yEs8HNzr6YetSU77U5jdy8RZQ6fFkvJU/edit#gid=0

Still lots of EBS snapshots to go through.

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

6 years ago

Assignee: nobody → relops

Component: Operations and Service Requests → RelOps: General

Product: Taskcluster → Infrastructure & Operations

QA Contact: klibby

ablew

Comment 10

•

3 years ago

we'll focus on migration and revisit this if needed.

Status: NEW → RESOLVED

Closed: 3 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Investigate AWS snapshots in each region

Categories

(Infrastructure & Operations :: RelOps: General, task, P5)

Tracking

(Not tracked)

People

(Reporter: garndt, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 8

Comment 9

Updated

Comment 10

Attachment

General

Description

File Name

Content Type