Closed
Bug 1373754
Opened 8 years ago
Closed 3 years ago
Investigate AWS snapshots in each region
Categories
(Infrastructure & Operations :: RelOps: General, task, P5)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: garndt, Unassigned)
References
Details
Attachments
(1 file)
1023 bytes,
text/x-python-script
|
Details |
We have a large number of snapshots in each of our regions. From what I understand these are snapshots of volumes for AMIs not volumes for running instances, but I could be wrong.
I wrote up a script that (as hacky as it may be) should show all snapshots that are currently not associated with an active ami, which leads me to believe that perhaps an AMI is removed/deregistered but the corresponding snapshot is not removed.
Based on what I think is the right storage cost, $0.049/gb/month this can add up to quite a bit of money if these snapshots are lingering around and really are not used.
We need to investigate that these are indeed left over and not used, and if so, remove them.
The script is attached and here is the output from running it today:
us-west-1
rogue snapshots: 2341 size: 132851 GiB
--------------------------------------------------
us-west-2
rogue snapshots: 2149 size: 124610 GiB
--------------------------------------------------
us-east-1
rogue snapshots: 2363 size: 134546 GiB
--------------------------------------------------
us-east-2
rogue snapshots: 449 size: 35260 GiB
--------------------------------------------------
eu-central-1
rogue snapshots: 2271 size: 134888 GiB
--------------------------------------------------
total rogue snapshots: 9573 total size: 562155 GiB
Comment 1•8 years ago
|
||
Based on the counts, I'm guessing these are associated with building windows AMIs?
Reporter | ||
Comment 2•8 years ago
|
||
I believe so. Pete took a look at them, and also based on the size seem to be for Windows. Linux uses 8gb snapshots for the AMIs, and most of these are much larger than that.
Comment 3•8 years ago
|
||
I think OCC just deletes one snapshot per AMI[1], but I think there might be more than one created per AMI (due to having multiple drives). I haven't dug deeper into it yet, but that could be a possible cause.
--
[1] https://github.com/mozilla-releng/OpenCloudConfig/blob/769bc87944edaefbdc41328ab64dcd656a9a478f/ci/update-workertype.sh#L205
Comment 4•8 years ago
|
||
Another thing to check is e.g. if when AMIs are copied across regions, if the snapshots are copied too (presumably they must be) and then to check if the cleanup script also purges snapshots in those other regions too.
Comment 5•7 years ago
|
||
Found in triage.
These are still peanuts compared to artifact storage and active EBS volumes, but for the sake of hygiene we should clean this up periodically.
I'd like to see a monthly report that contains the manual commands to delete rogue volumes if there are no anolmalies after inspection.
Priority: -- → P5
Comment 6•7 years ago
|
||
Found in triage.
The worker build process is changing, i.e. images may not even still be created this way, making it easy to cleanup at that point.
Updated•6 years ago
|
Component: Operations → Operations and Service Requests
Comment 8•6 years ago
|
||
This is still an issue.
I think it would be worthwhile for Pete, Wander, and Rob to sit down (perhaps virtually) and go through the list of existing EBS snapshots again and see what can be purged. Maybe that even leads to a heuristic that can become a cleanup script?
Comment 9•6 years ago
•
|
||
Here is some investigation we did earlier this year around instance cleanup: https://docs.google.com/spreadsheets/d/1IOOLikW3ms1yEs8HNzr6YetSU77U5jdy8RZQ6fFkvJU/edit#gid=0
Still lots of EBS snapshots to go through.
Updated•6 years ago
|
Assignee: nobody → relops
Component: Operations and Service Requests → RelOps: General
Product: Taskcluster → Infrastructure & Operations
QA Contact: klibby
Comment 10•3 years ago
|
||
we'll focus on migration and revisit this if needed.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•