Closed Bug 1431467 Opened 7 years ago Closed 4 years ago

gecko-t-*-gpu workers have multiple EBS volumes configured, don't appear to use both

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Unassigned)

References

Details

As documented in bug 1431291, g2.2xlarge instances are costing Mozilla a lot of money: hundreds of thousands of dollars per year. We're currently spending more money on g2.2xlarge instances than the instances we use for build tasks. It appears that the gecko-t-*-gpu worker types have multiple EBS volumes configured. e.g. "BlockDeviceMappings": [ { "DeviceName": "/dev/sda1", "Ebs": { "VolumeType": "gp2", "VolumeSize": 30, "DeleteOnTermination": true } }, { "DeviceName": "/dev/sdb", "Ebs": { "VolumeType": "gp2", "VolumeSize": 120, "DeleteOnTermination": true } }, { "DeviceName": "/dev/sdc", "Ebs": { "VolumeType": "gp2", "VolumeSize": 120, "DeleteOnTermination": true } } ] } My recollection is that each EBS volume gets a separate drive. However, if you look at task logs running on these instances, I don't think we're using the Y:\ drive at all. I /think/ we can remove one of the volumes and nothing will break. That's assuming the one EBS volume still gets allocated to Z:\. (I'm not sure if that's true.) I don't think the extra EBS volume is costing us that much money in the grand scheme of things. But waste is waste.
There's something about provisioning IOPS too, where you get so many IOPS per GB?
Flags: needinfo?(rthijssen)
gecko-t-win10-64-gpu is only using 1 block device: "BlockDeviceMappings": [ { "DeviceName": "/dev/sda1", "Ebs": { "DeleteOnTermination": true, "VolumeSize": 120, "VolumeType": "gp2" } }, { "DeviceName": "/dev/sdb", "Ebs": { "DeleteOnTermination": true, "VolumeSize": 120, "VolumeType": "gp2" } } ] Tasks on it seem to be using Z:\. So I /think/ we can remove /dev/sdc from gecko-t-win7-32-gpu and call this bug a day? And, yes, the more GB you allocate, the higher the IOPS. Alternatively, you can spring for provisioned IOPS. But you pay more for that. Given how expensive the g2.2xlarge instances are and that I/O is a likely culprit, we might want to consider that...
windows 7 powershell implementations aren't able to partition volumes to get the drive mappings we need. windows 10 ps implementations can. the extra volume on win 7 is a workaround which allows us to get the same drive mappings (c: system, y: cache, z: tasks) without partitioning one of the volumes to host both y: and z: as we do on windows 10.
Flags: needinfo?(rthijssen)
for completeness, the missing implementation from the powershell libraries on windows 7 is New-Partition https://docs.microsoft.com/en-us/powershell/module/storage/new-partition?view=win10-ps&viewFallbackFrom=win7-ps
Maybe we can decrease the volume size to the minimum required on the unused volume so we're not spending money on all that extra storage? EBS storage is ~$0.10/GB/month. We're on pace for 300,000 g2.2xlarge instance hours this month. That's ~$5,000/mo for 120 GB. Now, that's divided between the windows 7 and 10 (and possibly linux) workers. But we're still looking at ~$10,000/year for spawning these unused volumes on the windows 7 workers at existing usage rates.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.