Closed
Bug 1303150
Opened 8 years ago
Closed 6 years ago
Make TC AWS costs less eye-watering
Categories
(Taskcluster :: General, defect)
Taskcluster
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Unassigned)
References
Details
Attachments
(3 files)
## Total Cost Looking just at the TC accounts, the services costing us >1% of our spend are EC2 - Compute S3 - Storage CloudFront - Transfer EBS - Storage EC2 - Transfer S3 - Transfer so anything beyond that is unlikely to move the needle. The first two items are the big ones, and if we want to make a big dent in this cost, those are the places to start. ## EC2 - Compute ($96k in August) About $6k of that is ondemand instances. I see only four running: two for docker cloud, one for elastic container service (I emailed about that a few months ago..), and one that belongs to pete. My devel host is in there too. There are a bunch of stopped windows systems as well -- looks like just the base machines in use1 and usw1, but a bunch more in usw2. So it's hard to see why that's $6k. Breaking it down by instance type shows a patchwork, but mostly c4.*, which seems to be what docker cloud is using. So I'm guessing that's cloud-mirror. The remainder is spot. I've attached a mapping from instance type to workerType. The top offenders are m3.xlarge - $28k - desktop-test-xlarge / gecko-decison c3.xlarge - $15k - desktop-test-xlarge / gecko-decision c3.2xlarge - $20k - windows build / android m1.medium - $10k - desktop-test c4.4xlarge - $7k - linux build m3.2xlarge - $6k - flame-kk / android I've included a graph of those instance types by week. ## S3 - Storage ($47k) I've attached a shot of our S3 usage by week. The elephant in the room is "taskcluster-public-artifacts" at about $45k/mo. "taskcluster-private-artifacts" and "taskcluster-artifacts" are distant second and third at around $1k/mo and the rest is noise. The worrying bit is the obvious trend in S3 costs -- $2k/wk to $12k/wk in the last year. Could we get rid of the "taskcluster-artifacts" bucket? Our S3 transfer rates look to be almost entirely regional, which I assume is the "free" category :) ## CloudFront - Transfer ($9k) Per Travis, these are not our responsibility.
Reporter | ||
Comment 1•8 years ago
|
||
Reporter | ||
Comment 2•8 years ago
|
||
Reporter | ||
Comment 3•8 years ago
|
||
Considered remediations, with monthly savings S3 (max of $47k) - delete unused taskcluster-artifacts bucket (bug 1303147) ($1k) - delete old try artifacts (bug 1303153) (??) - configure more branches (e.g., integration) with shorter retention (??) EC2 (max of $96k) - audit hidden jobs: maybe kill whole swathes of permaorange? (small - ?? $4k) - run tier-2 jobs on fewer branches (maybe just central and not integration?) (temporary savings only) - look at the cost/performance tradeoff of the various desktop-test-* instances (?? $20k)
Comment 4•8 years ago
|
||
Some other things that could (and should eventually) be cleaned are listed below. They do not amount to much though. Unused AMIs - > 1300 snapshots costing around $300/month Unused EBS volumes - 61 created before August 30th that are "available" but not attached ($300/month)
Comment 5•8 years ago
|
||
Another idea for S3 - change to use the infrequent access policy.
Comment 6•8 years ago
|
||
travis is looking into moving our cloudfront cost ($8k/month) to another cost center because it should have been billed differently.
Comment 7•8 years ago
|
||
This will not solve an immediate problem, but I opened up bug 1303214 to lower the default artifact expiration for job running on Try in Buildbot.
Comment 8•8 years ago
|
||
Also, once we determine the total size used by try jobs (from buildbot and taskcluster), I will send an email to dev-platform suggesting that we remove try artifacts that are older than 14 days.
Reporter | ||
Comment 9•8 years ago
|
||
In bug 1303153 I estimated that we can save about $25k in S3 storage without much effort, with diminishing returns after that. So the total savings available so far is about $35k. We already have a stated retention policy of 14 days for try jobs, so I don't think we need to re-request that permission. However, we do need to ask about integration branches. I think the place to look for further cost savings is in EC2, and in particular at the utilization of the test instances. If we can get another $20k savings there, then that just leaves $30k for releng to shave off and we are at the $85k combined goal.
Reporter | ||
Comment 10•8 years ago
|
||
I put a bunch of useful data on all tasks, durations, branches, workerType, tiers, create date, and S3 storage at https://s3.amazonaws.com/taskcluster-bug1303153/tasks.csv please have a look and analyze the heck out of it.
Updated•6 years ago
|
Assignee: garndt → nobody
Reporter | ||
Comment 12•6 years ago
|
||
This has tracked a lot of work, and we're no longer terribly concerned with cutting costs.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•