Closed Bug 1303974 Opened 9 years ago Closed 9 years ago

gpu instance limits on aws account

Categories

(Taskcluster :: Services, defect)

Unspecified
Windows
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: grenade, Assigned: garndt)

References

Details

since adding the gecko-g-* test worker types which use an ec2 instance size of g2.2xlarge, i've noticed that no more than 10 spin up at any given time. i see in the aws console that we have limits on the number of these instances, which today were 5 instances per region. we'll likely need that quota lifted on the tc aws account to support windows tests that rely on gpu processing. limits for us-west-2 are visible here: https://us-west-2.console.aws.amazon.com/ec2/v2/home?region=us-west-2#Limits:
Blocks: 1280474
No longer blocks: 1244750
I'll leave this one to The Decider. If I recall, these were at $9/hr spot last week, so a limit of 1000 might be a bit extreme.
Assignee: nobody → garndt
What are the number of instances we would need? What are the limits on these instances within releng?
the limits under the releng account look to be set to the same defaults (5 per region) but there must be some other factor at play because i see 52 g2.2xlarge instances up and running in us-west-2 right now. Q or catlee: can you comment on what the instance limits on the releng account are set to (and how they are set if the limits page in the ec2 consoole is not the definitive source of truth)?
Flags: needinfo?(q)
Flags: needinfo?(catlee)
I don't recall having to request a higher limit. Perhaps spot instances aren't subject to those limits?
Flags: needinfo?(catlee)
I believe Spots are not subject to these instance type limits. If we decided to use on demand we would need to request a limit increase. There are however spot/spot fleet request and instance limits as a whole that can be increased.
Flags: needinfo?(q)
Currently our AWS provisioner does not handle on-demand instance provisioning as far as I'm aware (John Ford would know for certain when he's around). We should be able to adjust the spot instances the provisioner provisions though without touching our on-demand limit. I guess what I'm looking to understand first is the number of spot instances we would like to spawn (maxCapacity). We can still spin up more instances than what is currently configured, but I'm just being mindful when we make bumps like this especially since the cost per hour is a lot higher than our other instance types. I do not want to delay/slow the migration to windows in TaskCluster.
Reviewing the currently configuration for one fo the gpu worker types, I noticed that min price is set to .7. Why do we have a > 0 minprice? I was looking at the pricing history and there can often be regions/zones that have cheaper instance prices than .7.
Flags: needinfo?(rthijssen)
I'm also confirming with John about what minPrice actually does.
Ok, I have followed up with John. It's possible that the min price is being set to ensure that these types of nodes are not spot killed as easily. Setting the min price to this isn't required from a provisioning standpoint, but if we are noticing they are getting killed off, then we might want to set it to something non-zero.
i don't know what minPrice does but i suspect not very much. all of the gecko-* windows worker types have minPrice set to 0.7 (always have), but i see that the we often get spot prices of 0.07 for our win7 and win10 c3.2xlarge instances, which leads me to believe it doesn't do what i think it says on the tin.
Flags: needinfo?(rthijssen)
it seems there's nothing to be done. we aren't limited on spot instances.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
Component: AWS-Provisioner → Services
You need to log in before you can comment on or make changes to this bug.