Closed Bug 1578900 Opened 5 years ago Closed 5 years ago

[aws provider] InsufficientCapacity / InsufficientFreeAddressesInSubnet errors when spinning up a lot of instances

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: owlish, Assigned: owlish)

References

Details

This bug was discovered in load testing the aws provider.

What "a lot" means exactly, depends on the instance type - for xlarge instances it's hundreds, for nano instances - thousands.

My notes from the parent ticket copied here:

I think there are several ways of dealing with this:

  • To actually implement exponential retries for that particular error (which seems to complicate the code terribly)
  • To experiment with idempotent requests
  • To experiment with requestSpotFleet/requestSpotInstances endpoints
  • To experiment with spot options of runInstances
  • To see if this problem can be solved by talking to aws support

We need to make sure the load is being spread over subnets/regions. Along with this, we need to make sure the requests are idempotent or something like that, otherwise it would be even worse than it is now.

Status: NEW → ASSIGNED
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.