Closed Bug 1140646 Opened 9 years ago Closed 9 years ago

aws-provisioner: Auto adjustment of spot bid

Categories

(Taskcluster :: Services, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: garndt, Assigned: jhford)

Details

Currently there are times where all the instances within a workertype are killed because of spot pricing and then take along time to recover.  For instance, c3.2xlarge can be created for .2/hour for portions of the day, but there are hours where it easy can go to .3 or .4 causing these nodes to get killed.

Perhaps there is a way that we can automatically scale the spot pricing based on historical demand and the provisioner within an allowable limit can try to increase spot pricing if it takes too long to provision instance.
We should experiment with this, but for another reason.

Background:
Regardless of the spot bid we make, we only pay current spot price.
So if the goal to have nodes available for less than X $/hour, we just set the spot bid to X.
If the spot price is Y and Y <= X, we will get spot instances and they we will be charged Y $/hour.
(ie. regardless of the bid we only charged the current spot price).
Notice:
 * X = ondemand price will work if we're willing to pay that (we're probably not)
 * X = ondemand price, doesn't imply that nodes will always be available
 * Spot price Y, can be higher than ondemand price!
 * Never set X > ondemand price, there are examples of people paying thousands of dollars
   for a single spot hour, because they just set an extremely high spot bid.

So the argument for auto-adjusting spot bid is as two fold:
 1) AWS could technically use our spot bid to increase the spot price artificially.
    - however, all data shows that the actual spot price is fairly random :)
 2) Optimize for involuntary spot node terminations. If we keep the spot bid as close to the 
    actual spot price as possible, our spot nodes are more likely to be killed when the spot
    price varies. This is good because when AWS kills a spot node we get the current billing
    hour for free.

(1) and (2) are both reasons to keep the spot bid X as close to the actual spot price Y as possible.
Doing so makes it harder for our system to get spot nodes, or that is to say the delay for nodes
to be launched grows, as we may need to increase spot bid.

Either way the money saved by (2) is probably not all that significant. So if we do implement this,
we should probably just do it for the entertainment value :)
Summary: aws-provisioner - Auto adjustment of spot pricing → aws-provisioner: Auto adjustment of spot bid
Component: TaskCluster → AWS-Provisioner
Product: Testing → Taskcluster
So we basically do this right now.  We have a cap, but we use pricing data to try to optimize around that cap under the assumption that Amazon will set the spot price when we aren't at risk of spotkilling to be e.g. 85% of our bid.  As implemented, switching to this strategy when we moved away from the old provisioner played a part in drastically reducing our EC2 costs.

Soon, we'll be biasing our bids based on quality of regions and amount of spot kills and I think this strategy is the way to go.

I'm going to mark this bug as RESOLVED->FIXED because it's what we do.  If we want to further optimize our bidding, let's open a new bug.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Assignee: nobody → jhford
Component: AWS-Provisioner → Services
You need to log in before you can comment on or make changes to this bug.