aws_watch_pending.py starts up instances one after the other which can take a long time. This can lead to long wait times while instances are spun up. We should start up instances in parallel instead. The logic could resemble: - determine how many instances to start - attempt to start that many in parallel - count how many succeeded and reduce 'needed' count by that many - repeat while necessary
Rail mentioned that 'spot' instances on Amazon take longer to start which I believe could be a factor in the slower spin-up time.
Spot instances need to bootstrap themselves - talk to puppet and reboot. Additionally there some lag between the request and fulfillment of the request. Sometimes the price is too low and requests get not fulfilled.
I mentioned this on IRC (edited version): We could load spot instances in advance by measuring how many Linux builds are running at a given time. If we have less spot instances than what would be needed to handle all those builds when they finish we could start them up in advance.