When requesting spot instances, give up on an az the first time there is no slave name

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: nthomas, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

When we're running close to capacity, aws_watch_pending.log often has a lot of this:

2014-08-12 15:46:07,107 - Need 272 of tst-linux64 in us-east-1d
2014-08-12 15:46:07,107 - Using m1.medium (us-east-1, us-east-1d) 0.0081 (value: 0.0081) < 0.07
2014-08-12 15:46:07,108 - No slave name available for us-east-1, tst-linux64, None
2014-08-12 15:46:07,108 - No slave name available for us-east-1, tst-linux64, None
2014-08-12 15:46:07,108 - No slave name available for us-east-1, tst-linux64, None

and repeat the last line many more times, one per needed instance. And this:

2014-08-12 15:45:59,896 - No free IP available in us-east-1c for subnets ['subnet-ae35ccc4', 'subnet-8f32cbe5', 'subnet-ff3542d7', 'subnet-b8643190', 'subnet-fb97bc8f', 'subnet-844b7ec2', 'subnet-ed35cc87', 'subnet-5cd0d828', 'subnet-7ca5f03a']
2014-08-12 15:45:59,896 - No free IP available for tst-linux64 in us-east-1c

We could short circuit that by checking for getting a value of False for r in do_request_spot_instances().
Summary: When request spot instances, give up on az first time there is no slave name → When requesting spot instances, give up on an az the first time there is no slave name
Created attachment 8471950 [details] [diff] [review]
[cloud-tools] Return early

Might be as simple as this ? If not I'll leave it to you!
Attachment #8471950 - Flags: feedback?(rail)
Comment on attachment 8471950 [details] [diff] [review]
[cloud-tools] Return early

LGTM!
Attachment #8471950 - Flags: review+
Attachment #8471950 - Flags: feedback?(rail)
Attachment #8471950 - Flags: feedback+
Comment on attachment 8471950 [details] [diff] [review]
[cloud-tools] Return early

https://hg.mozilla.org/build/cloud-tools/rev/504bd721f9b4
Attachment #8471950 - Flags: checked-in+
Working. Saw an instance of needing 38 bld-linux64, starting 34, then failing to get the rest without lots of log spew. We still try all the other az and instance types when we can't get a name but I'm not going to attempt fixing that here.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.