Closed Bug 1302567 Opened 8 years ago Closed 8 years ago

watch_pending should send an alert when we're outbid for spot instances

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: aselagea)

References

Details

Attachments

(1 file)

60 bytes, text/x-github-pull-request
nthomas
: review+
Details | Review
eg an SNS alert that shows up in #buildduty and co. We could make it a CRITICAL problem if there are no availability zones where our bid is large enough, and a WARNING if it's only some proportion. Harder would be to assess if we can meet current pending within the current limits for region.
Component: General Automation → Buildduty
QA Contact: catlee → bugspam.Callek
Assignee: nobody → aselagea
When I filed this, I didn't realize we'd have alerts like
 <relengbot> [sns alert] Nov 03 12:51:30 aws-manager2.srv.releng.scl3.mozilla.com aws_watch_pending.py: 2016-11-03 12:51:30,988 - No spot choices for b-2008

That's coming from 
 https://github.com/mozilla-releng/build-cloud-tools/blob/bfbe747c6aacc3e437de67946d626dbf5ee9207a/cloudtools/scripts/aws_watch_pending.py#L155
and in turn
 https://github.com/mozilla-releng/build-cloud-tools/blob/master/cloudtools/aws/spot.py#L317
so will fire when the actual price is higher than the bid price in our config. Papertrail is checking every minute, so there's no issue with lag. The messages could be a little better, eg include how many we wanted to start, and explicitly say something about bid is less that current prices.

They are a little misleading too. We actually only start instances if the actual price < 0.8 * our bid price, see
 https://github.com/mozilla-releng/build-cloud-tools/blob/master/cloudtools/aws/spot.py#L135
So there's a gap where we wouldn't get alerts, but also wouldn't start anything.
We talked about this yesterday at the buildduty mtg.

(In reply to Nick Thomas [:nthomas] from comment #2)
> The messages could be a little better, eg include how many we wanted to
> start, and explicitly say something about bid is less that current prices.

I think this is an easy change for us to. Adding more info will make it more actionable.

> They are a little misleading too. We actually only start instances if the
> actual price < 0.8 * our bid price, see
>  https://github.com/mozilla-releng/build-cloud-tools/blob/master/cloudtools/
> aws/spot.py#L135
> So there's a gap where we wouldn't get alerts, but also wouldn't start
> anything.

OK, so we should also change the log output based on our fractional cost (currently 0.8) rather than the straight bid price. Again, probably an easy change.

Alin: is this enough to proceed?
Flags: needinfo?(aselagea)
Yup, thanks!
Flags: needinfo?(aselagea)
Attached file bug_1302567
We indeed log debug messages when the market price > bid_price, so I changed that to do it when the market price > bid_price * 0.8.

Noticed that messages like "No spot choices for *" are triggered when the market price is higher than our bid price in all the regions we bid for that instance type, so I think the alert message would be a bit too long to print all the prices. Instead, I was only printing a generic message that we are out-bidden in all regions for that instance type.

e.g. "b-2008 - market price too expensive in all available regions"

I also found that we log the number of online instances and the number of needed instances. In the case when we'd have such issues, the log message would look like this (with some adjustments):

e.g. b-2008 - started 0 c4.2xlarge spot instances; need 96

In papertrail we could setup an alert matching either of the two (although the second one seems to appear more often than the first one).
Attachment #8811205 - Flags: review?(nthomas)
Comment on attachment 8811205 [details] [review]
bug_1302567

Please re-request review when you've addressed comments on the pull request.
Attachment #8811205 - Flags: review?(nthomas)
Comment on attachment 8811205 [details] [review]
bug_1302567

Updated the PR. Sorry for those corrections that resulted in more commits, I can create a clean PR if desired.
Attachment #8811205 - Flags: review?(nthomas)
Comment on attachment 8811205 [details] [review]
bug_1302567

Use the 'Squash and merge' option and they'll magically become one commit (in the history of the main repo).
Attachment #8811205 - Flags: review?(nthomas) → review+
Unfortunately I don't have merge rights, so I'll need to kindly ask you to do the merge.
Created a SNS alert for this: "AWS market price too expensive".
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: