watch_pending should send an alert when we're outbid for spot instances

RESOLVED FIXED

Status

Infrastructure & Operations
CIDuty
RESOLVED FIXED
2 years ago
2 months ago

People

(Reporter: nthomas, Assigned: aselagea)

Tracking

Details

Attachments

(1 attachment)

60 bytes, text/x-github-pull-request
nthomas
: review+
Details | Review | Splinter Review
(Reporter)

Description

2 years ago
eg an SNS alert that shows up in #buildduty and co. We could make it a CRITICAL problem if there are no availability zones where our bid is large enough, and a WARNING if it's only some proportion. Harder would be to assess if we can meet current pending within the current limits for region.

Updated

2 years ago
Component: General Automation → Buildduty
QA Contact: catlee → bugspam.Callek
Duplicate of this bug: 1304831
(Assignee)

Updated

2 years ago
Assignee: nobody → aselagea
(Reporter)

Comment 2

2 years ago
When I filed this, I didn't realize we'd have alerts like
 <relengbot> [sns alert] Nov 03 12:51:30 aws-manager2.srv.releng.scl3.mozilla.com aws_watch_pending.py: 2016-11-03 12:51:30,988 - No spot choices for b-2008

That's coming from 
 https://github.com/mozilla-releng/build-cloud-tools/blob/bfbe747c6aacc3e437de67946d626dbf5ee9207a/cloudtools/scripts/aws_watch_pending.py#L155
and in turn
 https://github.com/mozilla-releng/build-cloud-tools/blob/master/cloudtools/aws/spot.py#L317
so will fire when the actual price is higher than the bid price in our config. Papertrail is checking every minute, so there's no issue with lag. The messages could be a little better, eg include how many we wanted to start, and explicitly say something about bid is less that current prices.

They are a little misleading too. We actually only start instances if the actual price < 0.8 * our bid price, see
 https://github.com/mozilla-releng/build-cloud-tools/blob/master/cloudtools/aws/spot.py#L135
So there's a gap where we wouldn't get alerts, but also wouldn't start anything.

Comment 3

2 years ago
We talked about this yesterday at the buildduty mtg.

(In reply to Nick Thomas [:nthomas] from comment #2)
> The messages could be a little better, eg include how many we wanted to
> start, and explicitly say something about bid is less that current prices.

I think this is an easy change for us to. Adding more info will make it more actionable.

> They are a little misleading too. We actually only start instances if the
> actual price < 0.8 * our bid price, see
>  https://github.com/mozilla-releng/build-cloud-tools/blob/master/cloudtools/
> aws/spot.py#L135
> So there's a gap where we wouldn't get alerts, but also wouldn't start
> anything.

OK, so we should also change the log output based on our fractional cost (currently 0.8) rather than the straight bid price. Again, probably an easy change.

Alin: is this enough to proceed?
Flags: needinfo?(aselagea)
(Assignee)

Comment 4

2 years ago
Yup, thanks!
Flags: needinfo?(aselagea)
(Assignee)

Comment 5

2 years ago
Created attachment 8811205 [details] [review]
bug_1302567

We indeed log debug messages when the market price > bid_price, so I changed that to do it when the market price > bid_price * 0.8.

Noticed that messages like "No spot choices for *" are triggered when the market price is higher than our bid price in all the regions we bid for that instance type, so I think the alert message would be a bit too long to print all the prices. Instead, I was only printing a generic message that we are out-bidden in all regions for that instance type.

e.g. "b-2008 - market price too expensive in all available regions"

I also found that we log the number of online instances and the number of needed instances. In the case when we'd have such issues, the log message would look like this (with some adjustments):

e.g. b-2008 - started 0 c4.2xlarge spot instances; need 96

In papertrail we could setup an alert matching either of the two (although the second one seems to appear more often than the first one).
(Assignee)

Updated

2 years ago
Attachment #8811205 - Flags: review?(nthomas)
(Reporter)

Comment 6

2 years ago
Comment on attachment 8811205 [details] [review]
bug_1302567

Please re-request review when you've addressed comments on the pull request.
Attachment #8811205 - Flags: review?(nthomas)
(Assignee)

Comment 7

2 years ago
Comment on attachment 8811205 [details] [review]
bug_1302567

Updated the PR. Sorry for those corrections that resulted in more commits, I can create a clean PR if desired.
Attachment #8811205 - Flags: review?(nthomas)
(Reporter)

Comment 8

2 years ago
Comment on attachment 8811205 [details] [review]
bug_1302567

Use the 'Squash and merge' option and they'll magically become one commit (in the history of the main repo).
Attachment #8811205 - Flags: review?(nthomas) → review+
(Assignee)

Comment 9

2 years ago
Unfortunately I don't have merge rights, so I'll need to kindly ask you to do the merge.
(Assignee)

Comment 11

2 years ago
Created a SNS alert for this: "AWS market price too expensive".
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED

Updated

2 months ago
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.