eg an SNS alert that shows up in #buildduty and co. We could make it a CRITICAL problem if there are no availability zones where our bid is large enough, and a WARNING if it's only some proportion. Harder would be to assess if we can meet current pending within the current limits for region.
Component: General Automation → Buildduty
QA Contact: catlee → bugspam.Callek
2 years ago
Duplicate of this bug: 1304831
When I filed this, I didn't realize we'd have alerts like <relengbot> [sns alert] Nov 03 12:51:30 aws-manager2.srv.releng.scl3.mozilla.com aws_watch_pending.py: 2016-11-03 12:51:30,988 - No spot choices for b-2008 That's coming from https://github.com/mozilla-releng/build-cloud-tools/blob/bfbe747c6aacc3e437de67946d626dbf5ee9207a/cloudtools/scripts/aws_watch_pending.py#L155 and in turn https://github.com/mozilla-releng/build-cloud-tools/blob/master/cloudtools/aws/spot.py#L317 so will fire when the actual price is higher than the bid price in our config. Papertrail is checking every minute, so there's no issue with lag. The messages could be a little better, eg include how many we wanted to start, and explicitly say something about bid is less that current prices. They are a little misleading too. We actually only start instances if the actual price < 0.8 * our bid price, see https://github.com/mozilla-releng/build-cloud-tools/blob/master/cloudtools/aws/spot.py#L135 So there's a gap where we wouldn't get alerts, but also wouldn't start anything.
We talked about this yesterday at the buildduty mtg. (In reply to Nick Thomas [:nthomas] from comment #2) > The messages could be a little better, eg include how many we wanted to > start, and explicitly say something about bid is less that current prices. I think this is an easy change for us to. Adding more info will make it more actionable. > They are a little misleading too. We actually only start instances if the > actual price < 0.8 * our bid price, see > https://github.com/mozilla-releng/build-cloud-tools/blob/master/cloudtools/ > aws/spot.py#L135 > So there's a gap where we wouldn't get alerts, but also wouldn't start > anything. OK, so we should also change the log output based on our fractional cost (currently 0.8) rather than the straight bid price. Again, probably an easy change. Alin: is this enough to proceed?
Created attachment 8811205 [details] [review] bug_1302567 We indeed log debug messages when the market price > bid_price, so I changed that to do it when the market price > bid_price * 0.8. Noticed that messages like "No spot choices for *" are triggered when the market price is higher than our bid price in all the regions we bid for that instance type, so I think the alert message would be a bit too long to print all the prices. Instead, I was only printing a generic message that we are out-bidden in all regions for that instance type. e.g. "b-2008 - market price too expensive in all available regions" I also found that we log the number of online instances and the number of needed instances. In the case when we'd have such issues, the log message would look like this (with some adjustments): e.g. b-2008 - started 0 c4.2xlarge spot instances; need 96 In papertrail we could setup an alert matching either of the two (although the second one seems to appear more often than the first one).
Comment on attachment 8811205 [details] [review] bug_1302567 Please re-request review when you've addressed comments on the pull request.
Comment on attachment 8811205 [details] [review] bug_1302567 Updated the PR. Sorry for those corrections that resulted in more commits, I can create a clean PR if desired.
Comment on attachment 8811205 [details] [review] bug_1302567 Use the 'Squash and merge' option and they'll magically become one commit (in the history of the main repo).
Attachment #8811205 - Flags: review?(nthomas) → review+
Unfortunately I don't have merge rights, so I'll need to kindly ask you to do the merge.
Created a SNS alert for this: "AWS market price too expensive".
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.