Closed Bug 970552 Opened 11 years ago Closed 11 years ago

Do not use spot instances for some builders

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rail, Assigned: rail)

References

Details

Attachments

(5 files, 1 obsolete file)

We shouldn't use spot instances for some builders (PGO/release?) or/and branches (beta/release?).
Assignee: nobody → rail
WCPGW?
Attachment #8374131 - Flags: review?(catlee)
Attachment #8374131 - Flags: review?(catlee) → review+
Attachment #8374131 - Flags: checked-in+
in production
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Attached patch tests.diffSplinter Review
Attachment #8374363 - Flags: review?(catlee)
Attachment #8374363 - Flags: review?(catlee) → review+
2014-02-12 16:05:26-0800 [-] Error choosing next slave for builder 'release-mozilla-release-linux_repack_9/10', choosing randomly instead 2014-02-12 16:05:26-0800 [-] Unhandled Error Traceback (most recent call last): File "/builds/buildbot/build1/lib/python2.7/site-packages/twisted/python/context.py", line 37, in callWithContext return func(*args,**kw) File "/builds/buildbot/build1/lib/python2.7/site-packages/twisted/enterprise/adbapi.py", line 429, in _runInteraction result = interaction(trans, *args, **kw) File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_f23f5672becd_production_0.8-py2.7.egg/buildbot/process/builder.py", line 517, in _claim_buildreqs sb = self._choose_slave(available_slaves) File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbot-0.8.2_hg_f23f5672becd_production_0.8-py2.7.egg/buildbot/process/builder.py", line 548, in _choose_slave return self.nextSlave(self, available_slaves) --- <exception caught here> --- File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbotcustom/misc.py", line 267, in _nextSlave return func(builder, available_slaves) File "/builds/buildbot/build1/lib/python2.7/site-packages/buildbotcustom/misc.py", line 463, in _nextSlave_skip_spot valid.append(s) exceptions.IndexError: list index out of range Additionally it would be great to avoid running any of release builds on spot instances because there may be no chance to get to the slave to debug some failure.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch [wip] nextSlave.diff (obsolete) — Splinter Review
Attached patch nextSlave.diffSplinter Review
I think I found the issue. sorted(no_spot_slaves, _recentSort(builder))[-1] doesn't work for [], better to return None earlier.
Attachment #8375262 - Attachment is obsolete: true
Attachment #8375265 - Flags: review?(catlee)
Attachment #8375265 - Flags: review?(catlee) → review+
Live in production.
Attached patch non-unified.diffSplinter Review
+ non-unified
Attachment #8375658 - Flags: review?(catlee)
Attachment #8375658 - Flags: review?(catlee) → review+
In production
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Since the kill ratio for spot instances has been almost 0% since we landed the bidding improvements (see below), let's use spot instances everywhere except releases. ^bld-linux64 date, total jobs, jobs on spots, spot retries, o-d retries 2014-03-01, 1725, 1356 (78%), 2 (0%), 2 (0%) 2014-03-02, 1036, 762 (73%), 1 (0%), 0 (0%) 2014-03-03, 2564, 2046 (79%), 68 (3%), 0 (0%) 2014-03-04, 3263, 2636 (80%), 27 (1%), 1 (0%) 2014-03-05, 2987, 2306 (77%), 38 (1%), 2 (0%) 2014-03-06, 3456, 2688 (77%), 29 (1%), 1 (0%) 2014-03-07, 3003, 2425 (80%), 10 (0%), 1 (0%) 2014-03-08, 1303, 951 (72%), 0 (0%), 0 (0%) 2014-03-09, 998, 685 (68%), 0 (0%), 0 (0%) 2014-03-10, 2282, 1966 (86%), 15 (0%), 0 (0%) 2014-03-11, 2730, 2385 (87%), 2 (0%), 0 (0%) 2014-03-12, 2883, 2616 (90%), 9 (0%), 0 (0%) 2014-03-13, 3109, 2728 (87%), 3 (0%), 0 (0%) It may sound blasphemous, but we can even reconsider our logic to avoid running retried jobs on spot instances! :)
Attachment #8391006 - Flags: review?(catlee)
Comment on attachment 8391006 [details] [diff] [review] kill-skip-spot.diff Review of attachment 8391006 [details] [diff] [review]: ----------------------------------------------------------------- Yeah, we could perhaps change it to run on spot if num_retries <= 1 instead of num_retries == 0
Attachment #8391006 - Flags: review?(catlee) → review+
In production
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: