Closed Bug 1198317 Opened 9 years ago Closed 9 years ago

reduce the number of available b-2008-ix instances in TRY in order to force y-2008-spot instantiation

Categories

(Infrastructure & Operations :: RelOps: Puppet, task)

x86_64
Windows Server 2008
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: grenade, Assigned: grenade)

References

Details

(Whiteboard: [windows][aws])

Attachments

(1 file)

disabled instances:
b-2008-ix-0036
b-2008-ix-0039
b-2008-ix-0019
b-2008-ix-0038
b-2008-ix-0025
b-2008-ix-0054
b-2008-ix-0022
b-2008-ix-0058
b-2008-ix-0026
b-2008-ix-0059
disabled instances extended to include:

b-2008-ix-0030
b-2008-ix-0174
b-2008-ix-0057
b-2008-ix-0023
b-2008-ix-0047
b-2008-ix-0046
b-2008-ix-0041
b-2008-ix-0044
b-2008-ix-0061
b-2008-ix-0055
b-2008-ix-0043
b-2008-ix-0049
b-2008-ix-0035
b-2008-ix-0029
b-2008-ix-0031
b-2008-ix-0062
b-2008-ix-0045
all machines returned to pool.
will disable more tomorrow.
progress:
- reduced ix capacity to a single instance (b-2008-ix-0043)
- pushed win32, win64 m-c build to try (https://treeherder.mozilla.org/#/jobs?repo=try&revision=272cab1322fc)
- observed messages in watch pending log indicating our max bid price (0.4) would not be successful
- updated max bid price for y-2008 to 0.5 (https://github.com/mozilla/build-cloud-tools/pull/109)
- observed successful spot requests in ec2 console (3 for use1, 3 for usw2, as expected/configured in slavealloc)
- observed spot instances starting, successfully running userdata, naming themselves and mailing logs
- now awaiting build output at https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/rthijssen@mozilla.com-272cab1322fc
Attached image spot-cltbld.png
the us-east-1 instances appear to have hung mid build. rdp'ing to the instances (001 - 003) as cltbld shows this running but apparently going nowhere cmd prompt.
Attachment #8653370 - Flags: feedback?(mcornmesser)
the us-west-2 instances have all terminated. I cannot find any evidence that they did any work before terminating (slave_health/treeherder). The PaperTrail logs end like this:
Aug 27 02:09:48 y-2008-spot-101.try.releng.usw2.mozilla.com USER32:  The process c:\windows\SysWOW64\shutdown.exe (Y-2008-SPOT-101) has initiated the shutdown of computer Y-2008-SPOT-101 on behalf of user Y-2008-SPOT-101\cltbld for the following reason: No title for this reason could be found   Reason Code: 0x800000ff   Shutdown Type: shutdown   Comment: #015
I think we've demonstrated that the spinning up and terminating processes work. We obviously have work to do to get mozilla-build's undies untwisted, but that's another bug...
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
There are alerts in #buildduty that indicate there's a buildbot misconfiguration/missing configuration:

[sns alert] Thu 06:08:03 PDT buildbot-master78.bb.releng.usw2.mozilla.com watch_twistd_log.py: Count: 675 | First instance: 2015-08-27 05:28:27-0700 | Most recent instance: 2015-08-27 06:00:02-0700 | Twistd exception: twisted.cred.error.UnauthorizedLogin - unknown 10.132.67.67
[sns alert] Thu 06:08:03 PDT buildbot-master78.bb.releng.usw2.mozilla.com watch_twistd_log.py: Count: 681 | First instance: 2015-08-27 05:28:27-0700 | Most recent instance: 2015-08-27 06:00:01-0700 | Twistd exception: twisted.cred.error.UnauthorizedLogin - unknown 10.132.67.101

I've verified that those are windows spot instances

10.132.67.67 (y-2008-spot-103) and 10.132.67.101 (y-2008-spot-102)
All of the alerts were for use1 IPs, I didn't see any for usw2.
Attachment #8653370 - Flags: feedback?(mcornmesser)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: