Closed Bug 486672 Opened 11 years ago Closed 11 years ago

temporarily move some moz2-* windows slaves to try to help get rid of patch backlog

Categories

(Release Engineering :: General, defect, P1, blocker)

x86
Windows Server 2003
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

Details

No description provided.
Severity: normal → blocker
Priority: -- → P1
Please reassign some production pool-of-slaves over to try-server to clear out backlog there. This is only a temp measure, until the new VMs being cloned in bug#485883, bug#485885 come online.

As the production pool-of-slaves machines are identical to the try-server, this will be a quick move, and should help with the backlog clearout immediately.

Note: this also means we will be running the main production pool-of-slaves below capacity, but the hope here is that we can move these slaves back again before the main pool gets too far behind. Hopefully late today.
OS: Mac OS X → All
Hardware: x86 → All
Note: the try backlog is only on win32, so we are only moving win32 slaves. There is only 3 try backlog jobs for mac, which will clear fast anyway, and zero try backlog for linux, so those are both fine.
note: we're moving 5 win32 slaves over, so with the new ones being cloned right now, this should give us 9 new slaves on try win32 for a while today. Watch this bug for try-server backlog updatesm, but we expect that backlog to be 100% gone sometime today.
(In reply to comment #3)
> we expect that backlog to be 100%
> gone sometime today.

Woot!  Thanks, that's great news.

(How can we monitor the backlog?)
Here's what we did to move them to the try master:
* Disconnect from production-master
* Remove Buildbot autostart job
* Clobber a couple trees to make sure we don't run out of space
* mv ~/.ssh ~/.ssh.prod
* scp -r cltbld@try-linux-slave01:.ssh .
* ssh trybld@build.mozilla.org to make sure uploads will work
* mkdir /e/builds/tryslave
* buildbot create-slave /e/builds/tryslave sm-try-master:9010 moz2-win32-slaveN $password
* Switch network to Sandbox, reboot
* start the slave

When we're ready to move them back we'll have to do the following:
* Delete all of /e/builds/tryslave
* Delete ~/.ssh
* mv ~/.ssh.prod ~/.ssh
* Enable the buildbot autostart job
* Switch the network, reboot

We ended grabbing 6 slaves from production:
moz2-win32-slave01
02
06
11
13
16

slave11 is already connected, and the others will do so in short order.
OS: All → Mac OS X
Hardware: All → x86
I should note that we do _not_ have any additional try talos slaves - which will probably cause it to fall behind the builders.
resetting o.s. - there's no macs being moved here, aiui.
OS: Mac OS X → Windows Server 2003
We're currently at 7 pending builds, and 8 pending unittest runs on win32. This is down from (IIRC) 12 pending builds and 13 unittest runs this morning.
OS: Windows Server 2003 → Mac OS X
Ugh, Bugzilla is changing random fields again. (Switching OS back to win2k3)
OS: Mac OS X → Windows Server 2003
We've got 3 of the 4 new slaves in the pool and we're down to 5 builds and 3 unittest runs pending.
We've gotten down to 0 pending builds for win32 on try server. I'm going to start moving the moz2* slaves back as they become idle.
slave01 and 16 have been returned to the main pool.
slave11 and 13 are back in the main pool, too.
The last two slaves are back in the main pool. Many thanks to Catlee for helping move them back and forth.

Still no pending win32 builds on the try server, I think we can call this  FIXED.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
The Try Server was able to keep up with the incoming requests over the weekend. We currently have 1 linux unittest run pending, 1 mac build + 1 mac unittest run, and nothing pending on win32.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.