Closed
Bug 758275
Opened 12 years ago
Closed 12 years ago
reimage w32 builders as w64 builders
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: arich, Assigned: arich)
References
Details
According to coop, releng is close to getting all windows builds functional on w64 servers. This means that we'll be reimaging most w32 machines as w64 machines. There are two groups of servers that will be affected, w32 servers in scl1 and w32 servers in mtv1. w32-ix-slave01 w32-ix-slave02 w32-ix-slave03 w32-ix-slave04 w32-ix-slave05 w32-ix-slave07 w32-ix-slave08 w32-ix-slave09 w32-ix-slave10 w32-ix-slave11 w32-ix-slave13 w32-ix-slave14 w32-ix-slave15 w32-ix-slave16 w32-ix-slave17 w32-ix-slave18 w32-ix-slave19 w32-ix-slave20 w32-ix-slave21 w32-ix-slave22 w32-ix-slave23 w32-ix-slave24 w32-ix-slave25 w32-ix-slave26 w32-ix-slave27 w32-ix-slave28 w32-ix-slave29 w32-ix-slave30 w32-ix-slave31 w32-ix-slave32 w32-ix-slave33 w32-ix-slave34 w32-ix-slave35 w32-ix-slave36 w32-ix-slave37 w32-ix-slave38 w32-ix-slave39 w32-ix-slave40 w32-ix-slave41 w32-ix-slave42 w32-ix-slave43 w32-ix-slave44 mw32-ix-slave11 mw32-ix-slave12 mw32-ix-slave13 mw32-ix-slave14 mw32-ix-slave15 mw32-ix-slave16 mw32-ix-slave17 mw32-ix-slave18 mw32-ix-slave19 mw32-ix-slave20 mw32-ix-slave21 mw32-ix-slave22 mw32-ix-slave23 mw32-ix-slave24 mw32-ix-slave25 mw32-ix-slave26 In order to reimage them, the following needs to happen: Prep work, can be done now: * Create dns/dhcp entries for the machines in the winbuild domain. Take the MAC info from inventory for these to create new entries for w64-ix-slave43 - w64-ix-slave84 for the w32-ix machines and w64-ix-slave85 - w64-ix-slave-101 for the mw32 machines. Create entries for both primary and management interfaces for every machine. (matt) * verify that rack space with power and network are available in scl1 (bug 758245) * configure VLAN for primary (and, if necessary) management interfaces for machines moving from mtv1 (dustin) Reimaging work for machines in scl1 (42 machines), must wait till we're ready to pull the trigger: * change VLAN for primary (and, if necessary) management interfaces (dustin) * verify that primary and mgmt interfaces work (matt/dustin) * modify nagios (arr) * modify inventory (matt) * modify corp dns (remove old A and PTR records, update bmo CNAME) (arr) * push new image (matt) The machines in mtv1 will need to be moved to scl1 and require a hardware upgrade before being put into service there, so this will take longer and should be done in a separate batch. Reimaging work for machines in mtv1 (16 machines), must wait till we're ready to pull the trigger: * unrack machines (matt/jake) * perform hardware upgrades (matt/jake) * rack machines in scl1 (matt/jake) * verify that primary and mgmt interfaces work (matt/jake/dustin) * modify nagios (arr) * modify inventory (matt) * modify corp dns (remove old A and PTR records, update bmo CNAME) (arr) * push new image (matt)
Comment 1•12 years ago
|
||
Armen and I are frantically trying to fix the w64 image. We have narrowed it down to the buildbot update python script (was affected with the SCL3 move I think) and we should have that done by tomorrow if things go smoothly. I will start the dns/dhcp entries tomorrow/over the weekend. The actual imaging will only take me a day once we have everything else hammered out.
Updated•12 years ago
|
Blocks: PGOSilverBullet
Comment 2•12 years ago
|
||
(In reply to Amy Rich [:arich] [:arr] from comment #0) > Reimaging work for machines in scl1 (42 machines), must wait till we're > ready to pull the trigger: > * change VLAN for primary (and, if necessary) management interfaces (dustin) > * verify that primary and mgmt interfaces work (matt/dustin) > * modify nagios (arr) > * modify inventory (matt) > * modify corp dns (remove old A and PTR records, update bmo CNAME) (arr) > * push new image (matt) We're ready to start converting the w32 ix machines in scl1 to w64 on our end. How soon can this be done? What items on the IT punch list above are still outstanding? I can coordinate with whoever will be doing the re-imaging (MaRu, I assume) to take batches of w32 slaves offline as required for the rest of the week. > The machines in mtv1 will need to be moved to scl1 and require a hardware > upgrade before being put into service there, so this will take longer and > should be done in a separate batch. We have another merge coming next Monday, after which Aurora will also be building on w64 and we should be clear to move 16 of the 26 machines from mtv -> scl. What portion of these prelim steps could be performed while the machines are still in mtv? Would we upgrade the hardware on the machines staying (for now) in mtv, or wait until they are also ready to move (esr17)?
Assignee | ||
Comment 3•12 years ago
|
||
The only thing we can do ahead of time is pre-populate DNS and DHCP on the windows domain controller. Everything else has to wait till we take the machine out of commission as a w32 builder because it will be disruptive to DNS, DHCP, the hardware, or the network for that machine and builds will fail. The DNS/DHCP pre-population will be done today. Maru will be the one doing the reimaging of the w32 machines, and he'll need to coordinate with dustin to get the switch ports cut over. We should come up with batches of machines and track this via etherpad. As far as upgrading the hardware that's staying in mtv1, it's time consuming (and there's limited space) to unrack, upgrade, and rerack them, so we'd hoped to do that when the physical machine moves happen. Is there a reason you're looking to do that now?
Comment 4•12 years ago
|
||
(In reply to Amy Rich [:arich] [:arr] from comment #3) > As far as upgrading the hardware that's staying in mtv1, it's time consuming > (and there's limited space) to unrack, upgrade, and rerack them, so we'd > hoped to do that when the physical machine moves happen. Is there a reason > you're looking to do that now? No, just looking for any potential time savings.
Assignee | ||
Comment 5•12 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #4) > (In reply to Amy Rich [:arich] [:arr] from comment #3) > > As far as upgrading the hardware that's staying in mtv1, it's time consuming > > (and there's limited space) to unrack, upgrade, and rerack them, so we'd > > hoped to do that when the physical machine moves happen. Is there a reason > > you're looking to do that now? > > No, just looking for any potential time savings. This would be the opposite of time saving. :}
Comment 6•12 years ago
|
||
I've only looked at slave43 and slave45 so far, but I've seen some problems: * both were logged in as administrator * both had the old VNC password * both had the old cltbld password * neither have a e:\builds\moz2_slave dir
Comment 7•12 years ago
|
||
All the w32-ix-slaves in scl1 have been re-imaged now. Leaving open to track the mw32-ix machines in mtv.
Comment 8•12 years ago
|
||
Do we have an ETA on when we are moving and upgrading these machines?
Assignee | ||
Comment 9•12 years ago
|
||
Moving any of the mw32 machines is on hold until we get the power problem in 3/mdf fixed.
Depends on: 761237
Assignee | ||
Comment 11•12 years ago
|
||
Coop, are we moving forward with the planned relocation/reimage of the 16 machines in mtv1 now that the power issue has been fixed?
Comment 12•12 years ago
|
||
(In reply to Amy Rich [:arich] [:arr] from comment #11) > Coop, are we moving forward with the planned relocation/reimage of the 16 > machines in mtv1 now that the power issue has been fixed? Now that merge day has passed, let's start the process of moving the 16 iX machines and getting them upgraded/reimaged.
Comment 13•12 years ago
|
||
For the record, yesterday we also moved TB jobs to win64 slaves. I think up to mozilla-beta (only mozilla-release and esr are left).
Assignee | ||
Comment 14•12 years ago
|
||
Matt needs to pre-load the domain controllers (old and new) with the DHCP info for the remaining 16 machines (mw32-ix-slave11 - mw32-ix-slave26).
Assignee | ||
Comment 15•12 years ago
|
||
DNS entries have been made for w64-ix-slave85 w64-ix-slave85-mgmt w64-ix-slave86 w64-ix-slave86-mgmt w64-ix-slave87 w64-ix-slave87-mgmt w64-ix-slave88 w64-ix-slave88-mgmt w64-ix-slave89 w64-ix-slave89-mgmt w64-ix-slave90 w64-ix-slave90-mgmt w64-ix-slave91 w64-ix-slave91-mgmt w64-ix-slave92 w64-ix-slave92-mgmt w64-ix-slave93 w64-ix-slave93-mgmt w64-ix-slave94 w64-ix-slave94-mgmt w64-ix-slave95 w64-ix-slave95-mgmt w64-ix-slave96 w64-ix-slave96-mgmt w64-ix-slave97 w64-ix-slave97-mgmt w64-ix-slave98 w64-ix-slave98-mgmt w64-ix-slave99 w64-ix-slave99-mgmt w64-ix-slave100 w64-ix-slave100-mgmt
Comment 16•12 years ago
|
||
DHCP entries have been made as well.
Comment 17•12 years ago
|
||
And now I've added the DHCP and DNS entries for winbuild
Assignee | ||
Comment 18•12 years ago
|
||
The mapping of the old to new hostnames for this batch: w64-ix-slave85 mw32-ix-slave11 w64-ix-slave86 mw32-ix-slave12 w64-ix-slave87 mw32-ix-slave13 w64-ix-slave88 mw32-ix-slave14 w64-ix-slave89 mw32-ix-slave15 w64-ix-slave90 mw32-ix-slave16 w64-ix-slave91 mw32-ix-slave17 w64-ix-slave92 mw32-ix-slave18 w64-ix-slave93 mw32-ix-slave19 w64-ix-slave94 mw32-ix-slave20 w64-ix-slave95 mw32-ix-slave21 w64-ix-slave96 mw32-ix-slave22 w64-ix-slave97 mw32-ix-slave23 w64-ix-slave98 mw32-ix-slave24 w64-ix-slave99 mw32-ix-slave25 w64-ix-slave100 mw32-ix-slave26 w64-ix-slave85-mgmt mw32-ix-slave11-mgmt w64-ix-slave86-mgmt mw32-ix-slave12-mgmt w64-ix-slave87-mgmt mw32-ix-slave13-mgmt w64-ix-slave88-mgmt mw32-ix-slave14-mgmt w64-ix-slave89-mgmt mw32-ix-slave15-mgmt w64-ix-slave90-mgmt mw32-ix-slave16-mgmt w64-ix-slave91-mgmt mw32-ix-slave17-mgmt w64-ix-slave92-mgmt mw32-ix-slave18-mgmt w64-ix-slave93-mgmt mw32-ix-slave19-mgmt w64-ix-slave94-mgmt mw32-ix-slave20-mgmt w64-ix-slave95-mgmt mw32-ix-slave21-mgmt w64-ix-slave96-mgmt mw32-ix-slave22-mgmt w64-ix-slave97-mgmt mw32-ix-slave23-mgmt w64-ix-slave98-mgmt mw32-ix-slave24-mgmt w64-ix-slave99-mgmt mw32-ix-slave25-mgmt w64-ix-slave100-mgmt mw32-ix-slave26-mgmt
Comment 19•12 years ago
|
||
Requesting input on which of the machines in comment #18 to leave "as is". We need to retain 2 from the group of 11-15, 20, 26. Since this work requires unracking the machines, no sense making it harder or riskier than need be. Optional - if we do work in batches, the "other 2" can be physically upgraded, they just would not be reimaged. Our business need is to keep 10 w32 core builders online. Please propose a list of 14 boxes to move, and we can coordinate on taking those out of service, so this work can move forward. Thanks.
Comment 20•12 years ago
|
||
Can we get: a) an update on when the selection will be made b) an ET on time to do the hardware upgrade and reimage Thanks!
Assignee | ||
Comment 21•12 years ago
|
||
Hal, I've re-asked your question in the appropriate (hardware) bug 774829.
Comment 22•12 years ago
|
||
Hal in response to your items a) a list has been give to DCOps to verify b) I will get a time frame from DCOps
Comment 23•12 years ago
|
||
774829 has the list of boxes, DCOps is ready to get started on this. I'll coordinate with Hal
Assignee | ||
Updated•12 years ago
|
Assignee: mlarrain → arich
Assignee | ||
Comment 24•12 years ago
|
||
All of the slated mw32 machines have been imaged as w64.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 25•12 years ago
|
||
Woot! And now they're in production! Thanks!
Comment 26•12 years ago
|
||
Can someone review this sql statement to delete these slaves from slavealloc? delete * from slaves where notes like '%bug 774829%'; mysql> select name from slaves where notes like '%bug 774829%'; +-----------------+ | name | +-----------------+ | mw32-ix-slave13 | | mw32-ix-slave14 | | mw32-ix-slave15 | | mw32-ix-slave16 | | mw32-ix-slave17 | | mw32-ix-slave18 | | mw32-ix-slave19 | | mw32-ix-slave20 | | mw32-ix-slave21 | | mw32-ix-slave22 | | mw32-ix-slave23 | | mw32-ix-slave24 | | mw32-ix-slave25 | | mw32-ix-slave26 | +-----------------+ 14 rows in set (0.01 sec)
Assignee | ||
Comment 27•12 years ago
|
||
Armen: those are the ones that got retasked, yes.
Comment 28•12 years ago
|
||
(In reply to Amy Rich [:arich] [:arr] from comment #27) > Armen: those are the ones that got retasked, yes. I've screwed up mysql statements before and I was hoping for someone to double check my delete statement :)
Comment 29•12 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] from comment #28) > (In reply to Amy Rich [:arich] [:arr] from comment #27) > > Armen: those are the ones that got retasked, yes. > > I've screwed up mysql statements before and I was hoping for someone to > double check my delete statement :) nvm. it seems that we're not going to remove decommissioned slaves from slavealloc and change the UI from showing them.
Comment 30•12 years ago
|
||
That sounds kinda silly. What's the bug for that?
Comment 31•12 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #30) > That sounds kinda silly. What's the bug for that? I don't know. Perhaps it is not filed. coop, do we have a bug for hiding decommissioned slaves by default?
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•