Closed Bug 519071 Opened 15 years ago Closed 15 years ago

Create 10 new linux64 VMs cloned from moz2-linux64-slave01

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Assigned: phong)

References

Details

(Whiteboard: ETA 10/09/2009)

To start providing linux64 build coverage, please clone 10 VMs called:

moz2-linux64-slave03
...
moz2-linux64-slave12

Note: if you need to power down moz2-linux64-slave01 to do this, please ping us on irc first, so we can coordinate. This VM is currently in production use, so we'll need to let developers know before we power it off.
Phong, how are we on capacity?  

John - about a week or more out on adding ESX capacity.
Assignee: server-ops → phong
I think we should wait until we have a linux 64-bit ref image before we clone a bunch of new slaves. John, how do you feel about that?
(In reply to comment #2)
> I think we should wait until we have a linux 64-bit ref image before we clone a
> bunch of new slaves. John, how do you feel about that?

See details in bug#519072. Ideally, yes, thats how I'd like to do it also. However, given the urgency from Shaver Friday afternoon, getting linux64 systems up *now* and redoing is his explicit preference. Hence cloning VMs asap, and doing refimage-and-redo-cloning later.
Depends on: 519310
I talked to Shaver - we're a week out for ESX capacity and he's okay waiting.
Whiteboard: ETA 10/09/2009
Do we have a VM I could clone this from?
Build, ping?  Which VM to clone?
(In reply to comment #5)
> Do we have a VM I could clone this from?
(In reply to comment #6)
> Build, ping?  Which VM to clone?

Please clone from "moz2-linux64-slave01", like it says in summary, and comment#0. :-)
Can we take that VM offline any time to clone?
I'd suggest using moz2-linux64-slave02, since that impacts on tracemonkey and m-1.9.1 rather than m-1.9.2 and m-c (which are busy with the 3.6b1 freeze). As John said in comment #0, please ping us on IRC to arrange a time for us to shut the VM down.
ping - any update?
I think we missed some timing here before Phong left for OC.  It'd be real helpful if someone could shut that VM down and update this bug (and even ping oncall about it) so it can be cloned.
Finally!! we have RelEng available to take slave down gracefully, at same time as Phong available to clone VM and at same time as sheriff approval to remove moz2-linux64-slave01 from production for a few hours.

Phong notified in irc.
(In reply to comment #12)
> Finally!! we have RelEng available to take slave down gracefully, at same time
> as Phong available to clone VM and at same time as sheriff approval to remove
> moz2-linux64-slave01 from production for a few hours.
> 
> Phong notified in irc.

Clone made to use as template called "CentOS-5-x64-ref", which we can use for creating the 10 new VMs. Note that this is really a *pseudo* ref-image, because we dont know what is in it or how to reproduce/support. However, it is the best option we have until bug#519074 is resolved. 

Phong put moz2-linux64-slave01 back in production when he was finished cloning, and all is working fine.
I've made a number of changes to the ref image to accomodate making many clones
* remove /builds/moz2_slave/*, so that 11 moz2-linux64-slave01's don't duke it out trying to connect to pm at the same time
* remove /tools/buildbot* so that we pull the production tag to set new clones up
* disabled the vncserver using chkconfig, we're not launching the build at all and Xvfb is running anyway (we'll need to revisit this when we set up unit tests to make sure the cron jobs are present etc etc)
* disabled X for the same reasoning (set run level 3 for boot, in /etc/inittab)
* disabled graphical boot (GRAPHICAL=no in /etc/sysconfig/init)
* moved ~cltbld/.ssh to ~/.ssh-prod, and copied in staging keys from a linux32 box to ~/.ssh-staging and symlinked

I think we should
* increase /builds from 15G to 30G. I though we weren't clobbering very much on linux64 but that's no longer true as the slaves handle several branches
Should also
* use /builds/slave instead of /builds/moz2_slave while there are only two machines to fix
* set a generic hostname on the ref platform
* fix the warning on shutdown about the MAC in use being different than configured
* apply the vncserver, X, and graphical boot changes to moz2-linux64-slave01/02, and update the ref platform doc (I'll get this Monday if no-one beats me to it)
* make the noatime change to /etc/mtab like the linux32 slaves
* check the linux32 ref doc for other changes we should make

Ref platform VM was left shut down.
I did one additional thing:
* Remove host keys in /etc/ssh so they'll be regenerated whenever the machine gets cloned.
all 10 VMs are online and added to nagios.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
(In reply to comment #15)
> Should also
> * use /builds/slave instead of /builds/moz2_slave while there are only two
> machines to fix
> * apply the vncserver, X, and graphical boot changes to
> moz2-linux64-slave01/02, and update the ref platform doc (I'll get this Monday
> if no-one beats me to it)

Both older slaves have been updated as above, as well as the ref doc at
  https://wiki.mozilla.org/ReferencePlatforms/Linux-CentOS-5.0_64-bit
The new slaves have had /builds/moz_slave renamed to /builds/slave (plus /etc/default/buildbot fix) too.
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.