Closed Bug 640706 Opened 13 years ago Closed 13 years ago

Please setup 2 new Linux VMs per data center to be used for check-ins from the build farm

Categories

(Infrastructure & Operations :: RelOps: General, task, P2)

x86
Linux

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: coop, Assigned: coop)

References

Details

Attachments

(3 files)

Bug 517304 has the full details about why we want to do this, but in short, we'd like start using dedicated machines/keys for jobs that require r/w access to hg/cvs. Release tagging, version bumping, and automated blocklist updating are examples of these types of jobs.

The r/w jobs are intermittent, so we don't need many machines. Two VMs per data center should suffice so that masters don't have to cross data center boundaries to use these slaves.

These new VMs can be clones of the existing Linux ref image. If we can name them something to distinguish them as the r/w slaves, that would be helpful, even if it's just adding "-rw-" to their hostname.
Ping - is there an ETA for this so I can plan around it?
(In reply to comment #1)
> Ping - is there an ETA for this so I can plan around it?

Ping - any ETA?
I haven't heard the background on this - would it make more sense to get one VM set up first, and get that into puppet so we can blast the other 5 out quickly and repeatably?  Will these be slaves, masters, or jumphosts?  Or what?
And let's make that first place scl1, yeah?
(In reply to comment #3)
> I haven't heard the background on this - would it make more sense to get one VM
> set up first, and get that into puppet so we can blast the other 5 out quickly
> and repeatably?  Will these be slaves, masters, or jumphosts?  Or what?

These will be dedicated buildslaves, as outlined in bug 517304.

One VM to start is fine, in scl1 is also fine if that makes the most sense.

Puppet config will be identical to a current linux builder, aside from the extra set of ssh keys that will be used for r/w repo access.
OK, so we'll be using a KVM, since we're using virtualization in scl1.  They're slaves, and thus should use the linux builder image (centos5).

Hostname remains to be determined.  I propose

 linux-hgwriter-slaveNN

keeping with the usual build zero-padding.  But let's wait for a sign-off on that proposal from coop before generating the VMs.

Should we set up the inventory, DNS, DHCP, and nagios checks on this bug, or open a new bug for that purpose?
These will be build slaves, and we only have build buildmasters in scl1 and sjc1, so there's no need to put anything new in mtv1.  That means we're only looking at linux-hgwriter-slave{01,02,03,04}, although two of those are KVM and two are VMWare.  Meh.

Zandr, Amy, are we missing any other detail to start setting up these VMs?  What about the last question in the previous comment?
(In reply to comment #7)
> Zandr, Amy, are we missing any other detail to start setting up these VMs? 

ping - any ETA?
As mentioned in the meeting on Monday, scl1 is currently out of capacity for new VMs, and we're ordering hardware to rectify that.  Once we have the memory, we'll need to evac all of the vms from one host to the other, then add the RAM.  We'll need to do this twice (once for each machine).

We'll get inventory, DNS, DHCP, and nagios set up once we have the capacity in scl1.  

Also, we're waiting for a sign off from coop re the names per comment 6.
Once coop agrees with the name, we can set the VMs in sjc1 up without blocking on the KVM hardware in scl1.  Coop?
(In reply to comment #10)
> Once coop agrees with the name, we can set the VMs in sjc1 up without blocking
> on the KVM hardware in scl1.  Coop?

Suggested nomenclature is fine.
New VMs in sjc1:

 linux-hgwriter-slave01
 linux-hgwriter-slave02

both are from the CentOS-5.0-ref-tools-vm template, in the INTEL01 cluster, and on datastore on eq01-bm01.

I'll have their MAC addresses shortly.
Assignee: server-ops-releng → dustin
 linux-hgwriter-slave01: 00:50:56:a5:4d:65
 linux-hgwriter-slave02: 00:50:56:a5:63:7c

(BTW, these have 30GB drives, like the other linux32 vms)
Over to amy for DHCP, DNS, Nagios, Inventory.

These should be monitored just like moz2-linux-slaveNN.  These hosts should probably be in their own hostgroup, though.
Assignee: dustin → arich
Added to inventory, dns, and dhcp.

Please verify that the entries are correct.

https://inventory.mozilla.org/systems/show/3522/
https://inventory.mozilla.org/systems/show/3523/
Also added to nagios.
Assignee: arich → dustin
Attachment #528152 - Flags: review?(coop)
Added to slavealloc.
Attachment #528152 - Flags: review?(coop) → review?(bear)
Attachment #528152 - Flags: review?(bear) → review+
OK, slaves are trying to auth to preproduction-master, although they're being rejected.  Back to Amy to set up the two KVM slaves in scl1.
Assignee: dustin → arich
dustin: how much memory and cpu did you allocate to the machines in sjc1?  I can match that in scl1.
I used the default for a slave on vmware, which I think is one CPU.  RAM is 2G, swap 512M, disk 40G in total, with 30G for /builds.
Two vanilla servers for scl1 built off the centos-55 image with 40G of disk and 2G of RAM.


linux-hgwriter-slave03.build.mozilla.org is an alias for linux-hgwriter-slave03.build.scl1.mozilla.com.
linux-hgwriter-slave03.build.scl1.mozilla.com has address 10.12.48.12

linux-hgwriter-slave04.build.mozilla.org is an alias for linux-hgwriter-slave04.build.scl1.mozilla.com.
linux-hgwriter-slave04.build.scl1.mozilla.com has address 10.12.48.13

        host linux-hgwriter-slave03 { hardware ethernet aa:00:00:e7:fa:fe; fixed-address 10.12.48.12; }
        host linux-hgwriter-slave04 { hardware ethernet aa:00:00:ac:c1:ca; fixed-address 10.12.48.13; }


https://inventory.mozilla.org/systems/show/3596/
https://inventory.mozilla.org/systems/show/3597/

bkero informs me that the genati-instance-image install process only supports three partition arrangements:

/boot, swap, /
/boot, /
swap, /root

So these hosts are using the first.

Our options seem to be to make a directory for /builds (the space is there, it's just shared), munge things by hand each time to grow the disk and create a new filesystem, or not use an image build.

I think the first option is the best if we can get away with it, but I'm not sure we want to have /builds and / on the same partition.
The other slaves - even VMs - segregate /builds so that when (not if, when) it fills up, it doesn't take everything else down.  If these are the only two slaves we'll ever put on KVM, then I'd be OK with skipping it in this case.  But if we're going to put more slaves on KVM, or these slaves will last forever, then I think we should solve the /builds problem now, rather than later.
If it makes you feel any safer, the ext[234] filesystems have a reserved disk
space % set aside only to be used by root for when the disks fill up.  By
default, this is set to 5%, so the hosts are still accessible and system
critical services operate even when the userland is 'filled'.
I spoke with Dustin and we're going to rely on the 5% root only margin and keep the disk layout as it is now.  The two machines in scl1 are ready for releng to take them and do their slave magic on them and see how it goes with the centos55 image.
Assignee: arich → dustin
more of the same.  Let's see what puppet can do with an unconfigured centos55 box!
Attachment #529835 - Flags: review?(bear)
Attachment #529835 - Flags: review?(bear) → review+
Comment on attachment 529835 [details] [diff] [review]
m640706-puppet-manifests-p2-r1.patch

changeset:   321:8a97715469ff
user:        Dustin J. Mitchell <dustin@mozilla.com>
date:        Thu May 05 14:08:06 2011 -0500
summary:     Bug 640706: add linux-hgwriter-slave{03,04}; r=bear
Minimal patches to puppet to make the KVM slaves able to run puppet successfully.  This doesn't exactly build from a bare centos5.5 system - see https://wiki.mozilla.org/ReferencePlatforms/Linux-CentOS-5.5 for the other stuff that's required.  But without these changes, the slaves won't boot because they'll hang in puppet.

Once this is landed, I'll hand this over to coop who can hopefully give these slaves a test run in a dev master before putting them into production.  I'm curious to know if they produce functional Firefox's.
Attachment #530782 - Flags: review?(catlee)
Attachment #530782 - Flags: review?(catlee) → review+
landed and deployed.

Coop, over to you to test these out:
 * try some dev builds on them to see if they produce a usable Firefox (for future reference)
 * get them doing the hgwriter stuff
Assignee: dustin → coop
Status: NEW → ASSIGNED
Priority: P3 → P2
(In reply to comment #29)
>  * try some dev builds on them to see if they produce a usable Firefox (for
> future reference)

linux-hgwriter-slave0[1,2] seem to be able to build fine.

linux-hgwriter-slave0[3,4] are missing some key build tools (e.g. autoconf-2.13). Going to try re-syncing from puppet.
(In reply to comment #30)
> linux-hgwriter-slave0[3,4] are missing some key build tools (e.g.
> autoconf-2.13). Going to try re-syncing from puppet.

Not a puppet issue, as Dustin reminded me, since this is a new ref image.

I updated the ref platform instructions with how to install the required packages for building: https://wiki.mozilla.org/ReferencePlatforms/Linux-CentOS-5.5 These slaves can now build correctly.

I think I'm happy with the VMs. Resolving this bug and continuing work back in bug 517304.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: