Closed
Bug 640706
Opened 13 years ago
Closed 13 years ago
Please setup 2 new Linux VMs per data center to be used for check-ins from the build farm
Categories
(Infrastructure & Operations :: RelOps: General, task, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: coop, Assigned: coop)
References
Details
Attachments
(3 files)
529 bytes,
patch
|
bear
:
review+
|
Details | Diff | Splinter Review |
518 bytes,
patch
|
bear
:
review+
|
Details | Diff | Splinter Review |
1.24 KB,
patch
|
catlee
:
review+
|
Details | Diff | Splinter Review |
Bug 517304 has the full details about why we want to do this, but in short, we'd like start using dedicated machines/keys for jobs that require r/w access to hg/cvs. Release tagging, version bumping, and automated blocklist updating are examples of these types of jobs. The r/w jobs are intermittent, so we don't need many machines. Two VMs per data center should suffice so that masters don't have to cross data center boundaries to use these slaves. These new VMs can be clones of the existing Linux ref image. If we can name them something to distinguish them as the r/w slaves, that would be helpful, even if it's just adding "-rw-" to their hostname.
Assignee | ||
Comment 1•13 years ago
|
||
Ping - is there an ETA for this so I can plan around it?
Comment 2•13 years ago
|
||
(In reply to comment #1) > Ping - is there an ETA for this so I can plan around it? Ping - any ETA?
Comment 3•13 years ago
|
||
I haven't heard the background on this - would it make more sense to get one VM set up first, and get that into puppet so we can blast the other 5 out quickly and repeatably? Will these be slaves, masters, or jumphosts? Or what?
Comment 4•13 years ago
|
||
And let's make that first place scl1, yeah?
Assignee | ||
Comment 5•13 years ago
|
||
(In reply to comment #3) > I haven't heard the background on this - would it make more sense to get one VM > set up first, and get that into puppet so we can blast the other 5 out quickly > and repeatably? Will these be slaves, masters, or jumphosts? Or what? These will be dedicated buildslaves, as outlined in bug 517304. One VM to start is fine, in scl1 is also fine if that makes the most sense. Puppet config will be identical to a current linux builder, aside from the extra set of ssh keys that will be used for r/w repo access.
Comment 6•13 years ago
|
||
OK, so we'll be using a KVM, since we're using virtualization in scl1. They're slaves, and thus should use the linux builder image (centos5). Hostname remains to be determined. I propose linux-hgwriter-slaveNN keeping with the usual build zero-padding. But let's wait for a sign-off on that proposal from coop before generating the VMs. Should we set up the inventory, DNS, DHCP, and nagios checks on this bug, or open a new bug for that purpose?
Comment 7•13 years ago
|
||
These will be build slaves, and we only have build buildmasters in scl1 and sjc1, so there's no need to put anything new in mtv1. That means we're only looking at linux-hgwriter-slave{01,02,03,04}, although two of those are KVM and two are VMWare. Meh. Zandr, Amy, are we missing any other detail to start setting up these VMs? What about the last question in the previous comment?
Comment 8•13 years ago
|
||
(In reply to comment #7) > Zandr, Amy, are we missing any other detail to start setting up these VMs? ping - any ETA?
Comment 9•13 years ago
|
||
As mentioned in the meeting on Monday, scl1 is currently out of capacity for new VMs, and we're ordering hardware to rectify that. Once we have the memory, we'll need to evac all of the vms from one host to the other, then add the RAM. We'll need to do this twice (once for each machine). We'll get inventory, DNS, DHCP, and nagios set up once we have the capacity in scl1. Also, we're waiting for a sign off from coop re the names per comment 6.
Comment 10•13 years ago
|
||
Once coop agrees with the name, we can set the VMs in sjc1 up without blocking on the KVM hardware in scl1. Coop?
Assignee | ||
Comment 11•13 years ago
|
||
(In reply to comment #10) > Once coop agrees with the name, we can set the VMs in sjc1 up without blocking > on the KVM hardware in scl1. Coop? Suggested nomenclature is fine.
Comment 12•13 years ago
|
||
New VMs in sjc1: linux-hgwriter-slave01 linux-hgwriter-slave02 both are from the CentOS-5.0-ref-tools-vm template, in the INTEL01 cluster, and on datastore on eq01-bm01. I'll have their MAC addresses shortly.
Assignee: server-ops-releng → dustin
Comment 13•13 years ago
|
||
linux-hgwriter-slave01: 00:50:56:a5:4d:65 linux-hgwriter-slave02: 00:50:56:a5:63:7c (BTW, these have 30GB drives, like the other linux32 vms)
Comment 14•13 years ago
|
||
Over to amy for DHCP, DNS, Nagios, Inventory. These should be monitored just like moz2-linux-slaveNN. These hosts should probably be in their own hostgroup, though.
Assignee: dustin → arich
Comment 15•13 years ago
|
||
Added to inventory, dns, and dhcp. Please verify that the entries are correct. https://inventory.mozilla.org/systems/show/3522/ https://inventory.mozilla.org/systems/show/3523/
Comment 16•13 years ago
|
||
Also added to nagios.
Comment 17•13 years ago
|
||
Assignee: arich → dustin
Attachment #528152 -
Flags: review?(coop)
Comment 18•13 years ago
|
||
Added to slavealloc.
Updated•13 years ago
|
Attachment #528152 -
Flags: review?(coop) → review?(bear)
Updated•13 years ago
|
Attachment #528152 -
Flags: review?(bear) → review+
Comment 19•13 years ago
|
||
OK, slaves are trying to auth to preproduction-master, although they're being rejected. Back to Amy to set up the two KVM slaves in scl1.
Assignee: dustin → arich
Comment 20•13 years ago
|
||
dustin: how much memory and cpu did you allocate to the machines in sjc1? I can match that in scl1.
Comment 21•13 years ago
|
||
I used the default for a slave on vmware, which I think is one CPU. RAM is 2G, swap 512M, disk 40G in total, with 30G for /builds.
Comment 22•13 years ago
|
||
Two vanilla servers for scl1 built off the centos-55 image with 40G of disk and 2G of RAM. linux-hgwriter-slave03.build.mozilla.org is an alias for linux-hgwriter-slave03.build.scl1.mozilla.com. linux-hgwriter-slave03.build.scl1.mozilla.com has address 10.12.48.12 linux-hgwriter-slave04.build.mozilla.org is an alias for linux-hgwriter-slave04.build.scl1.mozilla.com. linux-hgwriter-slave04.build.scl1.mozilla.com has address 10.12.48.13 host linux-hgwriter-slave03 { hardware ethernet aa:00:00:e7:fa:fe; fixed-address 10.12.48.12; } host linux-hgwriter-slave04 { hardware ethernet aa:00:00:ac:c1:ca; fixed-address 10.12.48.13; } https://inventory.mozilla.org/systems/show/3596/ https://inventory.mozilla.org/systems/show/3597/ bkero informs me that the genati-instance-image install process only supports three partition arrangements: /boot, swap, / /boot, / swap, /root So these hosts are using the first. Our options seem to be to make a directory for /builds (the space is there, it's just shared), munge things by hand each time to grow the disk and create a new filesystem, or not use an image build. I think the first option is the best if we can get away with it, but I'm not sure we want to have /builds and / on the same partition.
Comment 23•13 years ago
|
||
The other slaves - even VMs - segregate /builds so that when (not if, when) it fills up, it doesn't take everything else down. If these are the only two slaves we'll ever put on KVM, then I'd be OK with skipping it in this case. But if we're going to put more slaves on KVM, or these slaves will last forever, then I think we should solve the /builds problem now, rather than later.
Comment 24•13 years ago
|
||
If it makes you feel any safer, the ext[234] filesystems have a reserved disk space % set aside only to be used by root for when the disks fill up. By default, this is set to 5%, so the hosts are still accessible and system critical services operate even when the userland is 'filled'.
Comment 25•13 years ago
|
||
I spoke with Dustin and we're going to rely on the 5% root only margin and keep the disk layout as it is now. The two machines in scl1 are ready for releng to take them and do their slave magic on them and see how it goes with the centos55 image.
Assignee: arich → dustin
Comment 26•13 years ago
|
||
more of the same. Let's see what puppet can do with an unconfigured centos55 box!
Attachment #529835 -
Flags: review?(bear)
Updated•13 years ago
|
Attachment #529835 -
Flags: review?(bear) → review+
Comment 27•13 years ago
|
||
Comment on attachment 529835 [details] [diff] [review] m640706-puppet-manifests-p2-r1.patch changeset: 321:8a97715469ff user: Dustin J. Mitchell <dustin@mozilla.com> date: Thu May 05 14:08:06 2011 -0500 summary: Bug 640706: add linux-hgwriter-slave{03,04}; r=bear
Comment 28•13 years ago
|
||
Minimal patches to puppet to make the KVM slaves able to run puppet successfully. This doesn't exactly build from a bare centos5.5 system - see https://wiki.mozilla.org/ReferencePlatforms/Linux-CentOS-5.5 for the other stuff that's required. But without these changes, the slaves won't boot because they'll hang in puppet. Once this is landed, I'll hand this over to coop who can hopefully give these slaves a test run in a dev master before putting them into production. I'm curious to know if they produce functional Firefox's.
Attachment #530782 -
Flags: review?(catlee)
Updated•13 years ago
|
Attachment #530782 -
Flags: review?(catlee) → review+
Comment 29•13 years ago
|
||
landed and deployed. Coop, over to you to test these out: * try some dev builds on them to see if they produce a usable Firefox (for future reference) * get them doing the hgwriter stuff
Assignee: dustin → coop
Assignee | ||
Updated•13 years ago
|
Status: NEW → ASSIGNED
Priority: P3 → P2
Assignee | ||
Comment 30•13 years ago
|
||
(In reply to comment #29) > * try some dev builds on them to see if they produce a usable Firefox (for > future reference) linux-hgwriter-slave0[1,2] seem to be able to build fine. linux-hgwriter-slave0[3,4] are missing some key build tools (e.g. autoconf-2.13). Going to try re-syncing from puppet.
Assignee | ||
Comment 31•13 years ago
|
||
(In reply to comment #30) > linux-hgwriter-slave0[3,4] are missing some key build tools (e.g. > autoconf-2.13). Going to try re-syncing from puppet. Not a puppet issue, as Dustin reminded me, since this is a new ref image. I updated the ref platform instructions with how to install the required packages for building: https://wiki.mozilla.org/ReferencePlatforms/Linux-CentOS-5.5 These slaves can now build correctly. I think I'm happy with the VMs. Resolving this bug and continuing work back in bug 517304.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•