Closed Bug 640706 Opened 13 years ago Closed 13 years ago

Please setup 2 new Linux VMs per data center to be used for check-ins from the build farm

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: coop, Assigned: coop)

References

Details

Attachments

(3 files)

m640706-puppet-manifests-r1.patch 13 years ago Dustin J. Mitchell [:dustin] (he/him) 529 bytes, patch	bear : review+	Details \| Diff \| Splinter Review
m640706-puppet-manifests-p2-r1.patch 13 years ago Dustin J. Mitchell [:dustin] (he/him) 518 bytes, patch	bear : review+	Details \| Diff \| Splinter Review
m640706-puppet-manifests-p3-r1.patch 13 years ago Dustin J. Mitchell [:dustin] (he/him) 1.24 KB, patch	catlee : review+	Details \| Diff \| Splinter Review

Chris Cooper [:coop] (he/him)

Assignee

Description

•

13 years ago

Bug 517304 has the full details about why we want to do this, but in short, we'd like start using dedicated machines/keys for jobs that require r/w access to hg/cvs. Release tagging, version bumping, and automated blocklist updating are examples of these types of jobs.

The r/w jobs are intermittent, so we don't need many machines. Two VMs per data center should suffice so that masters don't have to cross data center boundaries to use these slaves.

These new VMs can be clones of the existing Linux ref image. If we can name them something to distinguish them as the r/w slaves, that would be helpful, even if it's just adding "-rw-" to their hostname.

Chris Cooper [:coop] (he/him)

Assignee

Comment 1

•

13 years ago

Ping - is there an ETA for this so I can plan around it?

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 2

•

13 years ago

(In reply to comment #1)
> Ping - is there an ETA for this so I can plan around it?

Ping - any ETA?

Dustin J. Mitchell [:dustin] (he/him)

Comment 3

•

13 years ago

I haven't heard the background on this - would it make more sense to get one VM set up first, and get that into puppet so we can blast the other 5 out quickly and repeatably?  Will these be slaves, masters, or jumphosts?  Or what?

Zandr Milewski [:zandr]

Comment 4

•

13 years ago

And let's make that first place scl1, yeah?

Chris Cooper [:coop] (he/him)

Assignee

Comment 5

•

13 years ago

(In reply to comment #3)
> I haven't heard the background on this - would it make more sense to get one VM
> set up first, and get that into puppet so we can blast the other 5 out quickly
> and repeatably?  Will these be slaves, masters, or jumphosts?  Or what?

These will be dedicated buildslaves, as outlined in bug 517304.

One VM to start is fine, in scl1 is also fine if that makes the most sense.

Puppet config will be identical to a current linux builder, aside from the extra set of ssh keys that will be used for r/w repo access.

Dustin J. Mitchell [:dustin] (he/him)

Comment 6

•

13 years ago

OK, so we'll be using a KVM, since we're using virtualization in scl1.  They're slaves, and thus should use the linux builder image (centos5).

Hostname remains to be determined.  I propose

 linux-hgwriter-slaveNN

keeping with the usual build zero-padding.  But let's wait for a sign-off on that proposal from coop before generating the VMs.

Should we set up the inventory, DNS, DHCP, and nagios checks on this bug, or open a new bug for that purpose?

Dustin J. Mitchell [:dustin] (he/him)

Comment 7

•

13 years ago

These will be build slaves, and we only have build buildmasters in scl1 and sjc1, so there's no need to put anything new in mtv1.  That means we're only looking at linux-hgwriter-slave{01,02,03,04}, although two of those are KVM and two are VMWare.  Meh.

Zandr, Amy, are we missing any other detail to start setting up these VMs?  What about the last question in the previous comment?

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 8

•

13 years ago

(In reply to comment #7)
> Zandr, Amy, are we missing any other detail to start setting up these VMs? 

ping - any ETA?

Amy Rich [:arr] [:arich]

Comment 9

•

13 years ago

As mentioned in the meeting on Monday, scl1 is currently out of capacity for new VMs, and we're ordering hardware to rectify that.  Once we have the memory, we'll need to evac all of the vms from one host to the other, then add the RAM.  We'll need to do this twice (once for each machine).

We'll get inventory, DNS, DHCP, and nagios set up once we have the capacity in scl1.  

Also, we're waiting for a sign off from coop re the names per comment 6.

Dustin J. Mitchell [:dustin] (he/him)

Comment 10

•

13 years ago

Once coop agrees with the name, we can set the VMs in sjc1 up without blocking on the KVM hardware in scl1.  Coop?

Chris Cooper [:coop] (he/him)

Assignee

Comment 11

•

13 years ago

(In reply to comment #10)
> Once coop agrees with the name, we can set the VMs in sjc1 up without blocking
> on the KVM hardware in scl1.  Coop?

Suggested nomenclature is fine.

Dustin J. Mitchell [:dustin] (he/him)

Comment 12

•

13 years ago

New VMs in sjc1:

 linux-hgwriter-slave01
 linux-hgwriter-slave02

both are from the CentOS-5.0-ref-tools-vm template, in the INTEL01 cluster, and on datastore on eq01-bm01.

I'll have their MAC addresses shortly.

Assignee: server-ops-releng → dustin

Dustin J. Mitchell [:dustin] (he/him)

Comment 13

•

13 years ago

 linux-hgwriter-slave01: 00:50:56:a5:4d:65
 linux-hgwriter-slave02: 00:50:56:a5:63:7c

(BTW, these have 30GB drives, like the other linux32 vms)

Dustin J. Mitchell [:dustin] (he/him)

Comment 14

•

13 years ago

Over to amy for DHCP, DNS, Nagios, Inventory.

These should be monitored just like moz2-linux-slaveNN.  These hosts should probably be in their own hostgroup, though.

Assignee: dustin → arich

Amy Rich [:arr] [:arich]

Comment 15

•

13 years ago

Added to inventory, dns, and dhcp.

Please verify that the entries are correct.

https://inventory.mozilla.org/systems/show/3522/
https://inventory.mozilla.org/systems/show/3523/

Amy Rich [:arr] [:arich]

Comment 16

•

13 years ago

Also added to nagios.

Dustin J. Mitchell [:dustin] (he/him)

Comment 17

•

13 years ago

Attached patch m640706-puppet-manifests-r1.patch — Details — Splinter Review

Assignee: arich → dustin

Attachment #528152 - Flags: review?(coop)

Dustin J. Mitchell [:dustin] (he/him)

Comment 18

•

13 years ago

Added to slavealloc.

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

13 years ago

Attachment #528152 - Flags: review?(coop) → review?(bear)

Mike Taylor [:bear]

Updated

•

13 years ago

Attachment #528152 - Flags: review?(bear) → review+

Dustin J. Mitchell [:dustin] (he/him)

Comment 19

•

13 years ago

OK, slaves are trying to auth to preproduction-master, although they're being rejected.  Back to Amy to set up the two KVM slaves in scl1.

Assignee: dustin → arich

Amy Rich [:arr] [:arich]

Comment 20

•

13 years ago

dustin: how much memory and cpu did you allocate to the machines in sjc1?  I can match that in scl1.

Dustin J. Mitchell [:dustin] (he/him)

Comment 21

•

13 years ago

I used the default for a slave on vmware, which I think is one CPU.  RAM is 2G, swap 512M, disk 40G in total, with 30G for /builds.

Amy Rich [:arr] [:arich]

Comment 22

•

13 years ago

Two vanilla servers for scl1 built off the centos-55 image with 40G of disk and 2G of RAM.


linux-hgwriter-slave03.build.mozilla.org is an alias for linux-hgwriter-slave03.build.scl1.mozilla.com.
linux-hgwriter-slave03.build.scl1.mozilla.com has address 10.12.48.12

linux-hgwriter-slave04.build.mozilla.org is an alias for linux-hgwriter-slave04.build.scl1.mozilla.com.
linux-hgwriter-slave04.build.scl1.mozilla.com has address 10.12.48.13

        host linux-hgwriter-slave03 { hardware ethernet aa:00:00:e7:fa:fe; fixed-address 10.12.48.12; }
        host linux-hgwriter-slave04 { hardware ethernet aa:00:00:ac:c1:ca; fixed-address 10.12.48.13; }


https://inventory.mozilla.org/systems/show/3596/
https://inventory.mozilla.org/systems/show/3597/

bkero informs me that the genati-instance-image install process only supports three partition arrangements:

/boot, swap, /
/boot, /
swap, /root

So these hosts are using the first.

Our options seem to be to make a directory for /builds (the space is there, it's just shared), munge things by hand each time to grow the disk and create a new filesystem, or not use an image build.

I think the first option is the best if we can get away with it, but I'm not sure we want to have /builds and / on the same partition.

Dustin J. Mitchell [:dustin] (he/him)

Comment 23

•

13 years ago

The other slaves - even VMs - segregate /builds so that when (not if, when) it fills up, it doesn't take everything else down.  If these are the only two slaves we'll ever put on KVM, then I'd be OK with skipping it in this case.  But if we're going to put more slaves on KVM, or these slaves will last forever, then I think we should solve the /builds problem now, rather than later.

Ben Kero [:bkero]

Comment 24

•

13 years ago

If it makes you feel any safer, the ext[234] filesystems have a reserved disk
space % set aside only to be used by root for when the disks fill up.  By
default, this is set to 5%, so the hosts are still accessible and system
critical services operate even when the userland is 'filled'.

Amy Rich [:arr] [:arich]

Comment 25

•

13 years ago

I spoke with Dustin and we're going to rely on the 5% root only margin and keep the disk layout as it is now.  The two machines in scl1 are ready for releng to take them and do their slave magic on them and see how it goes with the centos55 image.

Assignee: arich → dustin

Dustin J. Mitchell [:dustin] (he/him)

Comment 26

•

13 years ago

Attached patch m640706-puppet-manifests-p2-r1.patch — Details — Splinter Review

more of the same.  Let's see what puppet can do with an unconfigured centos55 box!

Attachment #529835 - Flags: review?(bear)

Mike Taylor [:bear]

Updated

•

13 years ago

Attachment #529835 - Flags: review?(bear) → review+

Dustin J. Mitchell [:dustin] (he/him)

Comment 27

•

13 years ago

Comment on attachment 529835 [details] [diff] [review]
m640706-puppet-manifests-p2-r1.patch

changeset:   321:8a97715469ff
user:        Dustin J. Mitchell <dustin@mozilla.com>
date:        Thu May 05 14:08:06 2011 -0500
summary:     Bug 640706: add linux-hgwriter-slave{03,04}; r=bear

Dustin J. Mitchell [:dustin] (he/him)

Comment 28

•

13 years ago

Attached patch m640706-puppet-manifests-p3-r1.patch — Details — Splinter Review

Minimal patches to puppet to make the KVM slaves able to run puppet successfully.  This doesn't exactly build from a bare centos5.5 system - see https://wiki.mozilla.org/ReferencePlatforms/Linux-CentOS-5.5 for the other stuff that's required.  But without these changes, the slaves won't boot because they'll hang in puppet.

Once this is landed, I'll hand this over to coop who can hopefully give these slaves a test run in a dev master before putting them into production.  I'm curious to know if they produce functional Firefox's.

Attachment #530782 - Flags: review?(catlee)

Chris AtLee [:catlee]

Updated

•

13 years ago

Attachment #530782 - Flags: review?(catlee) → review+

Dustin J. Mitchell [:dustin] (he/him)

Comment 29

•

13 years ago

landed and deployed.

Coop, over to you to test these out:
 * try some dev builds on them to see if they produce a usable Firefox (for future reference)
 * get them doing the hgwriter stuff

Assignee: dustin → coop

Chris Cooper [:coop] (he/him)

Assignee

Updated

•

13 years ago

Status: NEW → ASSIGNED

Priority: P3 → P2

Chris Cooper [:coop] (he/him)

Assignee

Comment 30

•

13 years ago

(In reply to comment #29)
>  * try some dev builds on them to see if they produce a usable Firefox (for
> future reference)

linux-hgwriter-slave0[1,2] seem to be able to build fine.

linux-hgwriter-slave0[3,4] are missing some key build tools (e.g. autoconf-2.13). Going to try re-syncing from puppet.

Chris Cooper [:coop] (he/him)

Assignee

Comment 31

•

13 years ago

(In reply to comment #30)
> linux-hgwriter-slave0[3,4] are missing some key build tools (e.g.
> autoconf-2.13). Going to try re-syncing from puppet.

Not a puppet issue, as Dustin reminded me, since this is a new ref image.

I updated the ref platform instructions with how to install the required packages for building: https://wiki.mozilla.org/ReferencePlatforms/Linux-CentOS-5.5 These slaves can now build correctly.

I think I'm happy with the VMs. Resolving this bug and continuing work back in bug 517304.

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

11 years ago

Component: Server Operations: RelEng → RelOps

Product: mozilla.org → Infrastructure & Operations

You need to log in before you can comment on or make changes to this bug.