Closed Bug 607179 Opened 12 years ago Closed 11 years ago

create new buildbot-master{4,6}.build.m.o on VMs using puppet

Categories

(Release Engineering :: General, defect, P2)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Assigned: catlee)

References

Details

(Whiteboard: [buildduty])

Attachments

(12 files, 7 obsolete files)

15.09 KB, patch
catlee
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
5.67 KB, patch
catlee
: review+
rail
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
8.06 KB, patch
catlee
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
3.48 KB, patch
catlee
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
1.46 KB, patch
catlee
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
8.45 KB, patch
bhearsum
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
15.29 KB, patch
dustin
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
26.82 KB, patch
dustin
: review+
catlee
: checked-in+
Details | Diff | Splinter Review
5.33 KB, text/plain
Details
8.07 KB, text/plain
Details
1.92 KB, text/plain
Details
1.62 KB, text/plain
Details
Each of these masters should be running 3 buildbot master processes; one for build jobs (8010) two for test jobs (8011, 8012). These should be identical to buildbot-master1,2 and have same nagios, munin, etc, settings.
Are these supposed to be based in SCL, MPT, Castro, or a combination?
(In reply to comment #1)
> Are these supposed to be based in SCL, MPT, Castro, or a combination?

For now, lets do these in SCL, alongside existing buildbot-master1,2.

My thinking was to:
- get these 5 pods all up and running, in identical manner
- allocate slaves across the masters
- do rolling restarts of these production pods without need a downtime
...then we can have a later project to move a few masters and slaves out from SCL to other colos.
Summary: create new buildbot-master{03,04,05}.build.m.o → create new buildbot-master{4,5,6}.build.m.o
buildbot-master4 is having hardware issues

buildbot-master6 was done last week and buildbot-master5 was just enabled today.

closing as fixed as soon as I can get a confirmation that the firewall holes are all punced
http://hg.mozilla.org/build/tools/rev/64585307d058 to fix production-masters.json, which buildapi requires. We need to make these updates, land them, update buildpai on cruncher before jobs start getting done on a master.
Followup - http://hg.mozilla.org/build/tools/rev/e3046037e323
FQDN is buildbot-master5.build.mozilla.org, while we have buildbot-master1.build.scl1.mozilla.com.
Whiteboard: [buildduty]
This patch creates all the new master configs files we need for buildbot-master4, 5, and 6. It also removes the old, now unused, tests_master3 on bm01.

We still have a bunch of work to do setting up the prerequisites on the machines, but I think we can land this at any time.
Attachment #519439 - Flags: review?(catlee)
Depends on: 639628
Attached patch makefile to set-up a new master (obsolete) — Splinter Review
Catlee suggested doing something that would help us set up a new master...so here it is! This should set-up the entire master directory as well as crontabs and a template mysql config file. That file, as well as passwords.py and BuildSlaves.py need to be updated manually afterwards. Not sure how to do that better.

There's still other manual things that need to be done, but those are things like filing IT bugs, updating wikis, etc.

Haven't tested this on an actual master yet, but it worked locally.

I'm not super happy with pip downloading things from the internet, or using 'tip' of MozillaPulse, but not sure how to make either of those things better.
Attachment #519491 - Flags: feedback?(rail)
Attachment #519491 - Flags: feedback?(catlee)
Using https://wiki.mozilla.org/User:Bhearsum/Master_Setup as a staging area for a new Master Setup doc that uses this Makefile.
Awesome! But won't this need to be run as root to write to /etc/cron.d, which will muck up the permissions on everything else ?
(In reply to comment #9)
> Awesome! But won't this need to be run as root to write to /etc/cron.d, which
> will muck up the permissions on everything else ?

....yes :). How about running everything except "cron" as part of all, and then prompting the user to run the "cron" target as root?
Yeah, or maybe 'su - root /path/to/Makefile cron' and the user just needs to input the root p/w ? Need to set some envvars too ?
Comment on attachment 519491 [details] [diff] [review]
makefile to set-up a new master

I'd add cron entries to a temporary file, then just su -c 'cp tempfile /etc/cron.d/$(MASTER_NAME).

BTW, if you add these templates to build/tools, I will use them for automatic master setup in pre-production.
Attachment #519491 - Flags: feedback?(rail) → feedback+
Comment on attachment 519491 [details] [diff] [review]
makefile to set-up a new master

Nice!

I kind of prefer having .pth files in the virtualenv to point to tools and buildbotcustom rather than messing with PYTHONPATH in the Makefile. It makes it easier to do things on the command line.

I think we should refer to an explicit version of pulse here too.  Does putting the url in master-pip.txt work?
Attachment #519491 - Flags: feedback?(catlee) → feedback+
(In reply to comment #13)
> Comment on attachment 519491 [details] [diff] [review]
> makefile to set-up a new master
> 
> Nice!
> 
> I kind of prefer having .pth files in the virtualenv to point to tools and
> buildbotcustom rather than messing with PYTHONPATH in the Makefile. It makes it
> easier to do things on the command line.

Hm, I didn't know you could do that. I'll have a look.

> I think we should refer to an explicit version of pulse here too.  Does putting
> the url in master-pip.txt work?

Pulse has no versions, but we could point at a specific changeset, I think. Any idea which we should use?

(In reply to comment #11)
> Yeah, or maybe 'su - root /path/to/Makefile cron' and the user just needs to
> input the root p/w ? Need to set some envvars too ?

Yeah, we'll need to set MASTER_NAME and maybe BASEDIR, but I think that's it...

(In reply to comment #12)
> BTW, if you add these templates to build/tools, I will use them for automatic
> master setup in pre-production.

I was actually hoping to avoid keeping static templates at all, to avoid the maintenance hassle. Could preproduction use the Makefile target instead? If we need to, eg, make the crontab file overridable, I can do that.
> > I think we should refer to an explicit version of pulse here too.  Does putting
> > the url in master-pip.txt work?
> 
> Pulse has no versions, but we could point at a specific changeset, I think. Any
> idea which we should use?

ad95569a089e (current tip) is what we have on the masters right now I believe.
Attachment #519491 - Attachment is obsolete: true
Attachment #519671 - Flags: review?(rail)
Attachment #519671 - Flags: review?(catlee)
Comment on attachment 519439 [details] [diff] [review]
create config files for new masters

Needs further updates.
Attachment #519439 - Attachment is obsolete: true
Attachment #519439 - Flags: review?(catlee)
Attachment #519671 - Attachment is obsolete: true
Attachment #519695 - Flags: review?(rail)
Attachment #519695 - Flags: review?(catlee)
Attachment #519671 - Flags: review?(rail)
Attachment #519671 - Flags: review?(catlee)
random note from a meeting:
- we should feed setup-master.py production-masters.json, and move whatever master info is in it, there.
(In reply to comment #20)
> random note from a meeting:
> - we should feed setup-master.py production-masters.json, and move whatever
> master info is in it, there.

Filed this as bug 642411
Depends on: 642413
Depends on: 642416
Depends on: 642418
Attachment #519694 - Flags: review?(catlee) → review+
Attachment #519695 - Flags: review?(catlee) → review+
Not sure if this should land before the masters are active or not.
Attachment #519906 - Flags: review?(catlee)
Attachment #519695 - Flags: review?(rail) → review+
Attachment #519695 - Flags: checked-in+
Attachment #519694 - Flags: checked-in+
Attached patch make hg overridable (obsolete) — Splinter Review
Not urgent, but this is useful on some masters, as it turns out.
Attachment #519954 - Flags: review?(catlee)
Comment on attachment 519954 [details] [diff] [review]
make hg overridable

Going to need a couple more fixes, actually.
Attachment #519954 - Attachment is obsolete: true
Attachment #519954 - Flags: review?(catlee)
Attachment #519906 - Flags: review?(catlee) → review+
Attachment #519959 - Flags: review?(catlee) → review+
Just brought up the new build master on buildbot-master5, going to move a slave or two to it soon.
Comment on attachment 519959 [details] [diff] [review]
create crontab as cltbld, make hg overridable, fix syntax errors in crontab

Landed this on default
Attachment #519959 - Flags: checked-in+
Attachment #519906 - Flags: checked-in+
Current status:
All masters on buildbot-master5 are now set-up, few slaves are on them yet.
buildbot-master4 and 6 are waiting until we figure out what we're doing in terms of Puppet + new masters, since they're both a bare OS.
I noticed that the cronjobs weren't firing properly, because they were chmoded to 664. I also noticed that MozillaPulse depends on something that "pip freeze" didn't catch, added it to the requirements file.
Attachment #520187 - Flags: review?(catlee)
Attachment #520187 - Flags: review?(catlee) → review+
Attachment #520187 - Flags: checked-in+
Summary: create new buildbot-master{4,5,6}.build.m.o → create new buildbot-master{4,6}.build.m.o on VMs using puppet
Assignee: nobody → catlee
Priority: P4 → P2
Depends on: 646320
Depends on: 646482
Attached patch First pass at puppet manifests (obsolete) — Splinter Review
masters on buildbot-master4 are up and running (hacked to be a staging masters) with these manifests

using puppet to manage puppet is maybe a little crazy, lots of bootstrapping issues there, I can rip it out if wanted. I figure we need to encode setting up a puppet master somewhere.
Attachment #523430 - Flags: feedback?(dustin)
Attachment #523430 - Flags: feedback?(bhearsum)
Attachment #523430 - Flags: feedback?(bhearsum) → feedback+
Comment on attachment 523430 [details] [diff] [review]
First pass at puppet manifests

+# buildmaster requires that $num_masters be set on the node prior to including this class
+# $num_masters should be a count of how many master instances are on the
+# machine, and it used to set up nagios checks appropriately

I think that there are puppet-y ways around this, although I don't know them now.  Please file a bug for that?

All of your Exec's will execute on every run of puppet.

I agree that using puppet to set up puppet masters is a bit too meta, and certainly out of scope for this project.
Attachment #523430 - Flags: feedback?(dustin) → feedback+
Blocks: 647374
(In reply to comment #31)
> Comment on attachment 523430 [details] [diff] [review]
>   --> https://bugzilla.mozilla.org/attachment.cgi?id=523430
> First pass at puppet manifests
> 
> +# buildmaster requires that $num_masters be set on the node prior to including
> this class
> +# $num_masters should be a count of how many master instances are on the
> +# machine, and it used to set up nagios checks appropriately
> 
> I think that there are puppet-y ways around this, although I don't know them
> now.  Please file a bug for that?

bug 647374


> All of your Exec's will execute on every run of puppet.

Hmmmm...which ones?  Most of the 'exec' calls are guarded with 'creates', which AIUI prevents the execution of the command if the file referenced by 'creates' exists.  The 'Exec' statements are in the 'requires' of other resources, and I think that means that the resource gets refreshed if and when the referenced 'exec' runs.

> I agree that using puppet to set up puppet masters is a bit too meta, and
> certainly out of scope for this project.
Ah, I missed the "creates".

As for 'require', you're thinking of 'notify' (or 'subscribe') - and that requires the use of 'refreshonly'.

http://docs.puppetlabs.com/references/latest/type.html#exec
(In reply to comment #33)
> Ah, I missed the "creates".
> 
> As for 'require', you're thinking of 'notify' (or 'subscribe') - and that
> requires the use of 'refreshonly'.
> 
> http://docs.puppetlabs.com/references/latest/type.html#exec

I think other (non-exec) resources can require an Exec resource just fine. You need 'refreshonly' on exec resources if you want them to run when their subscribe targets are updated.

Is there one Exec or exec in particular we can dissect?
Require works fine, but it *orders* the actions on the resources.  It doesn't control whether the actions occur.

Your execs are all protected by creates or subscribe, so you're all set.  Sorry that I didn't say that explicitly in comment 33.
Depends on: 642411
Adjust the names, ports, etc. of masters on buildbot-master{04,06} according to the new naming scheme.
Attachment #525595 - Flags: review?(bhearsum)
Attachment #525596 - Flags: review?(dustin)
Attachment #525596 - Flags: review?(bhearsum)
Attachment #525596 - Flags: review?(bhearsum) → review+
Attachment #525595 - Flags: review?(bhearsum) → review+
Comment on attachment 525596 [details] [diff] [review]
Fix up bm04, bm06, and change "roles" to "role"

This needs an update to dbimport.py as well (under lib/python/slavealloc/scripts/).  Looks fine to me otherwise.
Attachment #525596 - Flags: review?(dustin) → review-
Same as before, except fixing up dbimport.py
Attachment #525596 - Attachment is obsolete: true
Attachment #525675 - Flags: review?(dustin)
Comment on attachment 525595 [details] [diff] [review]
Fix up buildbot-configs for new naming

http://hg.mozilla.org/build/buildbot-configs/rev/7b6c08bc2173
Attachment #525595 - Flags: checked-in+
Blocks: 646563
Attachment #525675 - Flags: review?(dustin) → review+
Comment on attachment 525675 [details] [diff] [review]
Fix up bm04, bm06, and change "roles" to "role"

http://hg.mozilla.org/build/tools/rev/166de6b9071b
Attachment #525675 - Flags: checked-in+
Attached patch Puppet manifests (obsolete) — Splinter Review
Notable change from before:

* factored out some of the hardcoded values like names of users, groups, but then had to work around puppet's variable scope

* removed the puppet master module

* declare explicit versions of rpms to install

* manage NTP
Attachment #523430 - Attachment is obsolete: true
Attachment #526029 - Flags: review?(dustin)
Attachment #526029 - Flags: review?(bhearsum)
Comment on attachment 526029 [details] [diff] [review]
Puppet manifests

>diff --git a/modules/buildmaster/files/buildbot-start b/modules/buildmaster/files/buildbot-start

Typical practice for putting initscripts in version control is to give them an .initd extension, so buildbot.initd

>+define buildmaster::buildbot_master($basedir, $master_type, $http_port) {

I think that if you want to use buildmaster variables, you need
  include buildmaster
Then you can remove the same from the node definition.

>+        "/home/$master_user/.${master_name}_update_from_files.conf":
>+            owner => $master_user,
>+            group => $master_group,
>+            mode => 600,
>+            subscribe => Exec["setup-$basedir"],
>+            content => template("buildmaster/update_from_files.erb");
>+    }

I think that subscribe should be require there, right?

>+        "startup_file-$basedir":
>+            command => "/bin/echo $full_master_dir > /etc/default/buildbot.d/$basedir",

This will work, but it'd be a lot easier with File and its contents => ... key.

>+# TODO: you have to manage parents of $master_basedir

This is just /builds, right?  An ensure => directory couldn't hurt..

>+        "update-configs":
>+            require => Exec["clone-configs"],
>+            command => "/usr/bin/hg pull -u",
>+            user => $master_user,
>+            cwd => "$master_basedir/buildbot-configs";

This will run on every puppet invocation.  Is that what you want?

>@@ -0,0 +1,44 @@
>+class master {

This could use some explanatory comments - how is this different from the buildmaster module?

>+    package {
>+        "epel-release":
>+            source => "http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm",
>+            provider => "rpm";

Does this need to be before the other packages somehow?  I wonder if we could use puppets phases (stages?) functionality for that, rather than making every package invocation depend on Pacakge["epel-release"]?

>+    file {  
>+        "/root/.vimrc":
>+            source => "puppet:///modules/master/.vimrc";
>+        "/root/.bashrc":
>+            source => "puppet:///modules/master/.bashrc";

Are these really master-specific, or would they be better in a 'base' package - or maybe 'prefs'?

>+        "/usr/local/bin/check-for-rpm.sh":
>+            source => "puppet:///modules/master/check-for-rpm.sh",
>+            mode => 755;

This is definitely not master-specific..

>+        "/etc/ntp.conf":
>+            source => "puppet:///modules/master/ntp.conf",
>+            mode => 644,
>+            owner => root,
>+            group => root;
>+    }
>+    # TODO: Move into network:: ?
>+    service {
>+        "ntpd": 
>+            subscribe => File["/etc/ntp.conf"],
>+            enable => true,
>+            ensure => running;
>+    }

Agreed - or into an "ntp" module, if you'd prefer.

This looks great - none of the above changes are very big, so I'll say r+ assuming they're made or justified.

I don't know what you want to do with ben's r? - I'm happy to see this landed and get rolling with these masters.  What do you think?
Attachment #526029 - Flags: review?(dustin) → review+
I think I've addressed all your comments. Here are the notable changes according to the interdiff:
* Disable file buckets
* Include "buildmaster" inside buildbmaster::buildbot_master define, and remove from node defitions
* Rename buildbot-start to buildbot.initd
* /etc/default/buildbot.d/$basedir uses file { content => }
* setup-$basedir runs "hg pull -u" first; exec "update-configs" is now gone
* user "cltbld" depends on group "cltbld"
* manage /builds
* sprinkled "This file under configuration management" comments liberally
* factored out some code from masters module into packages/prefs/ntp modules
Attachment #526029 - Attachment is obsolete: true
Attachment #527591 - Flags: review?(dustin)
Attachment #526029 - Flags: review?(bhearsum)
Comment on attachment 527591 [details] [diff] [review]
Comments addressed

I like it - nice work!
Attachment #527591 - Flags: review?(dustin) → review+
The manual steps I did to get ganglia working:

add /etc/yum.repos.d/packages-mozilla.repo with the repo enabled
yum install ganglia-gmond-modules-python
create /etc/ganglia/gmond.conf (make sure to specify the right cluster and multicast address for each datacenter)
restart gmond
Attachment #527597 - Attachment mime type: application/octet-stream → text/plain
Attachment #527599 - Attachment mime type: application/octet-stream → text/plain
Attachment #527601 - Attachment mime type: application/octet-stream → text/plain
Attachment #527606 - Attachment mime type: application/octet-stream → text/plain
Note that this got backed out in bug 652540
Blocks: 653555
buildbot-master04,06 have been set up.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Here are the definitions we have for the three build ganglia servers:


node "ganglia1.build.scl1.mozilla.com" {
    ganglia-server {
        ganglia:
            gridname => "RelEng scl1",
            data_source => [
                "\"RelEngSCL1\" scl-production-puppet.build.scl1.mozilla.com slavealloc.build.scl1.mozilla.com buildbot-master1.build.scl1.mozilla.com buildbot-master2.build.scl1.mozilla.com buildbot-master04.build.scl1.mozilla.com buildbot-master5.build.scl1.mozilla.com buildbot-master06.build.scl1.mozilla.com",
                "\"RelEngSJC1\" ganglia2.build.sjc1.mozilla.com:8651",
                "\"RelEngMTV1\" ganglia3.build.mtv1.mozilla.com:8651",
                "\"Market-Post Tower\" im-ganglia01.mozilla.org:8651"];
    }
    ganglia-client {
        ganglia:
            cluster => "RelEngSCL1",
            mode => "mcast",
            addr => "239.2.11.201";
    } 
}

node "ganglia2.build.sjc1.mozilla.com" {
    ganglia-server {
        ganglia:
            gridname => "RelEng sjc1",
            data_source => [
                "\"RelEngSJC1\" cruncher.build.sjc1.mozilla.com production-master.build.sjc1.mozilla.com production-master1.build.sjc1.mozilla.com production-master2.build.sjc1.mozilla.com production-master3.build.sjc1.mozilla.com production-puppet.build.sjc1.mozilla.com talos-master.mozilla.org  dm-wwwbuild01.mozilla.org"];
    }
    ganglia-client {
        ganglia:
            cluster => "RelEngSJC1",
            mode => "mcast",
            addr => "239.2.11.202";
    }

}

node "ganglia3.build.mtv1.mozilla.com" {
    ganglia-server {
        ganglia:
            gridname => "RelEng mtv1",
            data_source => [                "\"RelEngMTV1\" foopy01.build.mtv1.mozilla.com foopy02.build.mtv1.mozilla.com remote-talos-webhost-1.build.mtv1.mozilla.com remote-talos-webhost-2.build.mtv1.mozilla.com remote-talos-webhost-3.build.mtv1.mozilla.com"];
    }
    ganglia-client {
        ganglia:
            cluster => "RelEngMTV1",
            mode => "mcast",
            addr => "239.2.11.203";
    }
}
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.