Closed
Bug 607179
Opened 14 years ago
Closed 13 years ago
create new buildbot-master{4,6}.build.m.o on VMs using puppet
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: joduinn, Assigned: catlee)
References
Details
(Whiteboard: [buildduty])
Attachments
(12 files, 7 obsolete files)
15.09 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
5.67 KB,
patch
|
catlee
:
review+
rail
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
8.06 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
3.48 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
1.46 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
8.45 KB,
patch
|
bhearsum
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
15.29 KB,
patch
|
dustin
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
26.82 KB,
patch
|
dustin
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
5.33 KB,
text/plain
|
Details | |
8.07 KB,
text/plain
|
Details | |
1.92 KB,
text/plain
|
Details | |
1.62 KB,
text/plain
|
Details |
Each of these masters should be running 3 buildbot master processes; one for build jobs (8010) two for test jobs (8011, 8012). These should be identical to buildbot-master1,2 and have same nagios, munin, etc, settings.
Comment 1•14 years ago
|
||
Are these supposed to be based in SCL, MPT, Castro, or a combination?
Reporter | ||
Comment 2•14 years ago
|
||
(In reply to comment #1) > Are these supposed to be based in SCL, MPT, Castro, or a combination? For now, lets do these in SCL, alongside existing buildbot-master1,2. My thinking was to: - get these 5 pods all up and running, in identical manner - allocate slaves across the masters - do rolling restarts of these production pods without need a downtime ...then we can have a later project to move a few masters and slaves out from SCL to other colos.
Updated•13 years ago
|
Summary: create new buildbot-master{03,04,05}.build.m.o → create new buildbot-master{4,5,6}.build.m.o
Comment 3•13 years ago
|
||
buildbot-master4 is having hardware issues buildbot-master6 was done last week and buildbot-master5 was just enabled today. closing as fixed as soon as I can get a confirmation that the firewall holes are all punced
Comment 4•13 years ago
|
||
http://hg.mozilla.org/build/tools/rev/64585307d058 to fix production-masters.json, which buildapi requires. We need to make these updates, land them, update buildpai on cruncher before jobs start getting done on a master.
Comment 5•13 years ago
|
||
Followup - http://hg.mozilla.org/build/tools/rev/e3046037e323 FQDN is buildbot-master5.build.mozilla.org, while we have buildbot-master1.build.scl1.mozilla.com.
Updated•13 years ago
|
Whiteboard: [buildduty]
Comment 6•13 years ago
|
||
This patch creates all the new master configs files we need for buildbot-master4, 5, and 6. It also removes the old, now unused, tests_master3 on bm01. We still have a bunch of work to do setting up the prerequisites on the machines, but I think we can land this at any time.
Attachment #519439 -
Flags: review?(catlee)
Comment 7•13 years ago
|
||
Catlee suggested doing something that would help us set up a new master...so here it is! This should set-up the entire master directory as well as crontabs and a template mysql config file. That file, as well as passwords.py and BuildSlaves.py need to be updated manually afterwards. Not sure how to do that better. There's still other manual things that need to be done, but those are things like filing IT bugs, updating wikis, etc. Haven't tested this on an actual master yet, but it worked locally. I'm not super happy with pip downloading things from the internet, or using 'tip' of MozillaPulse, but not sure how to make either of those things better.
Attachment #519491 -
Flags: feedback?(rail)
Attachment #519491 -
Flags: feedback?(catlee)
Comment 8•13 years ago
|
||
Using https://wiki.mozilla.org/User:Bhearsum/Master_Setup as a staging area for a new Master Setup doc that uses this Makefile.
Comment 9•13 years ago
|
||
Awesome! But won't this need to be run as root to write to /etc/cron.d, which will muck up the permissions on everything else ?
Comment 10•13 years ago
|
||
(In reply to comment #9) > Awesome! But won't this need to be run as root to write to /etc/cron.d, which > will muck up the permissions on everything else ? ....yes :). How about running everything except "cron" as part of all, and then prompting the user to run the "cron" target as root?
Comment 11•13 years ago
|
||
Yeah, or maybe 'su - root /path/to/Makefile cron' and the user just needs to input the root p/w ? Need to set some envvars too ?
Comment 12•13 years ago
|
||
Comment on attachment 519491 [details] [diff] [review] makefile to set-up a new master I'd add cron entries to a temporary file, then just su -c 'cp tempfile /etc/cron.d/$(MASTER_NAME). BTW, if you add these templates to build/tools, I will use them for automatic master setup in pre-production.
Attachment #519491 -
Flags: feedback?(rail) → feedback+
Assignee | ||
Comment 13•13 years ago
|
||
Comment on attachment 519491 [details] [diff] [review] makefile to set-up a new master Nice! I kind of prefer having .pth files in the virtualenv to point to tools and buildbotcustom rather than messing with PYTHONPATH in the Makefile. It makes it easier to do things on the command line. I think we should refer to an explicit version of pulse here too. Does putting the url in master-pip.txt work?
Attachment #519491 -
Flags: feedback?(catlee) → feedback+
Comment 14•13 years ago
|
||
(In reply to comment #13) > Comment on attachment 519491 [details] [diff] [review] > makefile to set-up a new master > > Nice! > > I kind of prefer having .pth files in the virtualenv to point to tools and > buildbotcustom rather than messing with PYTHONPATH in the Makefile. It makes it > easier to do things on the command line. Hm, I didn't know you could do that. I'll have a look. > I think we should refer to an explicit version of pulse here too. Does putting > the url in master-pip.txt work? Pulse has no versions, but we could point at a specific changeset, I think. Any idea which we should use? (In reply to comment #11) > Yeah, or maybe 'su - root /path/to/Makefile cron' and the user just needs to > input the root p/w ? Need to set some envvars too ? Yeah, we'll need to set MASTER_NAME and maybe BASEDIR, but I think that's it... (In reply to comment #12) > BTW, if you add these templates to build/tools, I will use them for automatic > master setup in pre-production. I was actually hoping to avoid keeping static templates at all, to avoid the maintenance hassle. Could preproduction use the Makefile target instead? If we need to, eg, make the crontab file overridable, I can do that.
Assignee | ||
Comment 15•13 years ago
|
||
> > I think we should refer to an explicit version of pulse here too. Does putting
> > the url in master-pip.txt work?
>
> Pulse has no versions, but we could point at a specific changeset, I think. Any
> idea which we should use?
ad95569a089e (current tip) is what we have on the masters right now I believe.
Comment 16•13 years ago
|
||
Attachment #519491 -
Attachment is obsolete: true
Attachment #519671 -
Flags: review?(rail)
Attachment #519671 -
Flags: review?(catlee)
Comment 17•13 years ago
|
||
Comment on attachment 519439 [details] [diff] [review] create config files for new masters Needs further updates.
Attachment #519439 -
Attachment is obsolete: true
Attachment #519439 -
Flags: review?(catlee)
Comment 18•13 years ago
|
||
Attachment #519694 -
Flags: review?(catlee)
Comment 19•13 years ago
|
||
Attachment #519671 -
Attachment is obsolete: true
Attachment #519695 -
Flags: review?(rail)
Attachment #519695 -
Flags: review?(catlee)
Attachment #519671 -
Flags: review?(rail)
Attachment #519671 -
Flags: review?(catlee)
Comment 20•13 years ago
|
||
random note from a meeting: - we should feed setup-master.py production-masters.json, and move whatever master info is in it, there.
Comment 21•13 years ago
|
||
(In reply to comment #20) > random note from a meeting: > - we should feed setup-master.py production-masters.json, and move whatever > master info is in it, there. Filed this as bug 642411
Assignee | ||
Updated•13 years ago
|
Attachment #519694 -
Flags: review?(catlee) → review+
Assignee | ||
Updated•13 years ago
|
Attachment #519695 -
Flags: review?(catlee) → review+
Comment 22•13 years ago
|
||
Not sure if this should land before the masters are active or not.
Attachment #519906 -
Flags: review?(catlee)
Updated•13 years ago
|
Attachment #519695 -
Flags: review?(rail) → review+
Updated•13 years ago
|
Attachment #519695 -
Flags: checked-in+
Updated•13 years ago
|
Attachment #519694 -
Flags: checked-in+
Comment 23•13 years ago
|
||
Not urgent, but this is useful on some masters, as it turns out.
Attachment #519954 -
Flags: review?(catlee)
Comment 24•13 years ago
|
||
Comment on attachment 519954 [details] [diff] [review] make hg overridable Going to need a couple more fixes, actually.
Attachment #519954 -
Attachment is obsolete: true
Attachment #519954 -
Flags: review?(catlee)
Comment 25•13 years ago
|
||
Attachment #519959 -
Flags: review?(catlee)
Assignee | ||
Updated•13 years ago
|
Attachment #519906 -
Flags: review?(catlee) → review+
Assignee | ||
Updated•13 years ago
|
Attachment #519959 -
Flags: review?(catlee) → review+
Comment 26•13 years ago
|
||
Just brought up the new build master on buildbot-master5, going to move a slave or two to it soon.
Comment 27•13 years ago
|
||
Comment on attachment 519959 [details] [diff] [review] create crontab as cltbld, make hg overridable, fix syntax errors in crontab Landed this on default
Attachment #519959 -
Flags: checked-in+
Updated•13 years ago
|
Attachment #519906 -
Flags: checked-in+
Comment 28•13 years ago
|
||
Current status: All masters on buildbot-master5 are now set-up, few slaves are on them yet. buildbot-master4 and 6 are waiting until we figure out what we're doing in terms of Puppet + new masters, since they're both a bare OS.
Comment 29•13 years ago
|
||
I noticed that the cronjobs weren't firing properly, because they were chmoded to 664. I also noticed that MozillaPulse depends on something that "pip freeze" didn't catch, added it to the requirements file.
Attachment #520187 -
Flags: review?(catlee)
Assignee | ||
Updated•13 years ago
|
Attachment #520187 -
Flags: review?(catlee) → review+
Updated•13 years ago
|
Attachment #520187 -
Flags: checked-in+
Assignee | ||
Updated•13 years ago
|
Summary: create new buildbot-master{4,5,6}.build.m.o → create new buildbot-master{4,6}.build.m.o on VMs using puppet
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → catlee
Priority: P4 → P2
Assignee | ||
Comment 30•13 years ago
|
||
masters on buildbot-master4 are up and running (hacked to be a staging masters) with these manifests using puppet to manage puppet is maybe a little crazy, lots of bootstrapping issues there, I can rip it out if wanted. I figure we need to encode setting up a puppet master somewhere.
Attachment #523430 -
Flags: feedback?(dustin)
Attachment #523430 -
Flags: feedback?(bhearsum)
Updated•13 years ago
|
Attachment #523430 -
Flags: feedback?(bhearsum) → feedback+
Comment 31•13 years ago
|
||
Comment on attachment 523430 [details] [diff] [review] First pass at puppet manifests +# buildmaster requires that $num_masters be set on the node prior to including this class +# $num_masters should be a count of how many master instances are on the +# machine, and it used to set up nagios checks appropriately I think that there are puppet-y ways around this, although I don't know them now. Please file a bug for that? All of your Exec's will execute on every run of puppet. I agree that using puppet to set up puppet masters is a bit too meta, and certainly out of scope for this project.
Attachment #523430 -
Flags: feedback?(dustin) → feedback+
Assignee | ||
Comment 32•13 years ago
|
||
(In reply to comment #31) > Comment on attachment 523430 [details] [diff] [review] > --> https://bugzilla.mozilla.org/attachment.cgi?id=523430 > First pass at puppet manifests > > +# buildmaster requires that $num_masters be set on the node prior to including > this class > +# $num_masters should be a count of how many master instances are on the > +# machine, and it used to set up nagios checks appropriately > > I think that there are puppet-y ways around this, although I don't know them > now. Please file a bug for that? bug 647374 > All of your Exec's will execute on every run of puppet. Hmmmm...which ones? Most of the 'exec' calls are guarded with 'creates', which AIUI prevents the execution of the command if the file referenced by 'creates' exists. The 'Exec' statements are in the 'requires' of other resources, and I think that means that the resource gets refreshed if and when the referenced 'exec' runs. > I agree that using puppet to set up puppet masters is a bit too meta, and > certainly out of scope for this project.
Comment 33•13 years ago
|
||
Ah, I missed the "creates". As for 'require', you're thinking of 'notify' (or 'subscribe') - and that requires the use of 'refreshonly'. http://docs.puppetlabs.com/references/latest/type.html#exec
Assignee | ||
Comment 34•13 years ago
|
||
(In reply to comment #33) > Ah, I missed the "creates". > > As for 'require', you're thinking of 'notify' (or 'subscribe') - and that > requires the use of 'refreshonly'. > > http://docs.puppetlabs.com/references/latest/type.html#exec I think other (non-exec) resources can require an Exec resource just fine. You need 'refreshonly' on exec resources if you want them to run when their subscribe targets are updated. Is there one Exec or exec in particular we can dissect?
Comment 35•13 years ago
|
||
Require works fine, but it *orders* the actions on the resources. It doesn't control whether the actions occur. Your execs are all protected by creates or subscribe, so you're all set. Sorry that I didn't say that explicitly in comment 33.
Assignee | ||
Comment 36•13 years ago
|
||
Adjust the names, ports, etc. of masters on buildbot-master{04,06} according to the new naming scheme.
Attachment #525595 -
Flags: review?(bhearsum)
Assignee | ||
Comment 37•13 years ago
|
||
Attachment #525596 -
Flags: review?(dustin)
Attachment #525596 -
Flags: review?(bhearsum)
Updated•13 years ago
|
Attachment #525596 -
Flags: review?(bhearsum) → review+
Updated•13 years ago
|
Attachment #525595 -
Flags: review?(bhearsum) → review+
Comment 38•13 years ago
|
||
Comment on attachment 525596 [details] [diff] [review] Fix up bm04, bm06, and change "roles" to "role" This needs an update to dbimport.py as well (under lib/python/slavealloc/scripts/). Looks fine to me otherwise.
Attachment #525596 -
Flags: review?(dustin) → review-
Assignee | ||
Comment 39•13 years ago
|
||
Same as before, except fixing up dbimport.py
Attachment #525596 -
Attachment is obsolete: true
Attachment #525675 -
Flags: review?(dustin)
Assignee | ||
Comment 40•13 years ago
|
||
Comment on attachment 525595 [details] [diff] [review] Fix up buildbot-configs for new naming http://hg.mozilla.org/build/buildbot-configs/rev/7b6c08bc2173
Attachment #525595 -
Flags: checked-in+
Updated•13 years ago
|
Attachment #525675 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 41•13 years ago
|
||
Comment on attachment 525675 [details] [diff] [review] Fix up bm04, bm06, and change "roles" to "role" http://hg.mozilla.org/build/tools/rev/166de6b9071b
Attachment #525675 -
Flags: checked-in+
Assignee | ||
Comment 42•13 years ago
|
||
Notable change from before: * factored out some of the hardcoded values like names of users, groups, but then had to work around puppet's variable scope * removed the puppet master module * declare explicit versions of rpms to install * manage NTP
Attachment #523430 -
Attachment is obsolete: true
Attachment #526029 -
Flags: review?(dustin)
Attachment #526029 -
Flags: review?(bhearsum)
Comment 43•13 years ago
|
||
Comment on attachment 526029 [details] [diff] [review] Puppet manifests >diff --git a/modules/buildmaster/files/buildbot-start b/modules/buildmaster/files/buildbot-start Typical practice for putting initscripts in version control is to give them an .initd extension, so buildbot.initd >+define buildmaster::buildbot_master($basedir, $master_type, $http_port) { I think that if you want to use buildmaster variables, you need include buildmaster Then you can remove the same from the node definition. >+ "/home/$master_user/.${master_name}_update_from_files.conf": >+ owner => $master_user, >+ group => $master_group, >+ mode => 600, >+ subscribe => Exec["setup-$basedir"], >+ content => template("buildmaster/update_from_files.erb"); >+ } I think that subscribe should be require there, right? >+ "startup_file-$basedir": >+ command => "/bin/echo $full_master_dir > /etc/default/buildbot.d/$basedir", This will work, but it'd be a lot easier with File and its contents => ... key. >+# TODO: you have to manage parents of $master_basedir This is just /builds, right? An ensure => directory couldn't hurt.. >+ "update-configs": >+ require => Exec["clone-configs"], >+ command => "/usr/bin/hg pull -u", >+ user => $master_user, >+ cwd => "$master_basedir/buildbot-configs"; This will run on every puppet invocation. Is that what you want? >@@ -0,0 +1,44 @@ >+class master { This could use some explanatory comments - how is this different from the buildmaster module? >+ package { >+ "epel-release": >+ source => "http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm", >+ provider => "rpm"; Does this need to be before the other packages somehow? I wonder if we could use puppets phases (stages?) functionality for that, rather than making every package invocation depend on Pacakge["epel-release"]? >+ file { >+ "/root/.vimrc": >+ source => "puppet:///modules/master/.vimrc"; >+ "/root/.bashrc": >+ source => "puppet:///modules/master/.bashrc"; Are these really master-specific, or would they be better in a 'base' package - or maybe 'prefs'? >+ "/usr/local/bin/check-for-rpm.sh": >+ source => "puppet:///modules/master/check-for-rpm.sh", >+ mode => 755; This is definitely not master-specific.. >+ "/etc/ntp.conf": >+ source => "puppet:///modules/master/ntp.conf", >+ mode => 644, >+ owner => root, >+ group => root; >+ } >+ # TODO: Move into network:: ? >+ service { >+ "ntpd": >+ subscribe => File["/etc/ntp.conf"], >+ enable => true, >+ ensure => running; >+ } Agreed - or into an "ntp" module, if you'd prefer. This looks great - none of the above changes are very big, so I'll say r+ assuming they're made or justified. I don't know what you want to do with ben's r? - I'm happy to see this landed and get rolling with these masters. What do you think?
Attachment #526029 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 44•13 years ago
|
||
I think I've addressed all your comments. Here are the notable changes according to the interdiff: * Disable file buckets * Include "buildmaster" inside buildbmaster::buildbot_master define, and remove from node defitions * Rename buildbot-start to buildbot.initd * /etc/default/buildbot.d/$basedir uses file { content => } * setup-$basedir runs "hg pull -u" first; exec "update-configs" is now gone * user "cltbld" depends on group "cltbld" * manage /builds * sprinkled "This file under configuration management" comments liberally * factored out some code from masters module into packages/prefs/ntp modules
Attachment #526029 -
Attachment is obsolete: true
Attachment #527591 -
Flags: review?(dustin)
Attachment #526029 -
Flags: review?(bhearsum)
Comment 45•13 years ago
|
||
Comment 46•13 years ago
|
||
Comment 47•13 years ago
|
||
Comment 48•13 years ago
|
||
Comment on attachment 527591 [details] [diff] [review] Comments addressed I like it - nice work!
Attachment #527591 -
Flags: review?(dustin) → review+
Comment 49•13 years ago
|
||
Comment 50•13 years ago
|
||
The manual steps I did to get ganglia working: add /etc/yum.repos.d/packages-mozilla.repo with the repo enabled yum install ganglia-gmond-modules-python create /etc/ganglia/gmond.conf (make sure to specify the right cluster and multicast address for each datacenter) restart gmond
Assignee | ||
Updated•13 years ago
|
Attachment #527597 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Updated•13 years ago
|
Attachment #527599 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Updated•13 years ago
|
Attachment #527601 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Updated•13 years ago
|
Attachment #527606 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Comment 51•13 years ago
|
||
Comment on attachment 527591 [details] [diff] [review] Comments addressed http://hg.mozilla.org/build/puppet-manifests/rev/d9bd97684a00
Attachment #527591 -
Flags: checked-in+
Comment 52•13 years ago
|
||
Note that this got backed out in bug 652540
Assignee | ||
Comment 53•13 years ago
|
||
buildbot-master04,06 have been set up.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 54•13 years ago
|
||
Here are the definitions we have for the three build ganglia servers: node "ganglia1.build.scl1.mozilla.com" { ganglia-server { ganglia: gridname => "RelEng scl1", data_source => [ "\"RelEngSCL1\" scl-production-puppet.build.scl1.mozilla.com slavealloc.build.scl1.mozilla.com buildbot-master1.build.scl1.mozilla.com buildbot-master2.build.scl1.mozilla.com buildbot-master04.build.scl1.mozilla.com buildbot-master5.build.scl1.mozilla.com buildbot-master06.build.scl1.mozilla.com", "\"RelEngSJC1\" ganglia2.build.sjc1.mozilla.com:8651", "\"RelEngMTV1\" ganglia3.build.mtv1.mozilla.com:8651", "\"Market-Post Tower\" im-ganglia01.mozilla.org:8651"]; } ganglia-client { ganglia: cluster => "RelEngSCL1", mode => "mcast", addr => "239.2.11.201"; } } node "ganglia2.build.sjc1.mozilla.com" { ganglia-server { ganglia: gridname => "RelEng sjc1", data_source => [ "\"RelEngSJC1\" cruncher.build.sjc1.mozilla.com production-master.build.sjc1.mozilla.com production-master1.build.sjc1.mozilla.com production-master2.build.sjc1.mozilla.com production-master3.build.sjc1.mozilla.com production-puppet.build.sjc1.mozilla.com talos-master.mozilla.org dm-wwwbuild01.mozilla.org"]; } ganglia-client { ganglia: cluster => "RelEngSJC1", mode => "mcast", addr => "239.2.11.202"; } } node "ganglia3.build.mtv1.mozilla.com" { ganglia-server { ganglia: gridname => "RelEng mtv1", data_source => [ "\"RelEngMTV1\" foopy01.build.mtv1.mozilla.com foopy02.build.mtv1.mozilla.com remote-talos-webhost-1.build.mtv1.mozilla.com remote-talos-webhost-2.build.mtv1.mozilla.com remote-talos-webhost-3.build.mtv1.mozilla.com"]; } ganglia-client { ganglia: cluster => "RelEngMTV1", mode => "mcast", addr => "239.2.11.203"; } }
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•