Closed Bug 617321 Opened 14 years ago Closed 14 years ago

add try buildbot master instances to buildbot-master1,2, and MV

Categories

(Release Engineering :: General, defect, P3)

x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: lsblakk)

References

Details

(Whiteboard: [buildmasters])

Attachments

(2 files, 1 obsolete file)

Right now, with only one try build master it's impossible to do a rolling upgrade without affecting wait times. Given how long some compile cycles take, we could go up to 3 hours without starting new jobs. I suggest that we add one more in MPT and one in Santa Clara. The MPT ones gives us Mac redundancy, the Santa Clara one gives us access to more fast Linux and Windows machines (that don't appear to be fully utilized in the main build pool), and location redundancy on those platforms.
Assignee: nobody → lsblakk
Priority: -- → P3
Summary: need at least one more try build master → add try buildbot master instance to buildbot-master1,2 and one to MPT build-master
Morphing summary after irc discussion with bhearsum, lsblakk. Each buildbot-master1,2 machine is currently running 3 buildbot master instances: 1 builds 2 tests We want to add another buildbot master instance to each machine, as follows: 1 builds 2 tests 1 try-builds This makes all master machines identical, treats try masters just like any other part of our production infrastructure, and once we solve bug#607179, we'll have more granular rolling upgrades for both production and try. Related, but not blocking, I'll work with zandr+mrz to get the IX machines in 650castro, and the mac builders in MPT, moved to SCL.
Summary: add try buildbot master instance to buildbot-master1,2 and one to MPT build-master → add try buildbot master instance to buildbot-master1,2
Note to myself so I don't forget over the weekend - on sc01 I have added a try_master2 and on sc02 a try_master3. Both are virtualenvs and have production-0.8 buildbot cloned and setup.py build run but not setup.py install yet because of pycrypto.org not responding. Need to try that again on Monday morning. also have cloned buildbotcustom, buildbot-configs, tools, copied in the buildbot-wrangler.py and the Makefile from builder_master with edit for the correct paths. Still need to make sure those are on the appropriate production branches, set up the master instance, update production-masters.json for managing with fabric, and then test adding try slaves to the masters. Also will need nagios updated as well as any cleanup scripts.
what are sc01 and sc02? Please don't forget to update https://intranet.mozilla.org/RelEngWiki/index.php/Masters and catlee's masters.json.
Comment on attachment 502836 [details] [diff] [review] new config files for try masters, and updated setup-master.py Looks OK to me.
Attachment #502836 - Flags: review?(bhearsum) → review+
Comment on attachment 502836 [details] [diff] [review] new config files for try masters, and updated setup-master.py http://hg.mozilla.org/build/buildbot-configs/rev/a389a93d5019 landed on default, will be merged to production tomorrow.
Attachment #502836 - Flags: checked-in+
I set the nagios checks back to -C 3:3 on both master boxes, since they were squawking and the try masters were not running.
Masters are running - have installed mozillapulse, MySQL-python, updated nrpe.cfg to 4:4 and restarted the service. Next step: add some builders.
Dustin mentions that these masters need to be added to statusdb - will check on this with Catlee in the morning. Also - which slaves should be pointed to these masters?
Check out /etc/cron.d/*master* for exceptions, master cleanup, and statusdb dumping.
Summary: add try buildbot master instance to buildbot-master1,2 → add try buildbot master instances to buildbot-master1,2, and MV
I'm going to add a master instance to the MTV location as well so that the try slaves in MV can connect to it, and also to re-purpose test-master02 for actual use.
Depends on: 627803
that'll teach me to not run test-masters myself first, missed the list of names in setup-masters
Attachment #505901 - Attachment is obsolete: true
Attachment #505902 - Flags: review?(bhearsum)
Attachment #505901 - Flags: review?(bhearsum)
Comment on attachment 505902 [details] [diff] [review] adds config for try_master1 on buildbot-master3, removes tm02 config from mozilla-tests We're not zero-padding the new Buildbot masters, so you'll need to adjust the URL in the config with that in mind. Looks fine otherwise. r=me with that changed.
Attachment #505902 - Flags: review?(bhearsum) → review+
Comment on attachment 505902 [details] [diff] [review] adds config for try_master1 on buildbot-master3, removes tm02 config from mozilla-tests thanks for catching my mistake, the master itself doesn't have a zero padded hostname so I was aware of the new naming. committed to default branch http://hg.mozilla.org/build/buildbot-configs/rev/279deb46c95f
Attachment #505902 - Flags: checked-in+
Flags: needs-reconfig?
try_master1 is now up and running on test-master02.build.mozilla.org (waiting to become buildbot-master3.build.mozilla.org) I've updated cron.d, nagios, the Masters list, the production-masters.json (http://people.mozilla.org/~lsblakk/production-masters.json) and have moved some MV mac slaves over to this master: try-mac-slave{20-26,29}, as well as linux-ix-slave08
I took try-mac-slave29 offline for now since it kept grabbing leak builds and failing on setting basedir.
err.html: <type 'exceptions.AttributeError'>: LogFileScanner instance has no attribute '_remainingData'
That's an issue caused by Twisted 10.2. You'll want to install Twisted 10.1 manually into the Buildbot virtualenv to fix it.
Twisted 10.1 installed into the virtualenv on all three.
RE: moving MV slaves to MV try master linux-ix-slave06 is having difficulty and the work on it is tracked in bug 624210 where I made a note to send it over to the new master when it's fixed linux-ix-08 is in fact an scl machine, and so has been pointed to try_master2 Still need to move to try_master1: linux-ix-slave{07,09,10,11} mv-moz2-linux-ix-slave{22,23} try-mac-slave27 try-mac-slave28 -- which is down and tracked in bug 620948
(In reply to comment #21) Edited the buildbot.tac for linux-ix-slave{06,07,09,10,11}, mv-moz2-linux-ix-slave23, try-mac-slave27 so that on their next reboot they will come up on the new MV master.
mv-moz2-linux-ix-slave22 is also down and tracked in bug 620948
Blocks: 628722
Masters are completed, so this bug is done - bug 628722 has been filed to track adding builders across the masters.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Depends on: 629546
Flags: needs-reconfig?
Depends on: 641782
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: