Closed
Bug 617321
Opened 14 years ago
Closed 14 years ago
add try buildbot master instances to buildbot-master1,2, and MV
Categories
(Release Engineering :: General, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: lsblakk)
References
Details
(Whiteboard: [buildmasters])
Attachments
(2 files, 1 obsolete file)
|
4.40 KB,
patch
|
bhearsum
:
review+
lsblakk
:
checked-in+
|
Details | Diff | Splinter Review |
|
5.94 KB,
patch
|
bhearsum
:
review+
lsblakk
:
checked-in+
|
Details | Diff | Splinter Review |
Right now, with only one try build master it's impossible to do a rolling upgrade without affecting wait times. Given how long some compile cycles take, we could go up to 3 hours without starting new jobs.
I suggest that we add one more in MPT and one in Santa Clara. The MPT ones gives us Mac redundancy, the Santa Clara one gives us access to more fast Linux and Windows machines (that don't appear to be fully utilized in the main build pool), and location redundancy on those platforms.
| Assignee | ||
Updated•14 years ago
|
Assignee: nobody → lsblakk
Priority: -- → P3
| Assignee | ||
Updated•14 years ago
|
Summary: need at least one more try build master → add try buildbot master instance to buildbot-master1,2 and one to MPT build-master
Comment 1•14 years ago
|
||
Morphing summary after irc discussion with bhearsum, lsblakk.
Each buildbot-master1,2 machine is currently running 3 buildbot master instances:
1 builds
2 tests
We want to add another buildbot master instance to each machine, as follows:
1 builds
2 tests
1 try-builds
This makes all master machines identical, treats try masters just like any other part of our production infrastructure, and once we solve bug#607179, we'll have more granular rolling upgrades for both production and try.
Related, but not blocking, I'll work with zandr+mrz to get the IX machines in 650castro, and the mac builders in MPT, moved to SCL.
Summary: add try buildbot master instance to buildbot-master1,2 and one to MPT build-master → add try buildbot master instance to buildbot-master1,2
| Assignee | ||
Comment 2•14 years ago
|
||
Note to myself so I don't forget over the weekend - on sc01 I have added a try_master2 and on sc02 a try_master3.
Both are virtualenvs and have production-0.8 buildbot cloned and setup.py build run but not setup.py install yet because of pycrypto.org not responding. Need to try that again on Monday morning.
also have cloned buildbotcustom, buildbot-configs, tools, copied in the buildbot-wrangler.py and the Makefile from builder_master with edit for the correct paths.
Still need to make sure those are on the appropriate production branches, set up the master instance, update production-masters.json for managing with fabric, and then test adding try slaves to the masters. Also will need nagios updated as well as any cleanup scripts.
Comment 3•14 years ago
|
||
what are sc01 and sc02?
Please don't forget to update
https://intranet.mozilla.org/RelEngWiki/index.php/Masters
and catlee's masters.json.
| Assignee | ||
Comment 4•14 years ago
|
||
Attachment #502836 -
Flags: review?(bhearsum)
| Reporter | ||
Comment 5•14 years ago
|
||
Comment on attachment 502836 [details] [diff] [review]
new config files for try masters, and updated setup-master.py
Looks OK to me.
Attachment #502836 -
Flags: review?(bhearsum) → review+
| Assignee | ||
Comment 6•14 years ago
|
||
Comment on attachment 502836 [details] [diff] [review]
new config files for try masters, and updated setup-master.py
http://hg.mozilla.org/build/buildbot-configs/rev/a389a93d5019 landed on default, will be merged to production tomorrow.
Attachment #502836 -
Flags: checked-in+
Comment 7•14 years ago
|
||
I set the nagios checks back to -C 3:3 on both master boxes, since they were squawking and the try masters were not running.
| Assignee | ||
Comment 8•14 years ago
|
||
Masters are running - have installed mozillapulse, MySQL-python, updated nrpe.cfg to 4:4 and restarted the service.
Next step: add some builders.
| Assignee | ||
Comment 9•14 years ago
|
||
Dustin mentions that these masters need to be added to statusdb - will check on this with Catlee in the morning.
Also - which slaves should be pointed to these masters?
Comment 10•14 years ago
|
||
Check out /etc/cron.d/*master* for exceptions, master cleanup, and statusdb dumping.
| Assignee | ||
Updated•14 years ago
|
Summary: add try buildbot master instance to buildbot-master1,2 → add try buildbot master instances to buildbot-master1,2, and MV
| Assignee | ||
Comment 11•14 years ago
|
||
I'm going to add a master instance to the MTV location as well so that the try slaves in MV can connect to it, and also to re-purpose test-master02 for actual use.
| Assignee | ||
Comment 12•14 years ago
|
||
Attachment #505901 -
Flags: review?(bhearsum)
| Assignee | ||
Comment 13•14 years ago
|
||
that'll teach me to not run test-masters myself first, missed the list of names in setup-masters
Attachment #505901 -
Attachment is obsolete: true
Attachment #505902 -
Flags: review?(bhearsum)
Attachment #505901 -
Flags: review?(bhearsum)
| Reporter | ||
Comment 14•14 years ago
|
||
Comment on attachment 505902 [details] [diff] [review]
adds config for try_master1 on buildbot-master3, removes tm02 config from mozilla-tests
We're not zero-padding the new Buildbot masters, so you'll need to adjust the URL in the config with that in mind. Looks fine otherwise. r=me with that changed.
Attachment #505902 -
Flags: review?(bhearsum) → review+
| Assignee | ||
Comment 15•14 years ago
|
||
Comment on attachment 505902 [details] [diff] [review]
adds config for try_master1 on buildbot-master3, removes tm02 config from mozilla-tests
thanks for catching my mistake, the master itself doesn't have a zero padded hostname so I was aware of the new naming. committed to default branch http://hg.mozilla.org/build/buildbot-configs/rev/279deb46c95f
Attachment #505902 -
Flags: checked-in+
| Assignee | ||
Updated•14 years ago
|
Flags: needs-reconfig?
| Assignee | ||
Comment 16•14 years ago
|
||
try_master1 is now up and running on test-master02.build.mozilla.org (waiting to become buildbot-master3.build.mozilla.org)
I've updated cron.d, nagios, the Masters list, the production-masters.json (http://people.mozilla.org/~lsblakk/production-masters.json) and have moved some MV mac slaves over to this master:
try-mac-slave{20-26,29}, as well as linux-ix-slave08
| Assignee | ||
Comment 17•14 years ago
|
||
I took try-mac-slave29 offline for now since it kept grabbing leak builds and failing on setting basedir.
| Assignee | ||
Comment 18•14 years ago
|
||
err.html:
<type 'exceptions.AttributeError'>: LogFileScanner instance has no attribute '_remainingData'
| Reporter | ||
Comment 19•14 years ago
|
||
That's an issue caused by Twisted 10.2. You'll want to install Twisted 10.1 manually into the Buildbot virtualenv to fix it.
| Assignee | ||
Comment 20•14 years ago
|
||
Twisted 10.1 installed into the virtualenv on all three.
| Assignee | ||
Comment 21•14 years ago
|
||
RE: moving MV slaves to MV try master
linux-ix-slave06 is having difficulty and the work on it is tracked in bug 624210 where I made a note to send it over to the new master when it's fixed
linux-ix-08 is in fact an scl machine, and so has been pointed to try_master2
Still need to move to try_master1:
linux-ix-slave{07,09,10,11}
mv-moz2-linux-ix-slave{22,23}
try-mac-slave27
try-mac-slave28 -- which is down and tracked in bug 620948
| Assignee | ||
Comment 22•14 years ago
|
||
(In reply to comment #21)
Edited the buildbot.tac for linux-ix-slave{06,07,09,10,11}, mv-moz2-linux-ix-slave23, try-mac-slave27 so that on their next reboot they will come up on the new MV master.
| Assignee | ||
Comment 23•14 years ago
|
||
mv-moz2-linux-ix-slave22 is also down and tracked in bug 620948
| Assignee | ||
Comment 24•14 years ago
|
||
Masters are completed, so this bug is done - bug 628722 has been filed to track adding builders across the masters.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
| Assignee | ||
Updated•14 years ago
|
Flags: needs-reconfig?
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•