Closed
Bug 616003
Opened 14 years ago
Closed 14 years ago
slave-side slave-alloc support for linux build & test
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Assigned: dustin)
References
Details
Attachments
(3 files, 6 obsolete files)
7.44 KB,
patch
|
dustin
:
review+
dustin
:
checked-in+
|
Details | Diff | Splinter Review |
9.75 KB,
patch
|
bhearsum
:
review+
dustin
:
checked-in+
|
Details | Diff | Splinter Review |
630 bytes,
patch
|
bhearsum
:
review+
dustin
:
checked-in+
|
Details | Diff | Splinter Review |
This puts slave-side support for the slave allocation system in place on linux hosts. It has no immediate behavior change, although adjusting buildbot.tac will require adjusting files in a central location, rather than on the slaves.
1. Collect the buildbot.tac's for all linux slaves and place them in a build-net-accessible directory (fabric?!)
2. land runslave.py and (/etc/init.d/)buildbot in build/tools/buildbot-helpers/startup, and add to puppet-files. Files will have appropriate header comments to describe their version control and deployment
3. adjust linux puppet manifests to distribute these files to linux slaves, and to delete /etc/init.d/buildbot-tac
4. remove build-tools.rpm (bug 615301)
Assignee | ||
Comment 1•14 years ago
|
||
Here are the corresponding changes to buildbot-files. All of the new/changed files are in the build/tools patch I'll push momentarily.
::remove unused $platform/etc/init.d/buildbot.tac
::remove unused $platform/etc/default/buildbot
::change $platform/etc/init.d/buildbot (centos)
/N/staging/centos5-x86_64/build/etc/init.d/buildbot
/N/staging/centos5-i686/build/etc/init.d/buildbot
::change fedora's run-puppet-and-buildbot.sh
/N/staging/fedora12-x86_64/test/home/cltbld/run-puppet-and-buildbot.sh
/N/staging/fedora12-i686/test/home/cltbld/run-puppet-and-buildbot.sh
::add runslave.py
add centos5-i686/user/local/bin/runslave.py
add centos5-x86_64/user/local/bin/runslave.py
add fedora12-i686/test/user/local/bin/runslave.py
add fedora12-x86_64/test/user/local/bin/runslave.py
This has been staged successfully, and slaves start up and download the .tac from my people page. I've even seen runslave.py successfully use the existing buildbot.tac after a failed download.
Attachment #494866 -
Flags: review?(bhearsum)
Assignee | ||
Comment 2•14 years ago
|
||
Let's keep all of these files in tools/buildbot-helpers/startup!
If they exist elsewhere in version control, I haven't found them - please point me there.
Attachment #494867 -
Flags: review?(bhearsum)
Attachment #494867 -
Flags: feedback?(catlee)
Assignee | ||
Comment 3•14 years ago
|
||
Attachment #494867 -
Attachment is obsolete: true
Attachment #494870 -
Flags: review?
Attachment #494867 -
Flags: review?(bhearsum)
Attachment #494867 -
Flags: feedback?(catlee)
Assignee | ||
Comment 4•14 years ago
|
||
Comment on attachment 494870 [details] [diff] [review]
tools-r2.patch
just like -r1, but with some extra MOZILLA DEPLOYMENT NOTES comments.
Attachment #494870 -
Flags: review?(bhearsum)
Attachment #494870 -
Flags: review?
Attachment #494870 -
Flags: feedback?(catlee)
Assignee | ||
Comment 5•14 years ago
|
||
When these are reviewed, I think it will be fine to throw them onto puppet with no further ado - most living slaves will check http://people.mozilla.com/~dmitchell/allocator/SLAVE, not find a .tac, and keep using the existing .tac -- just like they've been doing since they were set up.
We'll lose the ability to disable slaves by moving buildbot.tac to buildbot.off, as runslave.py will just create a fresh new buildbot.tac. But we can fix this by creating DO_NOT_START in the basedir.
It might be nice to finish bug 616344 first, so we don't need to redeploy runslave.py, but honestly it's not that hard to change the URL later.
Comment 6•14 years ago
|
||
A file called DO_NOT_START?
Comment 7•14 years ago
|
||
Comment on attachment 494870 [details] [diff] [review]
tools-r2.patch
>+# Required-Stop: $local_fs
>+PATH=/sbin:/bin:/usr/sbin:/usr/bin
>+DESC="Buildbot"
>+
>+BUILDSLAVE_CMD=/tools/buildbot/bin/buildbot # will change to 'buildslave' in 0.8.1
>+LOCKFILE=/var/lock/subsys/buildbot
>+RUNSLAVE=/builds/runslave.py
>+PYTHON=/tools/python/bin/python
>+
>+test -x ${BUILDSLAVE_CMD} || exit 0
>+
>+. /lib/lsb/init-functions
>+
>+start_buildbot() {
>+ # note - spaces here will break things
>+ su - cltbld sh -c "${PYTHON} ${RUNSLAVE}"
>+ ret=$?
>+ if [ $ret == "0" ]; then
>+ touch ${LOCKFILE}
>+ fi
>+ return $ret
>+}
Is LOCKFILE useful to maintain?
>+ parser.add_option("-a", "--allocator-url", action="store", dest="allocator_url")
>+ parser.add_option("-c", "--buildslave-cmd", action="store", dest="buildslave_cmd")
>+ parser.add_option("-d", "--basedir", action="store", dest="basedir")
>+ parser.add_option("-n", "--slavename", action="store", dest="slavename")
>+ parser.add_option("-v", "--verbose", action="store_true", dest="verbose")
>+ parser.add_option( "--no-start", action="store_true", dest="no_start")
>+
>+ (options, args) = parser.parse_args()
>+
>+ # apply some defaults
>+ if not options.slavename:
>+ options.slavename = socket.gethostname().split('.')[0]
>+ def slave_matches(*parts):
>+ "True if any of PARTS appears in options.slavename"
>+ for part in parts:
>+ if part in options.slavename:
>+ return True
>+ return False
>+
>+ if not options.basedir:
>+ if slave_matches('slave'):
>+ if slave_matches('linux', 'darwin'):
>+ options.basedir = '/builds/slave'
>+ elif slave_matches('win', 'w32'):
>+ options.basedir = r'e:\builds\moz2_slave'
>+ elif slave_matches('talos', '-try', 'r3'):
>+ if slave_matches('linux', 'ubuntu', 'fed'):
>+ options.basedir = '/home/cltbld/talos-slave'
>+ elif slave_matches('tiger', 'leopard', 'snow'):
>+ options.basedir = '/Users/cltbld/talos-slave'
>+ elif slave_matches('xp', 'w7'):
>+ options.basedir = r'c:\talos-slave'
>+
>+ if not options.basedir:
>+ parser.error("could not determine basedir")
>+
>+ if not os.path.exists(options.basedir):
>+ parser.error("basedir '%s' does not exist" % options.basedir)
>+
>+ # check for DO_NOT_START
>+ do_not_start = os.path.join(options.basedir, "DO_NOT_START")
>+ while os.path.exists(do_not_start):
>+ if options.verbose:
>+ print "waiting until '%s' goes away" % do_not_start
>+ time.sleep(10)
>+
>+ if not options.allocator_url:
>+ options.allocator_url = default_allocator_url
better to do parser.set_defaults(allocator_url=default_allocator_url) above?
Attachment #494870 -
Flags: feedback?(catlee) → feedback+
Assignee | ||
Comment 8•14 years ago
|
||
(In reply to comment #7)
> Is LOCKFILE useful to maintain?
I don't know if it's necessary to the CentOS initscript infrastructure, so I figured I'd leave it in. I'll take it out if you know that to be harmless.
> better to do parser.set_defaults(allocator_url=default_allocator_url) above?
Fair enough.
Assignee | ||
Comment 9•14 years ago
|
||
Attachment #494870 -
Attachment is obsolete: true
Attachment #495099 -
Flags: review?(bhearsum)
Attachment #494870 -
Flags: review?(bhearsum)
Comment 10•14 years ago
|
||
Comment on attachment 495099 [details] [diff] [review]
tools-r3.patch
I don't think production build machines should pull config information from people.m.c. Can we at least move this to cruncher if we don't have a permanent home yet?
Is RUNSLAVE correct? In the runslave.py header you claim that it is in /usr/local/bin/runslave.py. The latter makes more sense since /builds doesn't exist everywhere.
I think it would be better to assign all of the things from options that you want, rather than the whole dict. It makes it easier to get an overview of BuildbotTac that way.
If I'm reading this right, verbose mode will actually change the behaviour of the script if there's an error fetching or writing the tac file. If that's really what you want to do, I think it should be controlled if "--abort-on-error" or something like that. verbosity shouldn't change behaviour IMHO.
Other than the above, this seems good to me.
Attachment #495099 -
Flags: review?(bhearsum) → review-
Comment 11•14 years ago
|
||
Comment on attachment 494866 [details] [diff] [review]
puppet-manifests-r1.patch
Can you put all of this in a new file/class? startup.pp maybe? buildslave/staging-buildslave is purposely sparse. Also..it's weird to have Fedora including the buildslave class....
I don't see the launchd startuptype being handled anywhere.
Attachment #494866 -
Flags: review?(bhearsum) → review-
Assignee | ||
Comment 12•14 years ago
|
||
(In reply to comment #10)
> Comment on attachment 495099 [details] [diff] [review]
> tools-r3.patch
>
> I don't think production build machines should pull config information from
> people.m.c. Can we at least move this to cruncher if we don't have a permanent
> home yet?
Well, there's nothing *on* people at the moment, so even a failure there won't affect production, but I'm happy to put it wherever you suggest. It's easy enough to change now or later. I don't really want to hold up landing this while we bikeshed about where to drop these files.
> Is RUNSLAVE correct? In the runslave.py header you claim that it is in
> /usr/local/bin/runslave.py. The latter makes more sense since /builds doesn't
> exist everywhere.
I'm not sure what RUNSLAVE is, but yes, it's in /usr/local/bin/runslave.py
> I think it would be better to assign all of the things from options that you
> want, rather than the whole dict. It makes it easier to get an overview of
> BuildbotTac that way.
This is probably a stylistic issue, but I don't like threading parameters through by repeating the parameter name everywhere (in the constructor invocation, constructor parameter list, and twice in the assignment in the constructor). BuildbotTac could easily just become a function or two at this point, anyway.
> If I'm reading this right, verbose mode will actually change the behaviour of
> the script if there's an error fetching or writing the tac file. If that's
> really what you want to do, I think it should be controlled if
> "--abort-on-error" or something like that. verbosity shouldn't change behaviour
> IMHO.
Oops, that 'raise' shouldn't be in there.
Assignee | ||
Comment 13•14 years ago
|
||
(In reply to comment #11)
> Can you put all of this in a new file/class? startup.pp maybe?
> buildslave/staging-buildslave is purposely sparse. Also..it's weird to have
> Fedora including the buildslave class....
I can put it in a new class. It is weird to have fedora including the buildslave class, but there's a lot of weird stuff going on in the puppet manifests, and this change seems in line with that. The puppet manifests need an overhaul to separate concerns a bit better, but this is not the time or the bug for it..
> I don't see the launchd startuptype being handled anywhere.
Well, right - check the summary of the bug :)
Assignee | ||
Comment 14•14 years ago
|
||
moved to a new class as requested
Attachment #494866 -
Attachment is obsolete: true
Attachment #495157 -
Flags: review?(bhearsum)
Assignee | ||
Comment 15•14 years ago
|
||
I moved all of the logic into the class and renamed it RunSlave, as well as fixing the error handling.
This still points to people, since I don't know where else to put it. Open to suggestions there, but I think that can safely be changed without additional review.
This has been re-staged, with the new manifests, successfully.
Attachment #495099 -
Attachment is obsolete: true
Attachment #495158 -
Flags: review?(bhearsum)
Comment 16•14 years ago
|
||
(In reply to comment #12)
> (In reply to comment #10)
> > Comment on attachment 495099 [details] [diff] [review] [details]
> > tools-r3.patch
> >
> > I don't think production build machines should pull config information from
> > people.m.c. Can we at least move this to cruncher if we don't have a permanent
> > home yet?
>
> Well, there's nothing *on* people at the moment, so even a failure there won't
> affect production, but I'm happy to put it wherever you suggest. It's easy
> enough to change now or later. I don't really want to hold up landing this
> while we bikeshed about where to drop these files.
OK, please do change it. I'm just seeking to avoid having a landed, in production copy that's pulling from there.
> > Is RUNSLAVE correct? In the runslave.py header you claim that it is in
> > /usr/local/bin/runslave.py. The latter makes more sense since /builds doesn't
> > exist everywhere.
>
> I'm not sure what RUNSLAVE is, but yes, it's in /usr/local/bin/runslave.py
I'm talking about the RUNSLAVE definition from linux-initd-buildbot.sh -- it says "RUNSLAVE=/builds/runslave.py"
(In reply to comment #13)
> (In reply to comment #11)
> > Can you put all of this in a new file/class? startup.pp maybe?
> > buildslave/staging-buildslave is purposely sparse. Also..it's weird to have
> > Fedora including the buildslave class....
>
> I can put it in a new class. It is weird to have fedora including the
> buildslave class, but there's a lot of weird stuff going on in the puppet
> manifests, and this change seems in line with that.The puppet manifests need
> an overhaul to separate concerns a bit better, but this is not the time or the
> bug for it..
I don't agree that fedora including "buildslave" is in line with existing weirdness. There's a very clear separation of build and test slave stuff in the manifests, and this breaks that.
> > I don't see the launchd startuptype being handled anywhere.
>
> Well, right - check the summary of the bug :)
Duh, thanks for the reminder.
Comment 17•14 years ago
|
||
Comment on attachment 495157 [details] [diff] [review]
m616003-puppet-manifests-r2.patch
Per my other comment, Fedora should include buildbot-startup directly, not through buildslave.
Can you delete /etc/init.d/buildbot-tac in addition to disabling the inetd service?
Attachment #495157 -
Flags: review?(bhearsum) → review-
Comment 18•14 years ago
|
||
Comment on attachment 495158 [details] [diff] [review]
m616003-tools-r4.patch
Sorry, but I like this even less than the previous one....done like this, it's not possible to re-use the RunSlave class in any non-command-line context.
If we can't agree on the explicit arg passing, can you use the previous version w/ the verbose mode bugfix?
Attachment #495158 -
Flags: review?(bhearsum) → review-
Assignee | ||
Comment 19•14 years ago
|
||
-r2, plus include buildslave-startup from the centos and fedora classes instead. And delete /etc/init.d/buildbot-tac
Attachment #495157 -
Attachment is obsolete: true
Attachment #495911 -
Flags: review?(bhearsum)
Assignee | ||
Comment 20•14 years ago
|
||
linux-initd-buildbot: fix RUNPYTHON value
runpython.py: revert to -r3 form, with 'raise' removed
Staged on darwin, fedora, and centos. No change on darwin, and fedora and centos both did the right thing.
Attachment #495158 -
Attachment is obsolete: true
Attachment #495912 -
Flags: review?(bhearsum)
Updated•14 years ago
|
Attachment #495912 -
Flags: review?(bhearsum) → review+
Comment 21•14 years ago
|
||
Comment on attachment 495911 [details] [diff] [review]
m616003-puppet-manifests-r3.patch
I'm still seeing fedora including "buildslave".
Weren't you going to remove the build-tools RPM, too?
Assignee | ||
Comment 22•14 years ago
|
||
> Weren't you going to remove the build-tools RPM, too?
That's a separate bug - bug 615301.
As for the include in talosslave.pp, just an oversight. I'll take the r+ to mean it's OK with that removed.
Assignee | ||
Comment 23•14 years ago
|
||
Comment on attachment 495911 [details] [diff] [review]
m616003-puppet-manifests-r3.patch
0395b9a64270 - review via IRC
Attachment #495911 -
Flags: review?(bhearsum)
Attachment #495911 -
Flags: review+
Attachment #495911 -
Flags: checked-in+
Assignee | ||
Comment 24•14 years ago
|
||
Comment on attachment 495912 [details] [diff] [review]
m616003-tools-r5.patch
pushed as a1d4cb6cfafd
Attachment #495912 -
Flags: checked-in+
Assignee | ||
Comment 25•14 years ago
|
||
I somehow managed to re-introduce a typo when reverting to the earlier runslave.py. With this fix, staging systems seem to be starting without any trouble.
Attachment #496198 -
Flags: review?(bhearsum)
Updated•14 years ago
|
Attachment #496198 -
Flags: review?(bhearsum) → review+
Assignee | ||
Comment 26•14 years ago
|
||
Comment on attachment 496198 [details] [diff] [review]
m616003fix-r1.patch
59640dd90609
Attachment #496198 -
Flags: checked-in+
Assignee | ||
Comment 27•14 years ago
|
||
All of the staging slaves are humming along perfectly, but we're backing this out to re-land it tomorrow.
puppet-manifests: 8a62a0769b1c
tools: 6dc81ad838ae
puppet-files: not in version control; rolled back by hand
Assignee | ||
Comment 28•14 years ago
|
||
landed again, and into production.
puppet-manifests: 5092d483dec1
tools: 21c07e261e2a
Assignee | ||
Updated•14 years ago
|
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 29•14 years ago
|
||
Might have a regression from this - nagios is reporting some linux-ix-slaveN with 2 buildbot processes running. eg linux-ix-slave19 has an (abridged) 'ps xf' of
PID TTY STAT TIME COMMAND
3170 ? Ss 0:00 /tools/python/bin/python /usr/local/bin/runslave.py
3198 ? S 0:00 \_ /tools/buildbot-0.8.0/bin/python /tools/buildbot/bin/buildbot start /builds/slave
3199 ? Z 0:00 \_ [buildbot] <defunct>
3202 ? Sl 0:00 /tools/buildbot-0.8.0/bin/python /tools/buildbot/bin/buildbot start /builds/slave
Assignee | ||
Comment 30•14 years ago
|
||
I saw that on Darwin in testing bug 616350, but assumed it had to do with un-deactivated plist files.
I'll look into this ASAP in bug 619082
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•