Closed Bug 378529 Opened 13 years ago Closed 12 years ago
make buildbot auto-install and update configs from CVS
Our buildbot installs (both buildmaster and buildslaves) should update from our buildbot CVS vendor import automatically. This includes: 1) the buildbot install itself 2) the master.cfg (and any included files) For #1 we'll want to check out from cvs.mozilla.org/cvsroot/mozilla/tools/buildbot and run "python setup.py install". Buildbot should be shut down before this action, and started afterwards. For #2, we'll probably want a directory checked out ("buildbot-configs"?), then symlink/copy the needed files into the main buildbot basedir. A "buildbot reconfig $basedir" needs to be run afterwards (or a SIGHUP). Things to consider: * should be done out of band; e.g. don't want buildbot updating itself because if we break it we have to fix by hand * #1 will interrupt running builds; nice-to-have would be to check if there are any active builds, and wait if so (this may not be realistic on e.g. a master that is constantly busy). For the purposes of bug 355309 I want to do the simplest thing that could possibly work (thinking e.g. a script in a while() loop waiting for CVS checkins and performing above actions). This script could also be responsible for starting the buildbot process.
Assignee: rhelmer → ccooper
Status: ASSIGNED → NEW
rhelmer and I were talking in #build about how to test a master.cfg for validity before putting it into place for a real BuildMaster to use. This script does just that. It can take an argument of the config file to test defaulting to 'master.cfg'. If the file cannot be opened the exit status is 2, if there is a problem with the configuration file the exit status is 1. Otherwise, it exits normally.
(In reply to comment #1) > Created an attachment (id=262781) [details] > buildbot config file tester This looks good; we should get this into mozilla/tools/buildbot/contrib and see if they also want it upstream.
(In reply to comment #2) > (In reply to comment #1) > > Created an attachment (id=262781) [details] [details] > > buildbot config file tester > > > This looks good; we should get this into mozilla/tools/buildbot/contrib and see > if they also want it upstream. > Thanks :). I'll pass it on up.
This does roughly what I've described in comment #0. I think it would be useful for this script to be the way that buildbot is started, which it'll need a little more work for. The idea is, you'd checkout mozilla/tools/buildbot and run this script (contrib/botrunner.py), which would then take care of installing buildbot and keeping it up to date. We'd want this script to start on boot for both master and slaves. I didn't make any allowances in the current script for the "buildbot.tac" file on the slaves, not sure if it's worth tracking those in CVS (we could auto-generate them pretty easily on first-run based on hostname if we wish)
I've been thinking about this some more.. we should copy the way buildbot does command-line handling and do it like this: botrunner master [options] <basedir> botrunner slave [options] <basedir> <master> <name> <password> options would include things like overriding the default cvsroot and checkout directory. If no basedir exists, then botrunner would call "create-master" or "create-slave" as appropriate, put the master.cfg in place, and start buildbot. botrunner would be invoked on boot on the slaves (could auto-generate the name/password) as well as on the master.
(In reply to comment #4) > Created an attachment (id=263057) [details] > simple script to update buildbot from source and optionally master.cfg Looks good overall, a few comments though: - You call check_call in a few places. It looks like that will raise an exception if the exit code is non-zero, I'm thinking that it would be good to have everything in the while loop in a try/except block. - The last line in updateBuildbot() runs 'buildbot stop', I think you meant 'buildbot start' :) - Should the unit tests be run as part of updateBuildbot()? - It might make sense to have the workdir as a parameter with a default...maybe this doesn't make a difference if it's only being run from this file though.
Here's another work-in-progress version, closer to what I describe in comment #5. Ben I took your comments into account, for the unit tests I just put in a TODO comment which I'll fix up shortly :) I started adding exception handling but I found I was just throwing to the console anyway so left it out for now as that's the default behavior. The intent of the current script is that it can be invoked like this on the slaves: botrunner slave /builds/buildbot/slave master:9989 user pass /builds/checkouts email@example.com:/cvsroot and this on the master: botrunner master /builds/buildbot/master /builds/checkouts firstname.lastname@example.org:/cvsroot botrunner checks out the buildbot configs and sources to /builds/checkouts. If the basedir specified does not exist, then buildbot is installed and the basedir is created with "buildbot create-slave" or "create-master". Once that's done, botrunner enters a while loop which checks to see if buildbot or the configs have been updated, and does a stop/install/start or a install/sighup (respectively) if so. Otherwise, sleep for a minute.
(In reply to comment #7) > Created an attachment (id=263212) [details] > changes described in comment #5 > - stop_buildbot() is running 'buildbot start' - master_usage() and slave_usage() don't indicate that 'checkoutbase' and 'cvsroot' are required arguments - In is_running(), why are you using kill() to terminate Buildbot? Also, should is_running() be terminating the process in the first place? Other than that it looks great!
(In reply to comment #8) > (In reply to comment #7) > > Created an attachment (id=263212) [details] [details] > > changes described in comment #5 > > > > - stop_buildbot() is running 'buildbot start' > - master_usage() and slave_usage() don't indicate that 'checkoutbase' and > 'cvsroot' are required arguments > - In is_running(), why are you using kill() to terminate Buildbot? Also, should > is_running() be terminating the process in the first place? Got the other two; the "kill" thing is kind of a dumb trick; I'll put a comment above it. It's a simple way to tell if a process is running or not (notice it is sending "signal 0").
Ok, this one works on my dev box for both master and slave. Ben, I am not so sure that we should run unit tests actually.. it takes quite a while, I wonder if we should do something simpler like a pylint check? Anyway deferring that for right now :) Besides that I think this is pretty complete, the only other major thing I'd like to add is email support. I've added exception handling within the while loop (it's not very clever, just notes that there was an exception and continues.. I'd like this to be replaced with something that emails the exception instead). Note that there is an initial check in the main() method before the while loop that tries to install buildbot and initialize the appropriate basedir. There is no exception handling around this, the assumption being that this is not going to be recoverable (probably permissions or dependency issues).
(In reply to comment #10) > Ok, this one works on my dev box for both master and slave. Ben, I am not so > sure that we should run unit tests actually.. it takes quite a while, I wonder > if we should do something simpler like a pylint check? How long are we talking in each case, i.e. unit tests vs. pylint check? I'm all for running the unit tests if it makes sense to, which I guess depends on how often we figure we'll be pushing changes to this code. We don't currently push tinderbox updates very often, but that's a "mature" system. We'll likely be pushing lots of updates at the start, but isn't that precisely the kind of situation where we'd want unit tests even more?
Comment on attachment 263319 [details] better exception handling, handle custom prefix preed what do you think about this approach? Thinking I'll land it in our mozilla/tools/buildbot/contrib/ The assumption this script makes is that all deps (twistd, python, etc) are already installed, and this script will check out buildbot and master.cfg from our CVS, install buildbot if needed, and create master/slave dirs if needed.
Attachment #263319 - Flags: review?(preed)
The buildbot unit tests are pretty specific to, well, buildbot. I think running them on the development machine before checkin should be standard practice. As for running them on the actual production boxes subscribing to these changes, it might make more sense to have our own set of tests. We're likely more interested in things like the bonsaipoller, tinderbox reporter and various other custom bits we're running. It'd be nice to know if we're breaking any of these features before the machines are updated to new code and fail gracefully if we are.
Attachment #263319 - Attachment mime type: text/x-python → text/plain
Comment on attachment 263319 [details] better exception handling, handle custom prefix Let's settle up front on CamelCaps or under_scores. I'm partial to CamelCaps. >#!/usr/bin/env python ># ># BotRunner - start buildbot, keep it up to date, keep it running. >## > >import sys, getopt >from subprocess import PIPE, Popen >from time import sleep >from shutil import copyfile >from traceback import print_exc >from os.path import isdir, exists >from os import mkdir, kill > >try: > from subprocess import check_call >except ImportError: > # Python 2.4 doesn't have check_call Are we doing this for (old) build machines that don't have Python 2.4? If so, it's time for a MozBuild.py with this code in it (wasn't this stolen from cvs2hg-import.py?) >class BotRunner: > > def __init__(self, isMaster=False, prefix='', cvsroot='', checkoutbase='', > basedir=''): > self.isMaster = isMaster > self.prefix = prefix > self.cvsroot = cvsroot > self.checkoutbase = checkoutbase > self.basedir = basedir > self.buildbot = prefix+'/bin/buildbot' We should be able to come up with some sane defaults for some of these. Also, for all the strings here ("/bin/buildbot", "mozilla/tools/buildbot", etc.), let's constify them. >def usage(): > print "Usage: botrunner <command> [command options]" > print "" > print "Options:" > print " --version" > print " --help Display this help and exit." > print "Commands:" > print " master Manage a buildmaster" > print " slave Manage a buildslave" > sys.exit(0) > >def master_usage(): > print "Usage: botrunner master [options] <basedir> <checkoutbase> <cvsroot>" > print "" > print "Options:" > print " --help Display this help and exit." > print "Commands:" > print " prefix Location of buildbot install" > print " basedir Location of buildbot basedir" > print " checkoutbase Location of checked-out buildbot sources" > print " cvsroot CVSROOT for buildbot sources/configs" > sys.exit(0) > >def slave_usage(): > print "Usage: botrunner slave [options] <basedir> <master> <name> <password> <checkoutbase> <cvsroot>" > print "" > print "Options:" > print " --help Display this help and exit." > print "Commands:" > print " prefix Location of buildbot install" > print " basedir Location of buildbot basedir" > print " master hostname:port of buildmaster" > print " name username for buildmaster" > print " password password for buildmaster" > print " checkoutbase Location of checked-out buildbot sources" > print " cvsroot CVSROOT for buildbot sources/configs" > sys.exit(0) > >def main(): > isMaster = False > > if len(sys.argv) < 2: > usage() > > if sys.argv == '--help': > usage() > > if sys.argv == '--version': > print "Botrunner v0.1" > sys.exit(0) > > if sys.argv == 'master': > if len(sys.argv) < 6 or sys.argv == '--help': > master_usage() > prefix = sys.argv > basedir = sys.argv > checkoutbase = sys.argv > cvsroot = sys.argv > isMaster = True > elif sys.argv == 'slave': > if len(sys.argv) < 7 or sys.argv == '--help': > slave_usage() > prefix = sys.argv > basedir = sys.argv > master = sys.argv > name = sys.argv > password = sys.argv > checkoutbase = sys.argv > cvsroot = sys.argv > else: > usage() Have you considered using OptionParser? See cvs2hg-import.py; it's much cleaner, and does a bunch of nice things for you. >main() This is a bit of a perlism; in python, you usually put your main method under something that looks like: if __name__ == '__main__': *or* if you care about the return value being propagated to the shell you have a main function, and do: if __name__ == '__main__': return main(); These are mostly Python-esque aesthetic comments; let's get together and have you walk me through the approach here. It looks ok, but I mostly skimmed it, so a walkthrough would be useful for me.
Attachment #263319 - Flags: review?(preed) → review-
Thanks preed! Yeah I did the supporting code kind of quick and dirty, all the comments above sound good to me.
(In reply to comment #12) > (From update of attachment 263319 [details]) Everything looks fine to me logic-wise.
rhelmer seems to be making great strides here. Anything I can do to help: review, testing, etc?
(In reply to comment #17) > rhelmer seems to be making great strides here. Anything I can do to help: > review, testing, etc? preed and I went over it yesterday, we're all agreed on the general approach (e.g. comment #4). I'm going to follow up on preed's suggestions from comment #14 and I'll request review from you coop. After that, testing it on OSX and win32 would be especially valuable.
* use CamelCase for classnames, camelCase for methods/variables * use OptionParser (note that the first arg parsing is done as before, but the hard stuff is in OptionParser now) * use more constants and default values * use __main__ not main() The only thing this is missing from preed's previous comments is moving the check_call 2.4 check to a utility class; need to figure out where this file is landing, and where that class should go :) (mozilla/tools ? I was thinking mozilla/tools/buildbot/contrib, not sure if they'll want this upstream).
Comment on attachment 264199 [details] botrunner v0.2 This looks generally fine; for now, though, can we follow the Perl style guidelines, unless there's an obvious reason to change, so: -- the 2 -space indentation -- Same thing for horizontal spacing on lines, i.e.: basedir+'/buildbot.tac') self.buildbot = prefix+BUILDBOT_RUN_SCRIPT etc. Function calling; with Python it's slightly different, but you can still do function(namedArg=arg) type calling, IIRC. Please fix these before checking in. ># TODO - move this until utility class Can we take the extra hit and do this now? > for line in p.stdout.readlines(): > firstChar = line.split(' ') > if (firstChar == 'U') or (firstChar == 'P'): > return True > else: > return False This will only read the first line of p.stdout; is that what you were expecting? > if kill(int(pid), 0) == None: > return True > return False I think this can be written as: return (kill(int(pid), 0) == None); Other than these minor things, looks good.
Attachment #264199 - Flags: review?(preed) → review+
(In reply to comment #20) > (From update of attachment 264199 [details]) > > ># TODO - move this until utility class > > Can we take the extra hit and do this now? Sure, as per my comment I wasn't sure where this should be checked in. I've made up a patch which would add it to "mozilla/tools/botrunner.py", as well as adding a "mozbuild.py" that can be used by e.g. cvs2hg-import.py > > for line in p.stdout.readlines(): > > firstChar = line.split(' ') > > if (firstChar == 'U') or (firstChar == 'P'): > > return True > > else: > > return False > > This will only read the first line of p.stdout; is that what you were > expecting? Oh good catch thanks. I want something more like this actually: for line in ... if (firstChar matches): return True return False Exit on first match found, but look on all lines for matches. > Other than these minor things, looks good. Ok all done, not sure about the CVS location so I'm going to re-request review on the new patch.
Implement preed's suggestions from previous comments. The 2.4-compatible check_call has been moved into mozbuild.py, which provides a wrapper so the caller can simply: from mozbuild import check_call Then use as usual. I fixed a couple omissions and everything seems ok now in my testing, only thing left is adding support for running unit tests before install.
I tested this on win32 and ran into an issue with isRunning(). Python on Windows doesn't support os.kill(). Not only that, but Twisted on Windows does not appear to dump out PIDs. This means that we can't easily check for a process with the win32 python extensions. The only real handle we have on it is the name 'python.exe' -- which isn't going to be unique. There doesn't appear to be a way to force Twisted to spit out a twistd.pid on Windows, but I've posted a message to the Twisted mailing list to inquire about it anyways. I'll have another look at this tomorrow and post an update.
This patch adds partial support for win32. There are still issue to be resolved as noted below. I made changes to checkForUpdates() and isRunning(). checkForUpdates() now uses p.wait() when running on Windows. waitpid() needs a handle instead of a pid on Windows, so p.wait() was much simpler. In isRunning(), the only way I could find to determine if buildbot was running was to parse the output of 'wmic process'. The problem with this is that it will return 1 if *any* buildbot is running. Ideally I'd like a better way to do this. There's also the issue of paths that I haven't resolved yet. There's a lot of '/' + somedir hardcoding which will have to be removed/replaced. It just occured to me that anything that relys on twistd.pid (stop/restart/reconfig) will not work on Windows. I'm going to continue working on resolving these, but I wanted to get an initial patch up.
rhelmer/preed: How do you both feel about installing Buildbot as a service on release machines? I know we originally decided against it but I believe doing so will simplify the botrunner script. We may have to re-install the service whenever Buildbot is upgraded (I haven't tested that yet), but I can't think of any additional burdens. I can't find any way to make 'buildbot [stop|start|reconfig] work as it does on unix. The alternative to installing as a service is killing the process manually, which I don't think is a good option. This is the last big hump to making botrunner run on Windows.
Changes and fixes: * Fixed the hard-coded '+ "/" +'. * In updateConfig() on win32 Buildbot is simply restarted. I'm still looking for a way to reconfig it without a restart. * startBuildbot on win32 uses 'cmd.exe /C start buildbot start [basedir]' to avoid blocking the entire script rhelmer: Did you mean to put a call to checkconfig.py in updateBuildbot() or is that a bug? It seems to me that it should go in updateConfig() instead. I commented it out for now. I haven't fully tested it yet but I've run it a few times and it worked as intended. By the way Rob, I think I may have obsolete'd your patch when I posted mine, I didn't mean to do that.
Attachment #265432 - Attachment is patch: true
(In reply to comment #26) > Created an attachment (id=265432) [details] > botrunner, with win32 support > > Changes and fixes: > * Fixed the hard-coded '+ "/" +'. Great, thanks! I notice that the defaults are all windows-style paths now :P I'll look into a way to make this more graceful (like auto-prepending a drive letter and using platform-neutral separator for these). > * In updateConfig() on win32 Buildbot is simply restarted. I'm still looking > for a way to reconfig it without a restart. Yeah better than nothing, although it'd be really annoying to have the client stop building just because a new config was checked in, Actually, you know what might be more worth your time than getting reconfig working on win32, is to investigate how to schedule restarts for when the next build is finished .. bonus points for keeping any queued requests (although that might require extensive buildbot hackery). e.g. if I check in a code change, I'd rather wait until the master or slave is done with it's current task before killing buildbot to do the update. I'd focus on fixing this for the slave first, because the master can have multiple overlapping things to do and (theoretically) never have a moment's rest, but the client always has a stopping point. Up to you, I don't know if it'd be easier to do that or to get reconfig working on windows. I think that getting the idea of scheduled restarts that don't throw away the queue would be really valuable, though. > * startBuildbot on win32 uses 'cmd.exe /C start buildbot start [basedir]' to > avoid blocking the entire script > > rhelmer: Did you mean to put a call to checkconfig.py in updateBuildbot() or is > that a bug? It seems to me that it should go in updateConfig() instead. I > commented it out for now. You're right, it should go in updateConfig(). > I haven't fully tested it yet but I've run it a few times and it worked as > intended. > > By the way Rob, I think I may have obsolete'd your patch when I posted mine, I > didn't mean to do that. That's the right thing to do in this case, so no worries :)
(In reply to comment #27) > (In reply to comment #26) > > Created an attachment (id=265432) [details] [details] > > botrunner, with win32 support > > > > Changes and fixes: > > * Fixed the hard-coded '+ "/" +'. > > > Great, thanks! I notice that the defaults are all windows-style paths now :P > I'll look into a way to make this more graceful (like auto-prepending a drive > letter and using platform-neutral separator for these). I meant to change these back actually, whoops ;-) > > > * In updateConfig() on win32 Buildbot is simply restarted. I'm still looking > > for a way to reconfig it without a restart. > > > Yeah better than nothing, although it'd be really annoying to have the client > stop building just because a new config was checked in, > I just noticed that there is a way within the debugclient to force a reconfig. I'm going to look into whether or not this code could help make reconfig work on Windows. I'm thinking it might be possible to change 'buildbot reconfig' so it works without twistd.pid. > Actually, you know what might be more worth your time than getting reconfig > working on win32, is to investigate how to schedule restarts for when the next > build is finished .. bonus points for keeping any queued requests (although > that might require extensive buildbot hackery). If the above looks like it will be more trouble than it is worth I'm probably going to go this route. It would probably end up being useful in other areas, and to the Buildbot community at large.
TupdateConfig() has been changed in this version. copyfile() has been replaced with a 'copydir'. I had to write this function to accommodate copying the contents of a directory in a cross-platform way. I've added two new global variables: BUILDBOT_MASTER_PORT and BUILDBOT_DEBUG_PASSWORD. These are both necessary for the botrunner to interact with the debugcmd in bug#381479.
Attachment #265549 - Attachment is patch: false
This patch is pretty much the same as the one before last but fixes a couple bugs and typos in updateConfig().
needed for botrunner
I was able to get botrunner v0.5 running rhelmer's Mac-friendly vars from an earlier version. I don't have any configs to go further with it yet, but the auto-install + update seems to work as advertised.
(In reply to comment #32) > rhelmer's Mac-friendly vars Er, should read: "...by using rhelmer's Mac-friendly vars" (from v0.2, with a required update to the BUILDBOT_CONFIGS var name)
Comment on attachment 265556 [details] botrunner v0.5 looks fine, but can you: * make the default vars more platform-neutral (e.g. os.path) * attach it as a diff The actual code itself looks fine, thanks for working on this :)
Attachment #265556 - Flags: review?(rhelmer) → review-
I made the changes you requested, rhelmer. Seems like there should be a cleaner way to be os-independent with the paths but that's the best I could come up with.
Comment on attachment 266838 [details] [diff] [review] botrunner v0.6 looks good to me
Attachment #266838 - Flags: review+
Attachment #267308 - Attachment mime type: application/octet-stream → text/plain
Landed, thanks everybody! RCS file: /cvsroot/mozilla/tools/botrunner.py,v done Checking in botrunner.py; /cvsroot/mozilla/tools/botrunner.py,v <-- botrunner.py initial revision: 1.1 done RCS file: /cvsroot/mozilla/tools/MozBuild/Util.py,v done Checking in MozBuild/Util.py; /cvsroot/mozilla/tools/MozBuild/Util.py,v <-- Util.py initial revision: 1.1 done RCS file: /cvsroot/mozilla/tools/MozBuild/__init__.py,v done Checking in MozBuild/__init__.py; /cvsroot/mozilla/tools/MozBuild/__init__.py,v <-- __init__.py initial revision: 1.1 done
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Reopening because we forgot to check-in checkconfig.py. It should be put in mozilla/tools/buildbot/contrib.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
RCS file: /cvsroot/mozilla/tools/buildbot/contrib/checkconfig.py,v done Checking in tools/buildbot/contrib/checkconfig.py; /cvsroot/mozilla/tools/buildbot/contrib/checkconfig.py,v <-- checkconfig.py initial revision: 1.1 done
Status: ASSIGNED → RESOLVED
Closed: 13 years ago → 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.