Currently, when a mac builder runs puppet, puppet finishes by touching a file. This triggers launchd to start the buildslave service. The problem is, now that the buildslave service starts regardless of the presence of a pidfile, this can easily result in multiple buildbot processes on the same slave. Probably the best solution here is to run puppet from a script like run-puppet-and-buildbot, as used on the mac talos systems.
I thought we could run puppet as a StartupItem http://developer.apple.com/library/mac/#documentation/MacOSX/Conceptual/BPSystemStartup/Articles/StartupItems.html#//apple_ref/doc/uid/20002132-CJBBHDII this is similar to an rc script, and we could block in the StartService function until puppet is happy with its lot in life. When SystemStarter runs, launchd has already started, so this would not preclude ssh logins to unscrew puppet, if necessary. However, sadly, loginwindow does not block until StartService has completed. launchctl is really not into blocking at all. Then I stumbled across this: http://puppet-mw08.googlecode.com/svn/trunk/puppet/master-dev/manifests/classes/loginwindow.pp It turns out that you can tell launchctl to just not start loginwindow! launchctl unload -w /System/Library/LaunchDaemons/com.apple.loginwindow.plist And then you can later tell it to start loginwindow, when you're ready: launchctl load -F /System/Library/LaunchDaemons/com.apple.loginwindow.plist and all this time, SSH logins work, so we can unhork as necessary. So my proposal is to split up buildbot startup for *all* darwin hosts thusly: 1. Puppet gets started from a shell script that will run it repeatedly until it succeeds, similar to the initscript for centos 2. That script runs 'launchctl load -F ..' when puppet runs successfully 3. Puppet itself disables the automatic display of the loginwindow 4. Buildbot runs from a cltbld launchd service, which is only started when cltbld logs in (automatically) Incidentally, I also found http://osxdaily.com/2007/03/25/always-boot-mac-os-x-in-verbose-mode/ which is probably something we should do on all macs, just to be safe.
I'd still like to do this, but I'm not sure how high-priority it is.
Seems like a low priority compared to other work.
We're still touching a semaphore file, but that is working quite reliably now - we haven't seen any multiple-starts (that I've heard of).