Once you've successfully hooked a slave up to puppet, it invariably has the nasty habit of blowing away its keys and restarting - leaving you to start again. This doesn't serve much purpose except to make things slower, so I propose to remove the restart step. This is a step along the way to bug 623849 that would remove a lot of day-to-day slaveduty hassle.
Created attachment 513778 [details] [diff] [review] m635518-puppet-manifests-r1.patch I'll give bhearsum a break from reviewing my puppet patches - it's nick's turn! I staged this, made a harmless change to the puppet plist on a mac slave, and ran puppetd by hand. If reverted my harmless change, but did not reboot. Success!
Attachment #513778 - Flags: review?(nrthomas)
Comment on attachment 513778 [details] [diff] [review] m635518-puppet-manifests-r1.patch I think you actually have to inflict this on Ben instead. IIRC that code is there so a machine can be imaged, contact production-puppet on boot, get a new puppet config and blow away the keys, then reboot to talk to the location specific puppet master.
Attachment #513778 - Flags: review?(nrthomas) → review?(bhearsum)
This reset-ssl step is part of the automatically-connect-to-your-puppet-server feature of our manifests. It's not working so well these days (because the MV puppet server doesn't have the redirection nodes), but I'd rather see that repaired than removed altogether. The idea behind this was that machines could be re-imaged and come right back up into production, provided their host key on the server had been removed, and is still desirable IMHO. (The manual removing of the server side host key could easily be automated, too.)
Attachment #513778 - Flags: review?(bhearsum) → review-
You're right that the current system doesn't work. We already have a bug to do this right - bug 623840. The point of this ticket is just to remove the unnecessary reboots it takes to manually bring a slave up - it ends up taking far longer than it should, with several puppetca invocations on the master. Basically, it's a band-aid for a problem I don't have time to solve yet.
For the record, here's what I had to do to bring linux-ix-slave42 up: edit /etc/sysconfig/puppet run puppetd --test run puppetca --clean on the server run puppetd --test run puppetca --sign on the server run puppetd --test (wait for reboot, which blows away the keys I just set up) kill running puppetd instance run puppetca --clean on the server run puppetd --test run puppetca --sign on the server run puppetd --test so not having the restart-and-blow-away-keys would halve the time this process takes, and it certainly wouldn't work any worse than it does now.
(In reply to comment #4) > You're right that the current system doesn't work. We already have a bug to do > this right - bug 623840. What's the timeline for that? If it's far off into the future I'd rather add the necessary site*.pp bits to make master switching work again than remove this altogether.
The timeline is to do that when I have finished work on slavealloc and monitoring for idle slaves, both of which have converged onto getting a new version of buildbot and runslave.py installed on all slaves. We can WONTFIX this if you'd like, but this is a simple fix for a major pain point, especially when on slaveduty, and doesn't break the re-parenting support any more than it's already broken.
OK, well, let's call this WONTFIX. I'll just deal with the current brokenness.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → WONTFIX
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.