Closed Bug 959404 Opened 10 years ago Closed 10 years ago

puppet agent initscript shouldn't fail

Categories

(Infrastructure & Operations :: RelOps: Puppet, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rail, Assigned: dustin)

Details

Attachments

(1 file, 1 obsolete file)

When something goes wrong with puppet we usually lose a lot of capacity. We can change the initscript to touch some file (/etc/puppet/last-update) if everything goes right (puppet agent exits 0 or 2). If puppetizing doesn't work the script should retry N times, then check if /etc/puppet/last-update is not older than X hours. If the file is fresh enough the script should send an email about the failure and exit 0, so the machine boots up properly. If the file is old we can either keep retrying or do something else (reboot?).
It would have helped us today when a simple typo caused tree closures and 1500 Amazon machines up and not running.
Assignee: relops → dustin
And today too
Do we have something in place to prevent running jobs on minis with incorrect resolutions?  This plan sounds great *except* that those minis will run jobs until the last-update semaphore file is too old -- and I'm assuming "too old" is on the order of hours to a day.
Attached patch bug959404.patch (obsolete) — Splinter Review
Totally untested, aside from the perl snippet, but what do you think about this approach?
Attachment #8375039 - Flags: feedback?(rail)
Comment on attachment 8375039 [details] [diff] [review]
bug959404.patch

Review of attachment 8375039 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great to me. Maybe it'd be great to make it start spamming us whenever it reaches the MAX_SECS_SINCE_GOOD_RUN point just in case if all puppet masters are down, or something wrong in between.
Attachment #8375039 - Flags: feedback?(rail) → feedback+
Tested on CentOS, and with sending of email.
Attachment #8375039 - Attachment is obsolete: true
Attachment #8380765 - Flags: review?(rail)
Comment on attachment 8380765 [details] [diff] [review]
bug959404-p1.patch

woot!
Attachment #8380765 - Flags: review?(rail) → review+
Tested fine on Ubuntu, too.
Tested fine on OS X Lion.
I don't see any problems.  I watched a spot host apply this, reboot, and successfully re-run puppet (since the worst-case here was puppet runs not working, leaving machines unmanaged).
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Woot! Thanks a lot for this. No more puppet typos breaking the WORLD! :)
One hopes.. we'll see :)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: