Closed
Bug 959404
Opened 10 years ago
Closed 10 years ago
puppet agent initscript shouldn't fail
Categories
(Infrastructure & Operations :: RelOps: Puppet, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rail, Assigned: dustin)
Details
Attachments
(1 file, 1 obsolete file)
3.42 KB,
patch
|
rail
:
review+
|
Details | Diff | Splinter Review |
When something goes wrong with puppet we usually lose a lot of capacity. We can change the initscript to touch some file (/etc/puppet/last-update) if everything goes right (puppet agent exits 0 or 2). If puppetizing doesn't work the script should retry N times, then check if /etc/puppet/last-update is not older than X hours. If the file is fresh enough the script should send an email about the failure and exit 0, so the machine boots up properly. If the file is old we can either keep retrying or do something else (reboot?).
Reporter | ||
Comment 1•10 years ago
|
||
It would have helped us today when a simple typo caused tree closures and 1500 Amazon machines up and not running.
Assignee | ||
Updated•10 years ago
|
Assignee: relops → dustin
Assignee | ||
Comment 2•10 years ago
|
||
And today too
Assignee | ||
Comment 3•10 years ago
|
||
Do we have something in place to prevent running jobs on minis with incorrect resolutions? This plan sounds great *except* that those minis will run jobs until the last-update semaphore file is too old -- and I'm assuming "too old" is on the order of hours to a day.
Assignee | ||
Comment 4•10 years ago
|
||
Totally untested, aside from the perl snippet, but what do you think about this approach?
Attachment #8375039 -
Flags: feedback?(rail)
Reporter | ||
Comment 5•10 years ago
|
||
Comment on attachment 8375039 [details] [diff] [review] bug959404.patch Review of attachment 8375039 [details] [diff] [review]: ----------------------------------------------------------------- Looks great to me. Maybe it'd be great to make it start spamming us whenever it reaches the MAX_SECS_SINCE_GOOD_RUN point just in case if all puppet masters are down, or something wrong in between.
Attachment #8375039 -
Flags: feedback?(rail) → feedback+
Assignee | ||
Comment 6•10 years ago
|
||
Tested on CentOS, and with sending of email.
Attachment #8375039 -
Attachment is obsolete: true
Attachment #8380765 -
Flags: review?(rail)
Reporter | ||
Comment 7•10 years ago
|
||
Comment on attachment 8380765 [details] [diff] [review] bug959404-p1.patch woot!
Attachment #8380765 -
Flags: review?(rail) → review+
Assignee | ||
Comment 8•10 years ago
|
||
Tested fine on Ubuntu, too.
Assignee | ||
Comment 9•10 years ago
|
||
Tested fine on OS X Lion.
Assignee | ||
Comment 10•10 years ago
|
||
https://hg.mozilla.org/build/puppet/rev/97ec0177f2ee
Assignee | ||
Comment 11•10 years ago
|
||
I don't see any problems. I watched a spot host apply this, reboot, and successfully re-run puppet (since the worst-case here was puppet runs not working, leaving machines unmanaged).
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 12•10 years ago
|
||
Woot! Thanks a lot for this. No more puppet typos breaking the WORLD! :)
Assignee | ||
Comment 13•10 years ago
|
||
One hopes.. we'll see :)
You need to log in
before you can comment on or make changes to this bug.
Description
•