Closed Bug 842704 Opened 13 years ago Closed 13 years ago

Increase the puppet::atboot retry interval

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

Details

Attachments

(1 file)

In the vast majority of cases where a puppet run fails, the next one's going to fail too. And the same on a few hundred machines. Which makes for hundreds of emails per minute in the current incarnation. The retry logic is currently to retry every 60s for 10m, then reboot. The reboot is in case the host doesn't have an IP. We haven't seen hosts without IPs in a long time. Let's bump the delay back a bit: perhaps wait 2 minutes after the first failure and increase exponentially from there, and only reboot after an hour?
Attached patch bug842704.patchSplinter Review
Attachment #721308 - Flags: review?(bugspam.Callek)
Comment on attachment 721308 [details] [diff] [review] bug842704.patch Review of attachment 721308 [details] [diff] [review]: ----------------------------------------------------------------- ::: modules/puppet/templates/puppet-atboot-common.erb @@ +2,5 @@ > # License, v. 2.0. If a copy of the MPL was not distributed with this > # file, You can obtain one at http://mozilla.org/MPL/2.0/. > > +# number of tries to make between attempts to run puppet. 7 waits about 2h > +REBOOT_AFTER=7 count is 0-order not 1-order, so we actually try 8 times with this change, adding up to 4.25 hours! drop this to 6 and you have my r+.
Attachment #721308 - Flags: review?(bugspam.Callek) → review+
Attachment #721308 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: