So after deploying Part 1 of Bug 1048358, we have a VERY fast slaverebooter cycle. It used to take > 6-7 hours on average to run Now, a start at ~00:43 finishes by ~00:49!!! I suggest we do the following: * Decrease the nagios lockfile watching down to Warn in 30 minutes, Crit in 2 hours. ** If memory is wrong and the watch is on logfile reduce to warn in 2 hours, crit in 3. * Reduce the cron frequency to once an hour, the IDLE_* timer in slaverebooter itself will prevent extra actual work from happening, with the benefit that we'll address things faster.
Created attachment 8476064 [details] [diff] [review] [puppet] run every hour