Closed Bug 520594 Opened 16 years ago Closed 16 years ago

moz2-darwin* and bm-xserve machines having trouble with nagios since puppet rollout

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

Attachments

(3 files)

It seems that they take 10-15 minutes to start the nrpe daemon, which sets off quite a few alerts. Puppet does manage the nagios config / daemon, so it's possible that it busted it in some way.
I'll look into this a bit.
Assignee: nobody → bhearsum
Turns out we're not managing this file yet, this will get us doing that. Incoming, is a copy of the plist file.
Attachment #404659 - Flags: review?(catlee)
Comment on attachment 404659 [details] [diff] [review] Manage the nrpe plist file with Puppet >+ source => "${fileroot}darwni9/nrpe.plist", Looks good, as long as you fix the typo.
Attachment #404659 - Flags: review?(catlee) → review+
Attached patch nrpe plistSplinter Review
Here's the error I found in the system log: Oct 5 11:44:05 localhost com.apple.launchd[1] (org.nagios.nrpe): Unknown key: ServiceDescription Oct 5 11:44:58 moz2-darwin9-slave16 com.apple.launchd[1] (org.nagios.nrpe): Unknown key: ServiceDescription This is a copy of the plist file without that key, which seems to work right away, rather than 20 minutes later.
Attachment #404660 - Flags: review?(catlee)
Attachment #404660 - Flags: review?(catlee) → review+
Comment on attachment 404659 [details] [diff] [review] Manage the nrpe plist file with Puppet changeset: 59:4e22d906aa0b
Attachment #404659 - Flags: checked-in+
Comment on attachment 404660 [details] [diff] [review] nrpe plist Checking in nrpe.plist; /mofo/puppet-files/darwin9/nrpe.plist,v <-- nrpe.plist initial revision: 1.1 done
Attachment #404660 - Flags: checked-in+
I updated the Puppet masters, these should be OK now. Leaving this bug open for now.
This didn't fix it, apparently. I've disabled the notifications for now, to quiet down #build. I'll come back to this a bit later today.
This patch makes Puppet not exec enablerpe unless: * /Library/LaunchDaemons/nrpe.plist doesn't exist (this is the creates => part) * setup-nrpe gets run (this is the first part of the subscribe) * /usr/local/nagios/etc/nrpe.plist changes (the second part of the subscribe) I tested a few reboot cycles with and without this patch and the results were 100% consistent: Without the patch, the nagios server gets connection refused right after boot With the patch, it is able to properly execute its checks immediately after boot.
Attachment #404882 - Flags: review?(catlee)
Attachment #404882 - Flags: review?(catlee) → review+
Comment on attachment 404882 [details] [diff] [review] only run enablenrpe if nrpe.plist has changed changeset: 62:758b10a407d1
Attachment #404882 - Flags: checked-in+
I haven't seen any more of these fail, and I explicitly saw one work immediately after booting. I think the last patch did it.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: