Closed
Bug 520594
Opened 16 years ago
Closed 16 years ago
moz2-darwin* and bm-xserve machines having trouble with nagios since puppet rollout
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: bhearsum)
References
Details
Attachments
(3 files)
|
947 bytes,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
|
1006 bytes,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
|
802 bytes,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
It seems that they take 10-15 minutes to start the nrpe daemon, which sets off quite a few alerts. Puppet does manage the nagios config / daemon, so it's possible that it busted it in some way.
| Assignee | ||
Comment 3•16 years ago
|
||
Turns out we're not managing this file yet, this will get us doing that. Incoming, is a copy of the plist file.
Attachment #404659 -
Flags: review?(catlee)
Comment 4•16 years ago
|
||
Comment on attachment 404659 [details] [diff] [review]
Manage the nrpe plist file with Puppet
>+ source => "${fileroot}darwni9/nrpe.plist",
Looks good, as long as you fix the typo.
Attachment #404659 -
Flags: review?(catlee) → review+
| Assignee | ||
Comment 5•16 years ago
|
||
Here's the error I found in the system log:
Oct 5 11:44:05 localhost com.apple.launchd[1] (org.nagios.nrpe): Unknown key: ServiceDescription
Oct 5 11:44:58 moz2-darwin9-slave16 com.apple.launchd[1] (org.nagios.nrpe): Unknown key: ServiceDescription
This is a copy of the plist file without that key, which seems to work right away, rather than 20 minutes later.
Attachment #404660 -
Flags: review?(catlee)
Updated•16 years ago
|
Attachment #404660 -
Flags: review?(catlee) → review+
| Assignee | ||
Comment 6•16 years ago
|
||
Comment on attachment 404659 [details] [diff] [review]
Manage the nrpe plist file with Puppet
changeset: 59:4e22d906aa0b
Attachment #404659 -
Flags: checked-in+
| Assignee | ||
Comment 7•16 years ago
|
||
Comment on attachment 404660 [details] [diff] [review]
nrpe plist
Checking in nrpe.plist;
/mofo/puppet-files/darwin9/nrpe.plist,v <-- nrpe.plist
initial revision: 1.1
done
Attachment #404660 -
Flags: checked-in+
| Assignee | ||
Comment 8•16 years ago
|
||
I updated the Puppet masters, these should be OK now. Leaving this bug open for now.
| Assignee | ||
Comment 9•16 years ago
|
||
This didn't fix it, apparently. I've disabled the notifications for now, to quiet down #build. I'll come back to this a bit later today.
| Assignee | ||
Comment 10•16 years ago
|
||
This patch makes Puppet not exec enablerpe unless:
* /Library/LaunchDaemons/nrpe.plist doesn't exist (this is the creates => part)
* setup-nrpe gets run (this is the first part of the subscribe)
* /usr/local/nagios/etc/nrpe.plist changes (the second part of the subscribe)
I tested a few reboot cycles with and without this patch and the results were 100% consistent:
Without the patch, the nagios server gets connection refused right after boot
With the patch, it is able to properly execute its checks immediately after boot.
Attachment #404882 -
Flags: review?(catlee)
Updated•16 years ago
|
Attachment #404882 -
Flags: review?(catlee) → review+
| Assignee | ||
Comment 11•16 years ago
|
||
Comment on attachment 404882 [details] [diff] [review]
only run enablenrpe if nrpe.plist has changed
changeset: 62:758b10a407d1
Attachment #404882 -
Flags: checked-in+
| Assignee | ||
Comment 12•16 years ago
|
||
I haven't seen any more of these fail, and I explicitly saw one work immediately after booting. I think the last patch did it.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•