Closed Bug 625143 Opened 14 years ago Closed 13 years ago

nagios gets failing PINGs that magically come back

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 625867

People

(Reporter: dustin, Assigned: dustin)

References

()

Details

I'm not sure if these are a nagios failure (nagios isn't really built to ping boxes that restart all the time) or a problem with the slaves.

I'll track the alerts in the google spreadsheet in the URL to see if I can find the pattern.
I can't tell if these alerts are bogus or not - too much other chaos.  I'll comment on that in the parent bug, and hopefully work it out in person tomorrow.
Sometimes these look like:

15:08 < nagios> [42] try-linux64-slave09.build:buildbot is CRITICAL: Connection refused by host

and they seem to happen while the slave is restarting - this one did.  I checked the web interface and saw the CRITICAL I expected.  A few moments later I navigated back to the same page and saw "NRPE: Unable to read output"

What I don't understand is that in the web interface this service - indeed, all of the services for this host, and on a few other hosts I've checked - are marked as passive, with active checks disabled.  I didn't think that was possible with NRPE - aren't NRPE checks triggered when the master connects to the slave and requests the check?

There's something here I don't understand that's blocking my ability to diagnose further.
I was mixing up some "Connection refused" (which was due to a typo in my puppet deployment of the nrpe.cnf changes) with the ping failures, which are better described in bug 625867.  So, dup'ing.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.