Closed Bug 626088 Opened 14 years ago Closed 13 years ago

disable bm-xserveNN nagios NRPE checks

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: arich)

References

Details

(Whiteboard: [buildslaves][simple])

bm-xserve03:slave root# /usr/local/nagios/plugins/check_load 
-sh: /usr/local/nagios/plugins/check_load: Bad CPU type in executable

This will need to be recompiled, I assume, or copied from a working xserve.  I'll list the affected machines here.

bm-xserve03
bm-xserve04
(In reply to comment #0)
> bm-xserve03:slave root# /usr/local/nagios/plugins/check_load 
> -sh: /usr/local/nagios/plugins/check_load: Bad CPU type in executable
> 
> This will need to be recompiled, I assume, or copied from a working xserve. 
> I'll list the affected machines here.
> 
> bm-xserve03
> bm-xserve04

It is probably because bm-xserve01-05 are PPC machines and have the standard (intel) image installed.  Options for resolution include stopping this check on this host and recompiling the check binary.  Because this is checking load on a machine that is expected to be overloaded all the time, I would be in favour of morphing this bug to disable this check for bm-xserve01-05.
I picked check_load arbitrarily - all checks fail on these machines.  We need to rebuild for PPC.
(In reply to comment #2)
> I picked check_load arbitrarily - all checks fail on these machines.  We need
> to rebuild for PPC.

aside from the ping check, I see avg_load and root_partition.  Are there others?
No, but that root_partition check is a good one to keep.
Hm, I didn't realize that these were geriatric.  On the grounds they're very low priority, and will probably not be running buildbot soon, I agree that we should disable the checks.

Morphing the bug for that purpose.
Assignee: nobody → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
Summary: bad CPU type in nagios on bm-xserveNN → bm-xserveNN have wrong nagios binaries, and arent
Oops, hit enter early.

I think that we can safely disable all of the NRPE checks for these two machines, but keep the pings.
Summary: bm-xserveNN have wrong nagios binaries, and arent → disable bm-xserveNN nagios NRPE checks
affected machines are now

bm-xserve03
bm-xserve04
bm-xserve21
Noting that bm-xserve21 has also been having trouble starting up/shutting down (mentioned in bug 629511)
If this bug is about fixing/removing the nagios checks on PPC boxes then a sick Intel box (bm-xserve21) doesn't belong here.
Assignee: server-ops-releng → arich
I've removed all but the ping checks for the hosts bm-xserve03 and bm-xserve04.  bm-xserve21 appears to have been mistakenly listed in this ticket since it's listed in the inventory as an intel and it does not error on the checks.  That host has been left unchanged.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Blocks: 578234
I just checked the inventory and noted that bm-xserve02 is also a PPC machine and remove the NRPE checks for that one as well.
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.