Closed Bug 677888 Opened 13 years ago Closed 13 years ago

some newly re-imaged w32-ix machines appear to not have correctly configured nrpe

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arich, Assigned: armenzg)

References

Details

Attachments

(1 file)

Taking a look at nagios, there are a few w32-ix slaves that have been handed over to releng for postconfig/opsification that don't appear to be performing NRPE checks correctly (two of these (26 & 29) are back in service according to slavealloc):

w32-ix-slave03.build.scl1
w32-ix-slave23.build.scl1
w32-ix-slave26.build.scl1
w32-ix-slave29.build.scl1

https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=8&hoststatustypes=3
(In reply to Amy Rich [:arich] from comment #0)
> w32-ix-slave26.build.scl1
> w32-ix-slave29.build.scl1

These two I rebooted yesterday when looking for slaves that were stuck in idle states. 26 was disabled in slavealloc before then; both machines were rehabilitated in bug 673436.

Since being enabled 26 failed one build where it lost connection to the master, and 29 has successfully done 18 builds. Disabling both in slavealloc until we confirm their state.
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
w32-ix-slave03 has the wrong keys for try, it's disabled now too.
Even though I don't see discrepancies for hostkeys between OPSI and the slaves I am 99.9% sure that these slaves have not been setup properly as I see different NSC.ini (even though OPSI indicates it to be installed).

Re-installing the nagios package again has cleared the nagios check for slave26.

All of these slaves got setup OR half-setup in bug 673436
w32-ix-slave03.build.scl1
w32-ix-slave23.build.scl1
w32-ix-slave26.build.scl1 
w32-ix-slave29.build.scl1 - not disabled in slavealloc but should be - disabling now
w32-ix-slave41.build.scl1 - I can see it is a clear case of wrong hostkeys

I vote for reimaging them and re-setting them up as I have no way of verifying in which state they are.

One thing that it could have happened is to change the column "installation state" instead of "action request" as I've made that mistake myself.

What do you think?
Depends on: 663025
Depends on: 673415
Most of these were already re-imaged, so I don't think re-imaging them again will help.
Most (all?) of these hosts were ones that were recently reimaged and handed over to releng to do post-configure and opsification.  If those steps are idempotent, I suggest just performing them again, because I don't think another reimage is going to help.  If they aren't idempotent, then we can do another reimage.
I tried posting an update while bugzilla was down, but...

All of these hosts have been reimaged again.
Blocks: 682408
I will use bug 682408 to set them up.

Thanks Amy!
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
No longer depends on: 663025
Let's make this package to be always installed so we won't bite this problem again.

See bug 682931 for details.
Attachment #556650 - Flags: review?(bhearsum)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #556650 - Flags: review?(bhearsum) → review+
This bug is FIXED. I determined what happened with the nagios package in bug 683670.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: