some newly re-imaged w32-ix machines appear to not have correctly configured nrpe

RESOLVED FIXED

Status

RESOLVED FIXED
7 years ago
5 years ago

People

(Reporter: arich, Assigned: armenzg)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

7 years ago
Taking a look at nagios, there are a few w32-ix slaves that have been handed over to releng for postconfig/opsification that don't appear to be performing NRPE checks correctly (two of these (26 & 29) are back in service according to slavealloc):

w32-ix-slave03.build.scl1
w32-ix-slave23.build.scl1
w32-ix-slave26.build.scl1
w32-ix-slave29.build.scl1

https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=8&hoststatustypes=3
(In reply to Amy Rich [:arich] from comment #0)
> w32-ix-slave26.build.scl1
> w32-ix-slave29.build.scl1

These two I rebooted yesterday when looking for slaves that were stuck in idle states. 26 was disabled in slavealloc before then; both machines were rehabilitated in bug 673436.

Since being enabled 26 failed one build where it lost connection to the master, and 29 has successfully done 18 builds. Disabling both in slavealloc until we confirm their state.
(Assignee)

Updated

7 years ago
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
w32-ix-slave03 has the wrong keys for try, it's disabled now too.
(Assignee)

Comment 3

7 years ago
Even though I don't see discrepancies for hostkeys between OPSI and the slaves I am 99.9% sure that these slaves have not been setup properly as I see different NSC.ini (even though OPSI indicates it to be installed).

Re-installing the nagios package again has cleared the nagios check for slave26.

All of these slaves got setup OR half-setup in bug 673436
w32-ix-slave03.build.scl1
w32-ix-slave23.build.scl1
w32-ix-slave26.build.scl1 
w32-ix-slave29.build.scl1 - not disabled in slavealloc but should be - disabling now
w32-ix-slave41.build.scl1 - I can see it is a clear case of wrong hostkeys

I vote for reimaging them and re-setting them up as I have no way of verifying in which state they are.

One thing that it could have happened is to change the column "installation state" instead of "action request" as I've made that mistake myself.

What do you think?
(Assignee)

Updated

7 years ago
Depends on: 663025
(Assignee)

Updated

7 years ago
Depends on: 673415
Most of these were already re-imaged, so I don't think re-imaging them again will help.
(Reporter)

Comment 5

7 years ago
Most (all?) of these hosts were ones that were recently reimaged and handed over to releng to do post-configure and opsification.  If those steps are idempotent, I suggest just performing them again, because I don't think another reimage is going to help.  If they aren't idempotent, then we can do another reimage.
(Reporter)

Comment 6

7 years ago
I tried posting an update while bugzilla was down, but...

All of these hosts have been reimaged again.
(Assignee)

Updated

7 years ago
Blocks: 682408
(Assignee)

Comment 7

7 years ago
I will use bug 682408 to set them up.

Thanks Amy!
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
(Assignee)

Updated

7 years ago
No longer depends on: 663025
(Assignee)

Comment 8

7 years ago
Created attachment 556650 [details] [diff] [review]
allow marking the nagios package to be always installed

Let's make this package to be always installed so we won't bite this problem again.

See bug 682931 for details.
Attachment #556650 - Flags: review?(bhearsum)
(Assignee)

Updated

7 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #556650 - Flags: review?(bhearsum) → review+
(Assignee)

Comment 9

7 years ago
This bug is FIXED. I determined what happened with the nagios package in bug 683670.
Status: REOPENED → RESOLVED
Last Resolved: 7 years ago7 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.