Taking a look at nagios, there are a few w32-ix slaves that have been handed over to releng for postconfig/opsification that don't appear to be performing NRPE checks correctly (two of these (26 & 29) are back in service according to slavealloc): w32-ix-slave03.build.scl1 w32-ix-slave23.build.scl1 w32-ix-slave26.build.scl1 w32-ix-slave29.build.scl1 https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=8&hoststatustypes=3
(In reply to Amy Rich [:arich] from comment #0) > w32-ix-slave26.build.scl1 > w32-ix-slave29.build.scl1 These two I rebooted yesterday when looking for slaves that were stuck in idle states. 26 was disabled in slavealloc before then; both machines were rehabilitated in bug 673436. Since being enabled 26 failed one build where it lost connection to the master, and 29 has successfully done 18 builds. Disabling both in slavealloc until we confirm their state.
w32-ix-slave03 has the wrong keys for try, it's disabled now too.
Even though I don't see discrepancies for hostkeys between OPSI and the slaves I am 99.9% sure that these slaves have not been setup properly as I see different NSC.ini (even though OPSI indicates it to be installed). Re-installing the nagios package again has cleared the nagios check for slave26. All of these slaves got setup OR half-setup in bug 673436 w32-ix-slave03.build.scl1 w32-ix-slave23.build.scl1 w32-ix-slave26.build.scl1 w32-ix-slave29.build.scl1 - not disabled in slavealloc but should be - disabling now w32-ix-slave41.build.scl1 - I can see it is a clear case of wrong hostkeys I vote for reimaging them and re-setting them up as I have no way of verifying in which state they are. One thing that it could have happened is to change the column "installation state" instead of "action request" as I've made that mistake myself. What do you think?
Most of these were already re-imaged, so I don't think re-imaging them again will help.
Most (all?) of these hosts were ones that were recently reimaged and handed over to releng to do post-configure and opsification. If those steps are idempotent, I suggest just performing them again, because I don't think another reimage is going to help. If they aren't idempotent, then we can do another reimage.
I tried posting an update while bugzilla was down, but... All of these hosts have been reimaged again.
I will use bug 682408 to set them up. Thanks Amy!
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Created attachment 556650 [details] [diff] [review] allow marking the nagios package to be always installed Let's make this package to be always installed so we won't bite this problem again. See bug 682931 for details.
Attachment #556650 - Flags: review?(bhearsum)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #556650 - Flags: review?(bhearsum) → review+
This bug is FIXED. I determined what happened with the nagios package in bug 683670.
Status: REOPENED → RESOLVED
Last Resolved: 7 years ago → 7 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.