suspect slaveapi rebooting is broken for windows machines

RESOLVED INVALID

Status

Infrastructure & Operations
CIDuty
RESOLVED INVALID
a year ago
2 months ago

People

(Reporter: arr, Unassigned)

Tracking

Details

(Reporter)

Description

a year ago
Slaveapi has been filing a lot of bugs for rebooting windows machines recently. I suspect this is because someone rotated the password for them on slaveapi before the passwords were changed on the hosts themselves. Is there a way to roll that change back for windows and sync up better so that we don't have to have someone manually handling all these machines in the meantime?

In the future, it would be good to make sure the work is complete on both sides at the same time.
It looks like the passwords have not changed or at least had not been updated in the typical place.
I doubt that explanation, having been involved in... well, probably I'm responsible for every single one of those reboot bug storms. The one where I vaguely remember the numbers, I caused something like 8 or 12 Win8 reboot bugs, but that was while rebooting 56 Win8 slaves, when slave_health got unbroken after not updating for a day and a half, a time period that apparently crossed one of those events, network or temporarily broken slavealloc or whatever else that does it, which causes widespread Windows slave death. So unless slaveapi stores the password in multiple places and only one was changed, or some slaves have a new password and some do not, it shouldn't be that, because I'm only causing bugs for 1/5th or 1/6th of the things I'm rebooting.
(Reporter)

Comment 3

a year ago
Okay, then this probably goes back to the fact that slaveapi probably doesn't wait long enough for a slave to reboot before filing a bug.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → INVALID
While taking a look at the slaveapi secrets currently in use, I noticed several of them that were not updated in hiera. 
    - slaveapi_ipmi_password: changed ~ 1 month ago
    - slaveapi_root_passwords, slaveapi_administrator_passwords: in bug 1259491 we decided to set the root password based on the security level assigned to each machine (that includes both Windows AWS + GPO). These passwords can now be found in separate gpg files in our private repo.

I updated hiera with the new values, so this should look better now.

Updated

2 months ago
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.