Suspiciously high number of talos slaves are broken

RESOLVED FIXED

Status

Release Engineering
Buildduty
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: simone, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
A high percentage of both talos-linux32-ix and talos-linux64-ix boxes are reported as broken in slave_health. This is causing an increasing jobs backlog (especially for 64 machines).
(Reporter)

Comment 1

3 years ago
slaverebooter is apparently failing to reboot these machines:

[6:03pm] coop: something is causing slaverebooter to hang before it gets to the talos-linux64-ix machines
[6:03pm] coop: since those are last in its list
[6:03pm] coop: i blame xp

Also see: https://bugzilla.mozilla.org/show_bug.cgi?id=971861#c7, which "may" be related
(Reporter)

Comment 2

3 years ago
:coop: now the situation seems improved: did you do anything (slaverebooter related steps, manual rebooting, ...)?
Flags: needinfo?(coop)

Comment 3

3 years ago
(In reply to Simone Bruno [:simone] from comment #2)
> :coop: now the situation seems improved: did you do anything (slaverebooter
> related steps, manual rebooting, ...)?

Jordan discovered that slaveapi needed the updated passwords for cltbld and Administrator. On top of that, I manually rebooted all the talos-linux* slaves that weren't still actively taking jobs.

Windows pending jobs are still high today, but I think Linux is fixed for now.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Flags: needinfo?(coop)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.