Closed Bug 1039313 Opened 10 years ago Closed 10 years ago

Suspiciously high number of talos slaves are broken

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sbruno, Unassigned)

Details

A high percentage of both talos-linux32-ix and talos-linux64-ix boxes are reported as broken in slave_health. This is causing an increasing jobs backlog (especially for 64 machines).
slaverebooter is apparently failing to reboot these machines:

[6:03pm] coop: something is causing slaverebooter to hang before it gets to the talos-linux64-ix machines
[6:03pm] coop: since those are last in its list
[6:03pm] coop: i blame xp

Also see: https://bugzilla.mozilla.org/show_bug.cgi?id=971861#c7, which "may" be related
:coop: now the situation seems improved: did you do anything (slaverebooter related steps, manual rebooting, ...)?
Flags: needinfo?(coop)
(In reply to Simone Bruno [:simone] from comment #2)
> :coop: now the situation seems improved: did you do anything (slaverebooter
> related steps, manual rebooting, ...)?

Jordan discovered that slaveapi needed the updated passwords for cltbld and Administrator. On top of that, I manually rebooted all the talos-linux* slaves that weren't still actively taking jobs.

Windows pending jobs are still high today, but I think Linux is fixed for now.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(coop)
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.