Closed
Bug 1473589
Opened 7 years ago
Closed 6 years ago
Investigate releng-hardware worker failures
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: dhouse, Unassigned)
References
Details
CIDuty/RelOps/DCOps has restarted and reimaged an increased number of releng-hardware workers in the past few months. Please begin investigating to see if there is a systemic or job problem causing this across all hardware types: Windows, Mac, and Linux.
Workers have been found missing in taskcluster worker explorer. This happens when a machine's declaration has expired because of not taking jobs, or a machine has never started taking jobs (not declared itself to taskcluster).
There are a few known causes for workers going missing: loaners, moving machines to a beta/staging worker type (queue). But that accounts for only a handful of machines, and we have seen hundreds needing reimaged or restarted to resume taking work.
Comment 1•7 years ago
|
||
Main machines that fail are: Windows and Mac.
Linux is pretty stable, maybe we have to re-image 1-3/day.
Windows we may have 20 to 30 per day.
MacOSX we may have 10 to 20 per day.
So if we start to investigate, the priority could be Windows -> MacOSX -> Linux
(In reply to Danut Labici [:dlabici] from comment #1)
> Main machines that fail are: Windows and Mac.
> Linux is pretty stable, maybe we have to re-image 1-3/day.
> Windows we may have 20 to 30 per day.
> MacOSX we may have 10 to 20 per day.
>
> So if we start to investigate, the priority could be Windows -> MacOSX ->
> Linux
I didn't know the number was so high for Windows also. Are there tracker bugs for the individual machines or where is there a record of which were reimaged? Are these all moonshot Windows instances?
Comment 3•6 years ago
|
||
Since we are doing the tracking by machines, issues and nodes, there is no point to have this bug around.
If someone considers this bug useful please reopen it.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•