Closed
Bug 1168389
Opened 10 years ago
Closed 10 years ago
Mark slaves as "idle" instead of "broken" if there is no pending queue for that pool
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: coop, Assigned: coop)
Details
(Whiteboard: [buildduty][slavehealth])
In slave health, the "broken" designation for slaves is often inaccurate. When there are no pending jobs for a given pool, it's expected that machines may not have reported a job status in 6+ hours, e.g. on weekends.
We can make a pretty simple change to increase the accuracy here. Just like we do for AWS slave classes ("stopped"), we can mark the unused capacity for hardware slaves as "idle" instead of "broken" by default. Once we check the pending counts for that pool, we can change that status to "broken" iff that slave class has any pending jobs.
| Assignee | ||
Comment 1•10 years ago
|
||
Some good points brought up in IRC this morning:
* "broken" does not necessarily mean "broken" if all pending jobs are in jacuzzis.
We meed to update the slave_health page legend to point to jacuzzis regardless so that this is discoverable rather than tribal knowledge.
* how does changing hardware machine status from "broken" to "idle" affect batch actions on the slave type page?
We don't currently look up pending job counts for the slavetype.html page, so nothing would change here initially. Assuming we start checking pending counts here too, we could also change the intent of "Reboot all broken slaves" action to work on "idle" slaves when we update the state.
| Assignee | ||
Updated•10 years ago
|
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: -- → P2
| Assignee | ||
Comment 2•10 years ago
|
||
https://hg.mozilla.org/build/slave_health/rev/026ebaf53950
https://hg.mozilla.org/build/slave_health/rev/7aff1ea51d0e
https://hg.mozilla.org/build/slave_health/rev/4a7cbfe69866
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•8 years ago
|
Component: Tools → General
You need to log in
before you can comment on or make changes to this bug.
Description
•