Closed
Bug 1071152
Opened 10 years ago
Closed 8 years ago
Add a 'machine failures' page to treeherder, based on Ouija
Categories
(Tree Management :: Treeherder, defect, P5)
Tree Management
Treeherder
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: dminor, Unassigned)
References
Details
Ouija (see bug 909735, running instance at http://54.215.155.53/) has a slave failures page that allows each slave's failure rate to be determined and compared to the "expected" failure rate for a slave that has run that number of jobs. It also displays the number of jobs since last success.
This allows problematic slaves to be identified and removed from the pool. This tool has been useful to the sheriffs in the past and would see more use if it were part of treeherder.
Updated•10 years ago
|
Comment 1•10 years ago
|
||
Tweaking summary to make this bug more clearly different from bug 1087532 (I misread it a few times).
Since bug 1087532 is more practical short term, I'll make that block the TBPL EOL bug, rather than this one.
No longer blocks: tbpl-eol
Summary: Port Ouija 'slave failures' page to treeherder → Add a 'slave failures' page to treeherder, based on Ouija
Updated•10 years ago
|
Summary: Add a 'slave failures' page to treeherder, based on Ouija → Add a 'machine failures' page to treeherder, based on Ouija
Reporter | ||
Comment 2•10 years ago
|
||
Ed, I think the backend piece of this is a good candidate for the big/open data project we've been talking about (I'll file a separate bug.)
I was wondering if you thought the reporting bit still belonged in treeherder?
Flags: needinfo?(emorley)
Comment 3•10 years ago
|
||
I think it probably still does - I think unlike some of the other reports/dasbhoards (eg orangefactor), the "bad machines" report needs to be actioned in near real time, at least for the "runaway machine that's chewing through 100 jobs in N hours" case. Longer term analysis (does this machine have a higher rate of failure over the last 3 weeks and so it might have some bad RAM) could be kept elsewhere though perhaps?
Flags: needinfo?(emorley)
Updated•10 years ago
|
Priority: P4 → P5
Comment 5•8 years ago
|
||
In a Taskcluster spot instances AWS world, I don't think the machine failures page really makes much sense.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•