Stand up a dashboard measuring scheduler efficiency
Categories
(Firefox Build System :: Task Configuration, task, P2)
Tracking
(Not tracked)
People
(Reporter: ahal, Assigned: ekyle)
References
(Blocks 1 open bug)
Details
(Whiteboard: [smart-sched])
We have a rudimentary metric (which we'll improve over time) that attempts to measure how efficient a scheduling algorithm is. We also have the ability to run so called shadow-schedulers
, and may implement another mechanism to measure scheduling changes in the task generation phase.
We should automate the process of collecting the requisite data and feed it into a dashboard so we can see at a glance how the different scheduling algorithms are performing. Then we can use this information to determine what gets run by default on autoland.
Reporter | ||
Updated•6 years ago
|
Comment 1•6 years ago
|
||
In such a dashboard, it would also be nice to show the following:
- graph of the evolution of the number (and total duration, and cost if we can) of total tasks that could be run;
- graph of the evolution of the number (and total duration, and cost if we can) of tasks scheduled;
- graph of the evolution of the number of backouts and delay between landings and corresponding backouts.
We could call the dashboard "arewegreenyet", hinting to the quantity of carbon dioxide emissions we will prevent.
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 2•6 years ago
|
||
CO2 Emissions Dashboard!
Assignee | ||
Comment 3•6 years ago
|
||
:ahal
I assume "how efficient a scheduling algorithm is" is located here: https://github.com/mozilla/ci-recipes/blob/master/recipes/scheduler_analysis.py#L135
That line assumes mach
and hg
are installed. What other setup is required?
Reporter | ||
Comment 4•6 years ago
•
|
||
Yes, that is the one. You need a mozilla-central
clone at the moment, I don't think there are any other dependencies other than the ones in poetry install
. It's best if you clone outside of that script and then pass it in via the --gecko-path
argument.
Though, I think gecko
is only necessary if you want to test a custom scheduling algorithm that you've implemented locally.. If we only care about the shadow-schedulers
(which we would for this dashboard), then gecko shouldn't be needed. That script was written 6 months ago and is in need of a re-write on top of modern mozci
.
Assignee | ||
Comment 5•5 years ago
|
||
Enable tier 3, search for shadow-scheduler
: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=shadow-scheduler
The relevant_tests
job generates an optimized_tasks.list
artifact: https://firefoxci.taskcluster-artifacts.net/A-I-wYI7SKaRkEkmayvCMQ/0/public/shadow-scheduler/optimized_tasks.list
Assignee | ||
Comment 6•5 years ago
|
||
The baseline scheduler output can be found in the decision task: eg https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=decision&selectedJob=291693194
The task-graph.json
is the output with (currently SETA) decisions: https://firefoxci.taskcluster-artifacts.net/fzgPWeChT-mcYbQwduwWzA/0/public/task-graph.json found at data.values().label
mozci has code to pull this information from an artifact: https://github.com/mozilla/mozci/blob/master/mozci/push.py#L277
Assignee | ||
Comment 7•5 years ago
•
|
||
Example of bugbug guided task selection: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=291692042&tier=1%2C2%2C3&revision=4dfd0a3206974fe31db5766d507d4dc315964d23
and the specific artifact : https://firefoxci.taskcluster-artifacts.net/N-Ef1MM3TuGeTE0MnNJAww/0/public/shadow-scheduler/optimized_tasks.list
Assignee | ||
Comment 8•5 years ago
|
||
Steps
- ETL the 'optimized_tasks.list' into a database (like bigquery)
- there problem is where to run this cron job
- maybe the task can push the data into bigquery directly?
- ETL the backout informationinto same database
- what's pulled, and how to process should be in mozci
- Write the analysis logic (not necessarily complicated)
- the backout rate, and distance to backout (?mozci?)
- number of tasks rquested
- total test hours requested (will require average run time per task type, from the treeherder data)
- show dashboard (using DataStudio)
Comment 9•5 years ago
|
||
Will it be possible to plug redash on thr big query db?
I would be also interested to get the end to end time (decision task to completion)
Thanks
Comment 10•5 years ago
|
||
I think end to end time is out of scope of the project. I do think this project will make a dent in that- here is a dashboard showing scheduled->completion time:
https://datastudio.google.com/reporting/1Xo4joOq1PzlqF7iwq1SNmwbcQlLPHghV/page/Y5Vx
Assignee | ||
Comment 11•5 years ago
|
||
:Sylvestre I add the desire for end-to-end time to the MVP measures. We can discuss it when we discuss what the MVP will be exactly. At least it is not lost.
Assignee | ||
Comment 12•5 years ago
|
||
Docs for cron-like tasks http://firefox-source-docs.mozilla.org/taskcluster/cron.html
Reporter | ||
Comment 13•5 years ago
|
||
Should we close this bug out Kyle? Or did you want to use it to track moving to Armen's repo?
Assignee | ||
Comment 14•5 years ago
|
||
Please keep this open, I am not done
Assignee | ||
Updated•5 years ago
|
Description
•