1603467 - Stand up a dashboard measuring scheduler efficiency

Reporter

Description

•

6 years ago

We have a rudimentary metric (which we'll improve over time) that attempts to measure how efficient a scheduling algorithm is. We also have the ability to run so called shadow-schedulers, and may implement another mechanism to measure scheduling changes in the task generation phase.

We should automate the process of collecting the requisite data and feed it into a dashboard so we can see at a glance how the different scheduling algorithms are performing. Then we can use this information to determine what gets run by default on autoland.

Andrew Halberstadt [:ahal]

Reporter

Updated

•

6 years ago

Priority: -- → P2

Marco Castelluccio [:marco]

Comment 1

•

6 years ago

In such a dashboard, it would also be nice to show the following:

graph of the evolution of the number (and total duration, and cost if we can) of total tasks that could be run;
graph of the evolution of the number (and total duration, and cost if we can) of tasks scheduled;
graph of the evolution of the number of backouts and delay between landings and corresponding backouts.

We could call the dashboard "arewegreenyet", hinting to the quantity of carbon dioxide emissions we will prevent.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

6 years ago

Assignee: nobody → klahnakoski

Kyle Lahnakoski [:ekyle]

Assignee

Updated

•

6 years ago

Whiteboard: [smart-sched]

Kyle Lahnakoski [:ekyle]

Assignee

Comment 2

•

6 years ago

CO2 Emissions Dashboard!

Kyle Lahnakoski [:ekyle]

Assignee

Comment 3

•

6 years ago

:ahal

I assume "how efficient a scheduling algorithm is" is located here: https://github.com/mozilla/ci-recipes/blob/master/recipes/scheduler_analysis.py#L135

That line assumes mach and hg are installed. What other setup is required?

Flags: needinfo?(ahal)

Andrew Halberstadt [:ahal]

Reporter

Comment 4

•

6 years ago

•

Edited

Yes, that is the one. You need a mozilla-central clone at the moment, I don't think there are any other dependencies other than the ones in poetry install. It's best if you clone outside of that script and then pass it in via the --gecko-path argument.

Though, I think gecko is only necessary if you want to test a custom scheduling algorithm that you've implemented locally.. If we only care about the shadow-schedulers (which we would for this dashboard), then gecko shouldn't be needed. That script was written 6 months ago and is in need of a re-write on top of modern mozci.

Flags: needinfo?(ahal)

Kyle Lahnakoski [:ekyle]

Assignee

Comment 5

•

5 years ago

Enable tier 3, search for shadow-scheduler: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=shadow-scheduler

The relevant_tests job generates an optimized_tasks.list artifact: https://firefoxci.taskcluster-artifacts.net/A-I-wYI7SKaRkEkmayvCMQ/0/public/shadow-scheduler/optimized_tasks.list

Kyle Lahnakoski [:ekyle]

Assignee

Comment 6

•

5 years ago

The baseline scheduler output can be found in the decision task: eg https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=decision&selectedJob=291693194

The task-graph.json is the output with (currently SETA) decisions: https://firefoxci.taskcluster-artifacts.net/fzgPWeChT-mcYbQwduwWzA/0/public/task-graph.json found at data.values().label

mozci has code to pull this information from an artifact: https://github.com/mozilla/mozci/blob/master/mozci/push.py#L277

Kyle Lahnakoski [:ekyle]

Assignee

Comment 7

•

5 years ago

•

Edited

Example of bugbug guided task selection: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=291692042&tier=1%2C2%2C3&revision=4dfd0a3206974fe31db5766d507d4dc315964d23

and the specific artifact : https://firefoxci.taskcluster-artifacts.net/N-Ef1MM3TuGeTE0MnNJAww/0/public/shadow-scheduler/optimized_tasks.list

Kyle Lahnakoski [:ekyle]

Assignee

Comment 8

•

5 years ago

Steps

ETL the 'optimized_tasks.list' into a database (like bigquery)
- there problem is where to run this cron job
- maybe the task can push the data into bigquery directly?
ETL the backout informationinto same database
- what's pulled, and how to process should be in mozci
Write the analysis logic (not necessarily complicated)
- the backout rate, and distance to backout (?mozci?)
- number of tasks rquested
- total test hours requested (will require average run time per task type, from the treeherder data)
show dashboard (using DataStudio)

Sylvestre Ledru [:Sylvestre]

Comment 9

•

5 years ago

Will it be possible to plug redash on thr big query db?
I would be also interested to get the end to end time (decision task to completion)
Thanks

Joel Maher ( :jmaher ) (UTC -8)

Comment 10

•

5 years ago

I think end to end time is out of scope of the project. I do think this project will make a dent in that- here is a dashboard showing scheduled->completion time:
https://datastudio.google.com/reporting/1Xo4joOq1PzlqF7iwq1SNmwbcQlLPHghV/page/Y5Vx

Kyle Lahnakoski [:ekyle]

Assignee

Updated

•

5 years ago

Comment 11

•

5 years ago

:Sylvestre I add the desire for end-to-end time to the MVP measures. We can discuss it when we discuss what the MVP will be exactly. At least it is not lost.

Kyle Lahnakoski [:ekyle]

Assignee

Comment 12

•

5 years ago

Docs for cron-like tasks http://firefox-source-docs.mozilla.org/taskcluster/cron.html

Armen [:armenzg]

Updated

•

5 years ago

Depends on: 1623101

Andrew Halberstadt [:ahal]

Reporter

Comment 13

•

5 years ago

Should we close this bug out Kyle? Or did you want to use it to track moving to Armen's repo?

Flags: needinfo?(klahnakoski)

Kyle Lahnakoski [:ekyle]

Assignee

Comment 14

•

5 years ago

Please keep this open, I am not done

Flags: needinfo?(klahnakoski)

Marco Castelluccio [:marco]

Updated

•

5 years ago

Updated

•

5 years ago

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

Bugzilla

Stand up a dashboard measuring scheduler efficiency

Categories

(Firefox Build System :: Task Configuration, task, P2)

Tracking

(Not tracked)

People

(Reporter: ahal, Assigned: ekyle)

References

(Blocks 1 open bug)

Details

(Whiteboard: [smart-sched])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Comment 12

Updated

Comment 13

Comment 14

Updated

Updated