Measure machine time of backfilled perf jobs
Categories
(Tree Management :: Perfherder, task, P2)
Tracking
(Not tracked)
People
(Reporter: igoldan, Assigned: sparky)
References
Details
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
I figured out how to get the total machine time for backfills and you can find the script here: https://github.com/gmierz/moz-current-tests/blob/master/gen_backfill_report.py
You can run it with a command like: python3 gen_backfill_report.py --start-date today-week --branches autoland
Yesterday when I ran this command I found the total time spent on backfilled jobs was 495.31 hours.
That will return the total number of hours spent backfilling in the last week on autoland. Note that it can take some time because we have to download all the to-run and label-to-taskid artifacts produced by the backfill decision tasks to be able to tell which tasks were backfilled. I've made the downloads a bit faster with threads but it might still need some work to be able to recover from failures. I originally thought I would need to combine activedata with another source of data, but that's not the case so this script could be worked into ADR.
Getting this time can be a lot faster if we get something added to the task to denote that it's a backfilled job (and make sure that addition gets into activedata). There would need to be modifications done here to get that: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/actions/backfill.py#119
:igoldan, can we mark this issue as solved or do we want to do something else here as well? The next step is modelling the amount of time that automatic backfills will take.
Reporter | ||
Comment 2•5 years ago
•
|
||
We still need some extra optional filters for:
- Talos, Raptor, Browsertime & AWSY perf jobs
- autoland & mozilla-beta repositories
Basically, for the 1st set of filters, we need all jobs under the T(), Rap(), Btime() & SY() symbols.
As I said, these filters are optional, as the query above already has its own advantage.
They shouldn't be hard coded into the query. But if provided, the query should be able to pick them specifically.
Comment 3•5 years ago
|
||
:igoldan Can this be reduced to a BigQuery query? We now have all the Treeherder jobs in BigQuery for stage (prod will be ready this weekend).
- Request a user: https://mana.mozilla.org/wiki/pages/viewpage.action?spaceKey=SVCOPS&title=Google+Cloud+Platform+%28GCP%29+For+Firefox+Organization
- Try to access the table: https://console.cloud.google.com/bigquery?project=moz-fx-dev-ekyle-treeherder&_ga=2.266718542.1162090575.1582214471-972438803.1569166742&p=moz-fx-dev-ekyle-treeherder&d=treeherder_2d_stage&t=jobs__9PAQq1nIsN1wx2NBo8H1&page=table
Once you can do that, you can
select
job__type.name.__s__ AS type_name,
options.option.__s__ AS option,
who.__s__ as `person`,
count(1) num_tasks,
sum(timestamp_diff(end__time.__t__, start__time.__t__, second)) AS seconds_duration
from treeherder_2d_stage.jobs j
where start__time.__t__>'2020-01-01' and
who.__s__ = 'igoldan@mozilla.com'
group by job__type.name.__s__, options.option.__s__, who.__s__
The above query is not correct, just an example of what can be done.
Assignee | ||
Comment 4•5 years ago
|
||
:ekyle, the problem is that there's no way to differentiate a normal task, or a retrigger, from a backfill request using the data we have in the databases. We would need to make some changes here to be able to use only database queries to measure this timing: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/actions/backfill.py#119
:igoldan, the tool has the optional filters available now: https://github.com/gmierz/moz-current-tests/blob/master/gen_backfill_report.py
Reporter | ||
Comment 5•5 years ago
•
|
||
(In reply to Greg Mierzwinski [:sparky] from comment #4)
[...]
:igoldan, the tool has the optional filters available now: https://github.com/gmierz/moz-current-tests/blob/master/gen_backfill_report.py
We can close this ticket as resolved and move on to bug 1620551.
Reporter | ||
Updated•5 years ago
|
Comment 6•5 years ago
|
||
can we give a brief summary of what was measured so we have a reference point? Maybe the month of February 2020?
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 7•5 years ago
•
|
||
Not sure how to set the query for February specifically.
I can give a brief summary for the last month, ending March 16 today.
On autoland
& mozilla-beta
, on talos
, raptor
, browsertime
& awsy
, on all
platforms, the perf sheriffs (fstrugariu, mraiciof, aionescu) backfilled 1276 jobs.
This totaled in 265.5 hours of duration, with a job duration averaging at around 12.5 minutes, min duration 15 seconds, max duration 40.5 minutes.
Description
•