1569166 - Measure machine time of backfilled perf jobs

You can run it with a command like: python3 gen_backfill_report.py --start-date today-week --branches autoland
Yesterday when I ran this command I found the total time spent on backfilled jobs was 495.31 hours.

That will return the total number of hours spent backfilling in the last week on autoland. Note that it can take some time because we have to download all the to-run and label-to-taskid artifacts produced by the backfill decision tasks to be able to tell which tasks were backfilled. I've made the downloads a bit faster with threads but it might still need some work to be able to recover from failures. I originally thought I would need to combine activedata with another source of data, but that's not the case so this script could be worked into ADR.

Getting this time can be a lot faster if we get something added to the task to denote that it's a backfilled job (and make sure that addition gets into activedata). There would need to be modifications done here to get that: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/actions/backfill.py#119

:igoldan, can we mark this issue as solved or do we want to do something else here as well? The next step is modelling the amount of time that automatic backfills will take.

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Reporter

Updated

•

5 years ago

Blocks: 1618829

Ionuț Goldan [:igoldan]

Reporter

Comment 2

•

5 years ago

•

Edited

We still need some extra optional filters for:

Talos, Raptor, Browsertime & AWSY perf jobs
autoland & mozilla-beta repositories

Basically, for the 1st set of filters, we need all jobs under the T(), Rap(), Btime() & SY() symbols.

As I said, these filters are optional, as the query above already has its own advantage.
They shouldn't be hard coded into the query. But if provided, the query should be able to pick them specifically.

Flags: needinfo?(igoldan)

Kyle Lahnakoski [:ekyle]

Comment 3

•

5 years ago

:igoldan Can this be reduced to a BigQuery query? We now have all the Treeherder jobs in BigQuery for stage (prod will be ready this weekend).

Once you can do that, you can

select  
    job__type.name.__s__ AS type_name, 
    options.option.__s__ AS option, 
    who.__s__ as `person`, 
    count(1) num_tasks,
    sum(timestamp_diff(end__time.__t__, start__time.__t__, second)) AS seconds_duration
from treeherder_2d_stage.jobs j
where start__time.__t__>'2020-01-01' and
who.__s__ = 'igoldan@mozilla.com'
group by job__type.name.__s__, options.option.__s__, who.__s__

The above query is not correct, just an example of what can be done.

Greg Mierzwinski [:sparky]

Assignee

Comment 4

•

5 years ago

:ekyle, the problem is that there's no way to differentiate a normal task, or a retrigger, from a backfill request using the data we have in the databases. We would need to make some changes here to be able to use only database queries to measure this timing: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/actions/backfill.py#119

:igoldan, the tool has the optional filters available now: https://github.com/gmierz/moz-current-tests/blob/master/gen_backfill_report.py

Ionuț Goldan [:igoldan]

Reporter

Updated

•

5 years ago

Blocks: 1620551

Ionuț Goldan [:igoldan]

Reporter

Comment 5

•

5 years ago

•

Edited

(In reply to Greg Mierzwinski [:sparky] from comment #4)

[...]
:igoldan, the tool has the optional filters available now: https://github.com/gmierz/moz-current-tests/blob/master/gen_backfill_report.py

We can close this ticket as resolved and move on to bug 1620551.

Ionuț Goldan [:igoldan]

Reporter

Updated

•

5 years ago

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

Joel Maher ( :jmaher ) (UTC -8)

Comment 6

•

5 years ago

can we give a brief summary of what was measured so we have a reference point? Maybe the month of February 2020?

Ionuț Goldan [:igoldan]

Reporter

Updated

•

5 years ago

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Reporter

Comment 7

•

5 years ago

•

Edited

Not sure how to set the query for February specifically.
I can give a brief summary for the last month, ending March 16 today.

On autoland & mozilla-beta, on talos, raptor, browsertime & awsy, on all platforms, the perf sheriffs (fstrugariu, mraiciof, aionescu) backfilled 1276 jobs.
This totaled in 265.5 hours of duration, with a job duration averaging at around 12.5 minutes, min duration 15 seconds, max duration 40.5 minutes.

Flags: needinfo?(igoldan)

Bugzilla

Measure machine time of backfilled perf jobs

Categories

(Tree Management :: Perfherder, task, P2)

Tracking

(Not tracked)

People

(Reporter: igoldan, Assigned: sparky)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Updated

Comment 6

Updated

Comment 7