Closed Bug 1569166 Opened 5 years ago Closed 4 years ago

Measure machine time of backfilled perf jobs

Categories

(Tree Management :: Perfherder, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: igoldan, Assigned: sparky)

References

Details

No description provided.
Priority: -- → P2
Summary: Measure current machine time of backfilled perf jobs → Measure machine time of backfilled perf jobs
Blocks: 1570944
Blocks: 1570956
No longer blocks: 1570944
Priority: P2 → P3
Whiteboard: milestone-2
Priority: P3 → P2
Assignee: igoldan → nobody
Assignee: nobody → gmierz2
Whiteboard: milestone-2

I figured out how to get the total machine time for backfills and you can find the script here: https://github.com/gmierz/moz-current-tests/blob/master/gen_backfill_report.py

You can run it with a command like: python3 gen_backfill_report.py --start-date today-week --branches autoland
Yesterday when I ran this command I found the total time spent on backfilled jobs was 495.31 hours.

That will return the total number of hours spent backfilling in the last week on autoland. Note that it can take some time because we have to download all the to-run and label-to-taskid artifacts produced by the backfill decision tasks to be able to tell which tasks were backfilled. I've made the downloads a bit faster with threads but it might still need some work to be able to recover from failures. I originally thought I would need to combine activedata with another source of data, but that's not the case so this script could be worked into ADR.

Getting this time can be a lot faster if we get something added to the task to denote that it's a backfilled job (and make sure that addition gets into activedata). There would need to be modifications done here to get that: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/actions/backfill.py#119

:igoldan, can we mark this issue as solved or do we want to do something else here as well? The next step is modelling the amount of time that automatic backfills will take.

Flags: needinfo?(igoldan)
Blocks: 1618829

We still need some extra optional filters for:

  • Talos, Raptor, Browsertime & AWSY perf jobs
  • autoland & mozilla-beta repositories

Basically, for the 1st set of filters, we need all jobs under the T(), Rap(), Btime() & SY() symbols.

As I said, these filters are optional, as the query above already has its own advantage.
They shouldn't be hard coded into the query. But if provided, the query should be able to pick them specifically.

Flags: needinfo?(igoldan)

:igoldan Can this be reduced to a BigQuery query? We now have all the Treeherder jobs in BigQuery for stage (prod will be ready this weekend).

  1. Request a user: https://mana.mozilla.org/wiki/pages/viewpage.action?spaceKey=SVCOPS&title=Google+Cloud+Platform+%28GCP%29+For+Firefox+Organization
  2. Try to access the table: https://console.cloud.google.com/bigquery?project=moz-fx-dev-ekyle-treeherder&_ga=2.266718542.1162090575.1582214471-972438803.1569166742&p=moz-fx-dev-ekyle-treeherder&d=treeherder_2d_stage&t=jobs__9PAQq1nIsN1wx2NBo8H1&page=table

Once you can do that, you can

select  
    job__type.name.__s__ AS type_name, 
    options.option.__s__ AS option, 
    who.__s__ as `person`, 
    count(1) num_tasks,
    sum(timestamp_diff(end__time.__t__, start__time.__t__, second)) AS seconds_duration
from treeherder_2d_stage.jobs j
where start__time.__t__>'2020-01-01' and
who.__s__ = 'igoldan@mozilla.com'
group by job__type.name.__s__, options.option.__s__, who.__s__

The above query is not correct, just an example of what can be done.

:ekyle, the problem is that there's no way to differentiate a normal task, or a retrigger, from a backfill request using the data we have in the databases. We would need to make some changes here to be able to use only database queries to measure this timing: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/actions/backfill.py#119

:igoldan, the tool has the optional filters available now: https://github.com/gmierz/moz-current-tests/blob/master/gen_backfill_report.py

Blocks: 1620551

(In reply to Greg Mierzwinski [:sparky] from comment #4)

[...]
:igoldan, the tool has the optional filters available now: https://github.com/gmierz/moz-current-tests/blob/master/gen_backfill_report.py

We can close this ticket as resolved and move on to bug 1620551.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

can we give a brief summary of what was measured so we have a reference point? Maybe the month of February 2020?

Flags: needinfo?(igoldan)

Not sure how to set the query for February specifically.
I can give a brief summary for the last month, ending March 16 today.

On autoland & mozilla-beta, on talos, raptor, browsertime & awsy, on all platforms, the perf sheriffs (fstrugariu, mraiciof, aionescu) backfilled 1276 jobs.
This totaled in 265.5 hours of duration, with a job duration averaging at around 12.5 minutes, min duration 15 seconds, max duration 40.5 minutes.

Flags: needinfo?(igoldan)
You need to log in before you can comment on or make changes to this bug.