1241535 - Create an action task that allows retriggering tests on an existing Taskcluster build to generate a geckoprofile for talos

Assignee

Description

•

8 years ago

right now if we enter "mozharness: --spsProfile" at the end of our try syntax, we can generate sps profile files for the duration of our test run.  This has proven to be useful in many cases!

There is a need to compare profiles between a revision with a patch and a previous base revision- this means we need to generate profile data.

Right now the method is a new fresh try push which means new builds and then new tests.  This would need to be done with 2 pushes.

given the fact that we can retrigger jobs, I would like to pass a flag to the jobs or toggle some bit that helps us run the magic data.  It might not be possible to add custom mozharness/try syntax to an existing try push.  I do wonder if something with push-extender could trigger a job, but with certain properties set.  

One other way to simplify this is to push to try and use existing builds from the offending changeset and base changeset and just run the talos job.  I know we can hack this in taskcluster, but I am not sure if some magic with buildbot bridge could allow us to take an arbitrary build and run a given test job on it with the right flags.

If we restrict this to try runs, then we don't have to worry about data being posted from the talos run while being profiled (this is useless test results).  

We should outline a couple approaches (or the one that is realistic) and turn this into an actionable bug that can be picked up in due time.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 1

•

8 years ago

:armenzg, could you weigh in on any approaches to this problem that come to your mind?

Flags: needinfo?(armenzg)

Armen [:armenzg]

Comment 2

•

8 years ago

jmaher: would you be taking a given push on try and wanting some builders with those parameters?
Or a push from other repos than try? Where would you want those new scheduled jobs to appear? Under the same revision you're retriggering from?

We can create new jobs with new properties [1]. Not a retrigger request, even though, in effect it will be the same.

We will need to change Mozharness to check both places to the parameters it needs [2]
> self.buildbot_config['sourcestamp']['changes'][-1]['comments'].partition('mozharness:')
to also support:
> self.buildbot_config['properties'].get('mozharness_extra_parameters')

We can very easily write a script that would do this by having the right credentials and hacking Mozharness.

---------

A more sophisticated approach would be to have a web app to help us do this:

You pass the web app two variables:
* repo_name
* revision

The web app loads the various available builds on Treeherder.
The user is prompted to select the builds that we want to based our tests off.

In the next step the user will be able to select from all the test jobs (the ones those builds could trigger) and choose the subset it cares about.

In the next step the user will be able to add extra paramaters to be passed to the job.
Perhaps we can have a list to pick from.

On try, we will assume that we want to add jobs to that same revision.
If we *don't* want the jobs showing up on the same revision as the builds are taken from, we can either receive another revision to show the jobs under or 

On any other repository, we will not assume that we want to see those jobs running on that same revision. Instead 
the user will be redirected to a task graph. If we want treeherder pushes we will need to discuss this a bit further.


[1] https://github.com/mozilla/build-buildapi/blob/a19f3d79dd78e221a763b341f7c2d8281bea94a7/buildapi/controllers/selfserve.py#L511
[2] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/talos.py#165

Flags: needinfo?(armenzg)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

8 years ago

Depends on: 1241644

Armen [:armenzg]

Comment 3

•

8 years ago

armenzg> let me see if I get it right this time
<armenzg> you trigger 6 jobs of a specific tests (rather than all tests for a specific platform)
<armenzg> I assume you do this accross various revisions
<armenzg> once you know which revision has the regression
<armenzg> you would go ahead and schedule a 7th job for it
<armenzg> that revision and the previous one

Armen [:armenzg]

Comment 4

•

8 years ago

<jmaher> well, the only other difference is I schedule 6 retriggers for ALL talos tests on revision and base rev, which sometimes finds a few other regressions we didn't detect on initially running the test

Armen [:armenzg]

Comment 5

•

8 years ago

Joel, do you use trigger all talos tests for this?

What is your exact flow to desire to do spsProfile runs? Do you get perf alerts, trigger "all talos" and determine later which pushes need spsProfile?

Is running "all talos" from Treeherder not sufficient to spot which revision caused the regression?

####################

Some notes for myself (do not read this section until jmaher and I have clarified exactly the flow we need):

Currently pulse_actions listens to "trigger all talos" requests [1] which calls trigger_all_talos_jobs() [2]
We will need to make pulse_actions pass an extra_properties 'mozharness_extra_properties' set to --spsProfile
We will need mozharness to look for a 'mozharness_extra_properties' property [3]
Side bug, in trigger_all_talos_jobs() we need to change from trigger_range() to trigger_arbitrary_job() since we're not triggering jobs across revisions.
[1] https://github.com/mozilla/pulse_actions/blob/master/pulse_actions/handlers/treeherder_resultset.py#L57
[2] https://github.com/mozilla/mozilla_ci_tools/blob/master/mozci/mozci.py#L560
[3] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/talos.py#165

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 6

•

8 years ago

for sps profiles the flow would be for specific tests which have regressed, so not all.  In many cases it would be the same test on many platforms, for example 'tp'.  We would manually retrigger these, probably via treeherder interface, and typically on try server, but it could be on an integration branch.

those are my wishes :)

Armen [:armenzg]

Comment 7

•

8 years ago

When you say 'specific tests' do you mean 'specific talos jobs'?

From what you say, 'trigger all talos" from Treeherder does not help you as-is when you're dealing with a specific talos job (instead of *all* talos jobs).

Would you want another action on Treeherder that would look like this? (assuming I'm starting to understand what you need)
* Select a specific talos suite (e.g. tp)
* Choose a new action on Treeherder (Create baseline + sps profile)
* pulse_actions determines what are all the 'tp' builders for every platform that can be scheduled on that push
* We schedule 5 normal runs + an sps profile run
* Schedule missing builds if necessary

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 8

•

8 years ago

I think that would be good.  We normally want extra data point for each job, so doing that and then the additional sps version would be nice.  How we would differentiate the spsprofile run...would the user have to iterate through all the other 6 jobs before finding the one with the artifact?

Armen [:armenzg]

Comment 9

•

8 years ago

I think so.
Once bug 1218537 is fixed they would *not* need to click on "inspect task" to determine which of the tasks has the artifact.

Another option would be for pulse_actions to send an email with direct links to where the artifacts would be found.

Armen [:armenzg]

Comment 10

•

8 years ago

Maybe give the sps profile job a different colour so developers would not re-trigger by mistake an sps profile job (if they were expecting a normal run).
Developers put the profiles into a web app.

TODO: find developers to talk about this process with them (mstange, mconley and BenWa).

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 11

•

7 years ago

possibly a dup of bug 1322433

Blocks: 1307197

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 12

•

7 years ago

wlach, this is similar to what you are working on

Robert Wood [:rwood]

Comment 13

•

7 years ago

Looks like this isn't a priority anymore; also it looks more like a treeherder-based feature and not a talos framework issue, so closing it out.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WONTFIX

Mike Conley (:mconley) (:⚙️)

Comment 14

•

7 years ago

This would still be super interesting to have. I agree this is probably more of a Treeherder thing.

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

Mike Conley (:mconley) (:⚙️)

Updated

•

7 years ago

Component: Talos → Treeherder

Product: Testing → Tree Management

Version: unspecified → ---

Ed Morley [:emorley]

Comment 15

•

7 years ago

Hi! Reading the bug subject in the bugmail before opening the bug made me think this was not a Treeherder bug, but reading the comments I see that the bug has changed directions a few times, so that may no longer be the base. However I'm still a little unclear on what the use-case / ask is here? It sounds like new jobs need to be scheduled with an additional parameter passed in the try syntax? If so, that seems like something that try fuzzy or similar tools should be doing instead of Treeherder?

Could someone summarise the bug?

Ed Morley [:emorley]

Comment 16

•

7 years ago

Eugh, s/base/case/

Mike Conley (:mconley) (:⚙️)

Comment 17

•

7 years ago

(In reply to Ed Morley [:emorley] from comment #15)
> 
> Could someone summarise the bug?

Yeah, sure, I can try.

Sometimes we get reports of a Talos performance regression on one of our patches. One of our key diagnostic tools for a performance regression is getting Gecko Profiler profiles from the Talos machines for the test runs.

In order to get those profiles, it's necessary for us to push to try with the mozharness: --geckoProfile argument to the try syntax. This means, in the worst case, spinning up a new build, which can take a while.

In actuality, re-running the Talos test with profiling enabled shouldn't require a re-build. This bug is a feature request that allows us to look at a regressing revision in Treeherder, and go "Oh, okay, please re-run that suite of Talos tests, but give me profiles for them".

Is that sufficient?

Flags: needinfo?(emorley)

Ed Morley [:emorley]

Comment 18

•

7 years ago

Ah yes that's helpful - I think I was missing the part about not needing a rebuild to enable profiles.

So the steps for this are:
- Devise a way to schedule new test jobs on an existing push, but that have additional parameters set. (Reading back the comments it seems there's bug 1322433 and friends for something similar? I really don't know much about that)
- Create tooling that:
 * Makes it easy to select the correct subset of jobs
 * Reschedules those jobs using the above functionality with the appropriate sps profile options set
 * (Optionally) Makes it easy to find the URLs for the uploaded profiles and compare them

I agree this definitely sounds like a workflow that should be improved. 

My first concern is just that we try to avoid hardcoding too much test suite related business logic into Treeherder or anything else that isn't in mozilla-central. ie If Treeherder could schedule a second decision task (or something similar) that ran these steps based on a config in mozilla-central that would be ideal (this would make it easy to add mach support for the feature too). I'm presuming this isn't the only use-case that would follow this pattern so it seems worth having discussions between the taskcluster, treeherder and test automation teams to decide the best way forwards?

Flags: needinfo?(emorley)

Ed Morley [:emorley]

Updated

•

7 years ago

Summary: find a way to quickly "retrigger" a job to generate a sps profile for talos → Find a way to quickly "retrigger" tests on an existing build to generate an sps profile for talos

Mike Conley (:mconley) (:⚙️)

Updated

•

7 years ago

Flags: needinfo?(mconley)

Cameron Dawson [:camd]

Comment 19

•

7 years ago

Bstack just added a way to initiate jobs that sounds like it may help here.  Or perhaps we can augment it to handle the case you want.  Mike: would you check out the drop-down on a given push in Treeherder and select "Custom push action..." and see if that's sufficient?  Or if it looks like a mod there may get you where you want to go?

Mike Conley (:mconley) (:⚙️)

Comment 20

•

7 years ago

Hi emorley, camd,

This "Custom push action..." business sounds like it might do what I want, but it seems to want some kind of job syntax that I don't have. There appears to be a dropdown that presumably lets me choose job parameters from a template or something... would it be possible to add a template that re-runs one or more talos test suites with mozharness: --geckoProfile ?

Flags: needinfo?(mconley) → needinfo?(cdawson)

Stuart Philp :sphilp

Comment 21

•

7 years ago

Greg, what do you think would be required in taskcluster to accomplish this?

wlach, it sounds like you tried something similar to this, can you summarize a bit what you found?

Flags: needinfo?(wlachance)

Flags: needinfo?(garndt)

William Lachance (:wlach)

Comment 22

•

7 years ago

(In reply to Stuart Philp :sphilp from comment #21)
> Greg, what do you think would be required in taskcluster to accomplish this?
> 
> wlach, it sounds like you tried something similar to this, can you summarize
> a bit what you found?

Yeah, action tasks would be the way to implement this, assuming talos is running on taskcluster. I was working on this in the early part of the year:

https://wlach.github.io/blog/2017/04/easier-reproduction-of-intermittent-test-failures-in-automation/

I'm not up-to-date on where the action task stuff is these days, but I know Brian Stack and others on the tc team have been pushing this forward. Greg can no doubt give more detail.

Flags: needinfo?(wlachance)

Cameron Dawson [:camd]

Comment 23

•

7 years ago

It looks like Talos is a mixture of BuildBot and Taskcluster initiated jobs.

Adding bstack since he wrote the original "Custom push action..." impl.  Is there a template that could be added for what's described in comment 20?

Flags: needinfo?(cdawson)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 24

•

7 years ago

osx jobs are 100% taskcluster, linux/win7/win10 are scheduled via taskcluster and run via buildbot-bridge on buildbot.  When we switch to the new hardware in the coming months everything will be on taskcluster- but that could take a few months- maybe even 6.

Greg Arndt [:garndt]

Comment 25

•

7 years ago

(In reply to Stuart Philp :sphilp from comment #21)
> Greg, what do you think would be required in taskcluster to accomplish this?
> 
> wlach, it sounds like you tried something similar to this, can you summarize
> a bit what you found?

Action tasks can now be defined in tree.  Here is some documentation about them:
http://firefox-source-docs.mozilla.org/taskcluster/taskcluster/actions.html

Here is an existing action task for inspiration:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/actions/retrigger.py#20

I believe something like this might be a "retrigger talos with parameters" type action, such as the mochitest action:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/actions/mochitest_retrigger.py#33

These will then show up in the custom action menu within TH.

Flags: needinfo?(garndt)

Ed Morley [:emorley]

Updated

•

7 years ago

Component: Treeherder → Treeherder: Job Triggering & Cancellation

Ed Morley [:emorley]

Comment 26

•

7 years ago

What Greg suggested sounds great - most of the business logic living in-repo (where it's more easily maintained), with Treeherder only needing to know enough to trigger that custom task.

Ed Morley [:emorley]

Updated

•

7 years ago

Updated

•

7 years ago

Comment 27

•

7 years ago

Is --spsProfile related to --gecko-profile?

If so, bug 1412009 can be duped to this one.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 28

•

7 years ago

spsProfile was renamed to geckoProfile a year or so ago

Ed Morley [:emorley]

Updated

•

7 years ago

Updated

•

7 years ago

Blocks: 1411995

Ed Morley [:emorley]

Updated

•

7 years ago

Summary: Find a way to quickly "retrigger" tests on an existing build to generate an sps profile for talos → Find a way to quickly "retrigger" tests on an existing build to generate a geckoprofile for talos

Ed Morley [:emorley]

Comment 30

•

6 years ago

Going by comment 25, can (/needs to) be implemented in-tree, so this belongs in the Talos component instead.

Status: REOPENED → NEW

Component: Treeherder: Job Triggering & Cancellation → Talos

Product: Tree Management → Testing

Summary: Find a way to quickly "retrigger" tests on an existing build to generate a geckoprofile for talos → Create an action task that allows retriggering tests on an existing Taskcluster build to generate a geckoprofile for talos

Version: --- → unspecified

Ed Morley [:emorley]

Comment 31

•

6 years ago

Is this a dupe of bug 1465117?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 32

•

6 years ago

99% a duplicate, I would like to make this more of a one click than a custom action, <edit a bunch>, click ok- I think we can avoid the <edit a bunch> step and make it a hardcoded custom action.

Ed Morley [:emorley]

Comment 33

•

6 years ago

Yeah that definitely sounds like a good idea (making it a one-click). I meant more that in concept at least, this bug is a dupe of the two-parter "add action task in bug 1465117 + add treeherder UI parts in bug <TODO>"? :-)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 34

•

6 years ago

Attached file Bug 1241535 - add support for 'geckoprofile' action task in-tree. r=bstack — Details

Add support for 'geckoprofile' action task in-tree.

GitHub Bugzilla PR Linker

Comment 35

•

6 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4128 — Details

Pulsebot

Comment 36

•

6 years ago

Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ab32d2dc93e2
add support for 'geckoprofile' action task in-tree. r=bstack

Narcis Beleuzu [:NarcisB]

Comment 37

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ab32d2dc93e2

Status: NEW → RESOLVED

Closed: 7 years ago → 6 years ago

status-firefox64: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla64

Sylvestre Ledru [:Sylvestre]

Updated

•

6 years ago

Assignee: nobody → jmaher

Treeherder GitHub Bugbot

Comment 38

•

6 years ago

Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/fd20a61c8e3216dd0a5db244e86c71aafa6ea4ce
Bug 1241535 - Add support to job actions for collecting gecko profiles of performance tests. r=camd (#4128)

Bug 1241535 - add support for 'geckoprofile' action task in-tree. r=bstack 6 years ago Joel Maher ( :jmaher ) (UTC -8) 46 bytes, text/x-phabricator-request		Details \| Review
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4128 6 years ago GitHub Bugzilla PR Linker 47 bytes, text/x-github-pull-request		Details \| Review