Closed Bug 1465117 Opened Last year Closed Last year

Update the backfill action task to support performance sheriffing/test-verify workflows

Categories

(Taskcluster :: General, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: bstack)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

there are 2 specific use cases here:
1) a performance regression and we want to edit/retrigger a specific job and add '--geckoProfile' to the mozharness options, and change the treeherder symbol to add a -p to indicate this is a profile run (ex: 'tp' -> 'tp-p').  We want to initiate this on push X and have the same change run on push X-1
2) test-verify: when a test fails inside of test-verify, on a push X we want to run the same test on push X-1.  We can do force the list of tests to run by edit+retrigger a job to have an extra environment variable: |MOZHARNESS_TEST_PATHS: '<testname>'|.  Ideally this should be run with an different symbol (ex: 'TV' -> 'TV-b').

What is in common is:
1) add a value to an existing job via edit/retrigger
2) adjust the job name on treeherder to distinguish it from a normal run
3) initiate it on push X and have it run on push X-1

I do not know the magic needed here, possibly there is some easy work we can do with action tasks to make this less error prone and closer to a "one click" operation.
:bstack- would you have some thoughts on how we could achieve this?  If there are other ways to simplify this workflow from our current way of doing things, then I would be happy to edit our other tooling to work in that fashion.
Flags: needinfo?(bstack)
as a note, using our existing 'backfill' feature doesn't retain any of the customizations that I do (such as treeherder symbol, mozharness options, env vars)
This sounds like it would make a good action task.  The "X-1" sounds like backfilling, while the adding of options is something retriggering already does.  So this is just combining some existing behaviors.
Yep, I agree with dustin!
Flags: needinfo?(bstack)
how can we move forward with this?  My use of retriggering and backfilling doesn't work for this- but that is the interface built into treeherder.  Is this something I could configure with existing interfaces to action tasks between treeherder/taskcluster?  

I don't have ideas of starting points and to be honest I don't understand the terminology, even when looking online for action tasks.
:bstack, can you help me find some places to start here- I would be happy to hack around for a couple days with a few places to start.
Flags: needinfo?(bstack)
I'm going to hack up the action tasks myself and then put them up for review. After that we can try to bug some treeherder folks to wire them into the UI like backfilling is now!
Flags: needinfo?(bstack)
Assignee: nobody → bstack
Status: NEW → ASSIGNED
Attachment #8982744 - Flags: review?(jmaher)
Attachment #8982744 - Flags: review?(dustin)
Ok, I've created a patch that should allow backfilling to do what you wanted if I've understood what you want!

Once we land this, people can use this the hard way by using the "custom action..." button on treeherder and filling in the fields the way they want. It should be a couple hours of work at most for someone to make dedicated buttons in treeherder for this after that!
Comment on attachment 8982744 [details]
Bug 1465117 - Take two at updating backfill task. Use modifier this time.

https://reviewboard.mozilla.org/r/248638/#review255108

Joel, hopefully this helps identify how actions like this are implemented, and how other similar options might be added?

::: taskcluster/mach_commands.py
(Diff revision 3)
>              root = options['root']
>  
>              return taskgraph.actions.trigger_action_callback(
>                      task_group_id=options['task_group_id'],
>                      task_id=task_id,
> -                    task=task,

Whoops - I should have removed this a few weeks ago.  Thanks for the fix!

::: taskcluster/taskgraph/actions/backfill.py:116
(Diff revision 3)
> -            create_tasks(
> -                    [label], full_task_graph, label_to_taskid,
> -                    push_params, push_decision_task_id, push)
> +            task_def = fix_task_dependencies(full_task_graph.tasks[label], label_to_taskid)
> +            task_def['taskGroupId'] = push_decision_task_id
> +
> +            if input.get('addGeckoProfile'):
> +                mh_options = task_def['payload'].setdefault('env', {}) \
> +                                                .get('MOZHARNESS_OPTIONS', '')

I suspect the python lint will not like this indentation - try running `./mach lint -l flake8 -l py2 -l py3 taskcluster/`
Attachment #8982744 - Flags: review?(dustin) → review+
Comment on attachment 8982744 [details]
Bug 1465117 - Take two at updating backfill task. Use modifier this time.

https://reviewboard.mozilla.org/r/248638/#review255160

::: taskcluster/taskgraph/actions/backfill.py:116
(Diff revision 3)
> -            create_tasks(
> -                    [label], full_task_graph, label_to_taskid,
> -                    push_params, push_decision_task_id, push)
> +            task_def = fix_task_dependencies(full_task_graph.tasks[label], label_to_taskid)
> +            task_def['taskGroupId'] = push_decision_task_id
> +
> +            if input.get('addGeckoProfile'):
> +                mh_options = task_def['payload'].setdefault('env', {}) \
> +                                                .get('MOZHARNESS_OPTIONS', '')

It's funny because it looks weird to me too but that actually seems to be want flake8 wants. I had a few previous tries where it was complaining with what I did with that line. Running that now gets:

✖ 0 problems (0 errors, 0 warnings)
Comment on attachment 8982744 [details]
Bug 1465117 - Take two at updating backfill task. Use modifier this time.

https://reviewboard.mozilla.org/r/248638/#review255224

this looks really easy and very hackable.  Thanks for putting this together!
Attachment #8982744 - Flags: review?(jmaher) → review+
:bstack, is it ok to land this so I can play with it live as well as work with the treeherder team to build a "one click" solution?
Flags: needinfo?(bstack)
Ah yes! Please feel free to land. If you can promptly try a normal backfill afterward to see if it works, I would be happy.

I don’t _think_ this will conflict with dustin’s most recent changes, but if it does I’ll fix.
Flags: needinfo?(bstack)
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3d3fe54d0cb6
Add additional options to backfilling action task r=dustin,jmaher
leave-open to verify and get treeherder UI working.
Keywords: leave-open
Backed out 1 changesets (bug 1465117) for fix_task_dependencies not working as expected

backed out revision: https://hg.mozilla.org/mozilla-central/rev/3d3fe54d0cb6

backout: https://hg.mozilla.org/mozilla-central/rev/e16a3d8427b8bb6ceeb0b970702fc2eaccd997b8
Flags: needinfo?(bstack)
thanks for backing this out.  As a note, the MOZHARNESS_TEST_PATH worked well, the --geckoProfile didn't, but when this lands again I can fiddle and figure out what is wrong.  I also need to adjust the MOZHARNESS_TEST_PATH to actually launch --test-verify instead, or consider adding an additional cli param; details I can sort out and fine tune.
Flags: needinfo?(bstack)
Attachment #8982744 - Flags: review+ → review?
Attachment #8982744 - Flags: review? → review?(dustin)
Attachment #8982744 - Flags: review?(dustin)
Attachment #8982744 - Flags: feedback?(dustin)
Comment on attachment 8982744 [details]
Bug 1465117 - Take two at updating backfill task. Use modifier this time.

https://reviewboard.mozilla.org/r/248638/#review258230

One relatively small fix..

::: taskcluster/taskgraph/actions/util.py:122
(Diff revisions 3 - 4)
>      this function to create new tasks,
>      allowing easy debugging with `mach taskgraph action-callback --test`.
>      This builds up all required tasks to run in order to run the tasks requested.
>  
> +    Optionally this function takes a `modifier` function that is passed in each
> +    task before it is put into a new graph. It should return a valid task.

I like this!

::: taskcluster/taskgraph/actions/util.py:135
(Diff revisions 3 - 4)
>      full_task_graph = copy.deepcopy(full_task_graph)
>      label_to_taskid = label_to_taskid.copy()
>  
>      target_graph = full_task_graph.graph.transitive_closure(to_run)
>      target_task_graph = TaskGraph(
> -        {l: full_task_graph[l] for l in target_graph.nodes},
> +        {l: modifier(full_task_graph[l]) for l in target_graph.nodes},

This will apply to *every* task in the transitive closure, where I think the caller expects it to only apply to tasks in `to_run`.  I suppose it might be useful in some cases to modify all tasks, so perhaps the right fix here is to document the set of tasks passed to modifier, and to add some filtering in the backfilling modifier function that limits its effects to `to_run`.
Attachment #8982744 - Flags: review-
Attachment #8982744 - Flags: feedback?(dustin)
Comment on attachment 8982744 [details]
Bug 1465117 - Take two at updating backfill task. Use modifier this time.

https://reviewboard.mozilla.org/r/248638/#review258318
Attachment #8982744 - Flags: review?(dustin) → review+
Keywords: checkin-needed
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2731368a39e5
Take two at updating backfill task. Use modifier this time. r=dustin,jmaher
Keywords: checkin-needed
as a note, this works great for --geckoProfile, I am working on tweaking the parameters I didn't think through for the backfill/retrigger task in bug 1471227.  Once that is done (solved, reviewed, landed, tested) I will move this to the treeherder team to make this more of a "one click" solution for the two scenarios.
now that bug 1471227 is completed, I want to use it for a few days before recommending we make the treeherder changes.

From my testing I have high confidence we are ready.

I see a few things here:
1) for talos jobs adding the --geckoProfile will be straightforward, a few default to set on the existing backfill job and ensuring this is matched on talos jobs (will be raptor as well in the near future)
2) for test-verify stuff, this gets trickier, in general this pattern works well for mochitest*/web-platform/xpcshell/reftest/crashtest/jsreftest (not gtest, junit, robocop, cppunittest, marionette, firefox-ui, talos, etc.):
* change |depth: 1|
* change |inclusive: true|
* add |testPath: <path>|
3) using <path>, that will be difficult as we need a full test path and this doesn't work on all test types.  In the failure_line.test field, we can find the <path> we are looking for.  There are simple rules and we need to adjust a few paths:
* if path starts with /, add testing/web-platform/tests to the beginning of the path
* if path is like file:///Users/cltbld/tasks/task_1520511336/build/tests/jsreftest/tests/jsreftest.html?test=non262/Array/regress-488989.js, replace |*jsreftest.html?test=| with |/js/src/tests/|
* there will be quirks on android tests and other reftests, but I think we can get 95% of cases with the above.
here is an example of android reftest failure message:
REFTEST TEST-UNEXPECTED-FAIL | http://10.0.2.2:8854/tests/layout/reftests/bidi/dirAuto/dynamicDirAuto-setLTR-NoDir3.html == http://10.0.2.2:8854/tests/layout/reftests/bidi/dirAuto/dynamicDirAuto-refLTR-LTR.html | image comparison, max difference: 1, number of differing pixels: 1
another exception:
testing/web-platform/tests/WebCryptoAPI/generateKey/failures_RSA-OAEP.https.any.html

needs this file:
testing/web-platform/tests/WebCryptoAPI/generateKey/failures_RSA-OAEP.https.any.js

that is rare though
See Also: → 1443867
Brian: is there more work for you/the TC team here? It's unclear to me from the last few comments.
Flags: needinfo?(bstack)
I don't believe there's anything for tc team to be doing at this time.
Flags: needinfo?(bstack)
Putting this on the Treeherder teams radar, per comment #18.
Assignee: bstack → nobody
Status: ASSIGNED → NEW
Component: General → Treeherder: Job Triggering & Cancellation
Product: Taskcluster → Tree Management
Version: unspecified → ---
Could someone file a new bug in the Treeherder component with a description of:
* what changes you'd like to see to the UI (ie how best to expose these features?)
* what the backfill actions payload should look like for them

My preference would be for us to move away from sharing bugs for changes to different systems, since:
* in-tree vs out of tree have different workflows (eg it's likely we'll need this backported to older branches, which will get even more confusing once there are treeherder GitHub PRs in here)
* it's easier to follow status/next steps/what's being requested when there are not multiple attachments/50 comments etc
* bugs are cheap

Many thanks :-)
Assignee: nobody → bstack
Status: NEW → RESOLVED
Closed: Last year
Component: Treeherder: Job Triggering & Cancellation → General
Product: Tree Management → Taskcluster
Resolution: --- → FIXED
Summary: create a method which is easily initiated in treeherder by the sheriffs to retrigger a job with extra options on 2 revisions → Update the backfill action task to support performance sheriffing/test-verify workflows
Version: --- → unspecified
It's worth noting, I think the best "flow" for creating new actions is to initially develop them in-tree, using the "Custom Action.." option in treeherder to test and experiment with them.  Once that's fully landed and if necessary uplifted to the relevant trees, only then would a new bug be filed to add some UI bindings for the action to Treeherder.
You need to log in before you can comment on or make changes to this bug.