Closed Bug 1496768 Opened 7 years ago Closed 6 years ago

Retriggered tasks don't apply try_task_config.json templates (e.g. when passing --env to mach try fuzzy)

Categories

(Firefox Build System :: Task Configuration, task, P2)

task

Tracking

(firefox72 fixed)

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: sparky, Assigned: ahal)

References

Details

Attachments

(3 files, 1 obsolete file)

This bug is for the issue where a test that is scheduled with |mach try fuzzy <TEST/PATHS>| runs fine on the first test run [1]. But if the test run is retriggered, the MOZHARNESS_TEST_PATHS environment variable is no longer defined and results in no tests being run [2]. [1]: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=201089393&revision=f90b8cdb2cefb9e39f37db54415645df0055ae69 [2]: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=203191228&revision=f90b8cdb2cefb9e39f37db54415645df0055ae69
I have seen this when retriggering TV-bf jobs which uses MOZHARNESS_TEST_PATHS.
Just noting that this env gets added in the 'morph' stage of taskgraph generation. So my guess is that the retrigger.py action task is downloading an artifact from the decision task where this hasn't been added in yet.
Blocks: 1496593
Ah, so a case where "morph" is actually changing the meaning of the taskgraph. It sounds like we should instead set that variable during the full task-graph generation.
See Also: → 1501213
See Also: 1501213
Trying to make this more findable...
Summary: Retriggered test chunks do not run tests due to missing MOZHARNESS_TEST_PATHS environment variable → Retriggered test chunks do not run tests due to missing MOZHARNESS_TEST_PATHS environment variable (e.g. when passing a path to mach try fuzzy)
Btw, you can pass in |mach try fuzzy --rebuild <num>| to retrigger tasks up front. I know this isn't an acceptable solution, but just mentioning it as it be a good workaround for some use cases. (In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #3) > Ah, so a case where "morph" is actually changing the meaning of the > taskgraph. It sounds like we should instead set that variable during the > full task-graph generation. Sure, we'd need to invent some kind of new step that runs after the full task graph stage. This is that json-e templates thing we landed awhile back btw: https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/morph.py#162
Actually, the problem with applying the templates after the full taskgraph is that we needlessly modify ~16000 tasks as opposed to just the set of tasks that the user specifies. So it kind of makes sense to me that this is a morph. The full taskgraph is a constant that depends on the in-tree taskgraph. The morphed taskgraph can be changed by user input. (I know this doesn't match up with your definition of a morph dustin :)) Another way to solve this bug would be to modify the retrigger action task to look at the morphed taskgraph instead of the full (I assume this is what it's doing, but haven't investigated deeply).
The problem is, things assume that the definition of morph is accurate and that tasks can be re-created from the full task graph. For retriggers, yes, you could look at the morphed graph (or just at the task in the Queue API), but that won't work for things that add new jobs such as backfilling. I like the idea of adding a new phase makes sense, although maybe it needs a different formulation than as json-e templates. There's already a filtering stage, where the name suggests it doesn't modify the tasks -- but in fact legacy try syntax *does* modify tasks at that stage. So maybe we should just call that something other than "filter". If efficiency is an issue (16000 json-e invocations is a lot!), perhaps the templates could specify a regexp for kinds / tasks they apply to?
Looking at what we use templates for, I wonder if they are a better solution for what they are trying to accomplish, than adding logic into the transforms that does what they currently want to do. If they are, it might be better to move the handling of them into a transform, rather than a separate step.
The problem with using a transform is the same as putting it after the full taskgraph (it needlessly invokes json-e for every task). And it's actually worse than one invocation per task, it's N invocations per task (where N is the number of templates being used). A better idea might be to put it at the end of the "target_tasks_try" method. This way it'll only apply to the target_task set. This is still more tasks than the optimized set, but probably not by a lot, especially on try where SETA doesn't run.
While there are some exceptions (which do filtering within the template itself), the templates should generally apply to all target tasks (or better yet optimized tasks). I don't think it'll be possible to figure this out with a regex.
> A better idea might be to put it at the end of the "target_tasks_try" method. iirc this is where the try-syntax modifications take place, so it makes sense
Summary: Retriggered test chunks do not run tests due to missing MOZHARNESS_TEST_PATHS environment variable (e.g. when passing a path to mach try fuzzy) → Retriggered tasks don't apply try_task_config.json templates (e.g. when passing --env to mach try fuzzy)

In addition to Bug 1541424, which Gijs just pinged on, this bug is also causing a lot of confusion. With the combination of the two, I probably sank half a week's work into trying to fix failing tests (on Bug 1552565) when it wasn't broken.

Flags: needinfo?(ahal)

I agree this is important. But my answer has to be the same as the other bug, I won't have much time to look at it until H2. Feel free to ping me again if there hasn't been any activity after the all-hands.

Flags: needinfo?(ahal)
Priority: -- → P2

Hey Andrew, now is H2 and some months after the all hands. Do you have more visibility on when you or somebody else could work on this?
Thanks!

Flags: needinfo?(ahal)

Thanks for the reminder!

I'm still pretty busy, but I'll leave the needinfo for now and maybe find some time to tackle this in the next few weeks.

Ftr, I'm not saying this isn't important.. but it's tricky and not super straightforward to solve. I don't think I'll be getting to this anytime this quarter. If anyone else wants to take a stab at it I'd be happy to provide guidance.

Flags: needinfo?(ahal)

Ok, saw a few more complaints about this and my current project is blocked on reviews right now.. I'll see if I can get something together today or tomorrow.

Assignee: nobody → ahal
Status: NEW → ASSIGNED

Creates a 'transforms' parameter which is a list of module paths pointing to
TransformSequences. These transforms run after the normal kind-defined ones.

The intent is to have the ability to apply global transforms depending on some
condition. For example, transforms that only apply on 'try'.

Also converts the 'artifact' from a Template to a TryConfig. So this changes the config from:

{
  'templates': {
    'artifact': {
      'enabled': 1
    }
  }
}

to:

{
  'use-artifact-builds': True
}

Depends on D51415

Handles 'env' and 'chemspill-prio' configs in the transforms. The 'rebuild'
task config is purposefully excluded from the full_task_graph and instead
handled at the target_tasks stage. Otherwise if a user ran '--rebuild 20' then
retriggered one of those tasks, they'd instead get another 20 which is almost
certainly not what we want.

Depends on D51416

The name 'templates' originally came from the fact that JSON-e templates were
used to apply the configuration. Now that these no longer exist, the name
doesn't make any sense.

I'm not sure 'task_configs' is much better, but it at least makes sense given
that these are values that populate 'try_task_config.json'.

Depends on D51417

Attachment #9105810 - Attachment is obsolete: true

Hey Tom, any chance you can take a look at these soon? Would be nice to get this fixed.

Flags: needinfo?(mozilla)
Flags: needinfo?(mozilla)
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/94f4185fce60 [taskgraph] Rename 'artifact' try_config to 'use-artifact-builds', r=tomprince https://hg.mozilla.org/integration/autoland/rev/640dd0cc1d6d [taskgraph] Move remaining try configs from the morph to full_task_graph generation, r=tomprince https://hg.mozilla.org/integration/autoland/rev/f2900e96e837 [tryselect] Rename 'templates' to 'task_configs', r=tomprince

Thank you for fixing this :ahal!

Regressions: 1869449
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: