(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #1)
It looks like the retriggers were done individually (there are a lot of action tasks on that push!). That basically runs them in parallel, so they don't "know" about each other and end up independently scheduling dependent tasks.
Hmm… it seems like the scheduling could be done in serial (build in parallel) so the 2nd request would have seen that a build was scheduled already. That's how I assumed it worked.
I see that https://tools.taskcluster.net/groups/Aweo9v9dTg2N-drrs3EeOg/tasks/ESs5_LYfRuao6_u9HX4gIg/details (retriggering linux talos) ran after the linux build (https://tools.taskcluster.net/groups/Aweo9v9dTg2N-drrs3EeOg/tasks/QExa0HvBTri2GY_Yrat4yw/runs/0) was complete, and indeed that linux build was not duplicated.
But the mac builds were duplicated. The distinction seems to be, this mac build was not included in the original decision task, so every action task saw that there was no mac build, and created one.
In general, two things will lead to a better experience:
- where possible, schedule what you need up-front in the decision task, rather than addressing it later with retriggers.
mach try fuzzy has a
--rebuild option that can do what you need in this case.
That was my intention and locally
./mach try fuzzy showed the correct output but then the json file it generated only had Linux. I realize now that the issue was that I forget to hit ctrl-a before hitting <enter> in the curses UI. The curses UI also doesn't show the --rebuild or --no-artifact options which is quite annoying and is why I forgot about it. That's why I liked the trychooser webpage much better as I could see the options in front of me.
- when you must use action tasks, use fewer action tasks with more configuration. The add-new-jobs action allows retriggering multiple jobs (just select them all first) and has a "times" parameter that can give the number of times you'd like to retrigger the selected jobs. That is useful for common cases where you didn't know up-front that the try job would need additional talos runs.
I see… I didn't think that the "Custom Push Action…" dialog would know anything about the jobs that were selected on the push.
There are bugs filed to improve the treeherder UX around the second point -- for example, batching multiple presses of the 'r' key into a single action. I can't find the bugs right now :/
Looking at the logs in the add-new-jobs action for OSX talos, https://tools.taskcluster.net/groups/Aweo9v9dTg2N-drrs3EeOg/tasks/I9gZpQAmR6qqjPcyJhHK3A/details, which ran quite a bit later than the linux action -- it seems to have failed to find label-to-taskid.json for all of the previous actions. So even if the second add-new-tasks action had run after the build had completed, it likely would not have "realized" this and would have still scheduled an extra build. I suspect that's because the actions seem to have only written label-to-taskid-0.json: https://tools.taskcluster.net/groups/Aweo9v9dTg2N-drrs3EeOg/tasks/ESs5_LYfRuao6_u9HX4gIg/runs/0/artifacts
That issue didn't cause the particular problem you're seeing here, but is something we should fix all the same.
Well I believe that some of my retriggers on other platforms were done after builds were completed though maybe that was on the other push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=abe22f60f322347af4bba49830c2440a31432387