Closed Bug 1519599 Opened 2 years ago Closed 2 years ago

"Retrigger all" in the pinboard should minimize the number of action tasks spawned.

Categories

(Tree Management :: Treeherder: Job Triggering & Cancellation, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: KWierso, Assigned: KWierso)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Treeherder has an option to retrigger all jobs currently pinned to the pinboard with a single click[1].

It does this by looping through each pinned job, and triggering the "retrigger" action/hook for each one.

This works, but it means that the number of action tasks spawned is equal to the number of pinned jobs, which can quickly exhaust the pool of decision task workers able to handle things like this. It'd be better if Treeherder could add all of the jobs to a single action task, which then can be processed on a single decision task worker and still trigger all of the requested tasks.

:bstack mentioned[3] that it wouldn't be too hard to do something like this, though I'm not sure if it's something that would require a new taskcluster hook ("retrigger-multiple" or something), or if the "retrigger" hook could be adapted to accept and process multiple tasks.

Like the chatlog said, this could handle pinned jobs from multiple pushes either with a single action task for everything to spawn fewer action tasks, or it could spawn one action task per push to make it clearer that things are being done for each push.

I'm wondering if this would be as easy as changing [4] so it includes multiple objects? Something like:

  {
    action: retriggerTask,
    decisionTaskId,
    taskId: results.originalTaskId,
    input: {},
    staticActionVariables: results.staticActionVariables,
  },
  {
    action: retriggerTask2,
    decisionTaskId2,
    taskId: results2.originalTaskId,
    input: {},
    staticActionVariables: results2.staticActionVariables,
  },
  {etc},{etc}...
);

... and then handling that on the Taskcluster side as a single action task?

  1. https://github.com/mozilla/treeherder/blob/6eeff4bafe3f87829304d36926ff4ddfd943235c/ui/job-view/details/PinBoard.jsx#L358-L367
  2. https://github.com/mozilla/treeherder/blob/6eeff4bafe3f87829304d36926ff4ddfd943235c/ui/models/job.js#L114-L152
  3. https://mozilla.logbot.info/taskcluster/20190112#c15821461-c15821533
  4. https://github.com/mozilla/treeherder/blob/6eeff4bafe3f87829304d36926ff4ddfd943235c/ui/models/job.js#L133-L139

I may not be understanding the "as easy as" but there's no reason to change the actions API to support this. The retrigger-multiple action would be a taskgroup-level action that takes a list of taskIds to retrigger. That's similar to the cancel-all action that already exists.

I guess my questions are really:

  1. Can retriggering a batch of tasks possibly spanning multiple vcs pushes via a single submission to Taskcluster actions/hooks be done without changes to TC?
  2. If so, what would that submission look like?
  3. If not currently possible, how hard would it be to make it possible?

Instead of going through the list of tasks and submitting each of them to TC individually, loop through the list of tasks and build up an object/array containing each of the tasks' metadata before then submitting that one object/array to TC.

  1. Yes
  2. It'd look a lot like the backfill action

Could treeherder group the tasks by push, and call the hook once per push?

While we're at it, might as well make sure we do something similar for canceling multiple jobs at once, since it has a very similar implementation: https://github.com/mozilla/treeherder/blob/6eeff4bafe3f87829304d36926ff4ddfd943235c/ui/models/job.js#L177-L219

Summary: Add (or adapt) a hook to retrigger multiple jobs with fewer action tasks being created. → Add (or adapt) a hook to retrigger/cancel multiple jobs with fewer action tasks being created.

I can do this without TC changes (at least for the retrigger case, I think we can drop the "cancel pinned" part since those action tasks seem to finish very quickly), so I'm going to do that.

I'm going to batch up the pinned jobs by push, and then send a single request to the add-new-jobs hook (rather than editing the retrigger hook to accept multiple tasks, or writing a new "retrigger-multiple" hook that specifically handles this case), as that should have the same results as the old "retrigger all" did, with significantly fewer action tasks spawned. Previously it was actiontask per requested job, now it will be actiontask per push with requested jobs.

PR to come at some point.

Assignee: nobody → wkocher
Component: Hooks → Treeherder: Job Triggering & Cancellation
Product: Taskcluster → Tree Management
Summary: Add (or adapt) a hook to retrigger/cancel multiple jobs with fewer action tasks being created. → "Retrigger all" in the pinboard should minimize the number of action tasks spawned.
Version: unspecified → ---

Just filed bug 1520101 which would help speed this up a bit further by reducing the number of roundtrips it takes between frontend and backend to get all of the job details of all requested jobs down to one.

See Also: → 1520101

Comment on attachment 9036502 [details] [review]
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4460

Left some comments :-)

Attachment #9036502 - Flags: review?(emorley)
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/4206d626713763e8bc827a72aaa56b469652f149
Bug 1519599 - Make 'retrigger all pinned jobs' create action tasks per push involved rather than per requested job (#4460)

Retrieve job info in one batch rather than loop through them all individually
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED

This is still an issue -- for example, jmaher ran 600+ add-tasks actions on
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fb11f80287df2c50f82d140a5d5f6e739b02f939
many of which failed because actions just don't scale that high -- and this also would have blocked execution of new pushes for an hour or two.

It probably makes sense to check in with Joel regarding what massive-scale actions he needs to run and make sure we can support those.

You need to log in before you can comment on or make changes to this bug.