Batch together successive retrigger requests into a single (or minimal) action task submission.
Categories
(Tree Management :: Treeherder, enhancement)
Tracking
(Not tracked)
People
(Reporter: KWierso, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
Bug 1519599 is about to land, which will change how retriggers happen for jobs. To support retriggering all jobs actively pinned to the pinboard without spawning an action task per pinned job, it will use the add-new-jobs action to create all of the jobs for each push in a single "add-new" action task.
That's fine for that use case, but this also ends up being what happens for individual jobs retriggered via 'r' keyboard shortcut or the "retrigger" button for a particular job. Prior to bug 1519599 landing, each time you retrigger a job via these methods, a new "rt" action task will be created. With bug 1519599, each time will create an "add-new" action task. Either way, action-task-per-retrigger request can quickly overload the pool of workers handling action tasks, causing backups in jobs starting for everyone.
I'd like to go back to using the retrigger action for standalone job retriggers, but make it a bit better.
My plan goes something like:
- When "retrigger()" is called and no more than one job is in the pinboard, start a timer (something like five seconds seems good).
- Pop up a notify() message (with a spinner to show it's doing things, maybe?)
- Store the requested job, with a "times" count of 1.
- If "retrigger()" is called again while the timer's still active and the selected job hasn't changed, add 1 to that job's "times" count, restart the timer.
- If any of the following occur, submit the requested job as a "retrigger" action task with the "times" count included in the request:
** the timer reaches the end
** the selected job changes
** the "times" count reaches the retrigger tasks max "times" value of 6
For individual jobs being retriggered en-masse, this should reduce the number of action tasks spawned significantly, and the action task will be labeled with the "rt" symbol in Treeherder's display rather than "add-new".
Comment 1•7 years ago
|
||
Does this overlap with bug 1510002?
Reporter | ||
Comment 2•7 years ago
|
||
Overlap or supplement, I think.
I like the shift-r to pop up a "times?" prompt.
Tracking the number of times the user has submitted action tasks over the last X minutes, warning if it gets excessive might be nice, too.
Neither really detracts from what I filed this bug to accomplish. This would work more behind the scenes to make sure the user doesn't do bad things unintentionally.
Dustin: can the retrigger task's max times value be exceeded? It claims to max out at six, but what happens if I were to request "times: 25"?
Comment 3•7 years ago
|
||
If you submit with times: 25, it will be rejected as not satisfying the schema. But the schema is defined in-tree so it can be adjusted. I think we set it to something low just to prevent "times: 500" which tends to cause a backlog and try tree-closure.
I like the suggestions here! You can feel free to modify things in-tree (e.g., adding "times" to add-jobs, modifying the symbol for add-jobs, or even adding a new action).
One note that may or may not be important: by convention Taskcluster treats the action name "retrigger" as an action that will, given no input, re-create a new version of the targeted task. So for example the Taskcluster UI's look for an action by that name, and taskcluster-github will likely do so soon if you hit "rerun" in the Github UI. The convention is written down here.
Reporter | ||
Comment 4•7 years ago
|
||
(via bug 1519599 comment 11)
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #11)
This is still an issue -- for example, jmaher ran 600+ add-tasks actions on
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fb11f80287df2c50f82d140a5d5f6e739b02f939
many of which failed because actions just don't scale that high -- and this also would have blocked execution of new pushes for an hour or two.It probably makes sense to check in with Joel regarding what massive-scale actions he needs to run and make sure we can support those.
Joel, how are you triggering these jobs, and what's the end goal for them? It looks (from the add-new task's to-run.json) that each action task is only triggering a single job, which isn't the most efficient way to do things.
Comment 5•7 years ago
|
||
I want to get 100+ instances of specific jobs on pre/post pushes to find intermittents.
there is an action for retrigger, but the times is limited to 6, I would like an option to retrigger a job 50 times.
Reporter | ||
Comment 6•7 years ago
|
||
Joel: How are you retriggering these jobs? Select in treeherder and mash "r" 100+ times?
Dustin: I haven't looked too closely at the actions schema. Can we restrict certain actions (or definitions or parts within an action, say "times") to specific accounts or scopes?
I'm all in favor of raising "times" to something like 25 or 50 for everyone, but maybe we can add another action or redefine "times" within the retrigger action to 100 or more for certain people or for people with certain scopes?
Comment 7•7 years ago
|
||
Yes, it's a JSON schema so you can apply whatever range restriction you'd like. So changing the range for retrigger is as easy as changing "6" to "100". Adding a "times" option to add-jobs would be little more than adding a for loop to a Python function.
We could limit this with scopes, but my hunch is that's something we should do only if/when we see abuse.
Comment 8•7 years ago
|
||
I hit the 'r' key hundreds of times, it is much faster than doing custom actions.
Reporter | ||
Comment 9•7 years ago
|
||
Did some bug reorganizing today.
Bug 1510002 tracks all of the planned changes.
This bug will just be batching successive requests into single action task submissions via "times".
Bug 1524895 will add a keyboard shortcut to prompt for how many times you want a job to be retriggered via a single action task submission.
Bug 1524905 will hopefully catch suboptimal retrigger requests and direct users to better alternatives.
Other things still to file:
Raise the "times" upper limit in action task definition, uplift it as far as we can.
Add the "times" property to the add-new-jobs action task.
Comment 10•6 years ago
|
||
Comment 11•6 years ago
|
||
Reporter | ||
Comment 13•6 years ago
|
||
Unassigning myself because I don't have time to push this past the finish line, but if anyone wants to pick up https://github.com/mozilla/treeherder/pull/4851 , be my guest.
Assignee | ||
Updated•4 years ago
|
Description
•