Add the ability to retrigger all pinned jobs

RESOLVED FIXED

Status

P3
enhancement
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: RyanVM, Assigned: vaibhav1994)

Tracking

(Blocks: 1 bug)

Details

Attachments

(2 attachments)

46 bytes, text/x-github-pull-request
mdoglio
: review+
emorley
: feedback+
Details | Review | Splinter Review
46 bytes, text/x-github-pull-request
mdoglio
: review+
Details | Review | Splinter Review
(Reporter)

Description

4 years ago
In the case of mass-failures from infra issues, we are able to quickly pin all affected jobs thanks to TH's wonderful filtering capabilities. However, much of that speed-up is lost if we need to retrigger all those failed jobs since we still need to go through them one-by-one and manually retrigger. It would be great if we had an option exposed in the UI to retrigger all pinned jobs.

Comment 1

4 years ago
+1!
Perhaps a UI implementation here, would be to simply add a "retrigger all" icon to the right of the existing pinboard clear menu button.

So we'd have something like:

"clear all | ['retrigger all']" <- we'd re-use the icon from the job control bar, here
"save all  | [save menu]" <- our existing button dropdown for save would remain unchanged

And perhaps we'd want a confirm box in this case, in the event of accidental retrigger for dozens or hundreds of pinned jobs.
(Reporter)

Comment 3

4 years ago
Very much agreed that this isn't something we want to be easily-accessible/discoverable. I think it should be in a secondary UI rather than staring the user in the face. Maybe we just add a dropdown to the 'clear all' button and put it there?

On an unrelated note, I can't recall a time I've ever actually used the dropdown on the 'save all' button. Didn't even realize those options were there.
Priority: -- → P3
Duplicate of this bug: 1137951
(general musings)
We may also need to provide a graceful way of handling the retrigger messages. Currently we issue a single "Retriggered job (type)" for each retrigger, and will cascade those notifications, up to 5 at a time, on/off screen as quickly as they arrive.

If the we can generate mass retriggers really quickly, we may cascade those messages off screen almost immediately. We may need to coalesce and provide a summary of successful retriggers, and total count of failed retriggers? Not sure.

On the other hand, we also currently expect each job to be loaded, before a retrigger is issued. This might instead mean "go get a coffee" or, if we retrigger in some sort of background process we'd need to ensure the user can't simultaneously do anything to the same jobs through the UI that could corrupt their world.
Duplicate of this bug: 1108100

Comment 7

4 years ago
Once bug 1168148 is live, we can simply send the pulse actions and mozci+pulse_actions will deal with it.

On that note, we want to make mozci send messages through pulse so tools like TH can determine what happened with that cancellation or retrigger request.
(Assignee)

Comment 8

4 years ago
Created attachment 8635764 [details] [review]
pr #787

Ed, could you please review this patch?

I have made a comment for the approach( https://github.com/mozilla/treeherder/pull/787#discussion_r34955020 ), if you know a better way to do it, let me know. Thanks!
Attachment #8635764 - Flags: review?(emorley)
(Assignee)

Updated

4 years ago
Blocks: 1178522
(Assignee)

Updated

4 years ago
Assignee: nobody → vaibhavmagarwal
(In reply to Armen (back on Monday July 20th) from comment #7)
> Once bug 1168148 is live, we can simply send the pulse actions and
> mozci+pulse_actions will deal with it.
> 
> On that note, we want to make mozci send messages through pulse so tools
> like TH can determine what happened with that cancellation or retrigger
> request.

Cool!

Just a couple of questions.

o If the user has hundreds of jobs pinned and they retrigger all, will we thNotify after the request is sent how many jobs have been successfully retriggered vs. those not? (eg. some network issue or something during the process of all being sent) if that can occur? Maybe it never can, I defer to you guys.

ie.
thNotify.send("Retriggered " + count + " jobs" + count, 'success');
And follow on sticky thNotify 'error' if some fail to retrigger, or all fail.

o We assume the retrigger is not tying up Treeherder's UI during the process, which sounds like we avoid?

o We want to test for a) logged in b) LDAP, and any other conditions we currently do with retrigger. It looks like the patch needs those added?

o We probably want a cancel/yes confirmation alert when the number of jobs retriggered is > some number. Perhaps 5? Whatever sheriffs think might be a good safety valve.
Status: NEW → ASSIGNED
(Assignee)

Comment 10

4 years ago
:jfrench, nice questions!

> o If the user has hundreds of jobs pinned and they retrigger all, will we
> thNotify after the request is sent how many jobs have been successfully
> retriggered vs. those not? (eg. some network issue or something during the
> process of all being sent) if that can occur? Maybe it never can, I defer to
> you guys.
> 
> ie.
> thNotify.send("Retriggered " + count + " jobs" + count, 'success');
> And follow on sticky thNotify 'error' if some fail to retrigger, or all fail.

We are currently giving notification for each job in retriggerJob() function, so I am not sure if this is necessary.

> o We assume the retrigger is not tying up Treeherder's UI during the
> process, which sounds like we avoid?

I don't think we are tying Treeherder's UI (the "retrigger all" is in a dropdown of pinboard, so not easily discoverable. Also, the rest of TH is responsive when I give "retrigger all" command).

> o We want to test for a) logged in b) LDAP, and any other conditions we
> currently do with retrigger. It looks like the patch needs those added?

I have added the condition to see if a user in logged in now.

> o We probably want a cancel/yes confirmation alert when the number of jobs
> retriggered is > some number. Perhaps 5? Whatever sheriffs think might be a
> good safety valve.

Added now.
Comment on attachment 8635764 [details] [review]
pr #787

Have left some comments on the PR :-)
Attachment #8635764 - Flags: review?(emorley) → feedback+
(Assignee)

Comment 12

4 years ago
Comment on attachment 8635764 [details] [review]
pr #787

:camd, can you review this?
Attachment #8635764 - Flags: review?(cdawson)
(Assignee)

Comment 13

4 years ago
Comment on attachment 8635764 [details] [review]
pr #787

Passing on the review to :mdoglio
Attachment #8635764 - Flags: review?(cdawson) → review?(mdoglio)
Attachment #8635764 - Flags: review?(mdoglio) → review-
(Assignee)

Comment 14

3 years ago
Comment on attachment 8635764 [details] [review]
pr #787

I have changed the patch to send one call via @list_route.
Attachment #8635764 - Flags: review- → review?(mdoglio)
+10, thanks Vaibhav and everybody here for working on this!
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/e3b276268ce7112ceb6b585fe45291e8f5c7afd0
Bug 1121998 - Add the ability to retrigger all pinned jobs

https://github.com/mozilla/treeherder/commit/b239e9ef5840ee44fd5b3413a6f5c238bb6421e4
Merge pull request #787 from vaibhavmagarwal/retrigge-pinjobs

Bug 1121998 - Add the ability to retrigger all pinned jobs
Attachment #8635764 - Flags: review?(mdoglio) → review+
(Assignee)

Comment 17

3 years ago
Created attachment 8644530 [details] [review]
pr #840

Mauro, I saw a regression in the previous patch in that retriggers for all the pinned jobs were not getting published to pulse. This patch fixes it.
Attachment #8644530 - Flags: review?(mdoglio)
Attachment #8644530 - Flags: review?(mdoglio) → review+
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/6b14228ccd7051ab99c49072a517659e34636a0b
Bug 1121998-Fix publishing retrigger action for multiple jobs to pulse

https://github.com/mozilla/treeherder/commit/12658478fa10f4cbe4ef3bfd59a0ca89d1f4f965
Merge pull request #840 from vaibhavmagarwal/fix-pinjobs

Bug 1121998-Fix publishing retrigger action for multiple jobs to pulse
(Assignee)

Comment 19

3 years ago
This has been deployed on the treeherder, and I verified that it is executing properly. Closing the bug.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.