Open Bug 1322433 Opened 8 years ago Updated 1 year ago

Make it easier to retrigger a job with failing test with extra logging and debugging options

Categories

(Testing :: General, defect, P3)

defect

Tracking

(Not tracked)

REOPENED

People

(Reporter: wlach, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

In the Stockwell discussion, :jesup suggested that we should make it easier to (re)trigger a test job with specific options:

* Extra logging (higher mozlog levels)
* Run just one specific test (probably the failing test) with specific options

Right now, it's possible to do this by formulating a try push with the right set of options, but there's probably only a very small people that know how to do this. A one click loaner is a possible alternative, but it still requires a bunch of babysitting/setup of the loaner.

Not exactly sure what the solution here is, but it should be more accessible than that. The ideal, if possible, would be to just have some kind of button / option in Treeherder to perform this type of action. i.e. "This test failed in a suspicious way, I want to click a button and have useful debugging information"

Hopefully that's an accurate summary of the discussion/request, let me know if I'm missing something.
I really like this bug, there is another use case we discussed earlier this year in bug 1241535.

Even if we can only do this for taskcluster, that is fine- I see the ideal solution as a button in treeherder that pops up a small form prefilled out with the suggested debug flags for logging, and/or other mozharness options to make the harness behave properly.

As discussed in bug 1241535, we would need to indicate in the UI that this is not a typical job and to ensure that the data from the log is not accumulated in other tools (i.e. perfherder summaries, sheriff portals/autoclassification, mozreview summary).  While I am not a fan of tier-3 or hidden jobs, this might fit well in there.
Blocks: 1307197
There are few ways this can be implemented. Ignoring Buildbot for now.

Once the developer has selected the job they want to re-schedule with modified options we can then take them there in various ways.

1 - The developer has logged in to TH with TC credentials and we can schedule tasks for them hitting TC task creation APIs

2 - The developer is redirected to the task creator [1] and let him modify the task there

3 - A pulse message is sent with some extra information (not the whole task) which we can pass to Mozharness + selected task ID
  * Pulse_actions handles the scheduling
  * A TC pulse listener handles the scheduling

Adding dustin to check if I'm missing something.
I would like to see #1 happen. It should not be too difficult.

[1] https://tools.taskcluster.net/task-creator/
I don't like #2 as it is many steps and possibilities to get incorrect- and it is not discoverable via treeherder- this makes it harder to share unless you have the exact link.  #1 sounds good- especially if there are task creation APIs which exist :)
hmm, wait, if #1 is just an automated #2, then it is hard to discover without the exact taskcluster link- unless there is a better way to easily find this information in the future from looking at a push on treeherder, I would prefer another method.
In all of the options I listed above I expect a TH UI to help the developer select certain options.
No discoverability issues besides find the UI element on TH to get to the selection page.

We then decide if to:
1 - Take that information and schedule the task directly via TC APIs
2 - Fill up the task-creator with the right information (the dev can still make further modifications before hitting 'create task')
3 - Enough info is sent over Pulse and some tool will fulfill the scheduling
the issue I have is that by looking at a push on treeherder, how will I know how to find the log and artifacts of the created task?  Whenever I edit&create a task, the job is not seen on treeherder and I have not found a way to get the information when I lose the link.  I imagine the default method will be "run task with preset options", but it would be nice to have more options for advanced task editing- there is a lot of value there although almost all developers will find it overwhelming.
With TH routes/scopes you will be able to see directly on TH and won't need to keep track a link to the task created.

IIRC ATM developers don't get TH scopes.
My initial thought is that we should use the task-creator for this. I agree that as it is now, the task-creator is too confusing to figure out what to do.. but we should be able to improve the UX there a bit, and fix bugs like adding an option to make them visible on treeherder (if they aren't already).

I suspect building a new system in treeherder will be a lot more work than iterating on the task-creator. My vote would be option #2 under the caveat that we spend time improving the task-creator.
Bearing in mind that task-creator is not gecko-specific, that might be a bit tricky.  That said, task-creator is a few hundred lines of React, so it shouldn't be too hard to create a similar thing elsewhere, be that in releng web or treeherder.
I think I agree with Dustin that the task creator isn't the right tool for this task, as it exposes too much of the internals of Taskcluster which isn't relevant here. Instead, I think a custom dialog inside Treeherder would be a better/easier user interface.

Tentative plan:

1. Have taskcluster jobs upload some kind of json file indicating extra configurable options that may be passed to them
2. Expose a GUI from Treeherder that reads from the above list, let's the user pick/enter the ones they want, and then retrigger the job using them.

Going to land a loose dependency on bug 1285007, the new menu there would be a great place to expose this option.
Assignee: nobody → wlachance
Depends on: 1285007
This sounds like an action task, actually.  So far we don't have support for parameterizing action tasks, but that's probably relatively straightforward to add (if laborious, requiring specifying a list of fields and their types).
This is a great idea, and I just want to say that I agree that we should have some sort of specialized UI for it. What precisely that looks like, I dunno. The simplest thing I could imagine would be an entry in Treeherder's "Retrigger" button menu that says "Retrigger with extra logging", which would only cover about half of jesup's request. If we let tasks specify the knobs that could be fiddled then we could have that menu item pop up a little dialog to choose specific options, like:
```
[x] Enable verbose logging
[x] Enable WebRTC logs
...
[x] Run just tests in this path: [dom/media/webrtc   ]
```
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #12)
> This is a great idea, and I just want to say that I agree that we should
> have some sort of specialized UI for it. What precisely that looks like, I
> dunno. The simplest thing I could imagine would be an entry in Treeherder's
> "Retrigger" button menu that says "Retrigger with extra logging", which
> would only cover about half of jesup's request. If we let tasks specify the
> knobs that could be fiddled then we could have that menu item pop up a
> little dialog to choose specific options, like:
> ```
> [x] Enable verbose logging
> [x] Enable WebRTC logs
> ...
> [x] Run just tests in this path: [dom/media/webrtc   ]
> ```

Yes, I chatted with :dustin and a few others on #taskcluster about something like exactly this. I should have a more concrete proposal soon. :)
Jonas just proposed a detailed design on retriggerable tasks which we could use as the basis for this: 

https://groups.google.com/d/msg/mozilla.tools/VeyYYCVuzak/ehaF_ScqBAAJ

I imagine we will file seperate bugs for all the taskcluster and/or treeherder components that would go into this, but we can continue to use this bug to track the overall status of this feature.
Depends on: 1332506
Attached file bug submitting
Ok, so update, this is coming along. I have an in-tree implementation of mochitest retriggering set up, and I'm testing it via a try push. I think I have just about everything wired up, though I'm hitting a snag when actually submitting the job to taskcluster, getting the above error in a notification (after a fresh login on a private browsing instance).

Brian, do you know what might be up here? I based my code to submit the modified task on your work. The code is here:

https://github.com/mozilla/treeherder/blob/custom-job-actions/ui/js/controllers/tcjobactions.js#L30
Flags: needinfo?(bstack)
(In reply to William Lachance (:wlach) (use needinfo!) from comment #15)
> Created attachment 8838691 [details]
> bug submitting
> 
> Ok, so update, this is coming along. I have an in-tree implementation of
> mochitest retriggering set up, and I'm testing it via a try push. I think I
> have just about everything wired up, though I'm hitting a snag when actually
> submitting the job to taskcluster, getting the above error in a notification
> (after a fresh login on a private browsing instance).
> 
> Brian, do you know what might be up here? I based my code to submit the
> modified task on your work. The code is here:
> 
> https://github.com/mozilla/treeherder/blob/custom-job-actions/ui/js/
> controllers/tcjobactions.js#L30

Brian was kind enough to work through this with me on irc. We've filed bug 1340668.
Depends on: 1340668
Flags: needinfo?(bstack)
Depends on: 1341727
I'm filing a review request for some work to add tags to mochitest jobs. The rest of this work is close, but not quite there, and I didn't want to let the tag work bitrot anymore (it had already broken once).
Comment on attachment 8840631 [details]
Bug 1322433 - Make it possible to add tags + add a mochitest tag to mochitest jobs

https://reviewboard.mozilla.org/r/115084/#review116600
Attachment #8840631 - Flags: review?(jopsen) → review+
Pushed by wlachance@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9a85f428c314
Make it possible to add tags + add a mochitest tag to mochitest jobs r=jonasfj
https://hg.mozilla.org/mozilla-central/rev/9a85f428c314
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Sorry, should have marked this bug as leave-open. We're not done here yet.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla54 → ---
Ok so making some more progress. Thanks to :ahal, I got the idea of running the mach command to execute the mochitest *after* a "setup only" mozharness run. This simplifies things considerably, though it also introduces the problem of how to figure out the required arguments to mach that we need to run things here, which includes (at least):

* Whether to enable e10s
* mochitest flavor

I could make these environment variables, tags, or something else entirely in the taskcluster configs. Thoughts?
Depends on: 1343327
reviewboard seems to be getting angry with me for having multiple review requests belonging to this bug, so let's just split off the mochitest parts of this (which I wanted feedback on) to bug 1343327
Depends on: 1347696
Depends on: 1347698
Depends on: 1347732
Comment on attachment 8847814 [details] [review]
[treeherder] wlach:1322433 > mozilla:master

mislabeled patch, sorry for the noise
Attachment #8847814 - Attachment is obsolete: true
Depends on: 1348833
Depends on: 1347654
Assignee: wlachance → nobody
Not actively working on this at the moment, current state described here:

https://wlach.github.io/blog/2017/04/easier-reproduction-of-intermittent-test-failures-in-automation/
Priority: -- → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: