Open Bug 1443867 Opened 6 years ago Updated 2 years ago

Create tool to bisect intermittent test failure regressions

Categories

(Tree Management :: Treeherder, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: Gijs, Unassigned)

References

Details

Sometimes, a test failure either starts up as an intermittent or becomes a much more frequent intermittent. In both cases it would be helpful if it were possible to isolate what changeset caused the increase in intermittent failures.

We can currently kind of do this manually by rooting around the treeherder web interface to find out what chunk of a test framework a failing test runs in, possibly mixed with backfilling to even make the test run, then retriggering that lots of times, then comparing the results on different pushes to check what's going on. This is time-consuming, and almost all these steps could be automated based on the information OrangeFactor already has.

One idea I suggested was having a web tool that you just hand an orangefactor reference, that then bisects when failures started from the first occurence, based on the frequency over the 24 hours after that point, automatically finding and selecting the right chunks across pushes.

Geoff had another suggestion:

(In reply to Geoff Brown [:gbrown] from bug 1443364 comment #4)
> I think intelligent-backfilling&retriggering is a great idea...but it does
> have a lot of moving parts. I have some fear that such a project might end
> up creating a feature that works like magic one day of the year but
> otherwise has us chasing a million bugs.
> 
> I have aspirations of combining test-verify with backfilling to provide a
> similar feature: In treeherder, request regression tracking for a named test
> and changeset; then run test-verify on that test and work backward in time
> until the test runs reliably. However, this is just an idea -- I haven't
> even thought about coding this yet, there's no bug on file or anything.

So I'm filing a bug to get the ball rolling. :-)

The benefits here are:
- engineers don't have to spend a lot of time manually backfilling tests, searching test output logs for which chunk a test ran in, and then retriggering loads of stuff
- automation could more easily make sure the retriggers are lower priority, and select exactly the right amount to be statistically confident at isolating a change in frequency, without going overboard by just hammering 'retrigger' (which is a thoroughly human thing to do), as well as doing minimally-require-retriggers by actually doing strict bisection. (and/or optimize the amount of time spent waiting by retriggering more csets in parallel, etc. etc.)
- it becomes more feasible to investigate more changes in intermittently failing tests, because there's less overhead in finding out what csets regressed which tests
- we get a greener tree. :-)
Component: Treeherder → Intermittent Failures View
There is now an awkward way of running test-verify on a specified test path, on a revision and earlier pushes ("backfilling"). :jmaher and I are trying to make those TV-bf runs more reliable. If that works out, it would be nice to clean up the UI: perhaps auto-filling the test path from the treeherder Failure Summary?
See Also: → 1465117
Priority: -- → P3
Component: Intermittent Failures View → TreeHerder
You need to log in before you can comment on or make changes to this bug.