Open Bug 1448114 Opened 6 years ago Updated 2 years ago

Implement functions for mostly_is() and mostly_ok() for intermittent oranges

Categories

(Testing :: Mochitest, enhancement)

Version 3
enhancement

Tracking

(Not tracked)

People

(Reporter: jaws, Unassigned)

Details

Currently when we have intermittent oranges that don't necessitate a quick fix or deeper investigation we will either disable the test or change the specific condition from an ok() to a todo().

We should introduce a set of mostly_*() functions that will act the same as their suffix but will note the failure specially in the logs.

With this, if the orangebot notices that a failure rate has spiked it can trigger an alert.
I really like this idea- the biggest question is how do we know if something is mostly() green- maybe metadata in the tree or something in manifests- and if it fails we can retrigger X times to prove it meets the existing rate of failure.
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #1)
> I really like this idea- the biggest question is how do we know if something
> is mostly() green- maybe metadata in the tree or something in manifests- and
> if it fails we can retrigger X times to prove it meets the existing rate of
> failure.

How does orangebot currently work? Couldn't we leave a similar trace to whatever orangebot is looking for in the logs/artifacts that indicates which assertions todo-failed and which todo-succeeded?

We could theoretically even do that without adding new mostly_ functions and just by amending todo()...
the current system works like this:
* job turns orange and sheriffs pick a bug summary that looks like the right match and manually stars it
** auto classify works on about 10% of the bugs in place of a sheriff
* if a job is failing too often or frequently, a sheriff might realize that it is mostly perma fail and mention it
* all staring that is done on the tree goes to a database (orangefactor)
* we run a cronjob on the database to analyze failures and report in bugs

the failures are based on the error messages printed to the log, not necessarily specific asserts or test cases.  We have thought of building a firefox web extension that reads treeherder and orangefactor and changes the color of a job if all failures are a known set of failures which are not in a resolved state.
:ekyle- you had some thoughts on the "web extension" idea- do you have other things that could help add to the context here or suggestions of ways to help solve the "orange" problem in most cases?
Flags: needinfo?(klahnakoski)
In general, I would advocate for some way to mark *tests* as "intermittent" so that Treeherder can ignore them when displaying *job* results. It seems :jaws is advocating for the same.

The big problem is Treeherder's data model is limited to "jobs", "failure_log_lines" and "bugs"; it has has no model for a "test". Getting TH to perform test-based actions or provide test-based data will be messy and error prone.

My solution, was to track all tests and their failure rate with ActiveData. This would allow third parties build further tools: Show the highest failing tests; show tests with recent failure spikes; and make web extensions to turn oranges green on try runs; etc. There was no interest my solution at the time, so my work did not get past the prototype stage.
Flags: needinfo?(klahnakoski)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.