Make a Test Failure Dashboard

NEW
Assigned to

Status

Testing
ActiveData
2 years ago
a year ago

People

(Reporter: ekyle, Assigned: ekyle)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

2 years ago
Make a dashboard, using ActiveData, detailing failures down to the individual test.

Here is a super rough initial version, just to give a taste.  Click on a test failure.

http://activedata.allizom.org/tools/failures2.html
(Assignee)

Comment 1

2 years ago
Created attachment 8671573 [details]
2015-10-08 16-06-51.png

The revisions are simply in push_date order, the y-axis is the test duration.
(Assignee)

Comment 2

2 years ago
I hope you drink coffee.  You should go get one.  This page is really slow; it will download thousands of errors from the past day.   Too many errors in one day, and you will crash your browser.
(Assignee)

Comment 3

2 years ago
I had some time to work on this.  It is in debug mode so it is faster.  The new location will be my people page [1].  The code [2] is now separate from the ActiveData project.

Since it is debug mode, you only a sample of the failures from today are showing right now.  If you exclude some categories, and reload, you will get another sample.  Due to the size of the `unittest` table, and given the small number of machines we are limited to, we must build a cache for the full error set, or optimize ActiveData to deal with the query.

[1] http://people.mozilla.org/~klahnakoski/testfailures/failures.html
[2] https://github.com/klahnakoski/TestFailures
(Assignee)

Comment 4

a year ago
To make this useful it must point out the most egregious failures (eg fails high percent of the time), or point out recent increases in intermittents.  Neither of these is hard to detect, the hard part is sorting the approximately one million combinations, most of which are uninteresting, and making it fast. 

The million aggregates is too large for memory, or network, so it requires a container to hold them and query them.  I believe the solution is a materialized view over the whole, with a script keeping that view up to date.  Implementing materialized views is too much work for this objective; but defining the API for materialized views, and faking the implementation for this use case, should be in scope.
(Assignee)

Comment 5

a year ago
Maybe prioritizing all test failures by "interestingness" can be pushed outside the scope of this bug.  If we have a simple text search, then we can view any test over time.  We can push the problem of highlighting "interesting" to the regression-detection module.

Having a store of alerts can be used in this dashboard at a later time.
You need to log in before you can comment on or make changes to this bug.