Make a dashboard, using ActiveData, detailing failures down to the individual test. Here is a super rough initial version, just to give a taste. Click on a test failure. http://activedata.allizom.org/tools/failures2.html
Created attachment 8671573 [details] 2015-10-08 16-06-51.png The revisions are simply in push_date order, the y-axis is the test duration.
I hope you drink coffee. You should go get one. This page is really slow; it will download thousands of errors from the past day. Too many errors in one day, and you will crash your browser.
I had some time to work on this. It is in debug mode so it is faster. The new location will be my people page . The code  is now separate from the ActiveData project. Since it is debug mode, you only a sample of the failures from today are showing right now. If you exclude some categories, and reload, you will get another sample. Due to the size of the `unittest` table, and given the small number of machines we are limited to, we must build a cache for the full error set, or optimize ActiveData to deal with the query.  http://people.mozilla.org/~klahnakoski/testfailures/failures.html  https://github.com/klahnakoski/TestFailures
To make this useful it must point out the most egregious failures (eg fails high percent of the time), or point out recent increases in intermittents. Neither of these is hard to detect, the hard part is sorting the approximately one million combinations, most of which are uninteresting, and making it fast. The million aggregates is too large for memory, or network, so it requires a container to hold them and query them. I believe the solution is a materialized view over the whole, with a script keeping that view up to date. Implementing materialized views is too much work for this objective; but defining the API for materialized views, and faking the implementation for this use case, should be in scope.
Maybe prioritizing all test failures by "interestingness" can be pushed outside the scope of this bug. If we have a simple text search, then we can view any test over time. We can push the problem of highlighting "interesting" to the regression-detection module. Having a store of alerts can be used in this dashboard at a later time.