[TestGroup UI] Explore if ActiveData can be used to get pass/fail ratios on a per-test basis

NEW
Unassigned

Status

P3
normal
a year ago
9 months ago

People

(Reporter: camd, Unassigned)

Tracking

Details

(Reporter)

Description

a year ago
For a failing test, it would be good to know how many times the same test also passed on the same Push.  Treeherder can not hold this amount of data, so we are hopeful ActiveData could be leveraged this way.
(Reporter)

Updated

a year ago
Assignee: nobody → cdawson
(Reporter)

Updated

a year ago
Blocks: 1337488
I guess you are concerned about retriggers?  This usually happens on ?try?.  Do you have an example?
(Reporter)

Comment 2

a year ago
Yeah, this came out of a conversation with Joel Maher (cc'd) wrt searching for intermittent ratios in the TestGroup UI.  He'd like to be able to see, for a given test, how many times it passed and failed in that push.  As I recall being able to see this ratio not just for the push in question, but for a certain number of pushes prior would be ideal.  

Is this something, given a revision and a test name (path), that we could query for in Active Data?  It would be great if I could just build a link to your UI to get this data.
Flags: needinfo?(klahnakoski)
Yes, I can give you an example query. Problem is most pushes run a test just once, if at all. Test results from try pushes are not included in ActiveData right now.  So, you may need some historical data instead, which I can also provide, given a branch, pushdate, and a test name.
Flags: needinfo?(klahnakoski)
(Reporter)

Comment 4

a year ago
OK, cool.  I think that's fine.  It will just be useful in the event there ARE multiple runs of the same test.  I'll get back to you when I start tackling this.  Thanks!
(Reporter)

Updated

a year ago
Component: Treeherder → Treeherder: Test-based View
(Reporter)

Comment 5

a year ago
Bug 1399923 has an example query that is similar to what we might want to do here.  So just noting it down here.
See Also: → bug 1399923
also neglected oranges might help:
https://charts.mozilla.org/NeglectedOranges/index.html

Keep in mind configurations, we have 30+ configurations most of our tests run on so querying for failures could be misleading.  I am leaning towards expanding our definition of tests to be a test == testcase+configuration (where configuration could be something like "windows 7 debug stylo-disabled").

Either we ignore configurations, treat them all unique, or account for platforms (linux, osx, win7, win10, android).

I would personally vote for breaking it down by platform, but there are many arguments for doing more than that.  By platform could give us:

# bugs   testname    #failed   #runs  %failed  expected_%failed
   2     test_it.js     3        4       75        5

^ that is the ignoring configurations


# bugs   testname    Linux         OSX         Win       Android 
   2     test_it.js  1/2,5/100   0/0,0/51   2/2,10/100  0/0, 0/49      

^ this is per platform where windows has 2 failures out of 2 runs and in the last week we had 10 failures out of 100 runs on trunk.

As you can see the information gets harder to represent, but we can be smart with things- so lets say we do have per platform data, we can color code the cells or data green, yellow, orange- so that the above example would be:
linux = yellow
osx = green
windows = yellow
android = green

ideally you would be able to click a button and retrigger the failing jobs so you can get all green/orange (say you need 3 data points minimum)


As for what the user sees, it would be a table with either all this data, or summarized to a higher point in 3 seconds:
Failures needing more data:
...

Failures that are failing too frequently or not seen in bugs:
...

[collapsed] Failures that are well known, and you can ignore
Recall that |mach test-info <test-name>| uses activedata to report this type of information for a given time period (not restricted to a push). From my experience with that, I have two concerns:
 * Possible confusion over platforms and time period, like in comment 6, perhaps to be mitigated by an approach like that described in comment 6.
 * In its current state, activedata has too much latency to use in a UI.
one thought if we determine that activedata is not responsive enough is to have a temporary store (even a .json file, or a simple table in a db) that is populated every other hour from activedata for the most recent set of failures we have seen.  While this isn't perfect, it adds additional data to our workflow.
davehunt's dashboard would benefit from the exact same "temporary store", or cache.  

I would like to see the cache implemented on the Elasticsearch side of the ActiveData web service; shunting queries that match some pattern to pre-computed caches. This strategy is common in large data systems; the caching generalizes well to other tables and dimensions, it keeps the client simple, and it can run faster since it is closer to the data.
:ekyle, how could we implement the cache on elasticsearch?  is that something you can do in the new ES5.0?  how much work is this, are there prior art examples or docs to learn more about what you are thinking?
Flags: needinfo?(klahnakoski)
We can implement in stages:

1. Primitive - we setup a cron job to fill table with aggregates we desire. The client uses that table.  We do this now for code coverage and test failures.  This is only hard now because ES1.7 has bugs.
2. Materialized views - We write a module that will let you register a query: the module will be responsible for running the cron and keeping it up to date. Ingest routines can also keep the materialized view up to date. Clients still use the materialized views like they would any table.
3. Shunting - A module that will inspect the incoming query and shunt it to the materialized view that can answer the query fastest: Shunting is easier than in a database (relational) because the data warehouse data model (hierarchical) is simpler. The query model is also simpler.
4. Prediction - Let the machine decide which materialized views should exist based on query volume.

[1] Definition - https://en.wikipedia.org/wiki/Materialized_view
[2] A MySQL implementation - https://github.com/greenlion/swanhart-tools/tree/master/flexviews
[3] Why - https://www.compose.com/articles/its-a-view-its-a-table-no-its-a-materialized-view/ (intro only)
Flags: needinfo?(klahnakoski)
(Reporter)

Updated

a year ago
Priority: -- → P2

Updated

a year ago
No longer blocks: 1337488

Updated

9 months ago
Assignee: cdawson → nobody
Priority: P2 → P3
You need to log in before you can comment on or make changes to this bug.