Closed Bug 1337243 Opened 7 years ago Closed 3 years ago

Code Coverage Metadata Requirements

Categories

(Testing :: Code Coverage, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ekyle, Unassigned)

References

Details

This bug is for planning the metadata and schema required to answer a questions of bug 1337241, specifically:

> What do we need to know to solve these problems?  What is the schema?

There are a variety of ways our coverage will be incomplete, and each will require an action, or a mitigation strategy. The schema should have all the information required to carry out any and all of those actions.

Here are a few questions to start this bug:

* Where is the data? - This bug discusses a schema as if it in a central database, but some data is locally available to the agent that will be acting.  Although the local data is disjoint, and might never enter a central store, it still needs to be considered in the event peripheral data is required from elsewhere.
* Where is the data? - For the data that does go into a central store, what does the store look like (?ActiveData?).
* How is he data accessed? - Are we building a reliable system, or are clients designed to handle data access problems?
* For each action, what query provides the information required to perform that action?
* What is the data shape? - Shape can be dictated by the queries and the sources it comes from.  We may also consider this metadata to be useful beyond our limited objective.
The metadata must apply to the particular revision we are collecting Code Coverage for.  This means the metadata will have multiple versions, annotate by time and revision.
Assignee: nobody → mitch9654
Mitch, 

DXR [1] shows the *.ini files that control test execution. I believe between these lists, and what we see in Treeherder as scheduled jobs, we know everything that should have run. The C/C++ tests do not need the detail in these *.ini files yet, but the JS code coverage we are collecting now probably do.

I do not know if all test suites use these *.ini files.

Greg, 

What tests have code coverage enabled? Where is the run configs located in the tree?

Thank you

[1] https://dxr.mozilla.org/mozilla-central/search?q=skip-if&=mozilla-central
Flags: needinfo?(gmierz2)
Joel,

Is the nightly code coverage scheduled now?  Please add a pointer to where that task is defined.
Flags: needinfo?(jmaher)
The list of test suites that currently have code coverage enabled as listed here: https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/test-sets.yml#86,98 . Soon, ccov will also have the entire list of common-tests implemented.

It is true that not every test suite uses ".ini" files. For example, from the test-set for ccov, we have that gtest uses a "moz.build" file to describe which tests should be run: https://dxr.mozilla.org/mozilla-central/source/gfx/tests/gtest/moz.build

Also, Reftest uses "reftest.list", and jittest (coming soon) is another suite which finds tests automatically when they are specified in a given folder.

For jsdcov however, they all use ".ini" files so far. Here's a list of all the .ini files in the source tree: https://dxr.mozilla.org/mozilla-central/search?q=path%3Aini&redirect=false . As you can see, there are many, and have differing names based on which test-suite will be using them.

For mochitest, at the least, if you want to know what manifests were used in a given chunk, you can look at "manifests.list" from the job details: https://treeherder.mozilla.org/#/jobs?repo=try&author=gmierz2@outlook.com&selectedJob=79408849 .
Flags: needinfo?(gmierz2)
I'm assuming that the "client" referred to is an end-user client, like Eric's Code Coverage UI (https://github.com/ericdesj/moz-codecover-ui), and that the client isn't the actual infrastructure (TaskCluster).

* Where is the data?
There will be an external client accessing the data. What options do we have for storing the data in a centralized way other than ActiveData?
Also, what is the alternative to storing data in a centralized manner? The instances that run the code coverage tasks could be long demolished before the client requests the data, so data must be collected ... right?

* How is the data accessed?
So far, I'm leaning toward having the system be reliable. Pushing data access concerns to the client seems unnecessary. The answer to this question heavily depends on ^, though, because I'm missing infrastructure knowledge.

* For each action, what query gets the necessary data?
Depends on our centralized/decentralized data options.

* What is the data shape?
So, the primary information that we need is "Is the code coverage for build $x accurate?" This question can be fully answered by two pieces of information:
* The status of each test - PASS/FAIL/SKIPPED. All test tasks that I have checked so far produce a report of test statuses.
* Whether the taskcluster task that ran the code-coverage build finished executing correctly. If it failed partially through, then not all tests will executed, and the "test report" ^ will be incomplete

The test report structure of each task I've investigated so far:
	crashtest: crashtest_raw.log (structured JSON)
	firefox-ui-functional-local: remote.xml
	firefox-ui-functional-remote: remote.xml
	xpcshell: xpcshell_raw.log (structured JSON)
	... (need to check all others in https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/test-sets.yml#94-116)

The code coverage itself is in either gcc's format, or in LCOV's text-based format:
	CPP: gcc's GCDA/GCNO
	JS: LCOV format: https://bugzilla.mozilla.org/show_bug.cgi?id=1191289
	RUST: (open issue) gcc's GCDA/GCNO https://github.com/rust-lang/rfcs/issues/646

I'll continue investigating the "test report" data for each test task.
I have two questions in the meantime:
* What centralized data-storing alternatives do we have to ActiveData (Something completely independent, like a separate database)? What decentralized alternatives do we have?
* What clients will access this data? If I understand correctly, Eric's Code Coverage UI is one of them, right? (https://github.com/ericdesj/moz-codecover-ui)
Flags: needinfo?(klahnakoski)
### Where is the data?

First, for considering how to store the data, I don't believe that ActiveData is a great fit:
* There's a lot of data being entered into the data store, and the result of each individual test needs to be tracked to truly compare code coverage over time - different failures will affect code coverage differently. Unfortunately, this high-data-input isn't great for ActiveData, which has a limitation of low add speeds (https://wiki.mozilla.org/Auto-tools/Projects/ActiveData#Limitations)
* Additionally, ActiveData doesn't support non-structured logs, such as those created by cppunit tests, and JS-based gala suites, like Gij. See https://wiki.mozilla.org/Auto-tools/Projects/ActiveData#Limitations

Considering the high insertion rate, I believe that using Postgresql to store the data will work efficiently and effectively. Structuring the data correctly will allow convenient and performant querying.

For designing the structure of data, I've compiled this list of concerns:
* Tests aren't ordered, and can be added, removed, or changed in any order. Therefore, tests cannot be "indexed" per-suite
* The test identifiers/names won't be unique per-suite
* There are different chunks for each test suite
* Not all chunks will be executed, and some will be executed more than once
* We'll need the results of each individual test run. Limiting ourselves to just the total passed tests, total failed tests, and total ignored tests won't allow us to deterministically track progress - different test failures will affect code coverage in separate ways. Therefore, when handling this data for reporting and prediction, knowing the state of each test will be critical.
* Due to heavy load for querying total passes/failures/ignores, these metrics should be denormalized for each suite.

I've built a relational model for this structure. See http://i.imgur.com/6rq8QSa.jpg (https://drive.google.com/file/d/0B_6Z-WVm6ivHOFJ0TXNXLVhYS1E/view?usp=sharing).
This model allows very flexible querying, from simple (how many chunks were executed, and how many should have?) to complex (between two chunk executions of the same revision, which tests had different pass/fail/ignore results?)

### How is the data accessed?

We will build a reasonably reliable system to report on code coverage metadata. Clients will not need to manage and persist data locally, they will depend on this system to be available. However, when downtime (however minimal) does occur, the client should inform the user of the disconnect.

### For each action, what query provides the information required to perform that action?

* Number of passed tests for revision 123ABC on suite "cppunit", chunk 1
|SELECT tests_passed from suite_chunk where revision = '123ABC' AND suite_name = 'cppunit' AND chunk = 1 LIMIT 1|
* # of chunks expected for suite "cppunit", revision 123ABC
| SELECT total_chunks FROM suite WHERE revision = decode('123ABC', 'hex') AND suite_name = 'cppunit'|
* # of chunks executed of suite "cppunit" for revision 123ABC
|SELECT COUNT(key) FROM suite_chunk WHERE revision = decode('123ABC', 'hex') AND suite_name = 'cppunit' DISTINCT|
(need to distinct because a chunk can be re-run multiple times)
* Total suites executed for revision 123ABC
|SELECT COUNT(suite_name) FROM suite WHERE revision = decode('123ABC', 'hex')|
* View detected flaky tests for revision 123ABC and suite "cppunit", chunk 1
|SELECT test.name FROM test RIGHT JOIN suite_chunk ON suite_chunk.key = test.suite_chunk_key WHERE suite_chunk.revision = decode('123ABC', 'hex') AND suite_chunk.suite_name = 'cppunit' AND suite_chunk.chunk_index = 1 GROUP BY test.name HAVING COUNT(DISTINCT test.result) > 1|
* View detected flaky tests for revision 123ABC and suite "cppunit", all chunks
|SELECT test.name FROM test RIGHT JOIN suite_chunk ON suite_chunk.key = test.suite_chunk_key WHERE suite_chunk.revision = decode('123ABC', 'hex') AND suite_chunk.suite_name = 'cppunit' GROUP BY test.name HAVING COUNT(DISTINCT test.result) > 1|

### What is the data shape?
I've built a relational model for this structure. See http://i.imgur.com/6rq8QSa.jpg (https://drive.google.com/file/d/0B_6Z-WVm6ivHOFJ0TXNXLVhYS1E/view?usp=sharing).

### Walkthrough of a revision's lifecycle
In the same way that TreeHerder subscribed to TaskCluster pulse events, the code coverage system can subscribe the state of task executions:

1. Decision task finishes executing. An ETL pipeline parses the task-graph and for each scheduled suite, it will insert a row to the "suite" table (including the total chunks expected for the suite). When the decision runs multiple times, it only inserts rows to the "suite" table that have a unique (revision, suite_name) pair - according to configured database constraints.
2. Upon each suite chunk finishes executing, it inserts a row into "suite_chunk" keeping track of the new row's key. It loads each test's result into the "test" table, referencing the "suite_chunk" key
3. Re-running a suite's chunk is handled gracefully by the database. There will be multiple "suite_chunk" records with the same revision and suite_name, but different identifiers (keys).
TL;DR:

Requirements for code coverage metadata:
* Tests aren't ordered, and can be added, removed, or changed in any order. Therefore, tests cannot be "indexed" per-suite
* The test identifiers/names won't be unique per-suite
* There are different chunks for each test suite
* Not all chunks will be executed, and some will be executed more than once
* We'll need the results of each individual test run. Limiting ourselves to just the total passed tests, total failed tests, and total ignored tests won't allow us to deterministically track progress - different test failures will affect code coverage in separate ways. Therefore, when handling this data for reporting and prediction, knowing the state of each test will be critical.
* There's a lot of data being entered into the data store, and the result of each individual test needs to be tracked to truly compare code coverage over time - different failures will affect code coverage differently. This might cause a performance bottleneck when used with ActiveData, due to potentially low add speeds (https://wiki.mozilla.org/Auto-tools/Projects/ActiveData#Limitations)
* ActiveData isn't currently collecting all code coverage data, such as that created by cppunit tests, and JS-based gala suites, like Gij. See https://wiki.mozilla.org/Auto-tools/Projects/ActiveData#Limitations

This issue is being left open until 1337241 is complete.
Flags: needinfo?(klahnakoski)
Assignee: mitch9654 → nobody
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.