Closed Bug 890116 Opened 11 years ago Closed 7 years ago

[Tracking] Stand up Code Coverage build and test runs

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1278402

People

(Reporter: cmtalbert, Unassigned)

References

(Depends on 3 open bugs, )

Details

We would like to turn on code coverage [1] for our tests on an intermittent basis.

When Decoder and Cranmer did this on Try they dumped coverage data into the logs directly. That's not scalable or easy to handle in an ongoing basis. We need to upload these coverage logs to some place where they can be picked up and analyzed.

That's the first step in turning these on automatically. The upload work is going to be handled in bug 749421.

Next we'll need a buildbot build configured and scheduled to run the coverage builds and tests. Note that the coverage build should *NOT* run performance tests. Note that this should also run on a once-a-week schedule at first, potentially as often as once-a-day, but I'm open to allowing capacity issues to decide how often we can support these and on what branches. My first ask is once-a-week on mozilla-central.

I don't think a dashboard for viewing results should be part of this at this time. Simply having the data will allow interested parties and the community to build dashboards and we should encourage that.

[1] https://developer.mozilla.org/en-US/docs/Measuring_Code_Coverage_on_Firefox
Component: Release Engineering → Release Engineering: Automation (General)
QA Contact: catlee
Product: mozilla.org → Release Engineering
Some brief summaries on how to collect code coverage:

Linux/Linux64:
1. CFLAGS, CXXFLAGS need -fprofile-arcs -ftest-coverage
2. LDFLAGS needs -fprofile-arcs -ftest-coverage -lgcov
3. Upload all .gcno files from the build dir to a package somehow. Directory structure needs preservation.
4. Use GCOV_PREFIX when executing tests to put the gcda files in a specific directory.
5. The scripts to recombine the data need to make the directory structures of the .gcno and .gcda agree with each other. Not too hard.

OS X:
1. Same as Linux
2. LDFLAGS needs -fprofile-arcs -ftest-coverage -lprofile_rt [need compiler-rt when you build Clang/LLVM]
3. Same as Linux
4. Same as Linux
5. Other note: gcov on Linux will barf on LLVM's inputs. OS X's gcov may be able to rationalize this information better; I don't have an OS X system to test.

Windows:
Good documentation here doesn't exist. Also, every version of MSVC seems to like doing this somewhat differently :-(. Reports from the interwebs suggest that we need to run every exe and dll through vsinstr /coverage (and they need to link with -profile according to my short test), and we need to start/stop vsperfmon before and after the tests. The output .coverage file from vsperfmon is apparently not easily portable, and there is no command-line tool to turn that into a more-readable .xml file :-(

Gonk/Android:
In theory, this should be the same as Linux. I've never attempted to try these platforms, so cross-compiling may need some more tweaking to get working output.
I am running the coverage for GNU/Linux 64 bits. Results are here:
http://people.mozilla.org/~sledru/reports/coverage/

and managed by a Jenkins instance.

I am running these tests:
./mach mochitest-plain
./mach mochitest-a11y
./mach mochitest-browser
./mach mochitest-chrome
./mach mochitest-devtools
./mach reftest
./mach check-spidermonkey
./mach xpcshell-test
./mach cppunittest
./mach crashtest
./mach crashtest-ipc
./mach gtest
./mach jetpack-test
./mach python-test
./mach reftest-ipc
./mach talos-test
./mach webidl-parser-test

And removing the coverage of the tests them self.
I am trying to run the coverage on Mac OS X.
--coverage is replacing -lprofile_rt
However, with XCode 5.1.1 (last release), gcda files are not generated (something to do with the gcov flush)
http://qualitycoding.org/ios-7-code-coverage/
Joshua has produced a similar report: http://www.tjhsst.edu/~jcranmer/m-ccov/
The tools I've been using for my code coverage-on-try experiments are found here:
<https://github.com/jcranmer/m-c-tools-code-coverage>
<https://github.com/jcranmer/mozilla-coverage> (in lieu of LCOV, because it has some massive speed issues, and I find the softer color palettes easier on the eyes in the output files anyways).

There is some issues to this approach (timeouts of tests being a persistent issue, and I don't have all the testsuites hooked up yet as of this comment).
To carry over some of the description from bug 1035464, we want two things:

1 - run coverage on a periodic (nightly?) basis and report the results somewhere people can view them
2 - add the ability to trigger coverage for try runs
(In reply to Jonathan Griffin (:jgriffin) from comment #7)
> 1 - run coverage on a periodic (nightly?) basis and report the results
> somewhere people can view them
If we want to run the whole tests suite, we will need big server(s). On my workstation, it takes about 26 hours.
I think having this on a weekly basis would already be great. If someone wants to check if their own tests cover something, then they could do this on their own workstation as well. But a general coverage "health" report on a weekly basis seems like a good thing to me.
Do the coverage reports need to be generated on the same builder? If not, then you could run this on adapted versions of the existing builders (similar to asan or pgo builds), and combine the results.

There would be some step necessary to combine all the results, which could probably be automated as well (though I'm not sure if releng has got a "all builds complete" flag yet).

The frequency is then an amount of builder time versus developer requirements trade-off, and try runs would also be possible.
Ideally, we could handle this as just another build type and set of tests in buildbot, and spread the load over X slaves.  It depends on the requirements for running the tests (are there any, other than having the correct build?) and whether or not we can recombine the data (comment #1 seems to imply we can).
Depends on: 750347
See Also: → 740277
(In reply to Jonathan Griffin (:jgriffin) from comment #11)
> Ideally, we could handle this as just another build type and set of tests in
> buildbot, and spread the load over X slaves.  It depends on the requirements
> for running the tests (are there any, other than having the correct build?)
> and whether or not we can recombine the data (comment #1 seems to imply we
> can).

The answer depends on the platform, and I only have Linux builds working successfully.

The requirements you definitely need are:
a) Produce a build with a variant of flag
b) Upload some extra data in addition to the build
c) Run a test under slightly different conditions [i.e., you need to set an environment variable on Linux]
d) Upload the results after running a test

The first step is trivial to do as a mozconfig. For my run-on-try experiments, I modify the testsuites slightly to add the necessary environment variable and do the subsequent upload. It's probably that mozharness could be adjusted to do the requisite steps without having to modify mozilla-central code. Certainly, mozharness is much saner because there's probably a single place where it would need to be added rather than the several different testsuites that would each need modification.

Given the results of b) and d), everything else is just a matter of crunching data using some tool (e.g., LCOV or my ccov).

Thinking about non-Linux builds:
1. Clang (i.e., OS X) apparently doesn't schedule __gcov_flush calls, necessitating some small code modifications anyways.
2. Android and b2g are effectively cross-compiled. Building doesn't appear to be a problem, but getting steps c and particularly d implemented may be more annoying, since it looks like the machine that mozharness is executed on isn't the run that has the data in the first place.
3. Windows... is just not set up to do this easily. Doing clang-cl builds for this is probably the best option [see earlier caveat, though.]
> b) Upload some extra data in addition to the build

This extra upload is the coverage report generated by running the tests, right?  Or is there some build product other than the usual that is needed on the test slaves to run the tests?
(In reply to Jonathan Griffin (:jgriffin) from comment #13)
> > b) Upload some extra data in addition to the build
> 
> This extra upload is the coverage report generated by running the tests,
> right?  Or is there some build product other than the usual that is needed
> on the test slaves to run the tests?

The build needs to produce the executable, and it also needs to produce a "notes" file for coverage data. The test runs need to produce the data. Effectively, the coverage data is a gigantic list of counters, and the notes file is a mapping of counters to source locations. Notes aren't needed to run tests {which is why I produce them in a separate upload, no sense in having slaves download some extra 50MB of data}, but it is very necessary for building readable coverage results.
Got it.  In that case, I agree, it wouldn't be too hard to update the mozharness scripts to run in "coverage mode".
So I think the concrete steps needed to get this running periodically in c-i are:

1) Generate new builds that run periodically with a mozconfig that enables code-coverage (either will need to be owned by releng, or I'll need a lot of hand-holding by releng to get it done).
2) Get build job to upload notes file alongside the executable.
3) Add mozharness class for setting up environment for tests to run with code coverage.
4) Upload code coverage data to blobber (trivial).

So once you have a notes file as well as the coverage data, how do you produce human-consumable data? Wouldn't it be better if mozharness also downloaded the notes file and did this step for us, and uploaded the consumable result?
(In reply to Andrew Halberstadt [:ahal] from comment #16)
> 
> So once you have a notes file as well as the coverage data, how do you
> produce human-consumable data? Wouldn't it be better if mozharness also
> downloaded the notes file and did this step for us, and uploaded the
> consumable result?

I guess it depends on whether we want human-consumable data per test job or not.  We definitely want an aggregate, which will have to be processed outside of mozharness somewhere.
Ah, if there's already something outside of mozharness that is consuming the raw data, then yes that would make sense to just upload that.

Another question, based on:

> The answer depends on the platform, and I only have Linux builds working
> successfully.

I guess I should focus on Linux only for now? Or should I get code coverage builds going across all platforms all at once?
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
Depends on: 1049798
(In reply to Andrew Halberstadt [:ahal] from comment #18)
> 
> I guess I should focus on Linux only for now? Or should I get code coverage
> builds going across all platforms all at once?

Let's just target linux64 initially; once that's rolled out we can consider additional platforms.
Depends on: 1051809
Depends on: 1054275
One problem we'll likely run into is test runs that crash/timeout halfway through. We'll need to retrigger them and be smart enough about throwing out results from the crashed/timed out run so we don't count them twice.

We'll also need to be smart enough to discard data from retriggers where the original run *didn't* crash (e.g from sheriffs trying to hunt down an intermittent), again so we don't count the data twice.
I'm not sure this is important; merging data from redundant retriggers should not result in inaccurate code coverage in aggregate, AFAIK, even if some of the runs crashed.
(In reply to Jonathan Griffin (:jgriffin) from comment #21)
> I'm not sure this is important; merging data from redundant retriggers
> should not result in inaccurate code coverage in aggregate, AFAIK, even if
> some of the runs crashed.

If you only care about covered vs. not covered, that is true, but it can certainly mess up the numbers, which are also important.

Furthermore, crashes result in incomplete coverage data being written out. GCov unfortunately writes out its data only via at_exit.
Depends on: 1054641
Depends on: 1056236
Depends on: 1059951
Depends on: 1108638
Depends on: 1110465
Depends on: 1110769
See Also: → 989242
Depends on: 1112314
Depends on: 1164967
Linux64 can now be scheduled on try with the syntax:

try: -b o -p linux64-cc -u all -t none

I haven't gotten much feedback, so going to postpone work on enabling additional platforms until there's a customer willing to use it.
Assignee: ahalberstadt → nobody
Status: ASSIGNED → NEW
There has been recent work to get this build working again in bug 1278402. Not sure if this bug should be duped to that, or if that bug should be a dependency of this.
See Also: → 1278402
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.