890116 - [Tracking] Stand up Code Coverage build and test runs

Reporter

Description

•

11 years ago

We would like to turn on code coverage [1] for our tests on an intermittent basis.

When Decoder and Cranmer did this on Try they dumped coverage data into the logs directly. That's not scalable or easy to handle in an ongoing basis. We need to upload these coverage logs to some place where they can be picked up and analyzed.

That's the first step in turning these on automatically. The upload work is going to be handled in bug 749421.

Next we'll need a buildbot build configured and scheduled to run the coverage builds and tests. Note that the coverage build should *NOT* run performance tests. Note that this should also run on a once-a-week schedule at first, potentially as often as once-a-day, but I'm open to allowing capacity issues to decide how often we can support these and on what branches. My first ask is once-a-week on mozilla-central.

I don't think a dashboard for viewing results should be part of this at this time. Simply having the data will allow interested parties and the community to build dashboards and we should encourage that.

[1] https://developer.mozilla.org/en-US/docs/Measuring_Code_Coverage_on_Firefox

bhearsum@mozilla.com (:bhearsum)

Updated

•

11 years ago

Component: Release Engineering → Release Engineering: Automation (General)

QA Contact: catlee

Nobody; OK to take it and work on it

Assignee

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

Joshua Cranmer [:jcranmer]

Comment 1

•

11 years ago

Some brief summaries on how to collect code coverage:

Linux/Linux64:
1. CFLAGS, CXXFLAGS need -fprofile-arcs -ftest-coverage
2. LDFLAGS needs -fprofile-arcs -ftest-coverage -lgcov
3. Upload all .gcno files from the build dir to a package somehow. Directory structure needs preservation.
4. Use GCOV_PREFIX when executing tests to put the gcda files in a specific directory.
5. The scripts to recombine the data need to make the directory structures of the .gcno and .gcda agree with each other. Not too hard.

OS X:
1. Same as Linux
2. LDFLAGS needs -fprofile-arcs -ftest-coverage -lprofile_rt [need compiler-rt when you build Clang/LLVM]
3. Same as Linux
4. Same as Linux
5. Other note: gcov on Linux will barf on LLVM's inputs. OS X's gcov may be able to rationalize this information better; I don't have an OS X system to test.

Windows:
Good documentation here doesn't exist. Also, every version of MSVC seems to like doing this somewhat differently :-(. Reports from the interwebs suggest that we need to run every exe and dll through vsinstr /coverage (and they need to link with -profile according to my short test), and we need to start/stop vsperfmon before and after the tests. The output .coverage file from vsperfmon is apparently not easily portable, and there is no command-line tool to turn that into a more-readable .xml file :-(

Gonk/Android:
In theory, this should be the same as Linux. I've never attempted to try these platforms, so cross-compiling may need some more tweaking to get working output.

Sylvestre Ledru [:Sylvestre]

Comment 2

•

10 years ago

I am running the coverage for GNU/Linux 64 bits. Results are here:
http://people.mozilla.org/~sledru/reports/coverage/

and managed by a Jenkins instance.

I am running these tests:
./mach mochitest-plain
./mach mochitest-a11y
./mach mochitest-browser
./mach mochitest-chrome
./mach mochitest-devtools
./mach reftest
./mach check-spidermonkey
./mach xpcshell-test
./mach cppunittest
./mach crashtest
./mach crashtest-ipc
./mach gtest
./mach jetpack-test
./mach python-test
./mach reftest-ipc
./mach talos-test
./mach webidl-parser-test

And removing the coverage of the tests them self.

Sylvestre Ledru [:Sylvestre]

Comment 3

•

10 years ago

I am trying to run the coverage on Mac OS X.
--coverage is replacing -lprofile_rt
However, with XCode 5.1.1 (last release), gcda files are not generated (something to do with the gcov flush)
http://qualitycoding.org/ios-7-code-coverage/

Sylvestre Ledru [:Sylvestre]

Comment 4

•

10 years ago

Joshua has produced a similar report: http://www.tjhsst.edu/~jcranmer/m-ccov/

Joshua Cranmer [:jcranmer]

Comment 5

•

10 years ago

The tools I've been using for my code coverage-on-try experiments are found here:
<https://github.com/jcranmer/m-c-tools-code-coverage>
<https://github.com/jcranmer/mozilla-coverage> (in lieu of LCOV, because it has some massive speed issues, and I find the softer color palettes easier on the eyes in the output files anyways).

There is some issues to this approach (timeouts of tests being a persistent issue, and I don't have all the testsuites hooked up yet as of this comment).

Jonathan Griffin (:jgriffin)

Comment 7

•

10 years ago

To carry over some of the description from bug 1035464, we want two things:

1 - run coverage on a periodic (nightly?) basis and report the results somewhere people can view them
2 - add the ability to trigger coverage for try runs

Sylvestre Ledru [:Sylvestre]

Comment 8

•

10 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #7)
> 1 - run coverage on a periodic (nightly?) basis and report the results
> somewhere people can view them
If we want to run the whole tests suite, we will need big server(s). On my workstation, it takes about 26 hours.

Christian Holler (:decoder)

Comment 9

•

10 years ago

I think having this on a weekly basis would already be great. If someone wants to check if their own tests cover something, then they could do this on their own workstation as well. But a general coverage "health" report on a weekly basis seems like a good thing to me.

Mark Banner (:standard8)

Comment 10

•

10 years ago

Do the coverage reports need to be generated on the same builder? If not, then you could run this on adapted versions of the existing builders (similar to asan or pgo builds), and combine the results.

There would be some step necessary to combine all the results, which could probably be automated as well (though I'm not sure if releng has got a "all builds complete" flag yet).

The frequency is then an amount of builder time versus developer requirements trade-off, and try runs would also be possible.

Jonathan Griffin (:jgriffin)

Comment 11

•

10 years ago

Ideally, we could handle this as just another build type and set of tests in buildbot, and spread the load over X slaves.  It depends on the requirements for running the tests (are there any, other than having the correct build?) and whether or not we can recombine the data (comment #1 seems to imply we can).

Sylvestre Ledru [:Sylvestre]

Updated

•

10 years ago

Depends on: 750347

Comment 12

•

10 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #11)
> Ideally, we could handle this as just another build type and set of tests in
> buildbot, and spread the load over X slaves.  It depends on the requirements
> for running the tests (are there any, other than having the correct build?)
> and whether or not we can recombine the data (comment #1 seems to imply we
> can).

The answer depends on the platform, and I only have Linux builds working successfully.

The requirements you definitely need are:
a) Produce a build with a variant of flag
b) Upload some extra data in addition to the build
c) Run a test under slightly different conditions [i.e., you need to set an environment variable on Linux]
d) Upload the results after running a test

The first step is trivial to do as a mozconfig. For my run-on-try experiments, I modify the testsuites slightly to add the necessary environment variable and do the subsequent upload. It's probably that mozharness could be adjusted to do the requisite steps without having to modify mozilla-central code. Certainly, mozharness is much saner because there's probably a single place where it would need to be added rather than the several different testsuites that would each need modification.

Given the results of b) and d), everything else is just a matter of crunching data using some tool (e.g., LCOV or my ccov).

Thinking about non-Linux builds:
1. Clang (i.e., OS X) apparently doesn't schedule __gcov_flush calls, necessitating some small code modifications anyways.
2. Android and b2g are effectively cross-compiled. Building doesn't appear to be a problem, but getting steps c and particularly d implemented may be more annoying, since it looks like the machine that mozharness is executed on isn't the run that has the data in the first place.
3. Windows... is just not set up to do this easily. Doing clang-cl builds for this is probably the best option [see earlier caveat, though.]

Jonathan Griffin (:jgriffin)

Comment 13

•

10 years ago

> b) Upload some extra data in addition to the build

This extra upload is the coverage report generated by running the tests, right?  Or is there some build product other than the usual that is needed on the test slaves to run the tests?

Joshua Cranmer [:jcranmer]

Comment 14

•

10 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #13)
> > b) Upload some extra data in addition to the build
> 
> This extra upload is the coverage report generated by running the tests,
> right?  Or is there some build product other than the usual that is needed
> on the test slaves to run the tests?

The build needs to produce the executable, and it also needs to produce a "notes" file for coverage data. The test runs need to produce the data. Effectively, the coverage data is a gigantic list of counters, and the notes file is a mapping of counters to source locations. Notes aren't needed to run tests {which is why I produce them in a separate upload, no sense in having slaves download some extra 50MB of data}, but it is very necessary for building readable coverage results.

Jonathan Griffin (:jgriffin)

Comment 15

•

10 years ago

Got it.  In that case, I agree, it wouldn't be too hard to update the mozharness scripts to run in "coverage mode".

Andrew Halberstadt [:ahal]

Comment 16

•

10 years ago

So I think the concrete steps needed to get this running periodically in c-i are:

1) Generate new builds that run periodically with a mozconfig that enables code-coverage (either will need to be owned by releng, or I'll need a lot of hand-holding by releng to get it done).
2) Get build job to upload notes file alongside the executable.
3) Add mozharness class for setting up environment for tests to run with code coverage.
4) Upload code coverage data to blobber (trivial).

So once you have a notes file as well as the coverage data, how do you produce human-consumable data? Wouldn't it be better if mozharness also downloaded the notes file and did this step for us, and uploaded the consumable result?

Jonathan Griffin (:jgriffin)

Comment 17

•

10 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #16)
> 
> So once you have a notes file as well as the coverage data, how do you
> produce human-consumable data? Wouldn't it be better if mozharness also
> downloaded the notes file and did this step for us, and uploaded the
> consumable result?

I guess it depends on whether we want human-consumable data per test job or not.  We definitely want an aggregate, which will have to be processed outside of mozharness somewhere.

Andrew Halberstadt [:ahal]

Comment 18

•

10 years ago

Ah, if there's already something outside of mozharness that is consuming the raw data, then yes that would make sense to just upload that.

Another question, based on:

> The answer depends on the platform, and I only have Linux builds working
> successfully.

I guess I should focus on Linux only for now? Or should I get code coverage builds going across all platforms all at once?

Andrew Halberstadt [:ahal]

Updated

•

10 years ago

Assignee: nobody → ahalberstadt

Status: NEW → ASSIGNED

Andrew Halberstadt [:ahal]

Updated

•

10 years ago

Depends on: 1049798

Jonathan Griffin (:jgriffin)

Comment 19

•

10 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #18)
> 
> I guess I should focus on Linux only for now? Or should I get code coverage
> builds going across all platforms all at once?

Let's just target linux64 initially; once that's rolled out we can consider additional platforms.

Andrew Halberstadt [:ahal]

Updated

•

10 years ago

Depends on: 1051809

Andrew Halberstadt [:ahal]

Updated

•

10 years ago

Depends on: 1054275

Andrew Halberstadt [:ahal]

Comment 20

•

10 years ago

One problem we'll likely run into is test runs that crash/timeout halfway through. We'll need to retrigger them and be smart enough about throwing out results from the crashed/timed out run so we don't count them twice.

We'll also need to be smart enough to discard data from retriggers where the original run *didn't* crash (e.g from sheriffs trying to hunt down an intermittent), again so we don't count the data twice.

Jonathan Griffin (:jgriffin)

Comment 21

•

10 years ago

I'm not sure this is important; merging data from redundant retriggers should not result in inaccurate code coverage in aggregate, AFAIK, even if some of the runs crashed.

Christian Holler (:decoder)

Comment 22

•

10 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #21)
> I'm not sure this is important; merging data from redundant retriggers
> should not result in inaccurate code coverage in aggregate, AFAIK, even if
> some of the runs crashed.

If you only care about covered vs. not covered, that is true, but it can certainly mess up the numbers, which are also important.

Furthermore, crashes result in incomplete coverage data being written out. GCov unfortunately writes out its data only via at_exit.

Andrew Halberstadt [:ahal]

Updated

•

10 years ago

Depends on: 1054641

Sylvestre Ledru [:Sylvestre]

Updated

•

10 years ago

Blocks: 1054984

Andrew Halberstadt [:ahal]

Updated

•

10 years ago

Depends on: 1056236

Andrew Halberstadt [:ahal]

Updated

•

10 years ago

Depends on: 1059951

Andrew Halberstadt [:ahal]

Updated

•

10 years ago

Depends on: 1108638

Andrew Halberstadt [:ahal]

Updated

•

10 years ago

Depends on: 1110465

Andrew Halberstadt [:ahal]

Updated

•

10 years ago

Depends on: 1110769

Mark Banner (:standard8)

Updated

•

10 years ago

Updated

•

10 years ago

Depends on: 1112314

Andrew Halberstadt [:ahal]

Updated

•

9 years ago

Depends on: 1164967

Andrew Halberstadt [:ahal]

Comment 23

•

9 years ago

Linux64 can now be scheduled on try with the syntax:

try: -b o -p linux64-cc -u all -t none

I haven't gotten much feedback, so going to postpone work on enabling additional platforms until there's a customer willing to use it.

Assignee: ahalberstadt → nobody

Status: ASSIGNED → NEW

Chris Manchester (limited bugmail, email directly)

Updated

•

9 years ago

Depends on: 1229602

Andrew Halberstadt [:ahal]

Comment 24

•

8 years ago

There has been recent work to get this build working again in bug 1278402. Not sure if this bug should be duped to that, or if that bug should be a dependency of this.

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → DUPLICATE

Nobody; OK to take it and work on it

Assignee

Updated

•

6 years ago

Component: General Automation → General