Closed Bug 1332917 Opened 7 years ago Closed 7 years ago

Update gcov code coverage flags for the linux64-ccov build.

Categories

(Testing :: Code Coverage, defect)

defect
Not set
normal

Tracking

(firefox55 fixed)

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: sparky, Assigned: u587052)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Recently, some changes have been made to gcov that allow us to use the '--coverage' flag instead. The changes should be made to the code-coverage mozconfig for linux64-ccov.
Blocks: 1301170
Attachment #8829251 - Flags: review?(gmierz2)
Comment on attachment 8829251 [details]
Bug 1332917: Replace linux64-ccov compile/link gcov flags with the more-consistent '--coverage' +560562

https://reviewboard.mozilla.org/r/106368/#review107306

Good stuff! Before we have this landed, let's see if a try run on linux64-ccov with all the test suites currently available still runs as expected.
Attachment #8829251 - Flags: review?(gmierz2) → review+
Interesting...the size of the gcno file dropped from 200kB to 72kB. Maybe this is a side effect of using the --coverage flag because it may be better or it could be we are missing certain parts. I think the next step should be to see if the coverage data is actually good/unchanged. So as one possible step we can check the difference between info files within a single test that uses c++.

Another thing to look at would be the flags themselves. I am finding conflicting statements as to what --coverage replaces. Does it replace "-fprofile-arcs -ftest-coverage" or "-fprofile-arcs -ftest-coverage -lgcov" or just "-lgcov" at times? See [1], & [2].

[1]: http://stackoverflow.com/questions/28840261/gcov-what-is-the-difference-between-coverage-and-ftest-coverage-when-buildi
[2]: https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#Instrumentation-Options
Flags: needinfo?(mitch9654)
Regarding the flags: "--coverage" implies the following, according to the gcc docs [1]:
During compile: "-fprofile-arcs" and "-ftest-coverage"
During link   : "-lgcov"

Now, the "mozconfig" used to have "-fprofile-arcs" and "-ftest-coverage" for its linking flags (LDFLAGS). According to [2], these have no effect: only "-lgcov" affects linking.

Prior to these changes, we've recognized other builds where the ".gcno" file archive is incredibly small. I'm becoming confident that it's a "flaky" symptom. Besides, when re-running this build with the /exact same/ mozconfig, the ".gcno" archive dropped from 200kB to 72kB? If the build system configuration didn't change, but the archive size did, then perhaps the configuration didn't cause the initial drop from ~70mB to 200kB

I'll build locally and investigate what difference there is between the proper ~70mB ".gcno" archive, and the small archive being produced.

[1]: https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#Instrumentation-Options
[2]: http://stackoverflow.com/a/23511191
See this build, by Joel, which doesn't have my changes: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8002f88014f819ce34055db8ecf72c0e1ceb2b4a&selectedJob=68153184
The "target.code-coverage-gcno.zip" is 232kB.

I looked into the 72kB ".gcno" file archive from my build, and it has the following root-level folders:
- browser
- gfx
- ipc
- js
- modules
- security
- toolkit

Joel's 232kB archive has:
- browser
- dom
- ipc
- js
- media
- modules
- security
- toolkit

I think that there's a larger underlying problem with the build system, and I don't believe that my changes are affecting it.
Flags: needinfo?(mitch9654)
this is interesting that you are not getting dom/ or media/ coverage data- are we comparing the same tests?  Do you have a try push that we can examine coverage for identical bits?  Ideally a push with the same commit using |try: -b o -p linux64-ccov -u none -t none|, then apply your changes and try again.

The coverage you get here will be for the make check tests, possibly we could try for cppunit tests or gtests as well?

Another avenue to investigate is looking in more detail at coverage data in a certain root directory, say toolkit.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Just pushed a new build with my changes to "mozbuild".

|try: -b o -p linux64-ccov -u all -t none|
Joel's build (old mozconfig), 232kB: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8002f88014f819ce34055db8ecf72c0e1ceb2b4a&selectedJob=68153184
My build (new mozconfig): https://treeherder.mozilla.org/#/jobs?repo=try&revision=91831e3c22f7ba843c52f33ed9105c1a2fffe335

|try: -b o -p linux64-ccov -u none -t none|
Greg's build (old mozconfig), 159kB: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf323b25dd5e75ef87c9d60fbb9c07c3f1e0ffa9&selectedJob=71529031
My build (new mozconfig): https://treeherder.mozilla.org/#/jobs?repo=try&revision=9bf452937f870f959ee20082f3c48df628559ed8
Ok, so the Treeherder "Similar Jobs" tool's "Same options" flag doesn't quite work as expected, Greg's build from the last comment isn't "-u none".

My build (--coverage): https://treeherder.mozilla.org/#/jobs?repo=try&revision=9bf452937f870f959ee20082f3c48df628559ed8
My build (--fprofile-arcs, etc): https://treeherder.mozilla.org/#/jobs?repo=try&revision=aaba4c16298a67df1d449eb9a774abce1fab64ad
this looks like good progress
The above build that _doesn't_ have my "--coverage" changes also produced a ".gcno" archive that is undersized: 189kB.
Excellent progress so far Mitch!

Another thing that is interesting is that the gcno's have no consistency in size, even for just a build with no tests. One thing to do would be to see if a one-click loaner would give us a gcno with the same size more than once.

Do we know if these small gcno's can still be used to generate good c++ code coverage data or is it bad data?
1.
> One thing to do would be to see if a one-click loaner would give us a gcno with the same size more than once.
Though I do think that this is a worthwhile exercise to figure out why we aren't getting all the data, I don't believe that this problem should block this bug, or stop the patch from landing. Perhaps a new bugzilla ticket should be reported?

2.
> Do we know if these small gcno's can still be used to generate good c++ code coverage data or is it bad data?
Small gcno's will not generate good c++ coverage data, because each gcno file has critical information required for processing coverage data: "It contains information to reconstruct the basic block graphs and assign source line numbers to blocks" [1]. If "lcov" is executed on code coverage data where the gcno archive is incomplete, it will throw an error (like it did on Friday in Calgary, when I was initially running "lcov" to understand how it worked, but the gcno archive we used was incomplete)

[1] https://gcc.gnu.org/onlinedocs/gcc/Gcov-Data-Files.html
Unfortunately, this problem does block this bug from landing because I have yet to see these changes produce gcnos that are actually usable.

With the old mozconfig, at least we have good gcno's being produced even though it's inconsistent. We should certainly investigate this more using this new mozconfig since it consistently breaks unlike the old mozconfig, maybe gcov itself is causing these problems since it was only a change of flags that caused a problem. I think opening a new bug - as a blocker for this bug - would be a good idea.
this is excellent data, I know in the integration branches we do not do clobber builds, so you can think of them as partial builds- I would really like to see this on a test suite (like gtest or cppunittests), not just the build job to determine if the .gcna files are identical.  On try we do full clobber builds, that is the confusing part.

I would like to stress that ensuring we have the exact same base revision is important.  If source files change between comparisons we can have a much different set of data.

One other thought is that the build is done in parallel, possibly this is causing us some confusion or files to be overwritten.

:chris, could you think of ways that when compiling a build with --coverage (or other relate flags) would yield different .gcno files from build to build?  Any ideas on how to validate this?
Flags: needinfo?(cmanchester)
I can see from one of Mitch's try builds with retriggers in comment 10 that our current set of options will produce differently sized gcno archives if retriggered, so I think this is unrelated to that patch at hand, and that this patch should land.

I think I see why we're getting non-deterministic gcno archive sizes, and I think it's due to a race between packaging the archive and compiling some test binaries. I'll push a prospective fix to try.
Flags: needinfo?(cmanchester)
(In reply to Chris Manchester (:chmanchester) from comment #17)
> I can see from one of Mitch's try builds with retriggers in comment 10 that
> our current set of options will produce differently sized gcno archives if
> retriggered, so I think this is unrelated to that patch at hand, and that
> this patch should land.
> 
> I think I see why we're getting non-deterministic gcno archive sizes, and I
> think it's due to a race between packaging the archive and compiling some
> test binaries. I'll push a prospective fix to try.

Aha, so this wasn't due to a race with compiling test binaries, it looks like it was due to a bad interaction with sccache, our distributed compiler cache we use on try. When we pass the coverage flags to gcc it produces an extra output file (gcno) that sccache doesn't known about, so it doesn't become part of the cached result of that compilation call. The larger gcno bundles on try corresponded to cases we were getting more cache misses, so these calls were generating the gcno files locally.

If I add `no_sccache=1` to the top of the coverage mozconfig I see consistent gcno archive sizes.
Thanks for your investigation Chris!

I started a new build with |--coverage| and |no_sccache=1| https://hg.mozilla.org/try/rev/26e504484a5e0a7e692863d23c257ad6f6b7d4df

|try: -b o -p linux64-ccov -u all -t none|
https://treeherder.mozilla.org/#/jobs?repo=try&revision=26e504484a5e0a7e692863d23c257ad6f6b7d4df
Hmm, that build's "gcno" archive is 189kB. I might be missing something. By the way, the diff I linked doesn't show the sccache changes, see it's parent: https://hg.mozilla.org/try/rev/936a235842a9
(The first time I pushed to try with |no_sccache=1|, I improperly specified the "o" in the "try command")
(In reply to Mitchell Hentges [:mitchhentges] from comment #20)
> Hmm, that build's "gcno" archive is 189kB. I might be missing something. By
> the way, the diff I linked doesn't show the sccache changes, see it's
> parent: https://hg.mozilla.org/try/rev/936a235842a9
> (The first time I pushed to try with |no_sccache=1|, I improperly specified
> the "o" in the "try command")

The `no_sccache=1` may need to go on the first line of the mozconfig, it influences things that are included early in the file. At any rate, this investigation should probably be moved to a separate bug. Thanks for looking into this!
20:01 <@ted> mitchhentges: are the code coverage builds in taskcluster?
20:02 <@ted> if so you can just set an attribute on the task to disable sccache
20:02 < jmaher> ted: interesting- they are in taskcluster
20:02 <@ted> we don't currently use it anywhere, but the build task transform sets it by default:
             https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/build.py#22
20:03 < jmaher> linux64-ccov
20:03 < jmaher> ted: would you prefer task hacking vs mozconfig hacking?
20:03 <@ted> jmaher: i think that'd be nicer, yeah
20:03 < jmaher> cool
20:04 <@ted> https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/task.py#135
20:04 <@ted> that's where it lives in the task schema
I'll report the relevant bug and submit a patch tonight
As Chris suggests in comment 17, we can land this change whenever we want now. Bug 1334402 fixes the problem with the gcno sizes and is also ready to go.
Comment on attachment 8829251 [details]
Bug 1332917: Replace linux64-ccov compile/link gcov flags with the more-consistent '--coverage' +560562

https://reviewboard.mozilla.org/r/106368/#review125158
Attachment #8829251 - Flags: review+
did a brief test and found a couple failures:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dd0cdc4741d03a936dd96cb02301b8be6861ff06

the dt1 failure I believe is fixed (not sure if my source tree has the fix or not), but the other two failures are new.  I am not sure why this is the case, but the bc6 and bc7 failures are consistent.

Looking at the base revision used, this is from when the patch was originally created, I have a new patch from the tip of the tree, lets see if this helps out:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a65eee90c6b270acafd8a674f95e4d39b2d1e04
Assignee: nobody → mitch9654
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f9eed66241c3
Replace linux64-ccov compile/link gcov flags with the more-consistent '--coverage' r=gmierz,jmaher+560562
https://hg.mozilla.org/mozilla-central/rev/f9eed66241c3
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: