Closed Bug 1381069 Opened 2 years ago Closed 2 years ago

Remove ccov gcda files after talos browser initialization (production only)

Categories

(Testing :: Code Coverage, enhancement)

Version 3
enhancement
Not set

Tracking

(firefox59 fixed)

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: marco, Assigned: rwood)

References

Details

(Whiteboard: [PI:January])

Attachments

(1 file)

Before running the tests, we want to clear the gcda files that are generated by the initial profile setup run.
Whiteboard: [PI:July]
there are two use cases here:
1) automation
2) locally via mach

in automation we run the mozharness command for talos with --code-coverage.  This then invokes code coverage module which sets environment variables [1]:
        os.environ['GCOV_PREFIX'] = self.gcov_dir
        os.environ['JS_CODE_COVERAGE_OUTPUT_DIR'] = self.jsvm_dir

we should be able to query those environment variables to get the proper directories for discovering and accessing gcda files.

the question becomes what do we do when running via mach.  Mach actually runs mozharness, but it doesn't run mozharness with the --code-coverage flag.  We invoke mozharness with a hand crafted config file [2]

We set extra options to talos via talos_extra_options:
https://dxr.mozilla.org/mozilla-central/source/testing/talos/mach_commands.py#71

Instead of that we want extra options or config variables of code_coverage=1 for mozharness.


[1] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/codecoverage.py#61
[2] https://dxr.mozilla.org/mozilla-central/source/testing/talos/mach_commands.py
Whiteboard: [PI:July] → [PI:August]
Whiteboard: [PI:August] → [PI:September]
Whiteboard: [PI:September] → [PI:October]
Assignee: nobody → rwood
Status: NEW → ASSIGNED
Update: I have a local patch that allows the --code-coverage flag to be passed into mozharness via ./mach talos-test, so that code coverage can be enabled on talos runs initiated locally by mach.

I'm trying to get a local code coverage build to work, so I can test out my patch and then proceed to find and figure out the solution to get rid of the .gcda files.

I'm stuck with my local ccov build on osx. I have enabled ccov in my moz.config via the steps in [1], and have disabled artifact builds - so it does try to build with ccov now, but I am getting this error:

 0:13.29 checking for gcc... /usr/bin/clang -std=gnu99
 0:13.44 checking whether the C compiler (/usr/bin/clang -std=gnu99 --coverage --coverage -lgcov) works... no
 0:13.44 configure: error: installation or configuration problem: C compiler cannot create executables.
 0:13.44 DEBUG: <truncated - see config.log for full output>
 0:13.44 DEBUG:
 0:13.44 DEBUG: configure:867: checking host system type
 0:13.44 DEBUG: configure:888: checking target system type
 0:13.44 DEBUG: configure:906: checking build system type
 0:13.44 DEBUG: configure:2077: checking for gcc
 0:13.44 DEBUG: configure:2190: checking whether the C compiler (/usr/bin/clang -std=gnu99 --coverage --coverage -lgcov) works
 0:13.44 DEBUG: configure:2206: /usr/bin/clang -std=gnu99 -o conftest --coverage  --coverage -lgcov conftest.c  1>&5
 0:13.44 DEBUG: configure:2203:1: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
 0:13.44 DEBUG: main(){return(0);}
 0:13.44 DEBUG: ^
 0:13.45 DEBUG: 1 warning generated.
 0:13.45 DEBUG: ld: library not found for -lgcov
 0:13.45 DEBUG: clang: error: linker command failed with exit code 1 (use -v to see invocation)
 0:13.45 DEBUG: configure: failed program was:
 0:13.45 DEBUG:
 0:13.45 DEBUG: #line 2201 "configure"
 0:13.45 DEBUG: #include "confdefs.h"
 0:13.45 DEBUG:
 0:13.45 DEBUG: main(){return(0);}
 0:13.45 DEBUG: configure: error: installation or configuration problem: C compiler cannot create executables.
 0:13.45 ERROR: old-configure failed
 0:13.51 *** Fix above errors and then restart

:gmierz, any ideas? I've googled but can't seem to find a solution. Somehow I'm guessing I need to install a lgcov library? Any ideas would be appreciated, thanks!

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Testing/Measuring_Code_Coverage_on_Firefox
Flags: needinfo?(gmierz2)
So, we actually haven't gotten OSX to work with code coverage yet and the instructions in [1] are for linux (which I don't think is specified). The difference here is that OSX is running a clang build and linux is running a gcc/g++ (IIRC - I know it's not the same compiler at the least). We were working on getting it to work properly half way through september and this is more or less as far as we got. The last thing we were trying was adding these two commands to the mozconfig:

    export CC=clang

    export CXX=clang++

The other thing we were playing with was removing the -lgcov flag and seeing if firefox built and produced code coverage after a test. So, I suggest trying to remove -lgcov and seeing if you can get coverage. :marco has done a lot of work with clang recently so he may have some better ideas for how to proceed. 

One other thing you can do is get a virtual box setup with a linux distro. Although building and testing on this would be slower. :(
Flags: needinfo?(gmierz2)
I've never tried building a coverage build on OS X, so I don't know whether it works or not. We can build a coverage build with Clang on Linux though (and almost on Windows too), so I would expect it to be possible on OS X.

You can simply remove the "-lgcov" flag, as it is not needed for Clang.
Thank you :marco and :gmierz, I'm still trying to get a ccov build working on osx with your suggestions above, no luck yet. Will update again soon...
So with the latest inbound, I went back to an artifact build locally (with no ccov settings in my mozconfig) an it built successfully. Then took out the artifact build setting and a full build also worked fine.

Then added the following mozconfig options for ccov [1] and now the build fails with [2]. I'm trying to reduce it down to see which ccov option causes the failure.

[1] https://pastebin.mozilla.org/9070335
[2] https://pastebin.mozilla.org/9070334
Flags: needinfo?(mcastelluccio)
marco> rwood: could you try adding #include "mozilla/dom/UnionTypes.h" in TestBindingHeader.h?

Added, and got a bit further, now failing with:

32:35.02 libdom_media_gmp_widevine-adapter.a.desc
32:37.21 In file included from /Users/rwood/mozilla-unified/obj-x86_64-apple-darwin16.7.0/dom/media/gmp/Unified_cpp_dom_media_gmp0.cpp:56:
32:37.21 /Users/rwood/mozilla-unified/dom/media/gmp/GMPChild.cpp:304:26: error: use of undeclared identifier 'GetNativeTarget'
32:37.21   nsCString targetPath = GetNativeTarget(aPath);

Trying to find GetNativeTarget source...
Depends on: 1409747
Update: With the fix in Bug 1409747 (thanks :marco) applied locally to latest inbound, my osx ccov build finishes now - but when I startup Firefox it errors out with:

Roberts-MacBook-Pro-1903:mozilla-unified rwood$ ./mach run
 0:00.69 /Users/rwood/mozilla-unified/obj-x86_64-apple-darwin16.7.0/dist/NightlyDebug.app/Contents/MacOS/firefox -no-remote -foreground -profile /Users/rwood/mozilla-unified/obj-x86_64-apple-darwin16.7.0/tmp/scratch_user
Assertion failure: Storage<T>::initialized(), at /Users/rwood/mozilla-unified/obj-x86_64-apple-darwin16.7.0/dist/include/mozilla/ThreadLocal.h:225
(In reply to Robert Wood [:rwood] from comment #8)
> Update: With the fix in Bug 1409747 (thanks :marco) applied locally to
> latest inbound, my osx ccov build finishes now - but when I startup Firefox
> it errors out with:
> 
> Roberts-MacBook-Pro-1903:mozilla-unified rwood$ ./mach run
>  0:00.69
> /Users/rwood/mozilla-unified/obj-x86_64-apple-darwin16.7.0/dist/NightlyDebug.
> app/Contents/MacOS/firefox -no-remote -foreground -profile
> /Users/rwood/mozilla-unified/obj-x86_64-apple-darwin16.7.0/tmp/scratch_user
> Assertion failure: Storage<T>::initialized(), at
> /Users/rwood/mozilla-unified/obj-x86_64-apple-darwin16.7.0/dist/include/
> mozilla/ThreadLocal.h:225

Maybe it would be easier for you to just use a Linux VM for now, at least until Mac coverage builds are actually supported (you are the first one trying it on Mac :D).
Tomorrow or Friday I'll try building one myself to see if I can reproduce the same assertion failure.
Flags: needinfo?(mcastelluccio)
(In reply to Marco Castelluccio [:marco] from comment #9)
...
> Maybe it would be easier for you to just use a Linux VM for now, at least
> until Mac coverage builds are actually supported (you are the first one
> trying it on Mac :D).
> Tomorrow or Friday I'll try building one myself to see if I can reproduce
> the same assertion failure.

Thanks Marco, yes that's probably the best route at this point. When you get there, if you want me to try anything out with building on osx just let me know.
(In reply to Robert Wood [:rwood] from comment #10)
> (In reply to Marco Castelluccio [:marco] from comment #9)
> ...
> > Maybe it would be easier for you to just use a Linux VM for now, at least
> > until Mac coverage builds are actually supported (you are the first one
> > trying it on Mac :D).
> > Tomorrow or Friday I'll try building one myself to see if I can reproduce
> > the same assertion failure.
> 
> Thanks Marco, yes that's probably the best route at this point. When you get
> there, if you want me to try anything out with building on osx just let me
> know.

Hey Robert, it looks like the build on my machine is working, see bug 1399394 comment 2 for the details.
(In reply to Marco Castelluccio [:marco] from comment #11)
...
> Hey Robert, it looks like the build on my machine is working, see bug
> 1399394 comment 2 for the details.

Thank you Marco, I pulled the latest central and used those mozconfig settings and my ccov build works now yay!
Ok, so I understand the problem in automation, we need to remove the *.gcda files that were created from the talos browser initialization, before the actual talos test runs start. So I can get the path in automation to the *.gcda files from env vars, as noted in comment 1.

However to answer the question in comment 1 about mach:

It looks to me like code coverage isn't supported to run locally anyway. 

To test out a patch to remove the *.gcda files in automation as noted above, I made a local patch to allow --code-coverage to be enabled locally via ./mach talos-test --code-coverage... however then I notice that /mozharness/testing/codecoverage.py is expecting to install on a test machine, not locally. i.e. Line [1] causes talos to fail locally with "Can't figure out build directory urls without an installer_url or test_packages_url!".

So looks to me like code-coverage isn't supported locally anyway, which is a completely different issue. Therefore this bug is only meant to cleanup the *.gcda files in production (and I won't be including a patch to turn on --code-coverage locally via mach talos-test).

[1] http://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/codecoverage.py#74
(In reply to Robert Wood [:rwood] from comment #13)
> Ok, so I understand the problem in automation, we need to remove the *.gcda
> files that were created from the talos browser initialization, before the
> actual talos test runs start. So I can get the path in automation to the
> *.gcda files from env vars, as noted in comment 1.
> 
> However to answer the question in comment 1 about mach:
> 
> It looks to me like code coverage isn't supported to run locally anyway. 
> 
> To test out a patch to remove the *.gcda files in automation as noted above,
> I made a local patch to allow --code-coverage to be enabled locally via
> ./mach talos-test --code-coverage... however then I notice that
> /mozharness/testing/codecoverage.py is expecting to install on a test
> machine, not locally. i.e. Line [1] causes talos to fail locally with "Can't
> figure out build directory urls without an installer_url or
> test_packages_url!".
> 
> So looks to me like code-coverage isn't supported locally anyway, which is a
> completely different issue. Therefore this bug is only meant to cleanup the
> *.gcda files in production (and I won't be including a patch to turn on
> --code-coverage locally via mach talos-test).
> 
> [1]
> http://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/
> mozilla/testing/codecoverage.py#74

Hey Robert, code coverage in general is supported locally. It's just the script in mozharness that isn't, as it was only written to generate the artifacts that are uploaded.
We need a way to remove (or better, to get the path to) the gcda directory when we run tests locally too.
(In reply to Marco Castelluccio [:marco] from comment #14)
...
> Hey Robert, code coverage in general is supported locally. It's just the
> script in mozharness that isn't, as it was only written to generate the
> artifacts that are uploaded.
> We need a way to remove (or better, to get the path to) the gcda directory
> when we run tests locally too.

Thanks Marco.

So code coverage running automatically during talos tests is supported locally?

Alright so obviously I'm not understanding this. To run talos locally, via mach / mozharness, talos calls mozharness/mozilla/testing/codecoverage.py. Which is expecting to only run in automation and not locally.

How is it invoked locally when running talos then? There must be another entry point I am missing? Because it's not supported either in talos/run_tests.py....
(In reply to Robert Wood [:rwood] from comment #15)
> (In reply to Marco Castelluccio [:marco] from comment #14)
> ...
> > Hey Robert, code coverage in general is supported locally. It's just the
> > script in mozharness that isn't, as it was only written to generate the
> > artifacts that are uploaded.
> > We need a way to remove (or better, to get the path to) the gcda directory
> > when we run tests locally too.
> 
> Thanks Marco.
> 
> So code coverage running automatically during talos tests is supported
> locally?
> 
> Alright so obviously I'm not understanding this. To run talos locally, via
> mach / mozharness, talos calls mozharness/mozilla/testing/codecoverage.py.
> Which is expecting to only run in automation and not locally.
> 
> How is it invoked locally when running talos then? There must be another
> entry point I am missing? Because it's not supported either in
> talos/run_tests.py....

Let me rephrase better.

Code coverage is just an instrumentation done by the compiler, so it is independent on which test you run, it works automatically just by executing the browser.

A Firefox build, instrumented with code coverage, will generate gcda files in a directory (by default, the obj directory, otherwise in the path stored in the GCOV_PREFIX environment variable) on exit and/or on demand.

In order to collect meaningful coverage information for talos, since talos runs Firefox once to get some information before actually running the tests, we need to cleanup the gcda files that are generated by Firefox the first time it runs and before it runs the tests. So, we need a way to get the path to the gcda directory in the talos suite (and in the future in other test suites too) both when we run on automation or when we run via mach locally. Note in the future we might need to do other things with the gcda files, and not only remove them, so we need a way to get the path to the directory that contains them.

Now, codecoverage.py is the script which is used to upload coverage artifacts. Maybe it is part of the solution only for the automation case, or maybe making it work locally is part of the solution.
(In reply to Marco Castelluccio [:marco] from comment #16)
...
Ah, thanks for the info, appreciated! Yes I misunderstood, it looked to me like providing the '--code-coverage' flag to mozharness actually meant 'turning on' code coverage. (The help [1] says "help": "Whether gcov c++ code coverage should be run." which in my opinion is misleading and maybe should be changed - really as you noted, the --code-coverage flag just looks for and packages up the gcda data files). However it looks like that mozharness flag also results in the creation of a temp folder which is set to GCOV_PEFIX so the .gcda files are output and expected there.

> "in the future in other test suites too) both when we run on automation or when we run via mach locally. Note in the future we might need to do other things with the gcda files, and not only remove them, so we need a way to get the path to the directory that contains them."

Ok so because of ^, I really don't think this is a talos-specific problem/solution.

I really don't know much about how code coverage works, but one idea I can think of: have the build process itself (when building with the ccov flags set) be what creates and designates/sets the GCOV_PREFIX, and the JS_CODE_COVERAGE_OUTPUT_DIR. Then have the build process write out those values to a .json file. Then in talos, after browser initialization (and before the first test starts), if that .json exists and contains valid folders, talos can search there and delete any found .gcda files.

[1] http://searchfox.org/mozilla-central/rev/d30462037ffea383e74c42542c820cf65b2b144e/testing/mozharness/mozharness/mozilla/testing/codecoverage.py#21
Or perhaps it could all be in mozharness, whether running locally or in automation, but instead of one generic codecoverage.py have one per framework like talos_codecoverage.py etc.
(In reply to Robert Wood [:rwood] from comment #20)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=25c33d57c1a7677cf29bfb7c20914ee84ff5c10d

In this try run on linux x64 ccov opt, my patch successfully found the gcda files (2508 of them) after talos browser initialization but right before the tests start.

In the test job log file [1], search for "Found 2508 gcda files" to see the list of files to be deleted. Next step, I'll update the patch to actually delete the files.

Note: This solution only works on production because of how ccov is implemented in talos on production is not yet reflected for local talos runs. I suggest landing this for production first would be fine. If we want this locally, work has to be added to codecoverage.py in mozharness to create and setup the ccov output dirs, env vars, etc. locally as is done in production. I suggest that should be a separate bug.

:jmaher, :marco, in my try patch I list the name of all the *.gcda files to be deleted. I'd prefer to take that out of the logs and just have one log entry indicating how many were deleted, if that's alright - unless you see the need to have a record in the log of each gcda file name.

[1] https://public-artifacts.taskcluster.net/MHYyMXLJTqGIKIT3J_Tazw/0/public/logs/live_backing.log
Flags: needinfo?(mcastelluccio)
Flags: needinfo?(jmaher)
I think just knowing how many were deleted is adequate!
Flags: needinfo?(jmaher)
Comment on attachment 8922009 [details]
Bug 1381069 - Remove ccov gcda files after talos browser initialization (production only);

https://reviewboard.mozilla.org/r/192990/#review198256

I cannot find any nits, good stuff!
Attachment #8922009 - Flags: review?(jmaher) → review+
Summary: Find a way to access the directory where we store gcda files in the talos test suite → Remove ccov gcda files after talos browser initialization (production only)
:marco, do we really need support for a similar patch locally?

One concern I have is talos deleting all local *.gcda files (from specified ccov folders) after browser initialization locally is that there's potential to mistakenly delete ccov data. If someone runs a bunch of stuff on a ccov instrumented build and then runs a talos test then all previous gcda data would be lost (at least any data written to the same ccov folders).

If it is absolutely required locally i.e. if the data from local talos runs on ccov builds will be used, and needs to have the browser init ccov data removed then I will file a new bug to go ahead with this work. Thanks for your input!
:marco,

Instead of deleting, can we switch to a new directory and fill it up with the net coverage?

I would imagine being able to generate multiple directories of coverage could prove useful for per-test coverage collection.
(In reply to Robert Wood [:rwood] from comment #29)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=705d9ecb0c7bafa9b4dd24b88fdda2289bf8a5c6

Example of what the log looks like when running on production:

[task 2017-10-26T15:37:32.990Z] 15:37:32     INFO -  Browser initialized.
[task 2017-10-26T15:37:32.990Z] 15:37:32     INFO -  Cleaning ccov gcda file before starting the talos tests
[task 2017-10-26T15:37:32.991Z] 15:37:32     INFO -  GCOV_PREFIX: /tmp/tmp6iL7GR
[task 2017-10-26T15:37:32.991Z] 15:37:32     INFO -  JS_CODE_COVERAGE_OUTPUT_DIR: /tmp/tmpXyLjqB
[task 2017-10-26T15:37:32.992Z] 15:37:32     INFO -  Recursive search for gcda files in: /tmp/tmp6iL7GR
[task 2017-10-26T15:37:33.068Z] 15:37:33     INFO -  Recursive search for gcda files in: /tmp/tmpXyLjqB
[task 2017-10-26T15:37:33.069Z] 15:37:33     INFO -  Found 2508 gcda files to be removed. Deleting...
[task 2017-10-26T15:37:33.368Z] 15:37:33     INFO -  Finished cleaning ccov gcda files
[task 2017-10-26T15:37:33.368Z] 15:37:33     INFO -  Running cycle 1/1 for perf_reftest_singletons test...

I'm going to go ahead and land this as the production solution, when the try run finishes.
Hang on, looking over try logs I realized a problem. The code coverage data is packaged up [1] via mozharness at the very end of the entire talos suite run... not in between talos tests. So in jobs like talos 'other' [2] where multiple talos tests are run one after eachother, the existing ccov gcda data from the last test is blown away after each browser initialization. Therefore the packaged ccov data will only have data for the last talos test in that suite.

Is removing the ccov data from talos browser initialization really essential? If so, looks like a different solution will be needed even in production.

As :ekyle mentioned in comment 27, maybe something like copying existing ccov data files to 'archive' folders named after each talos test, and then maybe package that all up at the very end of the test suite/job? If we do that after each test however, that would be alot of files. In this example for the talos 'other' suite, that would be 8 x 2508 (20,064) gcda files total to package up just from this one test job.

[1] http://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/codecoverage.py
[2] https://public-artifacts.taskcluster.net/ElPOwQpaT5aMKm1Z2TqitA/0/public/logs/live_backing.log
(In reply to Robert Wood [:rwood] from comment #26)
> :marco, do we really need support for a similar patch locally?
> 
> One concern I have is talos deleting all local *.gcda files (from specified
> ccov folders) after browser initialization locally is that there's potential
> to mistakenly delete ccov data. If someone runs a bunch of stuff on a ccov
> instrumented build and then runs a talos test then all previous gcda data
> would be lost (at least any data written to the same ccov folders).
> 
> If it is absolutely required locally i.e. if the data from local talos runs
> on ccov builds will be used, and needs to have the browser init ccov data
> removed then I will file a new bug to go ahead with this work. Thanks for
> your input!

Yeah, I think we want to support running the tests with code coverage locally too. We don't want to -always- remove the gcda files, we could add another flag for talos (like "--cleanup-coverage-after-initialization", or something like this).

(In reply to Robert Wood [:rwood] from comment #32)
> Is removing the ccov data from talos browser initialization really essential? If so, looks like a different solution will be needed even in production.

It is needed in some cases, e.g. if you want to collect coverage data for Firefox startup. If you don't clear the old data, you will be collecting coverage data for Firefox startup (with hit counts multiplied by 2) AND shutdown.

(In reply to Kyle Lahnakoski [:ekyle] from comment #27)
> :marco,
> 
> Instead of deleting, can we switch to a new directory and fill it up with
> the net coverage?
> 
> I would imagine being able to generate multiple directories of coverage
> could prove useful for per-test coverage collection.

Sure! My idea for this bug was really just to have a way to get the path to the gcda files, then every test can do what it needs (either delete the files, or copy them somewhere, or nothing).
Flags: needinfo?(mcastelluccio)
(In reply to Marco Castelluccio [:marco] from comment #33)
...
Thanks for the feedback Marco. Having two command line arguments (one for --code-coverage to setup ccov output dirs and package up results, and another something like --clean-ccov-after-init) would work however that leaves open the door for a bit of confusion and also the ability to still have talos production uploading 20,000+ gcda files on every single talos 'other' job run, which I'm concerned is way too many files... We could restrict the '--clean-ccov-after-init' option but adding it still leaves the possibility that it might be turned on down the road inadvertently, etc. Joel, any insight here?

Is there another solution here? Can it be done in reverse? I.e. what if we package up ccov files *after* browser initialization, then only once more after the full suite is done - then could the ccov data from initialization be somehow removed from the full suite ccov data? Just wondering if there are any more solutions, thanks!
Flags: needinfo?(mcastelluccio)
Flags: needinfo?(jmaher)
My initial thoughts here are that we have a default mode when using --code-coverage (which probably does --clean-ccov-after-init); If we are trying to do something else, would it be a one off thing for local testing or random tests on try?  If that is the case, then hacking talos harness seems realistic.

One scenario I see is when a developer wants code coverage on their specific test while debugging items- But I see this as only scheduling the job that is important, not really hacking the harness.

On the topic of what we could be generating, I see something like the talos other job generating artifact here:
1) initialization of profile
2) test run
3) initialization of profile
4) test run
5) initialization of profile
6) test run
7) initialization of profile
8) test run
9) initialization of profile
10) test run
11) initialization of profile
12) test run
13) initialization of profile
14) test run
15) initialization of profile
16) test run

we don't want the odd number data points, we want the even numbers in the list above- ideally a clean artifact for each of the tests (even if we combine them in the future).

could we make the default mode with no cli options to delete the initialization data?

What I don't know is that if each of the tests we run in the job will successfully append to the profile?  I assume we will need to have a set of profile data after each test.
Flags: needinfo?(jmaher)
Whiteboard: [PI:October] → [PI:November]
(In reply to Robert Wood [:rwood] from comment #34)
> Is there another solution here? Can it be done in reverse? I.e. what if we
> package up ccov files *after* browser initialization, then only once more
> after the full suite is done - then could the ccov data from initialization
> be somehow removed from the full suite ccov data? Just wondering if there
> are any more solutions, thanks!

Yes, this is feasible. We would need to parse the initialization data and then subtract it to the full data. It's a little more expensive but in theory feasible.
It would also work for the per-test coverage.
Flags: needinfo?(mcastelluccio)
(In reply to Robert Wood [:rwood] from comment #37)
...
> Review request updated; see interdiff:
> https://reviewboard.mozilla.org/r/192990/diff/3-4/

This update was just a rebase
Ok here's the plan (for production):

1) use the --code-coverage flag as it already is - which causes ccov file folder creation and env vars set; and the upload of ccov data after the entire talos suite is done;

2) by default, remove all found gcda files after talos browser initialization (at the start of each talos test in the suite);

3) by default, when --code-coverage is set, after EACH test in the suite has finished, copy all found gcda files to a ccov subfolder named for the test name;

4) add a new talos flag, set on a per-test basis, "--throw-away-ccov-data" or something like that; if that is set (and --code-coverage is also set) then instead of copying over the resulting gcda files after the test (step 3), delete them; this will be false by default

5) at the end of the entire suite, all found ccov data will be uploaded (as works now via --code-coverage flag) which will include the subfolders for each talos test where the ccov data was saved

For local talos runs against ccov instrumented builds:

- Same behaviour as above, except the uploading of the ccov data won't happen - it will just be found in a local folder, where the user can retrieve it

Sound good? I'll start with my existing patch and expand it.
Flags: needinfo?(mcastelluccio)
Flags: needinfo?(jmaher)
I think this is right, as for:
5) at the end of the entire suite, all found ccov data will be uploaded (as works now via --code-coverage flag) which will include the subfolders for each talos test where the ccov data was saved

would we create a single .zip file with all the subdirs together, or would we:
foreach test in testdirnames:
    os.system('zip -r %s.zip %s/' % (test, test))
    os.move('%s.zip <blobber_upload_dir>' % test)

then we would have a list of .zip files that are gdca- i.e. per test.

Alternatively we could:
zip -r gcda.zip *.zip

which would create a single .zip file- I think moving forward you can work off your plan, and lets see what :marco has to say.
Flags: needinfo?(jmaher)
Thanks :jmaher. Also note, this bug will be for a solution for in production only. When running talos locally on a ccov instrumented build, the steps are not automated yet (i.e. mozharness only supports _set_gcov_prefix, _package_coverage_data, etc. when running in production). It's a separate job to get that working locally first, before having to cleanup the gcda files after init, etc.
The solution looks good to me, except that we probably don't want to generate gcda packages for each test by default. By default, we want to collect gcda files together for all tests.
So I would make "3" optional with another flag. And "4" is probably unneeded, right? Packaging them separately achieves the same result (developers can just download the gcda package they want and disregard the others).
Flags: needinfo?(mcastelluccio)
(In reply to Marco Castelluccio [:marco] from comment #44)
> The solution looks good to me, except that we probably don't want to
> generate gcda packages for each test by default. By default, we want to
> collect gcda files together for all tests.
> So I would make "3" optional with another flag. And "4" is probably
> unneeded, right? Packaging them separately achieves the same result
> (developers can just download the gcda package they want and disregard the
> others).

Thanks for the reply :marco. Let's get this nailed down for certain so I can get going on the code.

It's best to use as *few* flags as possible as we don't want to make it harder for people adding new tests, or for us to figure out how this works a year from now. :) I either want to package all the data together -or- in separate test packages, but I don't want the extra complexity of supporting both / or a combination of those options.

I believe the way to proceed is to choose one of these solutions:

Solution 1

- test suite starts, i.e. suite has test A and test B
- test A starts
- browser init, then talos deletes all existing gcda files
- test A runs and completes
- if '--ignore-ccov-data' is NOT set for test A, then all existing gcda files are copied to "ccovdatafolder"

- test B starts
- browser init, then talos deletes all existing gcda files
- test B runs and completes
- if '--ignore-ccov-data' is NOT set for test B, then all existing gcda files are copied to "ccovdatafolder"

- test suite is finished
- mozharness packages up the 'ccovdatafolder' into a single zip and uploads it

Question: When test B copies the gcda files into the same "ccovdatafolder", won't that overwrite some gcda files collected previously for test A? Is that a concern?


Solution 2

- test suite starts, i.e. suite has test A and test B
- test A starts
- browser init, then talos deletes all existing gcda files
- test A runs and completes
- all existing gcda files are copied to "ccovdatafolder/testA"

- test B starts
- browser init, then talos deletes all existing gcda files
- test B runs and completes
- all existing gcda files are copied to "ccovdatafolder/testB"

- test suite is finished
- mozharness packages up "ccovdatafolder/testA" into one zip, and "ccovdatafolder/testB" into a separate zip
- mozharness uploads both of the zips

Will one of these solutions meet your immediate needs on production? Thanks! :)
Flags: needinfo?(mcastelluccio)
(In reply to Robert Wood [:rwood] from comment #45)
> Thanks for the reply :marco. Let's get this nailed down for certain so I can
> get going on the code.
> 
> It's best to use as *few* flags as possible as we don't want to make it
> harder for people adding new tests, or for us to figure out how this works a
> year from now. :) I either want to package all the data together -or- in
> separate test packages, but I don't want the extra complexity of supporting
> both / or a combination of those options.

The issue is that we are currently using (both on https://github.com/mozilla-releng/services and on ActiveData) the full package.
For most purposes, we need the full package (as we want to calculate the overall coverage).
For some purposes (to gather per-test code coverage), we need per-test packages. This is way more expensive though, so we wouldn't do it on all test runs.
We'd need a single flag to control this behavior.

About throwing away the initialization data, we don't need the option as we always want to throw it away.

> I believe the way to proceed is to choose one of these solutions:
> 
> Solution 1
> 
> - test suite starts, i.e. suite has test A and test B
> - test A starts
> - browser init, then talos deletes all existing gcda files
> - test A runs and completes
> - if '--ignore-ccov-data' is NOT set for test A, then all existing gcda
> files are copied to "ccovdatafolder"
> 
> - test B starts
> - browser init, then talos deletes all existing gcda files
> - test B runs and completes
> - if '--ignore-ccov-data' is NOT set for test B, then all existing gcda
> files are copied to "ccovdatafolder"
> 
> - test suite is finished
> - mozharness packages up the 'ccovdatafolder' into a single zip and uploads
> it
> 
> Question: When test B copies the gcda files into the same "ccovdatafolder",
> won't that overwrite some gcda files collected previously for test A? Is
> that a concern?

Yes, this option isn't feasible because the files would be overwritten.

> Solution 2
> 
> - test suite starts, i.e. suite has test A and test B
> - test A starts
> - browser init, then talos deletes all existing gcda files
> - test A runs and completes
> - all existing gcda files are copied to "ccovdatafolder/testA"
> 
> - test B starts
> - browser init, then talos deletes all existing gcda files
> - test B runs and completes
> - all existing gcda files are copied to "ccovdatafolder/testB"
> 
> - test suite is finished
> - mozharness packages up "ccovdatafolder/testA" into one zip, and
> "ccovdatafolder/testB" into a separate zip
> - mozharness uploads both of the zips
> 
> Will one of these solutions meet your immediate needs on production? Thanks!
> :)

Option 2 sounds better, but I think we will need the full package too (and the full package should be the default).
Flags: needinfo?(mcastelluccio)
Talked with Kyle about this in the office yesterday. One possible solution is to have an option at the JOB level, not an individual test level, to either package up the resulting gcda files in one zip file (mainfolder/testsubfolders) or in multiple zip files (individual testfolders). Either way files will be blown away after each browser init, and copied into the appropriate folder after each test completion. The difference will just be a flag that indicates how the results are packaged up.

By default one package would be created and uploaded, but if the flag is set (i.e. on try) then ccov data would be packaged and uploaded in multiple zip files instead (one per test).

Having this option for the overall suite/job run is fine, but IMO it's too much overhead/complexity to have this option at each individual test level.
(In reply to Robert Wood [:rwood] from comment #47)
...
> Having this option for the overall suite/job run is fine, but IMO it's too
> much overhead/complexity to have this option at each individual test level.

Just to clarify this last sentence, I just mean in a single job/suite run, one test cannot indicate it wants it's ccov data in a separate zip, while another test in the same run/suite indicates it wants a full package for all tests together, resulting in having both options at once. The packaging flag will be at the entire job/suite level so only one option is used for that job run.
Whiteboard: [PI:November] → [PI:December]
:marco, quick question, in between talos tests when gathering the ccov data files, should files be copied into the upload dir (so they can potentially be appended by follwing test(s), or should the ccov files be *moved* over so none exist at the start of the next test(s)?
Flags: needinfo?(mcastelluccio)
(In reply to Robert Wood [:rwood] from comment #49)
> :marco, quick question, in between talos tests when gathering the ccov data
> files, should files be copied into the upload dir (so they can potentially
> be appended by follwing test(s), or should the ccov files be *moved* over so
> none exist at the start of the next test(s)?

They should be moved, otherwise the following test run will not start from counters at 0 but will increase the already existing counters.
The other option is to copy them and send a signal to the Firefox process to reset the counters to 0, but I guess this is harder than the first option.
Flags: needinfo?(mcastelluccio)
(In reply to Marco Castelluccio [:marco] from comment #52)
...
> They should be moved, otherwise the following test run will not start from
> counters at 0 but will increase the already existing counters.

Awesome, move it is, thanks
Hi Marco,

After the ccov files are zipped up they are processed via gcov, by running this command [1].

My question is, when you have multiple zip files (gcda files from each separate test) to process, can I just add the files to end of the command [1]? Or do I need to call the entire command each time for each file? From my brief googling it looks like I can provide all the file names on the one command line.

[1] 
https://searchfox.org/mozilla-central/rev/f0ab0b072ae090f11d6b02b7cf1e0829b4f08882/testing/mozharness/mozharness/mozilla/testing/codecoverage.py#174
Flags: needinfo?(mcastelluccio)
:marco, another question, when packaging up the gcda files do they need to stay in their current directory structure? I am currently just recursively searching for all *.gcda files and copying them into a talos-upload/test-name folder for packaging. Is that sufficient? Or do I need to copy the entire directory structure each time so the structure is replicated under talos-upload/test-name/ with all .gcda files within that? Does it matter?
(In reply to Robert Wood [:rwood] from comment #91)
> Hi Marco,
> 
> After the ccov files are zipped up they are processed via gcov, by running
> this command [1].
> 
> My question is, when you have multiple zip files (gcda files from each
> separate test) to process, can I just add the files to end of the command
> [1]? Or do I need to call the entire command each time for each file? From
> my brief googling it looks like I can provide all the file names on the one
> command line.
> 
> [1] 
> https://searchfox.org/mozilla-central/rev/
> f0ab0b072ae090f11d6b02b7cf1e0829b4f08882/testing/mozharness/mozharness/
> mozilla/testing/codecoverage.py#174

grcov does accept multiple ZIP files, but then it would merge all the coverage information into a single report. So, we'd need to run it once for each gcda package.

(In reply to Robert Wood [:rwood] from comment #95)
> :marco, another question, when packaging up the gcda files do they need to
> stay in their current directory structure? I am currently just recursively
> searching for all *.gcda files and copying them into a
> talos-upload/test-name folder for packaging. Is that sufficient? Or do I
> need to copy the entire directory structure each time so the structure is
> replicated under talos-upload/test-name/ with all .gcda files within that?
> Does it matter?

It shouldn't matter, unless there are two gcda files with the same name (that is, two C/C++ source files with the same name). Maybe to be extra-safe you can copy the directory structure too, or when you move a gcda file add an assertion to check that you didn't already move another file with the same name.
Flags: needinfo?(mcastelluccio)
(In reply to Marco Castelluccio [:marco] from comment #98)
> (In reply to Robert Wood [:rwood] from comment #91)
> > Hi Marco,
> > 
> > After the ccov files are zipped up they are processed via gcov, by running
> > this command [1].
> > 
> > My question is, when you have multiple zip files (gcda files from each
> > separate test) to process, can I just add the files to end of the command
> > [1]? Or do I need to call the entire command each time for each file? From
> > my brief googling it looks like I can provide all the file names on the one
> > command line.
> > 
> > [1] 
> > https://searchfox.org/mozilla-central/rev/
> > f0ab0b072ae090f11d6b02b7cf1e0829b4f08882/testing/mozharness/mozharness/
> > mozilla/testing/codecoverage.py#174
> 
> grcov does accept multiple ZIP files, but then it would merge all the
> coverage information into a single report. So, we'd need to run it once for
> each gcda package.
> 
> (In reply to Robert Wood [:rwood] from comment #95)
> > :marco, another question, when packaging up the gcda files do they need to
> > stay in their current directory structure? I am currently just recursively
> > searching for all *.gcda files and copying them into a
> > talos-upload/test-name folder for packaging. Is that sufficient? Or do I
> > need to copy the entire directory structure each time so the structure is
> > replicated under talos-upload/test-name/ with all .gcda files within that?
> > Does it matter?
> 
> It shouldn't matter, unless there are two gcda files with the same name
> (that is, two C/C++ source files with the same name). Maybe to be extra-safe
> you can copy the directory structure too, or when you move a gcda file add
> an assertion to check that you didn't already move another file with the
> same name.

Thanks for the info! Ok so a bit more work to go, but getting there...
Alright, I've come to the realization that I've gone down a huge code coverage rabbit hole here and this patch has gotten too large. It's crept into adding new code coverage features.

To summarize:

The initial requirement was to delete the gcda ccov files that were generated by talos browser initialization.

Because browser initialization is at the beginning of each talos test, and there are multiple tests per suite, any gcda files from previous tests would be blown away by the browser init in the following test. Therefore, at the end of every talos test, gcda files had to be moved to safe location where they wouldn't be blown away by the next test init.

So now we have one full set of gcda files for each talos test that was run. When packaging those up (mozharness codecoverage.py) by default all gcda data is packaged up into one single zip. However it was requested to have the option to package up the ccov data into multiple zip files (one per talos test).

A couple of problems with packaging up into multiple files based on test:

1) Currently in mozharness, grcov post-processing is performed against the gcda zip file, which in turn makes another zip file for the grcov output. If we have multiple gcda zip files (one per talos test) to start with, then the processing of the gcda files also has to be altered to support multiple gcda zips. That in turn requires support to handle multiple resulting grcov zip files (one per test gcda).

2) Having a flag on try to choose to package up in multiple files gcda's (one per test) would only work with talos. Because of the currently implementation - the packaging of ccov files is done at the end of the suite by mozharness codecoverage.py. However, moving the gcda files (and deleting them after browser init) has to be done in talos itself (ffsetup). Which means any such flag would work for talos, but it wouldn't work for all the other test harnesses that use ccov; because the other test harnesses don't output ccov data separately per test.

To complicate matters, code coverage gathering and packaging up etc. is only done in mozharnness in production, and not supported locally (and not at all on my dev env, OSX). So this patch can't be tested locally, everything has to be tested on try.

I am going to cut my losses here and restrict this patch for talos only, and have it:

1. Delete all gcda files created after talos browser initialization

2. At the end of each talos test, copy all the gcda files into a subfolder in the existing gcov folder, called 'gcda-archive/talos-test-name'

This patch won't alter the packaging or processing of the resulting data (currently single gcda and grcov zips).

Packaging up the ccov data and processing it via grcov is a separate issue and is done in mozharness, and effects every test harness that uses ccov, not just talos. If feature improvements are to be done with packaging and processing the ccov data (i.e. packaging it up and processing it based on test) then additional bugs should be filed for that outside of talos.
Attachment #8922009 - Attachment is obsolete: true
Attachment #8922009 - Flags: review+
Whiteboard: [PI:December] → [PI:January]
Comment on attachment 8922009 [details]
Bug 1381069 - Remove ccov gcda files after talos browser initialization (production only);

Alright patch is ready, as noted in comment 106, this patch will:

1. Delete all gcda files created after talos browser initialization

2. At the end of each talos test, copy all the gcda files into a subfolder in the existing gcov folder, called 'gcda-archive/talos-test-name'

This applies to ccov running in production only, not locally. On the latest try run link [1] you'll see in the ccov gcda zip artifact, all the gcdas were collected after each talos test was finished and can be found in the /gcda-archive/tresize and /gcda-archive/tcanvasmark in that example.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=d4eb1fe552743aefd2331e56906531ad58c6cd19&filter-tier=1&filter-tier=2&filter-tier=3&selectedJob=154191651
Attachment #8922009 - Flags: review?(jmaher)
Comment on attachment 8922009 [details]
Bug 1381069 - Remove ccov gcda files after talos browser initialization (production only);

https://reviewboard.mozilla.org/r/192990/#review215968

a few small nits, overall this is looking good:)

::: testing/talos/talos/ffsetup.py:207
(Diff revision 28)
> +                LOG.info("Collecting ccov files that were generated during the talos test")
> +            gcov_prefix = os.getenv('GCOV_PREFIX', None)
> +            js_ccov_dir = os.getenv('JS_CODE_COVERAGE_OUTPUT_DIR', None)
> +            gcda_archive_folder_name = 'gcda-archive'
> +            LOG.info("GCOV_PREFIX: %s" % gcov_prefix)
> +            LOG.info("JS_CODE_COVERAGE_OUTPUT_DIR: %s" % js_ccov_dir)

I am not sure these two log.info lines are useful for production, although do leave them in if you find them useful

::: testing/talos/talos/ffsetup.py:224
(Diff revision 28)
> +                                if next_file.endswith(".gcda"):
> +                                    # don't want to move or delete files in our 'gcda-archive'
> +                                    if root.find(gcda_archive_folder_name) == -1:
> +                                        _gcda_files_found.append(os.path.join(root, next_file))
> +                    else:
> +                        LOG.info("Path specified by ccov env var doesn't exist")

I would add the path to this LOG.info() statement

::: testing/talos/talos/ffsetup.py:269
(Diff revision 28)
> +                    # now copy the file there
> +                    try:
> +                        shutil.copy(_gcda, gcda_archive_dest)
> +                    except Exception as e:
> +                        LOG.info("Error copying %s to %s" % (str(_gcda), str(gcda_archive_dest)))
> +                        LOG.info(e)

if we have an error copying, would that invalidate the rest of the data?  Probably not, but we could have misleading data when only looking at one data set.
Attachment #8922009 - Flags: review?(jmaher) → review+
Comment on attachment 8922009 [details]
Bug 1381069 - Remove ccov gcda files after talos browser initialization (production only);

https://reviewboard.mozilla.org/r/192990/#review215968

Thanks for the review! Nits addressed

> if we have an error copying, would that invalidate the rest of the data?  Probably not, but we could have misleading data when only looking at one data set.

Yes true, not really sure how to handle that, I don't think it's worth failing out of the job if one or more ccov data file copies fails. I think we can leave it for now but if this becomes an issue we can revisit.
thanks, land away when you are ready!
(In reply to Robert Wood [:rwood] from comment #127)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=5579cb1bbcda9403fe14ef647ff329ed28e75562

There are some job failures there but I don't believe they're related to this patch. Here's a try run of same parent rev but without my changes:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=37ce70f2ddb23714fd97cbcd9da16a37083df301
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5c67e70c3a3e
Remove ccov gcda files after talos browser initialization (production only); r=jmaher
https://hg.mozilla.org/mozilla-central/rev/5c67e70c3a3e
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
You need to log in before you can comment on or make changes to this bug.