Closed Bug 1487277 Opened Last year Closed Last year

Prevent telemetry from running during tests

Categories

(Testing :: Code Coverage, enhancement)

Version 3
enhancement
Not set

Tracking

(firefox64 fixed)

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: sparky, Assigned: sparky, NeedInfo)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

This bug is to discuss preventing telemetry from running during tests as it can be a source of intermittent failures, and code coverage variability. See bug 1473626 for more details.
The attached JSON contains URL's for each of the lines that are different between various per-test-coverage reports. It has all the left over variability after running test_coverage_specialpowers.html with a wait time of 3 minutes, and with baseline correction. Using coco-tool's I used the following command to convert a JSON output by using `--differences` in the `pertestcoverage_variability.py` tool:

`py pertestcoverage_json2urls.py ~/line_level_differences_1535772622.json -r 937a9f858a97 -b try --differences`

Here's an interesting entry I found:
https://hg.mozilla.org/try/annotate/937a9f858a97/js/src/gc/Statistics.cpp#l1124

It looks like it's related to telemetry as well, but I'm not sure if we could disable this one - it looks possible but maybe it's required for GC. Who could we ask about it?

Overview of attachment:
-----------------------
All `.h` files in the attached JSON should be ignored as they are very buggy - the upgrade to GCC7 might fix them.

There are lots of things left that are outside of our control
1. https://hg.mozilla.org/try/annotate/937a9f858a97/js/src/gc/GC.cpp#l5939
2. https://hg.mozilla.org/try/annotate/937a9f858a97/js/src/vm/JSAtom.cpp#l544
3. https://hg.mozilla.org/try/annotate/937a9f858a97/js/src/vm/TypedArrayObject.cpp#l196
4. https://hg.mozilla.org/try/annotate/937a9f858a97/startupcache/StartupCache.cpp#l494
5. https://hg.mozilla.org/try/annotate/937a9f858a97/xpcom/base/nsCycleCollector.cpp#l3284
6. https://hg.mozilla.org/try/annotate/937a9f858a97/dom/workers/WorkerPrivate.cpp#l2743 (destructor call)
7. https://hg.mozilla.org/try/annotate/937a9f858a97/toolkit/components/telemetry/TelemetryScalar.cpp#l2855

Debug-only variability which cannot be controlled:
1. https://hg.mozilla.org/try/annotate/937a9f858a97/startupcache/StartupCache.cpp#l326
------------------------

The entries mentioned above are all that is left in the variability for this test. So, based on this, I think we've eliminated most, if not all, of the most problematic variability by simply disabling telemetry.

Now I've started looking into file-level variability as well because it seems to have increased but so far, it looks like most of the files are things we could ignore though.

Here's the try run I am using for this analysis: https://treeherder.mozilla.org/#/jobs?repo=try&revision=02755346e2d6932972eba840416a5f569f962850
(In reply to Greg Mierzwinski [:sparky] from comment #1)
> Here's an interesting entry I found:
> https://hg.mozilla.org/try/annotate/937a9f858a97/js/src/gc/Statistics.
> cpp#l1124
> 
> It looks like it's related to telemetry as well, but I'm not sure if we
> could disable this one - it looks possible but maybe it's required for GC.
> Who could we ask about it?

Maybe :jonco knows if we can (and want) to disable the gathering of statistics during tests.
Flags: needinfo?(jcoppeard)
(In reply to Greg Mierzwinski [:sparky] from comment #0)
> This bug is to discuss preventing telemetry from running during tests as it
> can be a source of intermittent failures, and code coverage variability. See
> bug 1473626 for more details.

Out of interest, why does this cause failures?  

I think we need to make sure the telemetry gathering code is exercised during tests, but I don't see the point of actually recording telemetry from tests.

I suspect that what's happening here though is that GC is being triggered by a timer and then generating telemetry data.  Are you saying that we should disable timer-based GCs in coverage builds?  That *might* expose problems but should probably be OK.  Or are you saying we should not record GC telemetry in test builds?
Flags: needinfo?(jcoppeard)
:jonco we are trying to collect accurate code coverage relevant to a specific test instead of getting random code showing up and worse case much different sets of code related to any specific test case which makes comparison across revisions near impossible.
(In reply to Jon Coppeard (:jonco) from comment #3)
> (In reply to Greg Mierzwinski [:sparky] from comment #0)
> > This bug is to discuss preventing telemetry from running during tests as it
> > can be a source of intermittent failures, and code coverage variability. See
> > bug 1473626 for more details.
> 
> Out of interest, why does this cause failures?

The Telemetry code can be triggered randomly at any point during a test, causing timeouts or other sorts of issues.
E.g. while running a test for the opening of the awesomebar, we shouldn't attempt to collect Telemetry on WebGL and attempt to send it.

> I think we need to make sure the telemetry gathering code is exercised
> during tests, but I don't see the point of actually recording telemetry from
> tests.
> 
> I suspect that what's happening here though is that GC is being triggered by
> a timer and then generating telemetry data.  Are you saying that we should
> disable timer-based GCs in coverage builds?  That *might* expose problems
> but should probably be OK.  Or are you saying we should not record GC
> telemetry in test builds?

My thinking is that the Telemetry code should be tested explicitly with proper tests, and not by chance by other unrelated tests.
I agree the Telemetry gathering code should not be disabled as it is part of the code under test.
What we want to disable is periodic Telemetry gathering and reporting which is unrelated to the tests that are being executed.
(In reply to Jon Coppeard (:jonco) from comment #3)
> (In reply to Greg Mierzwinski [:sparky] from comment #0)
> > This bug is to discuss preventing telemetry from running during tests as it
> > can be a source of intermittent failures, and code coverage variability. See
> > bug 1473626 for more details.
> 
> Out of interest, why does this cause failures?  

Comment 5 by :marco explains why this can be problematic.

> I suspect that what's happening here though is that GC is being triggered by
> a timer and then generating telemetry data.  Are you saying that we should
> disable timer-based GCs in coverage builds?  That *might* expose problems
> but should probably be OK.  Or are you saying we should not record GC
> telemetry in test builds?

As :marco states in comment 5, we want to disable periodic Telemetry gathering and reporting (anything that gathers and/or reports telemetry or statistics using a timer). The line of code at [1] (that was hit in testing) is what we are wondering about as it seems like it could be related to telemetry but through a different timer than the ones we've tested.

Do you think we should disable gathering these GC statistics during coverage builds or would it cause problems if we try? And if we should, do you know if there is already a simple way of skipping these GC statistics/telemetry measurements?


[1]: https://hg.mozilla.org/try/annotate/937a9f858a97/js/src/gc/Statistics.cpp#l1124
[2]: https://hg.mozilla.org/try/annotate/937a9f858a97/js/src/gc/Statistics.cpp#l1132
Flags: needinfo?(jcoppeard)
I've spoken with jonco, GC is similar to Telemetry in the sense that it could run at any point in time during tests, making tests not exactly deterministic and possibly causing intermittent failures. Differently from Telemetry though, the intermittent failures are actually real problems in GC that have to be fixed.

We could in theory disable running GC on timers for coverage builds, but it would mean making the coverage build different than other builds. I'd like to keep them as similar as possible so I would prefer not to disable anything that we can't disable in all builds.

Let's focus on disabling Telemetry, which we can do in all builds, for now, and then we'll re-evaluate GC later on if needed.
Flags: needinfo?(jcoppeard)
can we identify which code is executed by GC, or is that sort of random based on which components GC affects in the current pass?  per test coverage will not be useful if we are telling developers one day their test is covering 7000 lines of code and the next day only 4000 lines of code with no code changed.

We could do...
* start browser
* force gc/cc
* reset counters
* run test
* save counters
* shutdown browser

that would reduce the chance of a gc happening and if we can query the results and look for specific areas of code that indicate a GC happened during the test we could flag it as garbage (oh the irony)
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #8)
> can we identify which code is executed by GC, or is that sort of random
> based on which components GC affects in the current pass?  per test coverage
> will not be useful if we are telling developers one day their test is
> covering 7000 lines of code and the next day only 4000 lines of code with no
> code changed.
> 
> We could do...
> * start browser
> * force gc/cc
> * reset counters
> * run test
> * save counters
> * shutdown browser
> 
> that would reduce the chance of a gc happening and if we can query the
> results and look for specific areas of code that indicate a GC happened
> during the test we could flag it as garbage (oh the irony)

Let's file a separate bug for investigating this, as it doesn't have to block landing the disablement of Telemetry.
If we actually find GC to be a real problem blocking per-test coverage, we can disable running it on timers for coverage builds.
Here's a try run for all linux and windows tests with telemetry disabled: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffd6c93dce436903500becd3eb43040404a5fca3

I am also testing ccov here (it doesn't look like the mac failure is caused by this patch): https://treeherder.mozilla.org/#/jobs?repo=try&revision=62c4c58af9cfaebd1d8820d7be3f97093436903f

In the linux and windows tests, I've retriggered those that look like they are new bugs:
1. https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffd6c93dce436903500becd3eb43040404a5fca3&selectedJob=198195740&filter-searchStr=mch
2. https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffd6c93dce436903500becd3eb43040404a5fca3&filter-searchStr=Wd
3. (some are new) https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffd6c93dce436903500becd3eb43040404a5fca3&filter-searchStr=mda&selectedJob=197929106
4. https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffd6c93dce436903500becd3eb43040404a5fca3&filter-searchStr=en-US
5. https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffd6c93dce436903500becd3eb43040404a5fca3&selectedJob=197929913&filter-searchStr=windows7%20bc2
6. https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffd6c93dce436903500becd3eb43040404a5fca3&filter-searchStr=dt3
7. https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffd6c93dce436903500becd3eb43040404a5fca3&filter-searchStr=R4
8. https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffd6c93dce436903500becd3eb43040404a5fca3&filter-searchStr=bc6

:marco, what do you think of these? Are those the failures you were referring to when we spoke on IRC? I think some of them may be related to the changes I made.
(In reply to Greg Mierzwinski [:sparky] from comment #10)
> :marco, what do you think of these? Are those the failures you were
> referring to when we spoke on IRC? I think some of them may be related to
> the changes I made.

I see they all passed when you retriggered, so maybe they are simply intermittents?
The Mac issue is bug 1490247.
(In reply to Marco Castelluccio [:marco] from comment #11)
> (In reply to Greg Mierzwinski [:sparky] from comment #10)
> > :marco, what do you think of these? Are those the failures you were
> > referring to when we spoke on IRC? I think some of them may be related to
> > the changes I made.
> 
> I see they all passed when you retriggered, so maybe they are simply
> intermittents?

Yes, they are mostly new intermittents though. Would they cause this patch to be backed out?
This patch sets the preference 'toolkit.telemetry.initDelay' to a very high value in an attempt to prevent telemetry from intermittently gathering and reporting data.
Comment on attachment 9008052 [details]
Bug 1487277 - Prevent intermittent telemetry gathering and reporting. r?marco

Marco Castelluccio [:marco] has approved the revision.
Attachment #9008052 - Flags: review+
Comment on attachment 9008052 [details]
Bug 1487277 - Prevent intermittent telemetry gathering and reporting. r?marco

Andrew Halberstadt [:ahal] has approved the revision.
Attachment #9008052 - Flags: review+
Hi :chutten, we spoke in the past about trying to find a way to prevent as much telemetry as possible from running while testing and we found that by changing the 'telemetry.toolkit.initDelay' preference to a very high value, we were able to stop most of it.

We'd like to add this preference change to all tests (everything but raptor), and we are wondering what your thoughts are on this? Is it fine with you to do the switch?

Please include anyone you think should be involved in this discussion.
Flags: needinfo?(chutten)
I don't know what raptor is, but I have only mild concerns about such a change. Did you find that tests which test data collection (like, for instance [1]) still pass successfully even without Telemetry init?

Does this satisfy the goal of reducing intermittent failures due to Telemetry deciding to perform work sometimes?

ni?gfritzsche and Dexter for historical knowledge. ni?janerik for modern knowledge.

[1]: https://searchfox.org/mozilla-central/rev/de7676288a78b70d2b9927c79493adbf294faad5/browser/base/content/test/about/browser_aboutHome_search_telemetry.js#30
Flags: needinfo?(jrediger)
Flags: needinfo?(gfritzsche)
Flags: needinfo?(chutten)
Flags: needinfo?(alessio.placitelli)
Yes, that test passes in this chunk on linux64-opt: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffd6c93dce436903500becd3eb43040404a5fca3&selectedJob=197923101&searchStr=linux64%2Fopt-mochitest-browser-chrome-e10s-6

Comment 10 has the try runs we did, along with all the failures that looked new to me - I didn't notice any telemetry specific failures there.

We have no evidence yet to say it will definitely reduce intermittent failure rates. But we think it will as it's used in code coverage intermittently so it's possible it causes some intermittent failures.
(In reply to Greg Mierzwinski [:sparky] from comment #17)
> We'd like to add this preference change to all tests (everything but
> raptor), and we are wondering what your thoughts are on this? Is it fine
> with you to do the switch?

What do you mean by "everything but raptor"? Do you also plan to disable Telemetry in mochitests, Talos, etc.?

While Telemetry has unit tests and some integration tests, the rationale of having Telemetry enabled in all test suites (without sending pings) is to have tests running in an environment as close as possible to production, where Telemetry is always on (but not sending pings).

With that said, I think it's ok to disable/delay Telemetry in some specific test suites, if that's causing issues.
Flags: needinfo?(alessio.placitelli) → needinfo?(gmierz2)
(In reply to Greg Mierzwinski [:sparky] from comment #17)
> Hi :chutten, we spoke in the past about trying to find a way to prevent as
> much telemetry as possible from running while testing and we found that by
> changing the 'telemetry.toolkit.initDelay' preference to a very high value,
> we were able to stop most of it.
> 
> We'd like to add this preference change to all tests (everything but
> raptor), and we are wondering what your thoughts are on this? Is it fine
> with you to do the switch?

The criteria that were mentioned are ok to me:
- Telemetry instrumentation code is generally still active (as this is effectively part of the exercised code).
- But Telemetry sending and scheduled activity is disabled in most tests.

Adjusting that pref sounds like a good solution to me.
We definitely can't have that pref apply to everything inside toolkit/components/telemetry/tests.

There were some talks about generally collecting telemetry from test runs, but i'm not sure if this ever got anywhere.
If it does, it could investigate separately where this is feasible.
Flags: needinfo?(gfritzsche)
Raphael is generally looking into Telemetry and other Data testing.
Raphael, does that sound reasonable to you here and align with your plans?
Flags: needinfo?(rpierzina)
(In reply to Alessio Placitelli [:Dexter] from comment #20)
> What do you mean by "everything but raptor"? Do you also plan to disable
> Telemetry in mochitests, Talos, etc.?

Yes, it would be for all test harnesses (mochitest, xpcshell, etc.), including Talos.

(In reply to Georg Fritzsche [:gfritzsche] from comment #21)
> The criteria that were mentioned are ok to me:
> - Telemetry instrumentation code is generally still active (as this is
> effectively part of the exercised code).
> - But Telemetry sending and scheduled activity is disabled in most tests.
> 
> Adjusting that pref sounds like a good solution to me.
> We definitely can't have that pref apply to everything inside
> toolkit/components/telemetry/tests.

Looking at this chunk, it seems like the tests from that folder pass with this preference on (gtest also has passing telemetry tests): https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffd6c93dce436903500becd3eb43040404a5fca3&searchStr=linux64%2Fopt-xpcshell-8

Is this unexpected?

Comment 10 has all the failures which looked to me like they were new, none of them are explicitly related to telemetry or within the folder you mentioned.
Flags: needinfo?(gmierz2) → needinfo?(gfritzsche)
Assignee: nobody → gmierz2
Stuff in `toolkit/components/telemetry/tests` is probably fine, because they explicitly call `testSetup`, which ignores the `toolkit.telemetry.initDelay` pref (and instead uses a constant value of 1 for the delay).
Actual recording in other parts of the code should also be fine, as they happen anyway and don't depend on the controller to be initialized.

As this does not completely disable Telemetry, but only the "collecting external ping data & sending part", which we should already cover in our own tests, I'm fine with adding this for other tests to improve their coverage reports.
Flags: needinfo?(jrediger)
Pushed by gmierz2@outlook.com:
https://hg.mozilla.org/integration/autoland/rev/f54cd369acc0
Prevent intermittent telemetry gathering and reporting. r=ahal,marco
https://hg.mozilla.org/mozilla-central/rev/f54cd369acc0
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
(In reply to Jan-Erik Rediger [:janerik] from comment #24)
> As this does not completely disable Telemetry, but only the "collecting
> external ping data & sending part", which we should already cover in our own
> tests, I'm fine with adding this for other tests to improve their coverage
> reports.

I second this!
Flags: needinfo?(rpierzina)
You need to log in before you can comment on or make changes to this bug.