Closed Bug 1648183 Opened 3 months ago Closed 3 months ago

Disable live site testing for 1 day, or otherwise disable telemetry pings

Categories

(Testing :: Performance, task)

task

Tracking

(firefox79 fixed)

RESOLVED FIXED
mozilla79
Tracking Status
firefox79 --- fixed

People

(Reporter: acreskey, Assigned: acreskey)

References

Details

Attachments

(1 file)

It has been reported in this slack thread that large spikes of new users are being seen via telemetry pings:
https://mozilla.slack.com/archives/CG2FGG1ST/p1592925909200400?thread_ts=1591216689.369800&cid=CG2FGG1ST

This is problematic for data science and other teams that make use of the telemetry data.

It is very possible, even likely, that this is coming from the Fenix live site testing that runs in Bitbar (USA).
This bug is to determine if that is the case.

Two options:
1 - disable the live site testing for a day to see if there is a drop in new users
2 - run fenix with tagged telemetry pings, i.e.
adb shell am start -n org.mozilla.fenix/mozilla.telemetry.glean.debug.GleanDebugActivity --es tagPings perf-ci-test

Note that I did try option 2 locally and the browsertime framework was not able to initialize.
This is likely because with the GleanDebugActivity the gecko session is not created.

I think that since the live site performance results are not sherifed the impact of this test should be minimal.
Unless I'm missing something?

From #riot discussion:

Dave Hunt [he/him]

I’ve seen earlier mention of this. We have Alex looking into disabling telemetry in Fennec as that was suspected to be sending pings in the past. Regarding temporarily disabling the live site tests this would have no impact on sheriffing as you suggest.
Is there a reason we can’t simply disable telemetry for all browsers by setting a pref? I’m sure our desktop tests do this, and even set the telemetry service endpoint to a null value via a pref.

alexandrui: what have you found so far in the telemetry Fennec investigation?

acreskey

But it turns out that Fenix (possibly Android Components) pushes the telemetry settings, so the prefs are all overridden.
Since we already do turn off some Fenix features (like the Tracking Protection pop-up) for performance tests, we should be able to also disable telemetry via that method.

Note that, in general, we want the telemetry systems to be enabled (see the gigantic Slack thread from comment 0), otherwise performance measurements will not work in an environment that will be the one from our users.

We have bug 1646706 inbound that will enable toggling debug features on/off using environment variable. This won't be available before 2 weeks from now, assuming CI can make use of environment variables at all (we actually need this for other platforms).

I just had another idea that won't require turning off CI: what if Fenix added a new, temporary metric that reads some environment variable and sends a value? Like if the "FENIX_CI_TEST" is set, Fenix sets a glean metric called "ci.running_in_raptor"? Would you be able to set an environment variable on the devices running the Fenix perf tests?

Flags: needinfo?(acreskey)

So I still don't think the live site tests (or CI in general) is the culprit, there's no evidence that shows that it's all coming from there yet.
(1) Your spikes are much larger than what we run for our testing (I've asked you how we do the counting but have received no answer on this).
(2) There is only a coincidental overlap with our frequency of tests, it's not a perfect overlap like it should be (since this is an exact science) if it were coming from CI live site tests.
(3) Live site tests happen once a day on M/W/F/Sun. Can you simply correlate the time of the ping spike onset with the time of when the tests were run to determine if it is CI?

(In reply to Alessio Placitelli [:Dexter] from comment #3)

Note that, in general, we want the telemetry systems to be enabled (see the gigantic Slack thread from comment 0), otherwise performance measurements will not work in an environment that will be the one from our users.

Right, I agree, it is better to also be performance testing the telemetry system.

We have bug 1646706 inbound that will enable toggling debug features on/off using environment variable. This won't be available before 2 weeks from now, assuming CI can make use of environment variables at all (we actually need this for other platforms).

I just had another idea that won't require turning off CI: what if Fenix added a new, temporary metric that reads some environment variable and sends a value? Like if the "FENIX_CI_TEST" is set, Fenix sets a glean metric called "ci.running_in_raptor"? Would you be able to set an environment variable on the devices running the Fenix perf tests?

As far as I can tell, it won't be possible for us to set an environment variable that can be read on the android device.
However we do have a hook for setting performance options in Fenix (e.g. disabling certain pop-ups which interfere with visual metrics).
We could in theory add the additional glean metric here.

Flags: needinfo?(acreskey)

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #5)

As far as I can tell, it won't be possible for us to set an environment variable that can be read on the android device.
However we do have a hook for setting performance options in Fenix (e.g. disabling certain pop-ups which interfere with visual metrics).
We could in theory add the additional glean metric here.

@Andrew, Ah. How does that work? Can you provide me a link to that piece of code?

@Marissa, any comment with respect to comment 4?

Flags: needinfo?(mgorlick)
Flags: needinfo?(acreskey)

(In reply to Alessio Placitelli [:Dexter] from comment #6)

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #5)

As far as I can tell, it won't be possible for us to set an environment variable that can be read on the android device.
However we do have a hook for setting performance options in Fenix (e.g. disabling certain pop-ups which interfere with visual metrics).
We could in theory add the additional glean metric here.

@Andrew, Ah. How does that work? Can you provide me a link to that piece of code?

:Dexter, this is the function in fenix that prepares it for performance tests:
https://github.com/mozilla-mobile/fenix/blob/82facbfa00b4a701b6919d618df3ee6459bc74b7/app/src/main/java/org/mozilla/fenix/perf/Performance.kt#L26-L33

This is enable by an additional intent argument, an example use in a new test is here:
https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/testing/performance/hooks_home_activity.py#15-17

I don't actually see this option being passed in for the general CI performance tests, but it shouldn't be hard to add it.

Flags: needinfo?(acreskey)

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #7)

(In reply to Alessio Placitelli [:Dexter] from comment #6)

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #5)

As far as I can tell, it won't be possible for us to set an environment variable that can be read on the android device.
However we do have a hook for setting performance options in Fenix (e.g. disabling certain pop-ups which interfere with visual metrics).
We could in theory add the additional glean metric here.

@Andrew, Ah. How does that work? Can you provide me a link to that piece of code?

:Dexter, this is the function in fenix that prepares it for performance tests:
https://github.com/mozilla-mobile/fenix/blob/82facbfa00b4a701b6919d618df3ee6459bc74b7/app/src/main/java/org/mozilla/fenix/perf/Performance.kt#L26-L33

This is enable by an additional intent argument, an example use in a new test is here:
https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/testing/performance/hooks_home_activity.py#15-17

I don't actually see this option being passed in for the general CI performance tests, but it shouldn't be hard to add it.

Ok, then it seems possible for Fenix to detect this. Which means a metric can be added to understand if these tests are generating telemetry that's skewing things. This won't be a good general solution though.

@Andrew, are these tests running on Task Cluster?

Flags: needinfo?(acreskey)

:Dexter, yes they are running on Task Cluster (physical devices in the bitbar labs)

:Sparky - Although we didn't see a perfect correlation with the CI tests, I think we saw a possible correlation.
And perhaps there is something else going on? Or the tests take longer to run than we think?

Because there is no dashboard for these results and they are not sheriffed, I'm still in favor of disabling them for a bit to see if that removes the pings.

To me, this seems like the simplest solution.

I believe this is the taskcluster config that enabled them.
https://phabricator.services.mozilla.com/D69053

Flags: needinfo?(acreskey)

Dave, as we were discussing, would it be possible to have alexandrui or someone help with this test?
It's bit beyond my taskcluster skills to safely disable these.

Flags: needinfo?(dave.hunt)

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #9)

:Sparky - Although we didn't see a perfect correlation with the CI tests, I think we saw a possible correlation.
And perhaps there is something else going on? Or the tests take longer to run than we think?

I think there is something else going on here, that's why I'm curious to know about the methodology of how those daily counts were produced. I'm also wondering why our data can't already answer this - do we only have a daily level of granularity or can we look at hourly/minute data?

Because there is no dashboard for these results and they are not sheriffed, I'm still in favor of disabling them for a bit to see if that removes the pings.

I agree with disabling them temporarily - the cron tasks would need to be removed for this. It would be great if we could disable this telemetry eventually to get rid of some noise.

Flags: needinfo?(dave.hunt)

(In reply to Greg Mierzwinski [:sparky] | PTO 24-29 June from comment #11)

I agree with disabling them temporarily - the cron tasks would need to be removed for this. It would be great if we could disable this telemetry eventually to get rid of some noise.

I believe telemetry should not be disabled. It should rather be marked appropriately. We have bug 1634064 to enable this.

Disabling these unsheriffed tests to determine if they are causing the spikes in new users that telemetry reveals.

Assignee: nobody → acreskey
Status: NEW → ASSIGNED

(In reply to Dave Hunt [:davehunt] [he/him] ⌚BST from comment #12)

If we just want to disable the live site testing you can change the cron entry's when to [] so it does not trigger: https://searchfox.org/mozilla-central/rev/cfaa250d14e344834932de4c2eed0061701654da/.cron.yml#110-119 and return False here: https://searchfox.org/mozilla-central/rev/cfaa250d14e344834932de4c2eed0061701654da/taskcluster/taskgraph/target_tasks.py#670

Thank you Dave, I put that up for review.

Pushed by rmaries@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f721690fef8f
Disable live site testing for 1 day, or otherwise disable telemetry pings r=davehunt

(In reply to Greg Mierzwinski [:sparky] | PTO 24-29 June from comment #4)

So I still don't think the live site tests (or CI in general) is the culprit, there's no evidence that shows that it's all coming from there yet.
(1) Your spikes are much larger than what we run for our testing (I've asked you how we do the counting but have received no answer on this).
(2) There is only a coincidental overlap with our frequency of tests, it's not a perfect overlap like it should be (since this is an exact science) if it were coming from CI live site tests.
(3) Live site tests happen once a day on M/W/F/Sun. Can you simply correlate the time of the ping spike onset with the time of when the tests were run to determine if it is CI?

Not a perfect match but there seems to be some signal this is it. There is a trend over time by day of week (Monday = 1). Suspicious users are generally up higher on Mon, Wed, Th, F and down Tu, Sat, Su (up a bit sometimes)

https://docs.google.com/document/d/1bUotTjis45XoS2eyMExTo2VkcK5HwajHuJGtdDSz1wY/edit#heading=h.77lwp2w0wyw5

Flags: needinfo?(mgorlick)
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79

Hey Marissa,

looks like this was merged. This means, as far as I can tell, that tests won't run on Ci today. Any chance you could verify if you could see the difference?

Flags: needinfo?(mgorlick)

Thanks all. I will validate first thing tomorrow.

Following up, I checked the counts of new users with the two methodologies and the results were consistent with CI testing.

We can see that on Firefox Preview Nightly there is no volatility in our understanding of new users. The new users seen likely reflect the true new user baseline. In addition, we see a spike new testing on Fenix Release. Fenix Release isn’t available anywhere outside of internal channels. We see this activity stop during the weekend. Together this provides compelling evidence that CI testing and/or internal work is driving artifacts in our understanding of new users that need to be filtered out of our data.

https://docs.google.com/document/d/1bUotTjis45XoS2eyMExTo2VkcK5HwajHuJGtdDSz1wY/edit#heading=h.enmbbof72nzy

Flags: needinfo?(mgorlick)

Now that :mgorlick's confirmed that it's CI, it sounds like there are some issues with the glean pings then because the counts aren't matching up between what we actually run, and what's being measured (it sounds like there's an over-estimation going on somewhere).

(In reply to Alessio Placitelli [:Dexter] from comment #13)

(In reply to Greg Mierzwinski [:sparky] | PTO 24-29 June from comment #11)

I agree with disabling them temporarily - the cron tasks would need to be removed for this. It would be great if we could disable this telemetry eventually to get rid of some noise.

I believe telemetry should not be disabled. It should rather be marked appropriately. We have bug 1634064 to enable this.

Can you elaborate on why? Telemetry adds noise to our performance tests and this is very well known (we disable them on FF). Having it enabled means that the test variance increases which diminishes our ability to detect smaller regressions, and increases our chances to missing regressions which increases the likelihood for slow-regressions to occur. Could we run a bi-weekly test with telemetry enabled instead?

Flags: needinfo?(alessio.placitelli)

(In reply to Greg Mierzwinski [:sparky] from comment #22)

Can you elaborate on why? Telemetry adds noise to our performance tests and this is very well known (we disable them on FF). Having it enabled means that the test variance increases which diminishes our ability to detect smaller regressions, and increases our chances to missing regressions which increases the likelihood for slow-regressions to occur. Could we run a bi-weekly test with telemetry enabled instead?

Because you will testing the performance of a product that's different than the one you'll be releasing. Telemetry is a fundamental subsystem and might affect performance as well. The product might be lightning fast in tests due to that and be somehow slower (I'm not saying it is!) in the wild.

Moreover, we'd be testing and measuring metrics that are different than the ones that telemetry is collecting in the wild, while having telemetry on will allow us to capture telemetry from performance tests as well, to have a baseline for comparing CI vs real builds in the wild. Some teams are actually pursuing this.

With that said, of course, while a believe that disabling telemetry would not be the ideal solution for this, I don't own performance telemetry :-)

Flags: needinfo?(alessio.placitelli)

I am working on disabling telemetry on fennec and going through telemetry docs says that toolkit.telemetry.unified should be false on Fennec only. I tested this on a geckoview test locally, in --debug mode and discovered that is is also false. Am I missing something? Is there a way on finding out a specific job has telemetry/unified enabled or not? This finding might change the assumptions for this problem.

If unified is off, toolkit.telemetry.enabled controls whether the Telemetry module is enabled. If unified is on, toolkit.telemetry.enabled is locked to true if MOZ_UPDATE_CHANNEL is nightly or aurora or beta or default. Otherwise it is locked to false.

Flags: needinfo?(alessio.placitelli)

(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #24)

I am working on disabling telemetry on fennec and going through telemetry docs says that toolkit.telemetry.unified should be false on Fennec only. I tested this on a geckoview test locally, in --debug mode and discovered that is is also false. Am I missing something? Is there a way on finding out a specific job has telemetry/unified enabled or not? This finding might change the assumptions for this problem.

If unified is off, toolkit.telemetry.enabled controls whether the Telemetry module is enabled. If unified is on, toolkit.telemetry.enabled is locked to true if MOZ_UPDATE_CHANNEL is nightly or aurora or beta or default. Otherwise it is locked to false.

Hi,

this bug is specifically about Fenix, not Fennec. Fennec uses a legacy telemetry system that's different, but similar, than the one used in Firefox. This telemetry system is not owned by the Telemetry team (even though the name suggests that!).

With that said, as mentioned over Riot, you should try setting datareporting.healthreport.uploadEnabled to false, see here.

Flags: needinfo?(alessio.placitelli)

(In reply to Alessio Placitelli [:Dexter] from comment #23)

(In reply to Greg Mierzwinski [:sparky] from comment #22)

Can you elaborate on why? Telemetry adds noise to our performance tests and this is very well known (we disable them on FF). Having it enabled means that the test variance increases which diminishes our ability to detect smaller regressions, and increases our chances to missing regressions which increases the likelihood for slow-regressions to occur. Could we run a bi-weekly test with telemetry enabled instead?

Because you will testing the performance of a product that's different than the one you'll be releasing. Telemetry is a fundamental subsystem and might affect performance as well. The product might be lightning fast in tests due to that and be somehow slower (I'm not saying it is!) in the wild.

Moreover, we'd be testing and measuring metrics that are different than the ones that telemetry is collecting in the wild, while having telemetry on will allow us to capture telemetry from performance tests as well, to have a baseline for comparing CI vs real builds in the wild. Some teams are actually pursuing this.

With that said, of course, while a believe that disabling telemetry would not be the ideal solution for this, I don't own performance telemetry :-)

I fully agree that without telemetry, we aren't testing the same thing we are releasing but we also catch less regressions when it's on. The biggest issues in the variance with telemetry stems from the extra external network requests. Would it be possible to disable these at the least in the tests that are used for regression detection?

Regarding being able to compare telemetry between CI and in the wild, that sounds like a good idea to help us nail down our lab-testing gaps. But I wonder if we could have a telemetry-enabled variant running weekly/bi-weekly for this? We are quite limited in terms of the amount of mobile testing we can do so we need to balance realism with our ability to catch regressions.

EDIT: Let me know if you think I should direct these questions at someone else.

Flags: needinfo?(alessio.placitelli)

(In reply to Greg Mierzwinski [:sparky] from comment #26)

With that said, of course, while a believe that disabling telemetry would not be the ideal solution for this, I don't own performance telemetry :-)

I fully agree that without telemetry, we aren't testing the same thing we are releasing but we also catch less regressions when it's on. The biggest issues in the variance with telemetry stems from the extra external network requests. Would it be possible to disable these at the least in the tests that are used for regression detection?

Do we need to block all network requests? Even to localhost? In the future we might consider adding some way to tag uploads and make them go to localhost in automation.

Moreover, are you sure glean ping uploads add variance? All the collection is async and off the main thread, and the ping upload as well.

Regarding being able to compare telemetry between CI and in the wild, that sounds like a good idea to help us nail down our lab-testing gaps. But I wonder if we could have a telemetry-enabled variant running weekly/bi-weekly for this? We are quite limited in terms of the amount of mobile testing we can do so we need to balance realism with our ability to catch regressions.

EDIT: Let me know if you think I should direct these questions at someone else.

Having a separate variant for this was an explicit anti-goal. I'm not the best person to talk about this and to make decisions about it, I can only provide perspectives about telemetry and Glean. This is really a Fenix product question that Fenix people/fenix perf teams should answer. Michael Comella/Eric Smyth might be better people to talk to. I'd love to be kept in the loop, though :)

Flags: needinfo?(alessio.placitelli)
See Also: → 1650798

:dexter, I've opened bug 1650798 for further discussions (also replied to your comments there).

You need to log in before you can comment on or make changes to this bug.