Disable live site testing for 1 day, or otherwise disable telemetry pings
Categories
(Testing :: Performance, task)
Tracking
(firefox79 fixed)
Tracking | Status | |
---|---|---|
firefox79 | --- | fixed |
People
(Reporter: acreskey, Assigned: acreskey)
References
Details
Attachments
(1 file)
It has been reported in this slack thread that large spikes of new users are being seen via telemetry pings:
https://mozilla.slack.com/archives/CG2FGG1ST/p1592925909200400?thread_ts=1591216689.369800&cid=CG2FGG1ST
This is problematic for data science and other teams that make use of the telemetry data.
It is very possible, even likely, that this is coming from the Fenix live site testing that runs in Bitbar (USA).
This bug is to determine if that is the case.
Two options:
1 - disable the live site testing for a day to see if there is a drop in new users
2 - run fenix with tagged telemetry pings, i.e.
adb shell am start -n org.mozilla.fenix/mozilla.telemetry.glean.debug.GleanDebugActivity --es tagPings perf-ci-test
Note that I did try option 2 locally and the browsertime framework was not able to initialize.
This is likely because with the GleanDebugActivity
the gecko session is not created.
Assignee | ||
Comment 1•4 years ago
|
||
I think that since the live site performance results are not sherifed the impact of this test should be minimal.
Unless I'm missing something?
Assignee | ||
Comment 2•4 years ago
|
||
From #riot discussion:
Dave Hunt [he/him]
I’ve seen earlier mention of this. We have Alex looking into disabling telemetry in Fennec as that was suspected to be sending pings in the past. Regarding temporarily disabling the live site tests this would have no impact on sheriffing as you suggest.
Is there a reason we can’t simply disable telemetry for all browsers by setting a pref? I’m sure our desktop tests do this, and even set the telemetry service endpoint to a null value via a pref.
alexandrui: what have you found so far in the telemetry Fennec investigation?
acreskey
But it turns out that Fenix (possibly Android Components) pushes the telemetry settings, so the prefs are all overridden.
Since we already do turn off some Fenix features (like the Tracking Protection pop-up) for performance tests, we should be able to also disable telemetry via that method.
Comment 3•4 years ago
|
||
Note that, in general, we want the telemetry systems to be enabled (see the gigantic Slack thread from comment 0), otherwise performance measurements will not work in an environment that will be the one from our users.
We have bug 1646706 inbound that will enable toggling debug features on/off using environment variable. This won't be available before 2 weeks from now, assuming CI can make use of environment variables at all (we actually need this for other platforms).
I just had another idea that won't require turning off CI: what if Fenix added a new, temporary metric that reads some environment variable and sends a value? Like if the "FENIX_CI_TEST" is set, Fenix sets a glean metric called "ci.running_in_raptor"? Would you be able to set an environment variable on the devices running the Fenix perf tests?
Comment 4•4 years ago
|
||
So I still don't think the live site tests (or CI in general) is the culprit, there's no evidence that shows that it's all coming from there yet.
(1) Your spikes are much larger than what we run for our testing (I've asked you how we do the counting but have received no answer on this).
(2) There is only a coincidental overlap with our frequency of tests, it's not a perfect overlap like it should be (since this is an exact science) if it were coming from CI live site tests.
(3) Live site tests happen once a day on M/W/F/Sun. Can you simply correlate the time of the ping spike onset with the time of when the tests were run to determine if it is CI?
Assignee | ||
Comment 5•4 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #3)
Note that, in general, we want the telemetry systems to be enabled (see the gigantic Slack thread from comment 0), otherwise performance measurements will not work in an environment that will be the one from our users.
Right, I agree, it is better to also be performance testing the telemetry system.
We have bug 1646706 inbound that will enable toggling debug features on/off using environment variable. This won't be available before 2 weeks from now, assuming CI can make use of environment variables at all (we actually need this for other platforms).
I just had another idea that won't require turning off CI: what if Fenix added a new, temporary metric that reads some environment variable and sends a value? Like if the "FENIX_CI_TEST" is set, Fenix sets a glean metric called "ci.running_in_raptor"? Would you be able to set an environment variable on the devices running the Fenix perf tests?
As far as I can tell, it won't be possible for us to set an environment variable that can be read on the android device.
However we do have a hook for setting performance options in Fenix (e.g. disabling certain pop-ups which interfere with visual metrics).
We could in theory add the additional glean metric here.
Comment 6•4 years ago
|
||
(In reply to Andrew Creskey [:acreskey] [he/him] from comment #5)
As far as I can tell, it won't be possible for us to set an environment variable that can be read on the android device.
However we do have a hook for setting performance options in Fenix (e.g. disabling certain pop-ups which interfere with visual metrics).
We could in theory add the additional glean metric here.
@Andrew, Ah. How does that work? Can you provide me a link to that piece of code?
@Marissa, any comment with respect to comment 4?
Assignee | ||
Comment 7•4 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #6)
(In reply to Andrew Creskey [:acreskey] [he/him] from comment #5)
As far as I can tell, it won't be possible for us to set an environment variable that can be read on the android device.
However we do have a hook for setting performance options in Fenix (e.g. disabling certain pop-ups which interfere with visual metrics).
We could in theory add the additional glean metric here.@Andrew, Ah. How does that work? Can you provide me a link to that piece of code?
:Dexter, this is the function in fenix that prepares it for performance tests:
https://github.com/mozilla-mobile/fenix/blob/82facbfa00b4a701b6919d618df3ee6459bc74b7/app/src/main/java/org/mozilla/fenix/perf/Performance.kt#L26-L33
This is enable by an additional intent argument, an example use in a new test is here:
https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/testing/performance/hooks_home_activity.py#15-17
I don't actually see this option being passed in for the general CI performance tests, but it shouldn't be hard to add it.
Comment 8•4 years ago
|
||
(In reply to Andrew Creskey [:acreskey] [he/him] from comment #7)
(In reply to Alessio Placitelli [:Dexter] from comment #6)
(In reply to Andrew Creskey [:acreskey] [he/him] from comment #5)
As far as I can tell, it won't be possible for us to set an environment variable that can be read on the android device.
However we do have a hook for setting performance options in Fenix (e.g. disabling certain pop-ups which interfere with visual metrics).
We could in theory add the additional glean metric here.@Andrew, Ah. How does that work? Can you provide me a link to that piece of code?
:Dexter, this is the function in fenix that prepares it for performance tests:
https://github.com/mozilla-mobile/fenix/blob/82facbfa00b4a701b6919d618df3ee6459bc74b7/app/src/main/java/org/mozilla/fenix/perf/Performance.kt#L26-L33This is enable by an additional intent argument, an example use in a new test is here:
https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/testing/performance/hooks_home_activity.py#15-17I don't actually see this option being passed in for the general CI performance tests, but it shouldn't be hard to add it.
Ok, then it seems possible for Fenix to detect this. Which means a metric can be added to understand if these tests are generating telemetry that's skewing things. This won't be a good general solution though.
@Andrew, are these tests running on Task Cluster?
Assignee | ||
Comment 9•4 years ago
|
||
:Dexter, yes they are running on Task Cluster (physical devices in the bitbar labs)
:Sparky - Although we didn't see a perfect correlation with the CI tests, I think we saw a possible correlation.
And perhaps there is something else going on? Or the tests take longer to run than we think?
Because there is no dashboard for these results and they are not sheriffed, I'm still in favor of disabling them for a bit to see if that removes the pings.
To me, this seems like the simplest solution.
I believe this is the taskcluster config that enabled them.
https://phabricator.services.mozilla.com/D69053
Assignee | ||
Comment 10•4 years ago
|
||
Dave, as we were discussing, would it be possible to have alexandrui or someone help with this test?
It's bit beyond my taskcluster skills to safely disable these.
Comment 11•4 years ago
|
||
(In reply to Andrew Creskey [:acreskey] [he/him] from comment #9)
:Sparky - Although we didn't see a perfect correlation with the CI tests, I think we saw a possible correlation.
And perhaps there is something else going on? Or the tests take longer to run than we think?
I think there is something else going on here, that's why I'm curious to know about the methodology of how those daily counts were produced. I'm also wondering why our data can't already answer this - do we only have a daily level of granularity or can we look at hourly/minute data?
Because there is no dashboard for these results and they are not sheriffed, I'm still in favor of disabling them for a bit to see if that removes the pings.
I agree with disabling them temporarily - the cron tasks would need to be removed for this. It would be great if we could disable this telemetry eventually to get rid of some noise.
Comment 12•4 years ago
|
||
If we just want to disable the live site testing you can change the cron entry's when
to []
so it does not trigger: https://searchfox.org/mozilla-central/rev/cfaa250d14e344834932de4c2eed0061701654da/.cron.yml#110-119 and return False
here: https://searchfox.org/mozilla-central/rev/cfaa250d14e344834932de4c2eed0061701654da/taskcluster/taskgraph/target_tasks.py#670
Comment 13•4 years ago
|
||
(In reply to Greg Mierzwinski [:sparky] | PTO 24-29 June from comment #11)
I agree with disabling them temporarily - the cron tasks would need to be removed for this. It would be great if we could disable this telemetry eventually to get rid of some noise.
I believe telemetry should not be disabled. It should rather be marked appropriately. We have bug 1634064 to enable this.
Assignee | ||
Comment 14•4 years ago
|
||
Disabling these unsheriffed tests to determine if they are causing the spikes in new users that telemetry reveals.
Updated•4 years ago
|
Assignee | ||
Comment 15•4 years ago
|
||
(In reply to Dave Hunt [:davehunt] [he/him] ⌚BST from comment #12)
If we just want to disable the live site testing you can change the cron entry's
when
to[]
so it does not trigger: https://searchfox.org/mozilla-central/rev/cfaa250d14e344834932de4c2eed0061701654da/.cron.yml#110-119 and returnFalse
here: https://searchfox.org/mozilla-central/rev/cfaa250d14e344834932de4c2eed0061701654da/taskcluster/taskgraph/target_tasks.py#670
Thank you Dave, I put that up for review.
Comment 16•4 years ago
|
||
Comment 17•4 years ago
|
||
(In reply to Greg Mierzwinski [:sparky] | PTO 24-29 June from comment #4)
So I still don't think the live site tests (or CI in general) is the culprit, there's no evidence that shows that it's all coming from there yet.
(1) Your spikes are much larger than what we run for our testing (I've asked you how we do the counting but have received no answer on this).
(2) There is only a coincidental overlap with our frequency of tests, it's not a perfect overlap like it should be (since this is an exact science) if it were coming from CI live site tests.
(3) Live site tests happen once a day on M/W/F/Sun. Can you simply correlate the time of the ping spike onset with the time of when the tests were run to determine if it is CI?
Not a perfect match but there seems to be some signal this is it. There is a trend over time by day of week (Monday = 1). Suspicious users are generally up higher on Mon, Wed, Th, F and down Tu, Sat, Su (up a bit sometimes)
Comment 18•4 years ago
|
||
bugherder |
Comment 19•4 years ago
|
||
Hey Marissa,
looks like this was merged. This means, as far as I can tell, that tests won't run on Ci today. Any chance you could verify if you could see the difference?
Comment 20•4 years ago
|
||
Thanks all. I will validate first thing tomorrow.
Comment 21•4 years ago
|
||
Following up, I checked the counts of new users with the two methodologies and the results were consistent with CI testing.
We can see that on Firefox Preview Nightly there is no volatility in our understanding of new users. The new users seen likely reflect the true new user baseline. In addition, we see a spike new testing on Fenix Release. Fenix Release isn’t available anywhere outside of internal channels. We see this activity stop during the weekend. Together this provides compelling evidence that CI testing and/or internal work is driving artifacts in our understanding of new users that need to be filtered out of our data.
Comment 22•4 years ago
|
||
Now that :mgorlick's confirmed that it's CI, it sounds like there are some issues with the glean pings then because the counts aren't matching up between what we actually run, and what's being measured (it sounds like there's an over-estimation going on somewhere).
(In reply to Alessio Placitelli [:Dexter] from comment #13)
(In reply to Greg Mierzwinski [:sparky] | PTO 24-29 June from comment #11)
I agree with disabling them temporarily - the cron tasks would need to be removed for this. It would be great if we could disable this telemetry eventually to get rid of some noise.
I believe telemetry should not be disabled. It should rather be marked appropriately. We have bug 1634064 to enable this.
Can you elaborate on why? Telemetry adds noise to our performance tests and this is very well known (we disable them on FF). Having it enabled means that the test variance increases which diminishes our ability to detect smaller regressions, and increases our chances to missing regressions which increases the likelihood for slow-regressions to occur. Could we run a bi-weekly test with telemetry enabled instead?
Comment 23•4 years ago
|
||
(In reply to Greg Mierzwinski [:sparky] from comment #22)
Can you elaborate on why? Telemetry adds noise to our performance tests and this is very well known (we disable them on FF). Having it enabled means that the test variance increases which diminishes our ability to detect smaller regressions, and increases our chances to missing regressions which increases the likelihood for slow-regressions to occur. Could we run a bi-weekly test with telemetry enabled instead?
Because you will testing the performance of a product that's different than the one you'll be releasing. Telemetry is a fundamental subsystem and might affect performance as well. The product might be lightning fast in tests due to that and be somehow slower (I'm not saying it is!) in the wild.
Moreover, we'd be testing and measuring metrics that are different than the ones that telemetry is collecting in the wild, while having telemetry on will allow us to capture telemetry from performance tests as well, to have a baseline for comparing CI vs real builds in the wild. Some teams are actually pursuing this.
With that said, of course, while a believe that disabling telemetry would not be the ideal solution for this, I don't own performance telemetry :-)
Comment 24•4 years ago
•
|
||
I am working on disabling telemetry on fennec and going through telemetry docs says that toolkit.telemetry.unified
should be false
on Fennec only. I tested this on a geckoview
test locally, in --debug
mode and discovered that is is also false
. Am I missing something? Is there a way on finding out a specific job has telemetry/unified enabled or not? This finding might change the assumptions for this problem.
If unified is off,
toolkit.telemetry.enabled
controls whether the Telemetry module is enabled. Ifunified
is on,toolkit.telemetry.enabled
is locked to true ifMOZ_UPDATE_CHANNEL
isnightly
oraurora
orbeta
ordefault
. Otherwise it is locked tofalse
.
Comment 25•4 years ago
|
||
(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #24)
I am working on disabling telemetry on fennec and going through telemetry docs says that
toolkit.telemetry.unified
should befalse
on Fennec only. I tested this on ageckoview
test locally, in--debug
mode and discovered that is is alsofalse
. Am I missing something? Is there a way on finding out a specific job has telemetry/unified enabled or not? This finding might change the assumptions for this problem.If unified is off,
toolkit.telemetry.enabled
controls whether the Telemetry module is enabled. Ifunified
is on,toolkit.telemetry.enabled
is locked to true ifMOZ_UPDATE_CHANNEL
isnightly
oraurora
orbeta
ordefault
. Otherwise it is locked tofalse
.
Hi,
this bug is specifically about Fenix, not Fennec. Fennec uses a legacy telemetry system that's different, but similar, than the one used in Firefox. This telemetry system is not owned by the Telemetry team (even though the name suggests that!).
With that said, as mentioned over Riot, you should try setting datareporting.healthreport.uploadEnabled
to false
, see here.
Comment 26•4 years ago
•
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #23)
(In reply to Greg Mierzwinski [:sparky] from comment #22)
Can you elaborate on why? Telemetry adds noise to our performance tests and this is very well known (we disable them on FF). Having it enabled means that the test variance increases which diminishes our ability to detect smaller regressions, and increases our chances to missing regressions which increases the likelihood for slow-regressions to occur. Could we run a bi-weekly test with telemetry enabled instead?
Because you will testing the performance of a product that's different than the one you'll be releasing. Telemetry is a fundamental subsystem and might affect performance as well. The product might be lightning fast in tests due to that and be somehow slower (I'm not saying it is!) in the wild.
Moreover, we'd be testing and measuring metrics that are different than the ones that telemetry is collecting in the wild, while having telemetry on will allow us to capture telemetry from performance tests as well, to have a baseline for comparing CI vs real builds in the wild. Some teams are actually pursuing this.
With that said, of course, while a believe that disabling telemetry would not be the ideal solution for this, I don't own performance telemetry :-)
I fully agree that without telemetry, we aren't testing the same thing we are releasing but we also catch less regressions when it's on. The biggest issues in the variance with telemetry stems from the extra external network requests. Would it be possible to disable these at the least in the tests that are used for regression detection?
Regarding being able to compare telemetry between CI and in the wild, that sounds like a good idea to help us nail down our lab-testing gaps. But I wonder if we could have a telemetry-enabled variant running weekly/bi-weekly for this? We are quite limited in terms of the amount of mobile testing we can do so we need to balance realism with our ability to catch regressions.
EDIT: Let me know if you think I should direct these questions at someone else.
Comment 27•4 years ago
|
||
(In reply to Greg Mierzwinski [:sparky] from comment #26)
With that said, of course, while a believe that disabling telemetry would not be the ideal solution for this, I don't own performance telemetry :-)
I fully agree that without telemetry, we aren't testing the same thing we are releasing but we also catch less regressions when it's on. The biggest issues in the variance with telemetry stems from the extra external network requests. Would it be possible to disable these at the least in the tests that are used for regression detection?
Do we need to block all network requests? Even to localhost? In the future we might consider adding some way to tag uploads and make them go to localhost in automation.
Moreover, are you sure glean ping uploads add variance? All the collection is async and off the main thread, and the ping upload as well.
Regarding being able to compare telemetry between CI and in the wild, that sounds like a good idea to help us nail down our lab-testing gaps. But I wonder if we could have a telemetry-enabled variant running weekly/bi-weekly for this? We are quite limited in terms of the amount of mobile testing we can do so we need to balance realism with our ability to catch regressions.
EDIT: Let me know if you think I should direct these questions at someone else.
Having a separate variant for this was an explicit anti-goal. I'm not the best person to talk about this and to make decisions about it, I can only provide perspectives about telemetry and Glean. This is really a Fenix product question that Fenix people/fenix perf teams should answer. Michael Comella/Eric Smyth might be better people to talk to. I'd love to be kept in the loop, though :)
Comment 28•4 years ago
|
||
:dexter, I've opened bug 1650798 for further discussions (also replied to your comments there).
Description
•