Implement a way to make Telemetry opt-out for a random sample of users

VERIFIED FIXED in Firefox 41

Status

()

Toolkit
Telemetry
VERIFIED FIXED
2 years ago
2 years ago

People

(Reporter: gfritzsche, Assigned: gfritzsche)

Tracking

Trunk
mozilla43
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox40 unaffected, firefox41 fixed, firefox42 fixed, firefox43 fixed)

Details

(Whiteboard: [unifiedTelemetry] [uplift4])

Attachments

(4 attachments, 1 obsolete attachment)

If we can't get Unified Telemetry shipping for 41 then we want to enable it for a random sample (5% being suggested) on release.

This allows us to
a) unblock projects waiting on the data (game initiative, release management, ...)
b) judge the release data quality
c) don't break critical dashboards yet as FHR is still fully running
status-firefox40: --- → unaffected
status-firefox41: --- → affected
Iteration: --- → 42.3 - Aug 10
Are we ok with the stated 5% sampling for now?
Flags: needinfo?(sguha)
Flags: needinfo?(benjamin)
Roberto, i remember you implemented sampling based on client id already somewhere.
Can you point me to the code?
Did that turn out to be reliable and good for random sampling?
Flags: needinfo?(rvitillo)
Assignee: nobody → gfritzsche

Comment 3

2 years ago
I assume we should take the same strategy we do on the server (seems like a good idea to make sure we're collecting the same clientIds that fall in the 1% sample).

Adding needsinfo to mreid to give pointers to that code/strategy...
Flags: needinfo?(mreid)
(In reply to Georg Fritzsche [:gfritzsche] from comment #2)
> Roberto, i remember you implemented sampling based on client id already
> somewhere.
> Can you point me to the code?
> Did that turn out to be reliable and good for random sampling?

You can find the code at [1]. I haven't used the data after enabling this on Beta though.  

[1] https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/BackgroundHangMonitor.cpp?from=BackgroundHangMonitor.cpp#485
Flags: needinfo?(rvitillo)
I agree on using the same procedure as we do on the server side: https://github.com/vitillo/data-pipeline/blob/master/heka/sandbox/decoders/extract_telemetry_dimensions.lua#L95
So it all depends. If you need some estimate from(say a proportion) representative of the population 5% ought to be well enough. if one starts looking for small subpopulations e.g. Hello users (very very small), specific addon users (also very small) then 5% might result in tiny numbers. For example some addons have <2000 users - a 5% sample of these is ~100 which might be really small to detect differences in usage (low power).

For lack of any particular needs, i would go with 5% but quote everything with confidence intervals or at least report a measure of sampling error - so that we don't just compare means without taking into account the sampling error in the measurement.

HTH
Flags: needinfo?(sguha)

Comment 7

2 years ago
Roberto already pointed at the code that takes the sampleId on the server, but to repeat here, we take the crc32 of the clientId modulo 100 to bucket things into groups of approximately 1%. The current 1% sample on the server takes sampleId value of 42, though implementing sampling on the client we would presumably want to take values <= X or >= (100-X) to get an X% sample.

Also, to be clear, sampling should only be done on the release channel, pre-release channels should continue to submit 100%.
Flags: needinfo?(mreid)
Yes 5%.
Created attachment 8645720 [details] [diff] [review]
Part 1 - Move client id caching to ClientID.jsm
Attachment #8645720 - Flags: review?(rvitillo)
Created attachment 8645721 [details] [diff] [review]
Part 2 - Enable opt-out Telemetry for a 5% sample of release users
Attachment #8645721 - Flags: review?(rvitillo)
Created attachment 8645722 [details] [diff] [review]
Bonus: Remove left-over clientId getter from TelemetrySession
Attachment #8645722 - Flags: review?(rvitillo)
Status: NEW → ASSIGNED
Flags: needinfo?(benjamin)
Attachment #8645720 - Flags: review?(rvitillo) → review+
Comment on attachment 8645721 [details] [diff] [review]
Part 2 - Enable opt-out Telemetry for a 5% sample of release users

Review of attachment 8645721 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryController.jsm
@@ +111,5 @@
> +        crc = (crc >>> 8) ^ gCrcTable[(crc ^ str.charCodeAt(i)) & 0xFF];
> +    }
> +
> +    return (crc ^ (-1)) >>> 0;
> +}

mreid should confirm that this matches the server side implementation as depending on the algorithm used the implementation might yield different results.
Attachment #8645721 - Flags: review?(rvitillo)
Attachment #8645721 - Flags: review+
Attachment #8645721 - Flags: feedback?(mreid)
Attachment #8645722 - Flags: review?(rvitillo) → review+
Created attachment 8645748 [details] [diff] [review]
Part 3 - Enable opt-out Telemetry sampling
Attachment #8645748 - Flags: review?(rvitillo)
Attachment #8645748 - Flags: review?(rvitillo) → review+
Blocks: 1192906
Comment on attachment 8645721 [details] [diff] [review]
Part 2 - Enable opt-out Telemetry for a 5% sample of release users

Trink, maybe you can take a look in the meantime?
Attachment #8645721 - Flags: feedback?(mtrinkala)

Comment 15

2 years ago
https://hg.mozilla.org/integration/fx-team/rev/58b1351b3538
https://hg.mozilla.org/integration/fx-team/rev/2dc8148c0245
https://hg.mozilla.org/integration/fx-team/rev/ed247945355f
https://hg.mozilla.org/integration/fx-team/rev/82e177714649

Comment 16

2 years ago
Comment on attachment 8645721 [details] [diff] [review]
Part 2 - Enable opt-out Telemetry for a 5% sample of release users

Review of attachment 8645721 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryController.jsm
@@ +111,5 @@
> +        crc = (crc >>> 8) ^ gCrcTable[(crc ^ str.charCodeAt(i)) & 0xFF];
> +    }
> +
> +    return (crc ^ (-1)) >>> 0;
> +}

The server implementation uses the standard zlib crc32. I tested ~1300 clientIds and the implementation in this bug matches the server-assigned values.

@@ +640,5 @@
> +    // This mimics the server-side 1% sampling, so that we can get matching populations.
> +    // The server samples on ((crc32(clientId) % 100) == 42), we match 42+X here to get
> +    // a bigger sample.
> +    const sample = crc32(clientId) % 100;
> +    const offset = 42;

I don't think we should encode the server's magic "42" value in the client. Rather, I think we should modify the server to select a different range of sampleIds and remove the "offset".

Instead, we could simply use "sample < range" on the client.
Attachment #8645721 - Flags: feedback?(mreid) → feedback-
(In reply to Mark Reid [:mreid] from comment #16)
> @@ +640,5 @@
> > +    // This mimics the server-side 1% sampling, so that we can get matching populations.
> > +    // The server samples on ((crc32(clientId) % 100) == 42), we match 42+X here to get
> > +    // a bigger sample.
> > +    const sample = crc32(clientId) % 100;
> > +    const offset = 42;
> 
> I don't think we should encode the server's magic "42" value in the client.
> Rather, I think we should modify the server to select a different range of
> sampleIds and remove the "offset".
> 
> Instead, we could simply use "sample < range" on the client.

Per IRC we are not planning to address this now to not add any delays to the 41 timeline.
+1 to Comment 16.  

Although I am still unclear how throwing away the bulk of the data will help in our data validation efforts ;)  Hopefully it will be easy to compare with an equivalent (available?) v2 data sample set.

Comment 19

2 years ago
(In reply to Mike Trinkala [:trink] from comment #18)
> +1 to Comment 16.  
> 
> Although I am still unclear how throwing away the bulk of the data will help
> in our data validation efforts ;)  Hopefully it will be easy to compare with
> an equivalent (available?) v2 data sample set.

We're not sampling for pre-release populations. We won't need to sample if we're ready to turn of FHRv2.

This option is to be used if we aren't willing to turn off FHRv2, so the choice would be between 5% and 0%, not between 5% and 100%.
Yeah just saying 5% allows you to play with the v4 data but not actually use it as a reliable reporting mechanism and if it is not aligned with a v2 sample (while collecting both) it would also make it hard to validate.
https://hg.mozilla.org/mozilla-central/rev/58b1351b3538
https://hg.mozilla.org/mozilla-central/rev/2dc8148c0245
https://hg.mozilla.org/mozilla-central/rev/ed247945355f
https://hg.mozilla.org/mozilla-central/rev/82e177714649
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox43: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Comment on attachment 8645720 [details] [diff] [review]
Part 1 - Move client id caching to ClientID.jsm

Approval Request Comment
[Feature/regressing bug #]:
Unified Telemetry
[User impact if declined]:
If we don't get sign-off to have Unified Telemetry replace FHR in 41, then we want to enable it in addition to FHR for a 5% sample of the release population.
This patch takes care of a prerequisite.
[Describe test coverage new/current, TreeHerder]:
Automated test-coverage, manually checked the behavior, validated it matches the server-sampling.
[Risks and why]:
Low-risk, this is a rather small and contained change.
[String/UUID change made/needed]:
None.
Attachment #8645720 - Flags: approval-mozilla-beta?
Attachment #8645720 - Flags: approval-mozilla-aurora?
Comment on attachment 8645721 [details] [diff] [review]
Part 2 - Enable opt-out Telemetry for a 5% sample of release users

Approval Request Comment
[Feature/regressing bug #]:
Unified Telemetry
[User impact if declined]:
If we don't get sign-off to have Unified Telemetry replace FHR in 41, then we want to enable it in addition to FHR for a 5% sample of the release population.
This implements the actual sampling.
[Describe test coverage new/current, TreeHerder]:
Automated test-coverage, manually checked the behavior, validated it matches the server-sampling.
[Risks and why]:
Low-risk, this is a rather small and contained change.
[String/UUID change made/needed]:
None.
Attachment #8645721 - Flags: feedback?(mtrinkala)
Comment on attachment 8645721 [details] [diff] [review]
Part 2 - Enable opt-out Telemetry for a 5% sample of release users


Approval Request Comment
[Feature/regressing bug #]:
Unified Telemetry
[User impact if declined]:
If we don't get sign-off to have Unified Telemetry replace FHR in 41, then we want to enable it in addition to FHR for a 5% sample of the release population.
This implements the actual sampling.
[Describe test coverage new/current, TreeHerder]:
Automated test-coverage, manually checked the behavior, validated it matches the server-sampling.
[Risks and why]:
Low-risk, this is a rather small and contained change.
[String/UUID change made/needed]:
None.
Attachment #8645721 - Flags: approval-mozilla-beta?
Attachment #8645721 - Flags: approval-mozilla-aurora?
Whiteboard: [unifiedTelemetry] [uplift5] → [unifiedTelemetry] [uplift4]
Comment on attachment 8645720 [details] [diff] [review]
Part 1 - Move client id caching to ClientID.jsm

Approving for both Aurora and Beta. Telemetry related changes should be safe and this is a must-have for FF41 based on what I understand of the situation.
Attachment #8645720 - Flags: approval-mozilla-beta?
Attachment #8645720 - Flags: approval-mozilla-beta+
Attachment #8645720 - Flags: approval-mozilla-aurora?
Attachment #8645720 - Flags: approval-mozilla-aurora+

Updated

2 years ago
Attachment #8645721 - Flags: approval-mozilla-beta?
Attachment #8645721 - Flags: approval-mozilla-beta+
Attachment #8645721 - Flags: approval-mozilla-aurora?
Attachment #8645721 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/463fe048a778
https://hg.mozilla.org/releases/mozilla-aurora/rev/86a5a7e54c8f
status-firefox42: affected → fixed
https://hg.mozilla.org/releases/mozilla-beta/rev/f949fd1f62f8
https://hg.mozilla.org/releases/mozilla-beta/rev/23ae04733450
status-firefox41: affected → fixed
Flags: in-testsuite+
Flags: qe-verify+
Created attachment 8652271 [details] [diff] [review]
Part 3 (beta) - Enable opt-out Telemetry sampling
Comment on attachment 8652271 [details] [diff] [review]
Part 3 (beta) - Enable opt-out Telemetry sampling

Moving this to bug 1192906 to track the pref change uplifts in one bug.
Attachment #8652271 - Attachment is obsolete: true

Comment 30

2 years ago
Removing [qe-verify +] flag since this has already been covered by the testing performed in bug 1192906.
Flags: qe-verify+
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.