Closed Bug 1191912 Opened 9 years ago Closed 9 years ago

Implement a way to make Telemetry opt-out for a random sample of users

Categories

(Toolkit :: Telemetry, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla43
Iteration:
42.3 - Aug 10
Tracking Status
firefox40 --- unaffected
firefox41 --- fixed
firefox42 --- fixed
firefox43 --- fixed

People

(Reporter: gfritzsche, Assigned: gfritzsche)

References

Details

(Whiteboard: [unifiedTelemetry] [uplift4])

Attachments

(4 files, 1 obsolete file)

If we can't get Unified Telemetry shipping for 41 then we want to enable it for a random sample (5% being suggested) on release.

This allows us to
a) unblock projects waiting on the data (game initiative, release management, ...)
b) judge the release data quality
c) don't break critical dashboards yet as FHR is still fully running
Iteration: --- → 42.3 - Aug 10
Are we ok with the stated 5% sampling for now?
Flags: needinfo?(sguha)
Flags: needinfo?(benjamin)
Roberto, i remember you implemented sampling based on client id already somewhere.
Can you point me to the code?
Did that turn out to be reliable and good for random sampling?
Flags: needinfo?(rvitillo)
Assignee: nobody → gfritzsche
I assume we should take the same strategy we do on the server (seems like a good idea to make sure we're collecting the same clientIds that fall in the 1% sample).

Adding needsinfo to mreid to give pointers to that code/strategy...
Flags: needinfo?(mreid)
(In reply to Georg Fritzsche [:gfritzsche] from comment #2)
> Roberto, i remember you implemented sampling based on client id already
> somewhere.
> Can you point me to the code?
> Did that turn out to be reliable and good for random sampling?

You can find the code at [1]. I haven't used the data after enabling this on Beta though.  

[1] https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/BackgroundHangMonitor.cpp?from=BackgroundHangMonitor.cpp#485
Flags: needinfo?(rvitillo)
So it all depends. If you need some estimate from(say a proportion) representative of the population 5% ought to be well enough. if one starts looking for small subpopulations e.g. Hello users (very very small), specific addon users (also very small) then 5% might result in tiny numbers. For example some addons have <2000 users - a 5% sample of these is ~100 which might be really small to detect differences in usage (low power).

For lack of any particular needs, i would go with 5% but quote everything with confidence intervals or at least report a measure of sampling error - so that we don't just compare means without taking into account the sampling error in the measurement.

HTH
Flags: needinfo?(sguha)
Roberto already pointed at the code that takes the sampleId on the server, but to repeat here, we take the crc32 of the clientId modulo 100 to bucket things into groups of approximately 1%. The current 1% sample on the server takes sampleId value of 42, though implementing sampling on the client we would presumably want to take values <= X or >= (100-X) to get an X% sample.

Also, to be clear, sampling should only be done on the release channel, pre-release channels should continue to submit 100%.
Flags: needinfo?(mreid)
Yes 5%.
Status: NEW → ASSIGNED
Flags: needinfo?(benjamin)
Attachment #8645720 - Flags: review?(rvitillo) → review+
Comment on attachment 8645721 [details] [diff] [review]
Part 2 - Enable opt-out Telemetry for a 5% sample of release users

Review of attachment 8645721 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryController.jsm
@@ +111,5 @@
> +        crc = (crc >>> 8) ^ gCrcTable[(crc ^ str.charCodeAt(i)) & 0xFF];
> +    }
> +
> +    return (crc ^ (-1)) >>> 0;
> +}

mreid should confirm that this matches the server side implementation as depending on the algorithm used the implementation might yield different results.
Attachment #8645721 - Flags: review?(rvitillo)
Attachment #8645721 - Flags: review+
Attachment #8645721 - Flags: feedback?(mreid)
Attachment #8645722 - Flags: review?(rvitillo) → review+
Attachment #8645748 - Flags: review?(rvitillo) → review+
Blocks: 1192906
Comment on attachment 8645721 [details] [diff] [review]
Part 2 - Enable opt-out Telemetry for a 5% sample of release users

Trink, maybe you can take a look in the meantime?
Attachment #8645721 - Flags: feedback?(mtrinkala)
Comment on attachment 8645721 [details] [diff] [review]
Part 2 - Enable opt-out Telemetry for a 5% sample of release users

Review of attachment 8645721 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryController.jsm
@@ +111,5 @@
> +        crc = (crc >>> 8) ^ gCrcTable[(crc ^ str.charCodeAt(i)) & 0xFF];
> +    }
> +
> +    return (crc ^ (-1)) >>> 0;
> +}

The server implementation uses the standard zlib crc32. I tested ~1300 clientIds and the implementation in this bug matches the server-assigned values.

@@ +640,5 @@
> +    // This mimics the server-side 1% sampling, so that we can get matching populations.
> +    // The server samples on ((crc32(clientId) % 100) == 42), we match 42+X here to get
> +    // a bigger sample.
> +    const sample = crc32(clientId) % 100;
> +    const offset = 42;

I don't think we should encode the server's magic "42" value in the client. Rather, I think we should modify the server to select a different range of sampleIds and remove the "offset".

Instead, we could simply use "sample < range" on the client.
Attachment #8645721 - Flags: feedback?(mreid) → feedback-
(In reply to Mark Reid [:mreid] from comment #16)
> @@ +640,5 @@
> > +    // This mimics the server-side 1% sampling, so that we can get matching populations.
> > +    // The server samples on ((crc32(clientId) % 100) == 42), we match 42+X here to get
> > +    // a bigger sample.
> > +    const sample = crc32(clientId) % 100;
> > +    const offset = 42;
> 
> I don't think we should encode the server's magic "42" value in the client.
> Rather, I think we should modify the server to select a different range of
> sampleIds and remove the "offset".
> 
> Instead, we could simply use "sample < range" on the client.

Per IRC we are not planning to address this now to not add any delays to the 41 timeline.
+1 to Comment 16.  

Although I am still unclear how throwing away the bulk of the data will help in our data validation efforts ;)  Hopefully it will be easy to compare with an equivalent (available?) v2 data sample set.
(In reply to Mike Trinkala [:trink] from comment #18)
> +1 to Comment 16.  
> 
> Although I am still unclear how throwing away the bulk of the data will help
> in our data validation efforts ;)  Hopefully it will be easy to compare with
> an equivalent (available?) v2 data sample set.

We're not sampling for pre-release populations. We won't need to sample if we're ready to turn of FHRv2.

This option is to be used if we aren't willing to turn off FHRv2, so the choice would be between 5% and 0%, not between 5% and 100%.
Yeah just saying 5% allows you to play with the v4 data but not actually use it as a reliable reporting mechanism and if it is not aligned with a v2 sample (while collecting both) it would also make it hard to validate.
Comment on attachment 8645720 [details] [diff] [review]
Part 1 - Move client id caching to ClientID.jsm

Approval Request Comment
[Feature/regressing bug #]:
Unified Telemetry
[User impact if declined]:
If we don't get sign-off to have Unified Telemetry replace FHR in 41, then we want to enable it in addition to FHR for a 5% sample of the release population.
This patch takes care of a prerequisite.
[Describe test coverage new/current, TreeHerder]:
Automated test-coverage, manually checked the behavior, validated it matches the server-sampling.
[Risks and why]:
Low-risk, this is a rather small and contained change.
[String/UUID change made/needed]:
None.
Attachment #8645720 - Flags: approval-mozilla-beta?
Attachment #8645720 - Flags: approval-mozilla-aurora?
Comment on attachment 8645721 [details] [diff] [review]
Part 2 - Enable opt-out Telemetry for a 5% sample of release users

Approval Request Comment
[Feature/regressing bug #]:
Unified Telemetry
[User impact if declined]:
If we don't get sign-off to have Unified Telemetry replace FHR in 41, then we want to enable it in addition to FHR for a 5% sample of the release population.
This implements the actual sampling.
[Describe test coverage new/current, TreeHerder]:
Automated test-coverage, manually checked the behavior, validated it matches the server-sampling.
[Risks and why]:
Low-risk, this is a rather small and contained change.
[String/UUID change made/needed]:
None.
Attachment #8645721 - Flags: feedback?(mtrinkala)
Comment on attachment 8645721 [details] [diff] [review]
Part 2 - Enable opt-out Telemetry for a 5% sample of release users


Approval Request Comment
[Feature/regressing bug #]:
Unified Telemetry
[User impact if declined]:
If we don't get sign-off to have Unified Telemetry replace FHR in 41, then we want to enable it in addition to FHR for a 5% sample of the release population.
This implements the actual sampling.
[Describe test coverage new/current, TreeHerder]:
Automated test-coverage, manually checked the behavior, validated it matches the server-sampling.
[Risks and why]:
Low-risk, this is a rather small and contained change.
[String/UUID change made/needed]:
None.
Attachment #8645721 - Flags: approval-mozilla-beta?
Attachment #8645721 - Flags: approval-mozilla-aurora?
Whiteboard: [unifiedTelemetry] [uplift5] → [unifiedTelemetry] [uplift4]
Comment on attachment 8645720 [details] [diff] [review]
Part 1 - Move client id caching to ClientID.jsm

Approving for both Aurora and Beta. Telemetry related changes should be safe and this is a must-have for FF41 based on what I understand of the situation.
Attachment #8645720 - Flags: approval-mozilla-beta?
Attachment #8645720 - Flags: approval-mozilla-beta+
Attachment #8645720 - Flags: approval-mozilla-aurora?
Attachment #8645720 - Flags: approval-mozilla-aurora+
Attachment #8645721 - Flags: approval-mozilla-beta?
Attachment #8645721 - Flags: approval-mozilla-beta+
Attachment #8645721 - Flags: approval-mozilla-aurora?
Attachment #8645721 - Flags: approval-mozilla-aurora+
Flags: qe-verify+
Comment on attachment 8652271 [details] [diff] [review]
Part 3 (beta) - Enable opt-out Telemetry sampling

Moving this to bug 1192906 to track the pref change uplifts in one bug.
Attachment #8652271 - Attachment is obsolete: true
Removing [qe-verify +] flag since this has already been covered by the testing performed in bug 1192906.
Flags: qe-verify+
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: