Closed Bug 1313151 Opened 8 years ago Closed 8 years ago

Remove opt-in histograms from longitudinal dataset

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: rvitillo, Assigned: harter)

References

Details

User Story

Given that the longitudinal dataset is used as a representative dataset of our user population and that it's only a 1% sample, I think we should remove the opt-in measurements as they don't add value.

Attachments

(1 file)

https://github.com/mozilla/telemetry-batch-view/pull/148 8 years ago Ryan Harter [:harter] 56 bytes, text/x-github-pull-request	rvitillo : review+	Details \| Review

Roberto Agostino Vitillo (:rvitillo)

Reporter

Description

•

8 years ago

      No description provided.

Ryan Harter [:harter]

Assignee

Updated

•

8 years ago

Assignee: nobody → rharter

Points: --- → 3

Priority: -- → P2

Ryan Harter [:harter]

Assignee

Updated

•

8 years ago

Priority: P2 → P1

Ryan Harter [:harter]

Assignee

Comment 1

•

8 years ago

Hey Alessio,

Looking at histograms.json[1] I only see two histograms explicitly marked as opt-in[2]. Both of these appear to be for testing. If a histogram is not explicitly marked as opt-out is it opt-in?

[1] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Histograms.json
[2] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Histograms.json#5796

Flags: needinfo?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Comment 2

•

8 years ago

(In reply to Ryan Harter [:harter] from comment #1)
> [...] If a histogram is not explicitly marked as opt-out is it opt-in?

Hey Ryan! Yes, if not specified, you can safely assume an histogram is "opt-in" (see [1]).

[1] - https://dxr.mozilla.org/mozilla-central/rev/86f702229e32c6119d092e86431afee576f033a1/toolkit/components/telemetry/histogram_tools.py#130

Flags: needinfo?(alessio.placitelli)

Georg Fritzsche [:gfritzsche]

Comment 3

•

8 years ago

(Commenting on User Story)
> Given that the longitudinal dataset is used as a representative dataset of
> our user population and that it's only a 1% sample, I think we should remove
> the opt-in measurements as they don't add value.

This seems like a potentially disruptive change, there should probably at least reasonable advance notice be given for this? 
On the pre-release channels (where the opt-in measurements are collected from everyone by default) i would expect them to be used.

Roberto Agostino Vitillo (:rvitillo)

Reporter

Comment 4

•

8 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #3)
> (Commenting on User Story)
> > Given that the longitudinal dataset is used as a representative dataset of
> > our user population and that it's only a 1% sample, I think we should remove
> > the opt-in measurements as they don't add value.
> 
> This seems like a potentially disruptive change, there should probably at
> least reasonable advance notice be given for this? 

We should announce the intent of doing this on fhr-dev and fx-data-platform to see if there are any objections.

> On the pre-release channels (where the opt-in measurements are collected
> from everyone by default) i would expect them to be used.

In my experience we can't generally make statistically meaningful claims using only 1% of pre-release.

Ryan Harter [:harter]

Assignee

Comment 5

•

8 years ago

> > This seems like a potentially disruptive change, there should probably at
> > least reasonable advance notice be given for this? 
> 
> We should announce the intent of doing this on fhr-dev and fx-data-platform
> to see if there are any objections.

I'll send out an email this afternoon.

> > On the pre-release channels (where the opt-in measurements are collected
> > from everyone by default) i would expect them to be used.
> 
> In my experience we can't generally make statistically meaningful claims
> using only 1% of pre-release.

Looks like the current dataset has ~250k clients in pre-release. What types of claims do we try to make with these data? It seems like we should be able to answer some questions with that number of users.

Flags: needinfo?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Reporter

Comment 6

•

8 years ago

(In reply to Ryan Harter [:harter] from comment #5)
> Looks like the current dataset has ~250k clients in pre-release. What types
> of claims do we try to make with these data? It seems like we should be able
> to answer some questions with that number of users.

Right, but our users usually apply some filtering on top of that which can bring the number of eligible users quickly down to something which isn't very interesting. 

Furthermore, mixing opt-in and opt-out measurements makes self-served analysis more error prone as one could easily run a query based on an opt-in measure (and not knowing it's opt-in) and mistakenly think that the result applies to our population as a whole.

Flags: needinfo?(rvitillo)

Ryan Harter [:harter]

Assignee

Comment 7

•

8 years ago

Keeping this bug updated, we've identified all users/queries which depend on these histograms. I've emailed these users and am waiting on a response.

Bryan Clark (DevTools PM) [:clarkbw]

Comment 8

•

8 years ago

Jumping in to say that I agree with this choice in the general case.  I think it would likely be safer for those of us creating queries to not mistakenly use opt-in probes. 

I do have certain probes which are only useful to me in Nightly or Aurora because I'm tracking developer tools there.  But I believe I'm going to be the outsider here and I can look for another solution.  Hopefully someone can help me. :)  Given the size of the Nightly and Aurora population it would actually be nice to have a larger sample set anyway.

Benjamin Smedberg

Comment 9

•

8 years ago

Note that the original plan for longitudinal was to have separate longitudinals for certain subgroups, so I'd like us to continue to explore having a nightly-longitudinal with 100% of nightly, beta-longitudinal with 10% of beta, etc. I agree we should make this change for the current longitudinal because it's a footgun.

Ryan Harter [:harter]

Assignee

Comment 10

•

8 years ago

Attached file https://github.com/mozilla/telemetry-batch-view/pull/148 — Details

Sounds like we want to move forward with this change and pursue the 100% pre-release longitudinal set. 

PR attached.

Attachment #8814944 - Flags: review?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

8 years ago

Attachment #8814944 - Flags: review?(rvitillo) → review+

Roberto Agostino Vitillo (:rvitillo)

Reporter

Updated

•

8 years ago

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Frank Bertsch [:frank]

Comment 11

•

7 years ago

(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #4)
> In my experience we can't generally make statistically meaningful claims
> using only 1% of pre-release.

Any particular reason we kept opt-in scalars? This argument applies to those as well.

Roberto Agostino Vitillo (:rvitillo)

Reporter

Comment 12

•

7 years ago

(In reply to Frank Bertsch [:frank] from comment #11)
> (In reply to Roberto Agostino Vitillo (:rvitillo) from comment #4)
> > In my experience we can't generally make statistically meaningful claims
> > using only 1% of pre-release.
> 
> Any particular reason we kept opt-in scalars? This argument applies to those
> as well.

Opt-in scalars should be removed as well.

BMO Automation

Updated

•

6 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Remove opt-in histograms from longitudinal dataset

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

Tracking

(Not tracked)

People

(Reporter: rvitillo, Assigned: harter)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Updated

Comment 11

Comment 12

Updated

Attachment

General

Description

File Name

Content Type