Closed Bug 1120369 Opened 5 years ago Closed 5 years ago

Make telemetry data aware of different data sets

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Firefox 38

People

(Reporter: gfritzsche, Assigned: gfritzsche)

References

Details

(Whiteboard: [ready])

Attachments

(1 file, 2 obsolete files)

As we move FHR & Telemetry together, we need to be able to specify which data set telemetry probes go into.

For this (and potentially other different use-cases), we should allow specifying this as a field in Histograms.json and the relevant nsITelemetry methods.
Is it worth mentioning that this affects python scripts & data types for generated headers, histograms collection implementation.

Also the bucket specification should be optional. When no bucket is specified, we should default to "Telemetry" bucket.
Assignee: nobody → gfritzsche
Note that i moved the KeyedHistogram implementation around a little in Telemetry.cpp due to visibilities of gHistograms and TelemetryImpl.
Attachment #8553139 - Attachment is obsolete: true
Attachment #8554788 - Flags: review?(vdjeric)
Comment on attachment 8554788 [details] [diff] [review]
Part 1: Add datasets for histograms

Review of attachment 8554788 [details] [diff] [review]:
-----------------------------------------------------------------

I don't like the "dataSet" name for the Histograms.json syntax, it's way too vague. How about "fhrMeasurement": true/false? I wouldn't want people to add opt-out probes without truly understanding the difference between "base" and "extended" data sets and the relationship to FHR/Telemetry.

Don't forget to document this new field in the wiki.
Also please file a bug to visually distinguish base histograms from extended histograms in about:telemetry

::: toolkit/components/telemetry/Telemetry.cpp
@@ +1355,5 @@
> +    return false;
> +  }
> +
> +  uint32_t dataset = nsITelemetry::DATASET_EXTENDED;
> +  nsresult rv = keyed->GetDataset(&dataset);;

Hmm, we don't have to worry about anyone querying the dataset of a sub-histograms in a keyed histogram, right?

::: toolkit/components/telemetry/TelemetryPing.jsm
@@ +761,5 @@
>    /**
>     * Make a copy of interesting histograms at startup.
>     */
>    gatherStartupHistograms: function gatherStartupHistograms() {
> +    let info = Telemetry.registeredHistograms(Ci.nsITelemetry.DATASET_EXTENDED, []);

Do we really want FHR probes here? I'm not opposed, just asking. I guess Benjamin wants FHR histograms to be be stored in the same section as Telemetry histograms?

::: toolkit/components/telemetry/nsITelemetry.idl
@@ +33,5 @@
> +   * Dataset types:
> +   * DATASET_BASE - the basic dataset that is on-by-default on all channels
> +   * DATASET_EXTENDED - the extended dataset that defaults to off on release,
> +   *                    on on pre-release channels.
> +   */

So how do we query for the non-base extended-only histogram IDs?
Attachment #8554788 - Flags: review?(vdjeric) → review+
(In reply to Vladan Djeric (:vladan) from comment #4)
> I don't like the "dataSet" name for the Histograms.json syntax, it's way too
> vague. How about "fhrMeasurement": true/false? I wouldn't want people to add
> opt-out probes without truly understanding the difference between "base" and
> "extended" data sets and the relationship to FHR/Telemetry.

Right, sorry i missed adding some context here.
My problem is that i want it to be something descriptive.
* "fhr" is a data-collection mechanism that we move away from, naming a data (sub-)set after that will be strange later
* "telemetry" is now overloaded as a data submission mechanism, a specific data-collection mechanism... and a data sub-set in the latter

While we are changing things, i would like to end up with some naming that makes it clearer to people what the effect of this is.
I couldn't think of a better scheme yet.
 
> Don't forget to document this new field in the wiki.
> ::: toolkit/components/telemetry/Telemetry.cpp
> @@ +1355,5 @@
> > +    return false;
> > +  }
> > +
> > +  uint32_t dataset = nsITelemetry::DATASET_EXTENDED;
> > +  nsresult rv = keyed->GetDataset(&dataset);;
> 
> Hmm, we don't have to worry about anyone querying the dataset of a
> sub-histograms in a keyed histogram, right?

No, those shouldn't be accessed directly, but rather through the wrapper we have.

> ::: toolkit/components/telemetry/TelemetryPing.jsm
> @@ +761,5 @@
> >    /**
> >     * Make a copy of interesting histograms at startup.
> >     */
> >    gatherStartupHistograms: function gatherStartupHistograms() {
> > +    let info = Telemetry.registeredHistograms(Ci.nsITelemetry.DATASET_EXTENDED, []);
> 
> Do we really want FHR probes here? I'm not opposed, just asking. I guess
> Benjamin wants FHR histograms to be be stored in the same section as
> Telemetry histograms?

Given that we want to unify the two systems, we shouldn't distinguish between them really (e.g. some of what FHR submits now is already in Telemetry).
All that we do have to distinguish between (which is for an upcoming patch), is which sets of data we should submit in the payload.

> ::: toolkit/components/telemetry/nsITelemetry.idl
> @@ +33,5 @@
> > +   * Dataset types:
> > +   * DATASET_BASE - the basic dataset that is on-by-default on all channels
> > +   * DATASET_EXTENDED - the extended dataset that defaults to off on release,
> > +   *                    on on pre-release channels.
> > +   */
> 
> So how do we query for the non-base extended-only histogram IDs?

Given the above explanation, i don't think we need to do so.
We submit/collect either base or extended (a super-set of base).
ni?vladan for the data-set naming.
Flags: needinfo?(vdjeric)
Some other naming possibilities we were throwing around:
* measurementType - too close to "histogram type"
* population - as in "testing or user population", seems too abstract
* extended - too abstract?

Roberto, maybe you have input here?
Flags: needinfo?(rvitillo)
I think i like Robertos suggestion on IRC of "enabledOnRelease" or variations of that (collectOnRelease, ...).
This would be an optional boolean for the Histograms.json definitions and defaults to false.

While it is not technically correct (it will mean that the data will be collected if users explicitly opted in on release and if they didn't opt-out on prerelease), but it seems clear what to expect for collection.
We will usually get no data for those histograms on release and usually get them submitted on pre-release.

Sound reasonable?
how about "releaseChannelCollectionPolicy": "opt-out"/"opt-in"? wordy but we can only use these terms in histograms.json and most histograms won't have this field
Flags: needinfo?(vdjeric)
Vladans suggestion seems best so far, i shortened it a bit to:
"releaseChannelCollection": "opt-out"/"opt-in"
... which still seems to convey the same thing.

The constants on nsITelemetry are now DATASET_RELEASE_CHANNEL_OPTOUT/_OPTIN.
Choosing that over a flag means that its rather trivial to add other datasets if we need to.
E.g. i could imagine that
* for selfsupport we may collect data for local use only (due to privacy or legal concerns)
* we may want to add data that should go to different pings due to privacy concerns, like CertViolation pings
Attachment #8554788 - Attachment is obsolete: true
Attachment #8556461 - Flags: review+
Flags: needinfo?(rvitillo)
Another options is "releaseChannelPrivacy": "opt-out"/"opt-in"

It seems a little misleading that DATASET_RELEASE_CHANNEL_OPTIN returns DATASET_RELEASE_CHANNEL_OPTOUT histograms as well, but it's not important, they're just code constants.
(In reply to Vladan Djeric (:vladan) from comment #12)
> It seems a little misleading that DATASET_RELEASE_CHANNEL_OPTIN returns
> DATASET_RELEASE_CHANNEL_OPTOUT histograms as well, but it's not important,
> they're just code constants.

Hm, if we care we could make it DATASET_RELEASE_CHANNEL_OPTIN_AND_OPTOUT.
The remaining work here would be to not submit or record the opt-in dataset per the settings.
I'll be handling this in other bugs and call this one done.
Whiteboard: [ready]
Attachment #8556461 - Attachment description: Part 1: Add datasets for histograms → Add datasets for histograms
No semantic changes for the telemetry data here yet, so landing this:
https://hg.mozilla.org/integration/mozilla-inbound/rev/72940b27aeaa
And a touch of CLOBBER in https://hg.mozilla.org/integration/mozilla-inbound/rev/877c67c8ced4 since for some incomprehensible reason you failed xpcshell and browser-chrome extension tests without a clobber.
https://hg.mozilla.org/mozilla-central/rev/72940b27aeaa
https://hg.mozilla.org/mozilla-central/rev/877c67c8ced4
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 38
Depends on: 1128405
Depends on: 1131069
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.