Closed
Bug 1274008
Opened 8 years ago
Closed 8 years ago
In longitudinal, there are probably-incorrect results in settings.user_prefs
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: benjamin, Assigned: harter)
References
Details
https://sql.telemetry.mozilla.org/queries/386/source
This is a probably-incorrect result. In particular it's very unlikely that many people have customized browser.startup.homepage but absolutely nobody has customized browser.startup.page.
Similarly, dougt was trying to look for the frequency of privacy.trackingprotection.enabled and that showed zero users either, which is not reasonable.
I've verified that my pings in about:telemetry include browser.startup.page customizations in the settings.userPrefs block.
Updated•8 years ago
|
Flags: needinfo?(rvitillo)
Priority: -- → P2
Updated•8 years ago
|
Flags: needinfo?(rvitillo)
Comment 2•8 years ago
|
||
In previous versions, avro-parquet didn't support unions in map values, so the derived streams only kept track of settings that have string values. `browser.startup.page` is generally an integer, and therefore doesn't show up.
Since we upgraded it recently, here's a PR that adds support for non-string pref values: https://github.com/mozilla/telemetry-batch-view/pull/72
Flags: needinfo?(azhang9)
Comment 3•8 years ago
|
||
Ryan, can you please take this? It's coming up as a recurring footgun for developers.
Flags: needinfo?(rharter)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → rharter
Flags: needinfo?(rharter)
Assignee | ||
Comment 4•8 years ago
|
||
Sure, let me take a look.
Assignee | ||
Updated•8 years ago
|
Priority: P2 → P1
Assignee | ||
Updated•8 years ago
|
Points: --- → 3
Assignee | ||
Comment 5•8 years ago
|
||
I got a chance to review this issue and I have a path forward. However, I will have to whitelist each userPref field with a type. Identifying the types will take some analysis. I'll keep this thread updated with my progress, expect an update within a week.
Reporter | ||
Comment 6•8 years ago
|
||
Where will your list be kept? Because as we add/change the list of prefs that are sent, we really should avoid having people update another file somewhere else. Would it be worth the complexity to make an EnvironmentPrefs.yaml file to drive both TelemetryEnvironment collection and validation in longitudinal?
Assignee | ||
Comment 7•8 years ago
|
||
> Where will your list be kept?
I'll create a struct in the LongitudianlView code[0]. The struct will
enumerate a subset of all userPref fields, roughly similar to this
default_search_engine_data struct[1].
> Because as we add/change the list of prefs that
> are sent, we really should avoid having people update another file somewhere
> else.
If we want to dedicate some resources to a more elegant solution, we should go
through the parquet-avro upgrade blocking this PR [2]. This would allow us to
import all fields automagically, without the need for a whitelist. This will
probably take some time, but it would save us from some boilerplate code.
> Would it be worth the complexity to make an EnvironmentPrefs.yaml file to
> drive both TelemetryEnvironment collection and validation in longitudinal?
Where would that yaml file be kept? Adding a dependency on an external yaml
file makes me uneasy. For example, a developer making changes to that
EnvironmentPrefs.yaml file now needs to be aware that they can break the
telemetry-batch-view build.
[0] https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/Longitudinal.scala#L259
[1] https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/Longitudinal.scala#L243
[2] https://github.com/mozilla/telemetry-batch-view/pull/72
Flags: needinfo?(benjamin)
Reporter | ||
Comment 8•8 years ago
|
||
Presumably in mozilla-central next to Histograms.json and Scalars.yaml. Those are also imported and used by telemetry-batch-view, right?
Flags: needinfo?(benjamin)
Assignee | ||
Comment 9•8 years ago
|
||
Yes, you are correct. Let me take a look and see how that would work.
Comment 10•8 years ago
|
||
On the client-side, longer-term, i would like to record the user prefs (and most of the environment) in Scalars using the Scalars API.
A EnvironmentPrefs.yaml or so would be a good step in that direction, i'd be happy to help with that.
Comment 11•8 years ago
|
||
Alternatively we could treat all values as strings, and use casts at query time, while this data moves under Scalars and/or Spark starts using a recent parquet-avro library.
Reporter | ||
Comment 12•8 years ago
|
||
I don't mind storing these as scalars, but I do think that many of them should continue to trigger a subsession split, hence the reason they are in the environment block in the first place.
Comment 13•8 years ago
|
||
The plan is to not move anything to a different part of the ping. This is just about having consistent semantics for the data.
The first step here would be to use parse_scalars.py [0] (with possibly more restrictive settings) from another script to parse a EnvironmentPrefs.yaml and generate a JS or JSON file from that.
0: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/parse_scalars.py
Assignee | ||
Comment 14•8 years ago
|
||
I have an immediate fix available here[0]. Currently, the longitudinal table expects all user pref fields to be strings and discards all non-string values. This change converts numerics and bools to strings which should allow the missing fields to flow through.
[0] https://github.com/mozilla/telemetry-batch-view/pull/142
Assignee | ||
Comment 15•8 years ago
|
||
This should now be fixed. Let's open a separate bug if we want to pursue moving these measures into the Scalars API.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•