1274008 - In longitudinal, there are probably-incorrect results in settings.user_prefs

Reporter

Description

•

8 years ago

https://sql.telemetry.mozilla.org/queries/386/source This is a probably-incorrect result. In particular it's very unlikely that many people have customized browser.startup.homepage but absolutely nobody has customized browser.startup.page. Similarly, dougt was trying to look for the frequency of privacy.trackingprotection.enabled and that showed zero users either, which is not reasonable. I've verified that my pings in about:telemetry include browser.startup.page customizations in the settings.userPrefs block.

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Flags: needinfo?(rvitillo)

Priority: -- → P2

Roberto Agostino Vitillo (:rvitillo)

Updated

•

8 years ago

Flags: needinfo?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Comment 1

•

8 years ago

Anthony, could you please look into this?

Flags: needinfo?(azhang9)

Anthony Zhang

Comment 2

•

8 years ago

In previous versions, avro-parquet didn't support unions in map values, so the derived streams only kept track of settings that have string values. `browser.startup.page` is generally an integer, and therefore doesn't show up. Since we upgraded it recently, here's a PR that adds support for non-string pref values: https://github.com/mozilla/telemetry-batch-view/pull/72

Flags: needinfo?(azhang9)

Ryan VanderMeulen [:RyanVM]

Comment 3

•

8 years ago

Ryan, can you please take this? It's coming up as a recurring footgun for developers.

Flags: needinfo?(rharter)

Roberto Agostino Vitillo (:rvitillo)

Updated

•

8 years ago

Blocks: 1255755

Ryan Harter [:harter]

Assignee

Updated

•

8 years ago

Assignee: nobody → rharter

Flags: needinfo?(rharter)

Ryan Harter [:harter]

Assignee

Comment 4

•

8 years ago

Sure, let me take a look.

Ryan Harter [:harter]

Assignee

Updated

•

8 years ago

Priority: P2 → P1

Ryan Harter [:harter]

Assignee

Updated

•

8 years ago

Points: --- → 3

Ryan Harter [:harter]

Assignee

Comment 5

•

8 years ago

I got a chance to review this issue and I have a path forward. However, I will have to whitelist each userPref field with a type. Identifying the types will take some analysis. I'll keep this thread updated with my progress, expect an update within a week.

Benjamin Smedberg

Reporter

Comment 6

•

8 years ago

Where will your list be kept? Because as we add/change the list of prefs that are sent, we really should avoid having people update another file somewhere else. Would it be worth the complexity to make an EnvironmentPrefs.yaml file to drive both TelemetryEnvironment collection and validation in longitudinal?

Ryan Harter [:harter]

Assignee

Comment 7

•

8 years ago

> Where will your list be kept? I'll create a struct in the LongitudianlView code[0]. The struct will enumerate a subset of all userPref fields, roughly similar to this default_search_engine_data struct[1]. > Because as we add/change the list of prefs that > are sent, we really should avoid having people update another file somewhere > else. If we want to dedicate some resources to a more elegant solution, we should go through the parquet-avro upgrade blocking this PR [2]. This would allow us to import all fields automagically, without the need for a whitelist. This will probably take some time, but it would save us from some boilerplate code. > Would it be worth the complexity to make an EnvironmentPrefs.yaml file to > drive both TelemetryEnvironment collection and validation in longitudinal? Where would that yaml file be kept? Adding a dependency on an external yaml file makes me uneasy. For example, a developer making changes to that EnvironmentPrefs.yaml file now needs to be aware that they can break the telemetry-batch-view build. [0] https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/Longitudinal.scala#L259 [1] https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/Longitudinal.scala#L243 [2] https://github.com/mozilla/telemetry-batch-view/pull/72

Flags: needinfo?(benjamin)

Benjamin Smedberg

Reporter

Comment 8

•

8 years ago

Presumably in mozilla-central next to Histograms.json and Scalars.yaml. Those are also imported and used by telemetry-batch-view, right?

Flags: needinfo?(benjamin)

Ryan Harter [:harter]

Assignee

Comment 9

•

8 years ago

Yes, you are correct. Let me take a look and see how that would work.

Georg Fritzsche [:gfritzsche]

Comment 10

•

8 years ago

On the client-side, longer-term, i would like to record the user prefs (and most of the environment) in Scalars using the Scalars API. A EnvironmentPrefs.yaml or so would be a good step in that direction, i'd be happy to help with that.

Roberto Agostino Vitillo (:rvitillo)

Comment 11

•

8 years ago

Alternatively we could treat all values as strings, and use casts at query time, while this data moves under Scalars and/or Spark starts using a recent parquet-avro library.

Benjamin Smedberg

Reporter

Comment 12

•

8 years ago

I don't mind storing these as scalars, but I do think that many of them should continue to trigger a subsession split, hence the reason they are in the environment block in the first place.

Georg Fritzsche [:gfritzsche]

Comment 13

•

8 years ago

The plan is to not move anything to a different part of the ping. This is just about having consistent semantics for the data. The first step here would be to use parse_scalars.py [0] (with possibly more restrictive settings) from another script to parse a EnvironmentPrefs.yaml and generate a JS or JSON file from that. 0: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/parse_scalars.py

Ryan Harter [:harter]

Assignee

Comment 14

•

8 years ago

I have an immediate fix available here[0]. Currently, the longitudinal table expects all user pref fields to be strings and discards all non-string values. This change converts numerics and bools to strings which should allow the missing fields to flow through. [0] https://github.com/mozilla/telemetry-batch-view/pull/142

Ryan Harter [:harter]

Assignee

Comment 15

•

8 years ago

This should now be fixed. Let's open a separate bug if we want to pursue moving these measures into the Scalars API.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Georg Fritzsche [:gfritzsche]

Updated

•

8 years ago

Updated

•

6 years ago

Product: Cloud Services → Cloud Services Graveyard

Bugzilla

Quick Search

In longitudinal, there are probably-incorrect results in settings.user_prefs

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

Tracking

(Not tracked)

People

(Reporter: benjamin, Assigned: harter)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Updated

Updated

Comment 4

Updated

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated

Updated