Open Bug 1336977 Opened 7 years ago Updated 2 years ago

Make environment fields scalars

Categories

(Toolkit :: Telemetry, task, P4)

task

Tracking

()

People

(Reporter: rvitillo, Unassigned)

References

Details

(Whiteboard: [measurement:client])

User Story

Currently several ETL jobs have to be changed when a new field to the environment section is added. Even though there is a (sadly incomplete) JSON schema describing the environment section which could be used to automate this process, the fact that fields are nested in a non uniform way means that one can't easily write generic code, e.g. for alerting or aggregation, that works for any kind of scalar measurement. Similar considerations apply to simpleMeasurements.

Since ultimately the environment section contains mostly scalar values, it would be convenient to store  those attributes within the scalar section of the ping. That would make it trivial to create generic tooling capable of adapting to schema changes, just like we do with histograms.
      No description provided.
User Story: (updated)
User Story: (updated)
There's some prior art in bug 1278920
(Commenting on User Story)
> Currently several ETL jobs have to be changed when a new field to the
> environment section is added. Even though there is a (sadly incomplete) JSON
> schema describing the environment section which could be used to automate
> this process, the fact that fields are nested in a non uniform way means
> that one can't easily write generic code, e.g. for alerting or aggregation,
> that works for any kind of scalar measurement. Similar considerations apply
> to simpleMeasurements.

Another aspect here is discoverability. Once we document the data in a structured format we could integrate it into tooling like the "data explorer".

> Since ultimately the environment section contains mostly scalar values, it
> would be convenient to store  those attributes within the scalar section of
> the ping. That would make it trivial to create generic tooling capable of
> adapting to schema changes, just like we do with histograms.

This is a medium to long-term goal we have, mostly blocked on finding the time to prioritize it.
I want to do this at some point for all the "scalar" data in the main ping, as part of the "main ping cleanup".

To keep things simpler, i'll make this bug about the "environment" data specifically.

Questions that need to be solved before:
(1) How to deal with environment parts that are not scalars? (addons etc.)
(2) How to deal with the existing environment data format and its consumers?

Lets e.g. assume we track environment data in a separate file, Environment.yaml.
Then we can solve (1) by allowing for special "object" values or so?
Or we could see if we can flatten all of them into keyed scalars.
User prefs are probably best tracked in a separate file (see bug 1330856).

For (2), would we try to keep the existing format, building environment scalars into a nested JSON object? (requires some awkward tree walking in the jobs)
Or would we serialize environment data into the flat scalar format we use for payload/processes/*/scalars. (would require a lot of job updates.
Summary: Move environment fields and simpleMeasurements to the scalar section → Make environment fields scalars
Priority: -- → P4
Whiteboard: [measurement:client]
Type: defect → task
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.