Closed Bug 1353105 Opened 8 years ago Closed 7 years ago

Automatically Add All Scalars to main_summary

Categories

(Data Platform and Tools :: General, enhancement, P1)

enhancement
Points:
3

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: frank, Assigned: frank)

References

Details

To support heavy user analysis, they need all engagement scalars. We might as well add *all* scalars automatically, to reduce data engineering workload down the line.
Scalars are per-process, and per-namespace. Current scalars in main_summary are just the scalar name: e.g. "max_concurrent_tab_count". I propose making them like longitudinal, where the name is: <scalar_prefix>_<process>_<namespace>_<name>, e.g. "scalars_parent_browser_engagement_max_concurrent_tab_count". Downside is the previous names won't be available, so we'd probably have to version main_summary so that people can still access them in the historical data (I'm assuming backfill is out of the question). Thoughts, Mark?
Flags: needinfo?(mreid)
I think this is a good idea. We'll likely want to keep adding all the new scalars as they arrive, so making the process as easy as possible makes sense. When we version to v4, we could rewrite the v3 main_summary data into v4 using the new column names (rather than backfilling from the raw data, which is quite time-consuming), then users of main_summary would not have to query both tables.
Flags: needinfo?(mreid)
Component: Metrics: Pipeline → Datasets: Main Summary
Product: Cloud Services → Data Platform and Tools
Blocks: 1359041
Will be backfilled with bug 1362161
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Component: Datasets: Main Summary → General
You need to log in before you can comment on or make changes to this bug.