Closed Bug 1329844 Opened 7 years ago Closed 7 years ago

Productionize Topline report

Categories

(Data Platform and Tools :: General, defect, P1)

defect
Points:
3

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

The Topline (executive) report is now taking advantage of the main_summary for generating aggregates.

This involves scheduling an airflow job for the data set, and uploading the results of the job to the production location.
Assignee: nobody → amiyaguchi
Blocks: 1329842
Points: --- → 1
Priority: -- → P1
Blocks: 1309574
No longer blocks: 1329842
Depends on: 1329842
Priority: P1 → P2
Depends on: 1357875
Depends on: 1357877
Points: 1 → 3
Component: Metrics: Pipeline → Datasets: General
Priority: P2 → P1
Product: Cloud Services → Data Platform and Tools
While it's still fresh in my mind, here are the steps we discussed:

- Deploy the topline job to airflow to generate weekly and monthly output in Parquet form. Backfill as far as makes sense.
- Write code to output a CSV view of "all time" based on parquet data with a new naming standard ("topline-weekly.csv" and "topline-monthly.csv" seem reasonable).
- Update Diagnostic Dashboard[1] to display new csv data.
- Backfill parquet data from historic CSV files[2], dropping columns that are no longer being generated (such as five-of-seven, inactives).
- Compare new and old datasets for consistency via Diagnostic Dashboard.
- Update Firefox Dashboard[3] to use new naming standard.
- Run old and new code in parallel for some period of time in case there are problems.
- Stop old code.
- Celebrate!

[1] https://github.com/mozilla/diagnostic-data-viewer
[2] Using data from s3://telemetry-private-analysis-2/executive-report-<period>/data/executive_report.<period>.yyyymmdd.csv
[3] https://github.com/mozilla/firefox-dashboard
Depends on: 1359193
The backfilling process and dashboard output looks pretty consistent. [1] There is a little bit of noise due to floating point arithmetic, but this ends up being a total of 12 minutes throughout the whole dataset. 

I'm going to go ahead and copy over the historical backfill to the primary location.

[1] https://gist.github.com/acmiyaguchi/a8f18830a4d8ba3fbae0790ba4503658
Blocks: 1320702
Depends on: 1375725
Depends on: 1377730
Depends on: 1378879
Blocks: 1378977
Depends on: 1379614
Depends on: 1380050
This is done! The Firefox Dashboard[1] is now using the topline data source. The previous data "v4-weekly.csv" and "v4-monthly.csv" is still available for comparison purposes, but will likely disappear in the next few months.

[1] https://metrics.services.mozilla.com/firefox-dashboard/
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Component: Datasets: General → General
You need to log in before you can comment on or make changes to this bug.