Backfill: Update derived datasets and reports

RESOLVED FIXED

Status

P2
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: mreid, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

2 years ago
Once the raw data has been updated per bug 1286220, we need to recompute any derived datasets and reports.

This includes:

Derived Datasets:
- main_summary v2 and v3
- longitudinal
- client counts
- Telemetry aggregates
- redshift data for Firefox Desktop Report
- Fennec dashboard
- churn dataset
- others

Reports:
- Firefox desktop report (v4-weekly)
- Retention csv
- other a.t.m.o reports
- sql.t.m.o scheduled queries
- others
(Reporter)

Updated

2 years ago
Blocks: 1285621
Depends on: 1286220
Depends on: 1286227

Updated

2 years ago
Points: --- → 3
Priority: -- → P2

Comment 1

2 years ago
NIing people that can help with this. Most of this should be in airflow now and we should be able to launch 600 instances of the c3.4xlarge type, so parallelizing across multiple data sets should not be a problem.
Flags: needinfo?(rvitillo)
Flags: needinfo?(mreid)
Flags: needinfo?(mdoglio)
:whd do you have a time range for this?
Telemetry aggregates (t.m.o.) don't need to be back-filled as it's OK to have few days with less data considering the use-cases.

The longitudinal dataset has been back-filled and the client_count one will reflect reality once main_summary is back-filled. 

Re:dash dashboards based on the longitudinal and client_count dataset are regenerated automatically every week so no action is required there as well.
Flags: needinfo?(rvitillo)
(Reporter)

Comment 4

2 years ago
(In reply to Mauro Doglio [:mdoglio] from comment #2)
> :whd do you have a time range for this?

Affected dates include July 4 to July 9
Flags: needinfo?(mreid)
(Reporter)

Updated

2 years ago
Depends on: 1275889
(Reporter)

Comment 5

2 years ago
main_summary v2 backfill is running presently, and should complete in about 3 hours. main_summary v3 will be handled in bug 1275889.
(Reporter)

Comment 6

2 years ago
main_summary v2 has been backfilled for July 4, 6, 7, 8 and 9. July 5th appears to have some data errors (bad JSON values in the histograms / keyedHistograms fields).
(Reporter)

Updated

2 years ago
Depends on: 1287585
crash_aggregates dataset backfill done.
Flags: needinfo?(mdoglio)
(Reporter)

Comment 8

2 years ago
The churn dataset has been backfilled.
(Reporter)

Comment 9

2 years ago
(In reply to Mark Reid [:mreid] from comment #6)
> main_summary v2 has been backfilled for July 4, 6, 7, 8 and 9. July 5th
> appears to have some data errors (bad JSON values in the histograms /
> keyedHistograms fields).

The data for July 5th was also added a few days ago.
(Reporter)

Comment 10

2 years ago
The retention CSV has also been updated.
(Reporter)

Comment 11

2 years ago
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #3)
> The longitudinal dataset has been back-filled and the client_count one will
> reflect reality once main_summary is back-filled. 
main_summary has been backfilled, does something need to be triggered to update client_count?
Flags: needinfo?(rvitillo)
(Reporter)

Updated

2 years ago
Depends on: 1290458
(In reply to Mark Reid [:mreid] from comment #11)
> (In reply to Roberto Agostino Vitillo (:rvitillo) from comment #3)
> > The longitudinal dataset has been back-filled and the client_count one will
> > reflect reality once main_summary is back-filled. 
> main_summary has been backfilled, does something need to be triggered to
> update client_count?

No
Flags: needinfo?(rvitillo)
(Reporter)

Updated

2 years ago
Depends on: 1290540
(Reporter)

Updated

2 years ago
Depends on: 1299153
(Reporter)

Comment 13

2 years ago
This is finished, as far as I know.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.