Closed
Bug 1286226
Opened 8 years ago
Closed 8 years ago
Backfill: Update derived datasets and reports
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mreid, Unassigned)
References
Details
Once the raw data has been updated per bug 1286220, we need to recompute any derived datasets and reports.
This includes:
Derived Datasets:
- main_summary v2 and v3
- longitudinal
- client counts
- Telemetry aggregates
- redshift data for Firefox Desktop Report
- Fennec dashboard
- churn dataset
- others
Reports:
- Firefox desktop report (v4-weekly)
- Retention csv
- other a.t.m.o reports
- sql.t.m.o scheduled queries
- others
Reporter | ||
Updated•8 years ago
|
Updated•8 years ago
|
Points: --- → 3
Priority: -- → P2
Comment 1•8 years ago
|
||
NIing people that can help with this. Most of this should be in airflow now and we should be able to launch 600 instances of the c3.4xlarge type, so parallelizing across multiple data sets should not be a problem.
Flags: needinfo?(rvitillo)
Flags: needinfo?(mreid)
Flags: needinfo?(mdoglio)
Comment 2•8 years ago
|
||
:whd do you have a time range for this?
Comment 3•8 years ago
|
||
Telemetry aggregates (t.m.o.) don't need to be back-filled as it's OK to have few days with less data considering the use-cases.
The longitudinal dataset has been back-filled and the client_count one will reflect reality once main_summary is back-filled.
Re:dash dashboards based on the longitudinal and client_count dataset are regenerated automatically every week so no action is required there as well.
Flags: needinfo?(rvitillo)
Reporter | ||
Comment 4•8 years ago
|
||
(In reply to Mauro Doglio [:mdoglio] from comment #2)
> :whd do you have a time range for this?
Affected dates include July 4 to July 9
Flags: needinfo?(mreid)
Reporter | ||
Comment 5•8 years ago
|
||
main_summary v2 backfill is running presently, and should complete in about 3 hours. main_summary v3 will be handled in bug 1275889.
Reporter | ||
Comment 6•8 years ago
|
||
main_summary v2 has been backfilled for July 4, 6, 7, 8 and 9. July 5th appears to have some data errors (bad JSON values in the histograms / keyedHistograms fields).
Reporter | ||
Comment 8•8 years ago
|
||
The churn dataset has been backfilled.
Reporter | ||
Comment 9•8 years ago
|
||
(In reply to Mark Reid [:mreid] from comment #6)
> main_summary v2 has been backfilled for July 4, 6, 7, 8 and 9. July 5th
> appears to have some data errors (bad JSON values in the histograms /
> keyedHistograms fields).
The data for July 5th was also added a few days ago.
Reporter | ||
Comment 10•8 years ago
|
||
The retention CSV has also been updated.
Reporter | ||
Comment 11•8 years ago
|
||
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #3)
> The longitudinal dataset has been back-filled and the client_count one will
> reflect reality once main_summary is back-filled.
main_summary has been backfilled, does something need to be triggered to update client_count?
Flags: needinfo?(rvitillo)
Comment 12•8 years ago
|
||
(In reply to Mark Reid [:mreid] from comment #11)
> (In reply to Roberto Agostino Vitillo (:rvitillo) from comment #3)
> > The longitudinal dataset has been back-filled and the client_count one will
> > reflect reality once main_summary is back-filled.
> main_summary has been backfilled, does something need to be triggered to
> update client_count?
No
Flags: needinfo?(rvitillo)
Reporter | ||
Comment 13•8 years ago
|
||
This is finished, as far as I know.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•