Closed
Bug 1251189
Opened 8 years ago
Closed 8 years ago
Build Spark Job to export CSV summary data for the fennec-dashboard
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gfritzsche, Assigned: Dexter)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [measurement:client])
To power the fennec-dashboard, we need to built CSV data exports from the "core" ping, following this format: https://metrics.services.mozilla.com/fennec-dashboard/data/fennec_weekly_data.csv https://metrics.services.mozilla.com/fennec-dashboard/data/fennec_monthly_data.csv This currently contains these columns: os_version,geo,channel,date,actives,abnormals,new_records,d1,d7,d30,hours,google,yahoo,bing,other abnormals will be cut, search counts will also not be available (at least initially), so depending on the plans we can drop those or fill them with 0s.
Reporter | ||
Comment 1•8 years ago
|
||
The exports can go into: s3://net-mozaws-prod-metrics-data/fennec-dashboard To keep the convention established by the Desktop v4 dashboard update, we should name them: fennec-v4-weekly.csv fennec-v4-monthly.csv
Reporter | ||
Updated•8 years ago
|
Priority: -- → P2
Whiteboard: [measurement:client]
Reporter | ||
Updated•8 years ago
|
Priority: P2 → P1
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → alessio.placitelli
Assignee | ||
Comment 2•8 years ago
|
||
Hamilton, what do you think about storing the Spark script used to generate the CSV data on the dashboard repository? [1] - https://mail.mozilla.org/pipermail/fhr-dev/2016-March/000884.html
Flags: needinfo?(hulmer)
Reporter | ||
Comment 3•8 years ago
|
||
Talking to mreid, we decided to let this live in the pipeline repository for now: * repo: https://github.com/mozilla-services/data-pipeline/ * path: reports/fennec_dashboard That way we can easily find it easily in case we make any bigger changes. In the medium- to longer-term we'd want to move away from this spark job and power this from a longitudinal, client-oriented or other more appropriate derived stream.
Flags: needinfo?(hulmer)
Reporter | ||
Comment 4•8 years ago
|
||
We will also need to support 3 modes of operation here: * weekly & monthly for incremental updates of the csv files * backfill for the whole time period we are looking at Ideally we'd want to power that from the same notebook just by looking at the submission arguments or the job name. Roberto, do you have an idea on how we can do that properly? Can we see the "Spark submission args" there? Or maybe get the job name and look for a "-weekly"/"-monthly" suffix?
Flags: needinfo?(rvitillo)
Comment 5•8 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #4) > Roberto, do you have an idea on how we can do that properly? > Can we see the "Spark submission args" there? > Or maybe get the job name and look for a "-weekly"/"-monthly" suffix? The job name suffix will work but it's a hack. I filed 1258685.
Flags: needinfo?(rvitillo)
Assignee | ||
Comment 6•8 years ago
|
||
This is being reviewed on Github: https://github.com/mozilla-services/data-pipeline/pull/195
Assignee | ||
Comment 7•8 years ago
|
||
Roberto, any suggestion about how to fetch the job name from a Spark notebook?
Flags: needinfo?(rvitillo)
Comment 8•8 years ago
|
||
You could try to read the filename of the notebook (e.g. YOURJOB.ipynb) from the current working directory.
Flags: needinfo?(rvitillo)
Assignee | ||
Comment 9•8 years ago
|
||
I checked that the active users computed by the script in comment 6, for the week starting on the 6th of March ("beta" population) roughly match the ones from this query: https://sql.telemetry.mozilla.org/queries/85/source#table . They do, so we should be producing sane data from the Spark job.
Assignee | ||
Updated•8 years ago
|
Status: NEW → ASSIGNED
Reporter | ||
Comment 10•8 years ago
|
||
This was merged: https://github.com/mozilla-services/data-pipeline/commit/ddd255e8b2c5440ad94819fcea88678f894bcce3 Currently we can't power the fennec-dashboard yet due to bug 1257589, we will look into scheduling this for Fennec 46 in bug 1260715.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•