Build Spark Job to export CSV summary data for the fennec-dashboard

RESOLVED FIXED

Status

P1
normal
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: gfritzsche, Assigned: Dexter)

Tracking

(Blocks: 2 bugs)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [measurement:client])

(Reporter)

Description

3 years ago
To power the fennec-dashboard, we need to built CSV data exports from the "core" ping, following this format:
https://metrics.services.mozilla.com/fennec-dashboard/data/fennec_weekly_data.csv
https://metrics.services.mozilla.com/fennec-dashboard/data/fennec_monthly_data.csv

This currently contains these columns:
os_version,geo,channel,date,actives,abnormals,new_records,d1,d7,d30,hours,google,yahoo,bing,other

abnormals will be cut, search counts will also not be available (at least initially), so depending on the plans we can drop those or fill them with 0s.
(Reporter)

Comment 1

3 years ago
The exports can go into: s3://net-mozaws-prod-metrics-data/fennec-dashboard

To keep the convention established by the Desktop v4 dashboard update, we should name them:
fennec-v4-weekly.csv
fennec-v4-monthly.csv
(Reporter)

Updated

3 years ago
Priority: -- → P2
Whiteboard: [measurement:client]
(Reporter)

Updated

3 years ago
Depends on: 1253392
(Reporter)

Updated

3 years ago
Priority: P2 → P1
(Assignee)

Updated

2 years ago
Assignee: nobody → alessio.placitelli
(Reporter)

Updated

2 years ago
Blocks: 1251192
(Assignee)

Comment 2

2 years ago
Hamilton, what do you think about storing the Spark script used to generate the CSV data on the dashboard repository?

[1] - https://mail.mozilla.org/pipermail/fhr-dev/2016-March/000884.html
Flags: needinfo?(hulmer)
(Reporter)

Comment 3

2 years ago
Talking to mreid, we decided to let this live in the pipeline repository for now:
* repo: https://github.com/mozilla-services/data-pipeline/
* path: reports/fennec_dashboard 

That way we can easily find it easily in case we make any bigger changes.
In the medium- to longer-term we'd want to move away from this spark job and power this from a longitudinal, client-oriented or other more appropriate derived stream.
Flags: needinfo?(hulmer)
(Reporter)

Comment 4

2 years ago
We will also need to support 3 modes of operation here:
* weekly & monthly for incremental updates of the csv files
* backfill for the whole time period we are looking at

Ideally we'd want to power that from the same notebook just by looking at the submission arguments or the job name.

Roberto, do you have an idea on how we can do that properly?
Can we see the "Spark submission args" there?
Or maybe get the job name and look for a "-weekly"/"-monthly" suffix?
Flags: needinfo?(rvitillo)
(In reply to Georg Fritzsche [:gfritzsche] from comment #4)

> Roberto, do you have an idea on how we can do that properly?
> Can we see the "Spark submission args" there?
> Or maybe get the job name and look for a "-weekly"/"-monthly" suffix?

The job name suffix will work but it's a hack. I filed 1258685.
Flags: needinfo?(rvitillo)
(Assignee)

Comment 6

2 years ago
This is being reviewed on Github: https://github.com/mozilla-services/data-pipeline/pull/195
(Assignee)

Comment 7

2 years ago
Roberto, any suggestion about how to fetch the job name from a Spark notebook?
Flags: needinfo?(rvitillo)
You could try to read the filename of the notebook (e.g. YOURJOB.ipynb) from the current working directory.
Flags: needinfo?(rvitillo)
(Assignee)

Comment 9

2 years ago
I checked that the active users computed by the script in comment 6, for the week starting on the 6th of March ("beta" population) roughly match the ones from this query: https://sql.telemetry.mozilla.org/queries/85/source#table . They do, so we should be producing sane data from the Spark job.
(Assignee)

Updated

2 years ago
Status: NEW → ASSIGNED
(Assignee)

Updated

2 years ago
Blocks: 1259505
(Reporter)

Updated

2 years ago
Blocks: 1260715
(Reporter)

Comment 10

2 years ago
This was merged:
https://github.com/mozilla-services/data-pipeline/commit/ddd255e8b2c5440ad94819fcea88678f894bcce3

Currently we can't power the fennec-dashboard yet due to bug 1257589, we will look into scheduling this for Fennec 46 in bug 1260715.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
(Reporter)

Updated

2 years ago
No longer blocks: 1259505
You need to log in before you can comment on or make changes to this bug.