Closed Bug 1251189 Opened 8 years ago Closed 8 years ago

Build Spark Job to export CSV summary data for the fennec-dashboard

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: gfritzsche, Assigned: Dexter)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [measurement:client])

Georg Fritzsche [:gfritzsche]

Reporter

Description

•

8 years ago

To power the fennec-dashboard, we need to built CSV data exports from the "core" ping, following this format:
https://metrics.services.mozilla.com/fennec-dashboard/data/fennec_weekly_data.csv
https://metrics.services.mozilla.com/fennec-dashboard/data/fennec_monthly_data.csv

This currently contains these columns:
os_version,geo,channel,date,actives,abnormals,new_records,d1,d7,d30,hours,google,yahoo,bing,other

abnormals will be cut, search counts will also not be available (at least initially), so depending on the plans we can drop those or fill them with 0s.

Georg Fritzsche [:gfritzsche]

Reporter

Comment 1

•

8 years ago

The exports can go into: s3://net-mozaws-prod-metrics-data/fennec-dashboard

To keep the convention established by the Desktop v4 dashboard update, we should name them:
fennec-v4-weekly.csv
fennec-v4-monthly.csv

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

Priority: -- → P2

Whiteboard: [measurement:client]

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

Depends on: 1253392

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

Priority: P2 → P1

Alessio Placitelli [:Dexter]

Assignee

Updated

•

8 years ago

Assignee: nobody → alessio.placitelli

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

Blocks: 1251192

Alessio Placitelli [:Dexter]

Assignee

Comment 2

•

8 years ago

Hamilton, what do you think about storing the Spark script used to generate the CSV data on the dashboard repository?

[1] - https://mail.mozilla.org/pipermail/fhr-dev/2016-March/000884.html

Flags: needinfo?(hulmer)

Georg Fritzsche [:gfritzsche]

Reporter

Comment 3

•

8 years ago

Talking to mreid, we decided to let this live in the pipeline repository for now:
* repo: https://github.com/mozilla-services/data-pipeline/
* path: reports/fennec_dashboard 

That way we can easily find it easily in case we make any bigger changes.
In the medium- to longer-term we'd want to move away from this spark job and power this from a longitudinal, client-oriented or other more appropriate derived stream.

Flags: needinfo?(hulmer)

Georg Fritzsche [:gfritzsche]

Reporter

Comment 4

•

8 years ago

We will also need to support 3 modes of operation here:
* weekly & monthly for incremental updates of the csv files
* backfill for the whole time period we are looking at

Ideally we'd want to power that from the same notebook just by looking at the submission arguments or the job name.

Roberto, do you have an idea on how we can do that properly?
Can we see the "Spark submission args" there?
Or maybe get the job name and look for a "-weekly"/"-monthly" suffix?

Flags: needinfo?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Comment 5

•

8 years ago

(In reply to Georg Fritzsche [:gfritzsche] from comment #4)

> Roberto, do you have an idea on how we can do that properly?
> Can we see the "Spark submission args" there?
> Or maybe get the job name and look for a "-weekly"/"-monthly" suffix?

The job name suffix will work but it's a hack. I filed 1258685.

Flags: needinfo?(rvitillo)

Alessio Placitelli [:Dexter]

Assignee

Comment 6

•

8 years ago

This is being reviewed on Github: https://github.com/mozilla-services/data-pipeline/pull/195

Alessio Placitelli [:Dexter]

Assignee

Comment 7

•

8 years ago

Roberto, any suggestion about how to fetch the job name from a Spark notebook?

Flags: needinfo?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Comment 8

•

8 years ago

You could try to read the filename of the notebook (e.g. YOURJOB.ipynb) from the current working directory.

Flags: needinfo?(rvitillo)

Alessio Placitelli [:Dexter]

Assignee

Comment 9

•

8 years ago

I checked that the active users computed by the script in comment 6, for the week starting on the 6th of March ("beta" population) roughly match the ones from this query: https://sql.telemetry.mozilla.org/queries/85/source#table . They do, so we should be producing sane data from the Spark job.

Alessio Placitelli [:Dexter]

Assignee

Updated

•

8 years ago

Status: NEW → ASSIGNED

Alessio Placitelli [:Dexter]

Assignee

Updated

•

8 years ago

Blocks: 1259505

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

Blocks: 1260715

Georg Fritzsche [:gfritzsche]

Reporter

Comment 10

•

8 years ago

This was merged:
https://github.com/mozilla-services/data-pipeline/commit/ddd255e8b2c5440ad94819fcea88678f894bcce3

Currently we can't power the fennec-dashboard yet due to bug 1257589, we will look into scheduling this for Fennec 46 in bug 1260715.

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Georg Fritzsche [:gfritzsche]

Reporter

Updated

•

8 years ago

No longer blocks: 1259505

BMO Automation

Updated

•

6 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Build Spark Job to export CSV summary data for the fennec-dashboard

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

Tracking

(Not tracked)

People

(Reporter: gfritzsche, Assigned: Dexter)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [measurement:client])

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Updated

Updated

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Updated

Updated

Comment 10

Updated

Updated