1323598 - Add additional fields for search retention to churn

Reporter

Description

•

9 years ago

Hi, There is a report in tableau dataviz for cohort retention data - https://dataviz.mozilla.org/views/FirefoxDesktopCohortAnalysis-UT_0/ByCountry The BizDev team would like to leverage the report for additional filters - a) distribution_id b) default_search_engine Please let me know any other information needed for this request intake. Thanks !

Roberto Agostino Vitillo (:rvitillo)

Updated

•

9 years ago

Priority: -- → P3

Mark Reid [:mreid]

Updated

•

9 years ago

Points: --- → 2

Shraddha Patil [:Shraddha Patil]

Reporter

Comment 1

•

9 years ago

Hi All, Adding more context on the bug needs. The BD team(Joanne/Amit) is looking on having search retention data as a priority for 1Q2017. Search data is also important data part from business end hence we would like to know next steps to get this further (meetings with concerned team) Thanks for helping out

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Assignee: nobody → amiyaguchi

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Priority: P3 → P2

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Blocks: 1337044

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Priority: P2 → P1

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 2

•

8 years ago

Below is an updated list of additional filters that will be added to the churn/retention dataset located at [1]. a) distribution_id b) default_search_engine c) locale [1] https://github.com/mozilla/mozilla-reports/blob/master/etl/churn.kp/orig_src/Churn.ipynb

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 3

•

8 years ago

Retention data currently lives in the `telemetry-parquet` bucket under `churn/v2` [1]. The data is stored as parquet and is partitioned by `week_start`, the start of the retention period. Scripts that have implicit assumptions about the granularity of the data may be affected by these changes. Scripts should be explicit about aggregating over the necessary set of columns for analysis/visualizations, like below: > SELECT channel, distribution_id, SUM(n_profiles) > FROM churn > GROUP BY channel, distribution_id; The staging location for this data will be located in a private bucket at [2]. I plan to fill this location with 1-3 months worth of data within the next week. [1] s3://telemetry-parquet/churn/v2/ [2] s3://net-mozaws-prod-us-west-2-pipeline-analysis/amiyaguchi/churn-staging/

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 4

•

8 years ago

I've updated the churn notebook to include the requested fields, among some other changes. I've verified that the updated dataset is equivalent to the older dataset through this notebook [1]. I will be backfilling the job back a few months, next week. On a tangential note, it would also be nice to have unit tests that can automatically verify changes between versions of the churn notebook, but it is not a blocking issue. [1] https://gist.github.com/acmiyaguchi/f21a92b2980e177ab7fc4468c0c55074

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 5

•

8 years ago

I have updated the private bucket location [1] and backfilled it with data since 01-01-2017. You can access the data through redash [2] for exploration. [1] s3://net-mozaws-prod-us-west-2-pipeline-analysis/amiyaguchi/churn_testing/ [2] https://sql.telemetry.mozilla.org/queries/3382/source

Shraddha Patil [:Shraddha Patil]

Reporter

Comment 6

•

8 years ago

(In reply to Anthony Miyaguchi [:amiyaguchi] from comment #5) > I have updated the private bucket location [1] and backfilled it with data > since 01-01-2017. You can access the data through redash [2] for exploration. > > [1] > s3://net-mozaws-prod-us-west-2-pipeline-analysis/amiyaguchi/churn_testing/ > [2] https://sql.telemetry.mozilla.org/queries/3382/source Thanks Anthony!

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 7

•

8 years ago

Attached file Bug 1337037/1323598 - Add additional attributes to churn #33 — Details

Heather

Comment 8

•

8 years ago

(In reply to Anthony Miyaguchi [:amiyaguchi] from comment #5) > I have updated the private bucket location [1] and backfilled it with data > since 01-01-2017. You can access the data through redash [2] for exploration. > > [1] > s3://net-mozaws-prod-us-west-2-pipeline-analysis/amiyaguchi/churn_testing/ > [2] https://sql.telemetry.mozilla.org/queries/3382/source Hey Anthony. Thanks for the great work here. Regarding the redash report, Joanne has asked that this not be made available via redash as this data could expose us to a disclosure of usage data comparing one public partner over another, which is something we need to be extremely careful with. Could you please remove the redash report? We will have this data available via Tableau under credentials that are trackable and is the current location for Desktop Retention today. Questions, please let me know. Thanks

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 9

•

8 years ago

(In reply to Heather from comment #8) > Regarding the redash report, Joanne has > asked that this not be made available via redash as this data could expose > us to a disclosure of usage data comparing one public partner over another, > which is something we need to be extremely careful with. There are other consumers of this dataset. Are the values of concern contained within distribution_id and/or default_search_engine? If so, everything in this processed dataset is accessible to users with Mozilla credentials via the `main_summary`. Or is this more of an issue of intent rather than accessibility of data within our ecosystem? I am curious about the nature of this issue, since it might require either a fork of the dataset or extra processing to make other fields available to other users.

Flags: needinfo?(hcrince)

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Depends on: 1345555

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 10

•

8 years ago

The data is now available from 20160306 in s3://telemetry-parquet/churn/v2.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Blocks: 1381806

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Flags: needinfo?(hcrince)

Summary: Cohort Retention data with additional filters → Add additional fields for search retention to churn

BMO Automation

Updated

•

7 years ago

Product: Cloud Services → Cloud Services Graveyard