Closed
Bug 1323598
Opened 8 years ago
Closed 8 years ago
Add additional fields for search retention to churn
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Shraddha, Assigned: amiyaguchi)
References
Details
Attachments
(1 file)
Hi,
There is a report in tableau dataviz for cohort retention data -
https://dataviz.mozilla.org/views/FirefoxDesktopCohortAnalysis-UT_0/ByCountry
The BizDev team would like to leverage the report for additional filters -
a) distribution_id
b) default_search_engine
Please let me know any other information needed for this request intake. Thanks !
Updated•8 years ago
|
Priority: -- → P3
Updated•8 years ago
|
Points: --- → 2
Reporter | ||
Comment 1•8 years ago
|
||
Hi All,
Adding more context on the bug needs. The BD team(Joanne/Amit) is looking on having search retention data as a priority for 1Q2017. Search data is also important data part from business end hence we would like to know next steps to get this further (meetings with concerned team)
Thanks for helping out
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → amiyaguchi
Assignee | ||
Updated•8 years ago
|
Priority: P3 → P2
Assignee | ||
Updated•8 years ago
|
Priority: P2 → P1
Assignee | ||
Comment 2•8 years ago
|
||
Below is an updated list of additional filters that will be added to the churn/retention dataset located at [1].
a) distribution_id
b) default_search_engine
c) locale
[1] https://github.com/mozilla/mozilla-reports/blob/master/etl/churn.kp/orig_src/Churn.ipynb
Assignee | ||
Comment 3•8 years ago
|
||
Retention data currently lives in the `telemetry-parquet` bucket under `churn/v2` [1]. The data is stored as parquet and is partitioned by `week_start`, the start of the retention period.
Scripts that have implicit assumptions about the granularity of the data may be affected by these changes. Scripts should be explicit about aggregating over the necessary set of columns for analysis/visualizations, like below:
> SELECT channel, distribution_id, SUM(n_profiles)
> FROM churn
> GROUP BY channel, distribution_id;
The staging location for this data will be located in a private bucket at [2]. I plan to fill this location with 1-3 months worth of data within the next week.
[1] s3://telemetry-parquet/churn/v2/
[2] s3://net-mozaws-prod-us-west-2-pipeline-analysis/amiyaguchi/churn-staging/
Assignee | ||
Comment 4•8 years ago
|
||
I've updated the churn notebook to include the requested fields, among some other changes. I've verified that the updated dataset is equivalent to the older dataset through this notebook [1]. I will be backfilling the job back a few months, next week.
On a tangential note, it would also be nice to have unit tests that can automatically verify changes between versions of the churn notebook, but it is not a blocking issue.
[1] https://gist.github.com/acmiyaguchi/f21a92b2980e177ab7fc4468c0c55074
Assignee | ||
Comment 5•8 years ago
|
||
I have updated the private bucket location [1] and backfilled it with data since 01-01-2017. You can access the data through redash [2] for exploration.
[1] s3://net-mozaws-prod-us-west-2-pipeline-analysis/amiyaguchi/churn_testing/
[2] https://sql.telemetry.mozilla.org/queries/3382/source
Reporter | ||
Comment 6•8 years ago
|
||
(In reply to Anthony Miyaguchi [:amiyaguchi] from comment #5)
> I have updated the private bucket location [1] and backfilled it with data
> since 01-01-2017. You can access the data through redash [2] for exploration.
>
> [1]
> s3://net-mozaws-prod-us-west-2-pipeline-analysis/amiyaguchi/churn_testing/
> [2] https://sql.telemetry.mozilla.org/queries/3382/source
Thanks Anthony!
Assignee | ||
Comment 7•8 years ago
|
||
(In reply to Anthony Miyaguchi [:amiyaguchi] from comment #5)
> I have updated the private bucket location [1] and backfilled it with data
> since 01-01-2017. You can access the data through redash [2] for exploration.
>
> [1]
> s3://net-mozaws-prod-us-west-2-pipeline-analysis/amiyaguchi/churn_testing/
> [2] https://sql.telemetry.mozilla.org/queries/3382/source
Hey Anthony.
Thanks for the great work here. Regarding the redash report, Joanne has asked that this not be made available via redash as this data could expose us to a disclosure of usage data comparing one public partner over another, which is something we need to be extremely careful with.
Could you please remove the redash report?
We will have this data available via Tableau under credentials that are trackable and is the current location for Desktop Retention today.
Questions, please let me know.
Thanks
Assignee | ||
Comment 9•8 years ago
|
||
(In reply to Heather from comment #8)
> Regarding the redash report, Joanne has
> asked that this not be made available via redash as this data could expose
> us to a disclosure of usage data comparing one public partner over another,
> which is something we need to be extremely careful with.
There are other consumers of this dataset.
Are the values of concern contained within distribution_id and/or default_search_engine? If so, everything in this processed dataset is accessible to users with Mozilla credentials via the `main_summary`. Or is this more of an issue of intent rather than accessibility of data within our ecosystem?
I am curious about the nature of this issue, since it might require either a fork of the dataset or extra processing to make other fields available to other users.
Flags: needinfo?(hcrince)
Assignee | ||
Comment 10•8 years ago
|
||
The data is now available from 20160306 in s3://telemetry-parquet/churn/v2.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(hcrince)
Summary: Cohort Retention data with additional filters → Add additional fields for search retention to churn
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•