Closed
Bug 1345929
Opened 7 years ago
Closed 7 years ago
Create documentation for churn dataset
Categories
(Data Platform and Tools :: Documentation and Knowledge Repo (RTMO), enhancement, P1)
Data Platform and Tools
Documentation and Knowledge Repo (RTMO)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: amiyaguchi, Assigned: amiyaguchi)
References
Details
Attachments
(1 file)
60 bytes,
text/plain
|
Details |
There should be some supporting documentation describing the churn dataset's purpose, creation, and location. This documentation will probably live in the new data docs[1]. [1] https://github.com/harterrt/firefox-data-docs
Comment 1•7 years ago
|
||
please add points for effort and roll to a p1 when you start work
Priority: -- → P2
Updated•7 years ago
|
Component: Metrics: Pipeline → Documentation and Knowledge Repo (RTMO)
Product: Cloud Services → Data Platform and Tools
Updated•7 years ago
|
Points: --- → 2
Assignee | ||
Comment 2•7 years ago
|
||
Current sections I currently have written. I want to have a simple, sample query on redash that is usable too. # Churn dataset The churn tracks the 7-day churn rate of telemetry profiles. This dataset is generally used for analysing cohort churn across segments and time. ## Content The columns are broken down into attributes and metrics. An attribute describes a property about a particular group of profiles, such as the country they originate from or the channel they are part of. The metrics are the measurements across these attributes, such as the the group size or the total usage length. ## Background and Caveats Each row in this dataset describes a unique segment of users. The size of the dataset grows exponentially with the number of descriptive dimensions, so attributes should be added only when necessary. To prevent breakage between different minor versions of this dataset, you should be proactive in aggregating over attributes of concern. For example, if you only care about the number of clients broken down by their location and locale you would perform the following SQL statement. ``` SELECT geo, locale, sum(n_profiles) FROM churn GROUP BY geo, location ``` ## Accessing the Data The longitudinal is available in re:dash. The data is stored as a parquet table in S3 at the following address. See this cookbook to get started working with the data in Spark. ``` s3://telemetry-parquet/churn/v2 ```
Updated•7 years ago
|
Priority: P2 → P1
Assignee | ||
Comment 3•7 years ago
|
||
Assignee | ||
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•