Closed Bug 1345929 Opened 7 years ago Closed 7 years ago

Create documentation for churn dataset

Categories

(Data Platform and Tools :: Documentation and Knowledge Repo (RTMO), enhancement, P1)

enhancement
Points:
2

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

Attachments

(1 file)

There should be some supporting documentation describing the churn dataset's purpose, creation, and location. This documentation will probably live in the new data docs[1].

[1] https://github.com/harterrt/firefox-data-docs
please add points for effort and roll to a p1 when you start work
Priority: -- → P2
Component: Metrics: Pipeline → Documentation and Knowledge Repo (RTMO)
Product: Cloud Services → Data Platform and Tools
Points: --- → 2
Current sections I currently have written. I want to have a simple, sample query on redash that is usable too.

# Churn dataset
The churn tracks the 7-day churn rate of telemetry profiles. This dataset is generally used for analysing cohort churn across segments and time. 

## Content
The columns are broken down into attributes and metrics. An attribute describes a property about a particular group of profiles, such as the country they originate from or the channel they are part of. The metrics are the measurements across these attributes, such as the the group size or the total usage length.


## Background and Caveats
Each row in this dataset describes a unique segment of users. The size of the dataset grows exponentially with the number of descriptive dimensions, so attributes should be added only when necessary.

To prevent breakage between different minor versions of this dataset, you
should be proactive in aggregating over attributes of concern. For example, if
you only care about the number of clients broken down by their location and locale you would perform the following SQL statement.

```
SELECT geo, locale, sum(n_profiles)
FROM churn
GROUP BY geo, location
```

## Accessing the Data
The longitudinal is available in re:dash.

The data is stored as a parquet table in S3 at the following address. See this cookbook to get started working with the data in Spark.

```
s3://telemetry-parquet/churn/v2
```
Blocks: 1381806
Priority: P2 → P1
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: