Closed Bug 1345929 Opened 7 years ago Closed 7 years ago

Create documentation for churn dataset

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

Attachments

(1 file)

Bug 1345929 - Add documentation for churn dataset #66 7 years ago Anthony Miyaguchi [:amiyaguchi] 60 bytes, text/plain		Details

Anthony Miyaguchi [:amiyaguchi]

Assignee

Description

•

7 years ago

There should be some supporting documentation describing the churn dataset's purpose, creation, and location. This documentation will probably live in the new data docs[1].

[1] https://github.com/harterrt/firefox-data-docs

Thomas Huelbert

Comment 1

•

7 years ago

please add points for effort and roll to a p1 when you start work

Priority: -- → P2

Thomas Huelbert

Updated

•

7 years ago

Component: Metrics: Pipeline → Documentation and Knowledge Repo (RTMO)

Product: Cloud Services → Data Platform and Tools

Ryan Harter [:harter]

Updated

•

7 years ago

Points: --- → 2

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 2

•

7 years ago

Current sections I currently have written. I want to have a simple, sample query on redash that is usable too.

# Churn dataset
The churn tracks the 7-day churn rate of telemetry profiles. This dataset is generally used for analysing cohort churn across segments and time. 

## Content
The columns are broken down into attributes and metrics. An attribute describes a property about a particular group of profiles, such as the country they originate from or the channel they are part of. The metrics are the measurements across these attributes, such as the the group size or the total usage length.


## Background and Caveats
Each row in this dataset describes a unique segment of users. The size of the dataset grows exponentially with the number of descriptive dimensions, so attributes should be added only when necessary.

To prevent breakage between different minor versions of this dataset, you
should be proactive in aggregating over attributes of concern. For example, if
you only care about the number of clients broken down by their location and locale you would perform the following SQL statement.

```
SELECT geo, locale, sum(n_profiles)
FROM churn
GROUP BY geo, location
```

## Accessing the Data
The longitudinal is available in re:dash.

The data is stored as a parquet table in S3 at the following address. See this cookbook to get started working with the data in Spark.

```
s3://telemetry-parquet/churn/v2
```

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

7 years ago

Blocks: 1381806

Thomas Huelbert

Updated

•

7 years ago

Priority: P2 → P1

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 3

•

7 years ago

Attached file Bug 1345929 - Add documentation for churn dataset #66 — Details

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Create documentation for churn dataset

Categories

(Data Platform and Tools :: Documentation and Knowledge Repo (RTMO), enhancement, P1)

Tracking

(Not tracked)

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Updated

Comment 2

Updated

Updated

Comment 3

Updated

Attachment

General

Description

File Name

Content Type