Closed Bug 1407410 Opened 8 years ago Closed 8 years ago

Determine initial slack for computing 1 day retention

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

Attachments

(1 file)

Bug 1407410 - Set initial slack to capture 99% of activity #188 8 years ago Anthony Miyaguchi [:amiyaguchi] 53 bytes, text/x-github-pull-request		Details \| Review

Anthony Miyaguchi [:amiyaguchi]

Assignee

Description

•

8 years ago

The 1-day retention dataset describes the count of users seen in a particular activity date. Submission latency affects what portion of activity we observe after a certain amount waiting. In the `mozetl.engagement.retention` job, there is a configurable `--slack` argument that will compute the retention dataset by an offset of `n` days. The churn dataset is sensitive to this value and has been set to 10 days. However, this means that there are up to 17 days of lag. This dataset is interested in Firefox 55+ and should benefit from lower latency submissions. Currently, this option defaults to 2 days, which should be enough to capture 95% of the data. From the telemetry-health dashboard, the 95th percentile graph for the latest nightly seems to wildly vary (up to 447 hours for nightly 57). [1] 1. Looking at Firefox Beta 57, is it safe to assume the following? - ~1 day captures about 95% of activity - ~4 days captures about 99% of activity 2. From Firefox 55+, are these assumptions reasonable? - For 95% of clients, wait 2 days - For 99% of clients, wait 5 days 3. What value of slack should be set for 1-day retention? [1] https://sql.telemetry.mozilla.org/dashboard/telemetry-health

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Blocks: 1381840

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 1

•

8 years ago

Are points 1 and 2 in comment #1 reasonable assumptions to make based on the data in the telemetry health dashboard?

Flags: needinfo?(chutten)

Chris H-C :chutten

Comment 2

•

8 years ago

The wild variations in Nightly 57 is because we're now on Nightly 58. The maximum delay we've seen on a current nightly for the 95%ile has only twice been more than 30 hours over the past three months. I don't know the particular qualities of the 1-day retention dataset. Can you direct me to some documentation?

Flags: needinfo?(chutten) → needinfo?(amiyaguchi)

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 3

•

8 years ago

Thanks, that sheds more light on the bottom most plot. Bug 1381840 Comment 3 is probably the closest thing to documentation beside the docstring in the retention module at the moment.[1] I'll be adding a doc page to DTMO before it's deployed on Airflow. [1] https://github.com/mozilla/python_mozetl/blob/master/mozetl/engagement/retention/job.py#L1-L32

Flags: needinfo?(amiyaguchi)

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 4

•

8 years ago

The slack value is one of the last configurations to tune before deploying the 1 day retention dataste. Is 95% of activity of clients per day an adequate figure for 1 day retention?

Flags: needinfo?(pdolanjski)

Peter Dolanjski [:pdol]

Comment 5

•

8 years ago

(In reply to Anthony Miyaguchi [:amiyaguchi] from comment #4) > The slack value is one of the last configurations to tune before deploying > the 1 day retention dataste. Is 95% of activity of clients per day an > adequate figure for 1 day retention? Is it safe to assume that 95% gives us a really good sample of the whole data set, such that if we see statistically significant variation in retention when only 95% of data is present, we can assume that'll usually hold true with 100%?

Flags: needinfo?(pdolanjski) → needinfo?(amiyaguchi)

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 6

•

8 years ago

Probably not, it would probably depend on the size of the cohort and the type of bias introduced by latency. In addition, the use of HLL will introduce standard error that will most likely compound. I can think of a way of performing validation to quantify the error with 95% of activity, but waiting an extra 2-3 days for 99% of the activity is a safe route. As I understand it, significance testing will require going back to the raw data to calculate the standard deviation. I imagine this to be straightforward to automate once relevant subpopulations are identified, by hand or other means. In any case, it sounds like it's better to err on the side of caution and increase the slack to accommodate 99% of activity for a day.

Flags: needinfo?(amiyaguchi)

Anthony Miyaguchi [:amiyaguchi]

Assignee

Comment 7

•

8 years ago

Attached file Bug 1407410 - Set initial slack to capture 99% of activity #188 — Details

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Assignee: nobody → amiyaguchi

Ryan Harter [:harter]

Updated

•

8 years ago

Points: --- → 2

Priority: -- → P1

Anthony Miyaguchi [:amiyaguchi]

Assignee

Updated

•

8 years ago

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

3 years ago

Component: Datasets: General → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Determine initial slack for computing 1 day retention

Categories

(Data Platform and Tools :: General, enhancement, P1)

Tracking

(Not tracked)

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type