Closed Bug 1592114 Opened 5 years ago Closed 5 years ago

investigate possible repeat churners

Categories

(Data Science :: Investigation, task)

x86_64
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rscholl, Assigned: flawrence)

Details

Brief Description of the request (required):

The pull survey turned up some users who were targeted as new via telemetry but who told us in the survey they are not actually new, raising the possibility that some of our new users who fail to retain are actually old users cycling through profiles/client_ids. Specifically, we discussed these diagnostics:
-Only 1% have telemetry enabled?
-Profile creation date- are the HB targeted client IDs really new profiles?
-Break down hardware of the “neither” responses (because installer and updated differ)
-Attribution of installers
-Kamyar’s 32: the followup survey coded 32 responses where the "neither" user told us they were not a new user. Do these client_ids have old profiles? Did they recently update?

Business purpose for this request (required):

We want to understand retention patterns.

Requested timelines for the request or how this fits into roadmaps or critical decisions (required):

Felix requested I file this bug and assign to him after a discussion occassioned by dcamp questions to mgrimes.

Links to any assets (e.g Start of a PHD, BRD; any document that helps describe the project):

https://docs.google.com/document/d/1_DOew5bSfjKXd_nNgsJpLqoh7U-7SQmxZD3ahbZJF4w/edit?usp=sharing

Name of Data Scientist (If Applicable):

Felix Lawrence

Jgaunt's notebook for context on the 1% enabled surprise: https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/213525/command/213539

Status: NEW → ASSIGNED

-Only 1% have telemetry enabled?

per tdsmith, the telemetry_enabled probe is a trap - it refers to extended telemetry - a setting that sends us lots of extra data, and is typically only enabled on beta. https://searchfox.org/mozilla-central/source/modules/libpref/Preferences.cpp#3340

-Profile creation date- are the HB targeted client IDs really new profiles?

Nearly 2.4 million profiles reported via heartbeat telemetry ('telemetry_heartbeat_parquet') that they were exposed to 'hb-pull-survey-2019Q3' between 20190821 and 20190914. Of these, I was able to link 2.2 million to 'clients_daily' rows so that I could extract their profile creation date (I'm happy to ignore the rest). Of these 2.2 million profiles, roughly 31% were 'old' profiles (created before 2019/07/01). Does this make sense given the heartbeat targeting or does it expose a bug?

We used the absence of archived main ping telemetry as a proxy for "new" here, and I think these data indicate that was an imperfect assumption for targeting new profiles.

I don't believe it's clear why so many old profiles are in the state of not having any archived main pings, though, but I don't suspect there's a bug with heartbeat/normandy.

Of these 2.2 million profiles, roughly 31% were 'old' profiles (created before 2019/07/01). Does this make sense given the heartbeat targeting or does it expose a bug?

We know that the heartbeat targeting will oversample users that don't have a main telemetry ping when the recipe is evaluated for whatever reason. It was assumed that this number of users would be fairly small. 31% would be a much larger portion than I expected.

I think this represents a bug in either Normandy's context or in the Telemetry system. I haven't though of a way that a user with otherwise functional telemetry (we have pings from them) wouldn't have a telemetry ping. The telemetry archive system wasn't meant to be used this way, so maybe we've just asked it to do something that isn't reasonable.

As well as investigating what's going on with Telemetry, I think we should investigate other ways to target this heartbeat recipe. That would likely involve changes to Normandy to let us more precisely identify what we are looking for.

Does this mean that Mozilla's retention numbers are underestimated, in that they count as "new" ( ie, not retained) a lot of users that are in fact old?

(In reply to Rosanne Scholl from comment #5)

Does this mean that Mozilla's retention numbers are underestimated, in that they count as "new" ( ie, not retained) a lot of users that are in fact old?

This doesn't imply that. This implies that something with Normandy's targeting doesn't work how we expected, or something to do with how the client side telemetry system stores old pings is not as we expected.

But this doesn't involve old/existing users showing up as new profiles as measured by profile creation date, or (i would assume) the other ways we count new profiles.

The qsurvey CSV link for the pre-skyline pull survey is no longer working. I was going to look at the "neither" users and confirm that they specifically are mostly old profiles. But given that we now know that there was some mismatch with our targeting (i.e. it included extra users), I don't think this exercise is particularly useful - it's unlikely to lead to anything actionable.

I propose that we keep the same targeting for the rest of this series of surveys for consistency, and in the next analysis (bug 1571895) I check the "neither" users from that survey to verify that most of them do indeed have older profiles. I'll also check that older profiles comprise a relatively small proportion of responses overall, and that excluding the old profiles does not qualitatively change the results. If people express interest in getting to the bottom of why the telemetry archive system isn't working in the anticipated way, then at that point I'll check the old profiles' platforms to see if some platforms are more affected than others.

For future surveys outside this series, we should work out better ways to do this targeting. This might involve understanding precisely what went wrong with the targeting here and fixing it, or it might involve jumping ship to a different strategy (as mythmon suggested).

I think this plan addresses all the parts of this investigation that would yield actionable outcomes?

Thanks, Felix. We included this info in the follow-up study and are closing this bug.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.