Closed Bug 1507073 Opened 6 years ago Closed 4 years ago

Understand profiles that are only active for a single day.

Categories

(Data Science :: Investigation, task)

task
Not set
normal
Points:
3

Tracking

(Not tracked)

RESOLVED INACTIVE

People

(Reporter: jmccrosky, Assigned: jmccrosky)

References

Details

Brief description of the request:

We suspect that many of our new profiles may be created by labs or internet cafes that wipe and re-image their computers after every login or day.  This assumes that the image does not already have a profile created, but this is likely sometimes the case.

We'd like to attempt to identify such profiles and, more generally, attempt to develop a notion of "real profiles" and better understand the relationships between profiles, users, and computers.

Link to any assets:

Felix left a comment on one of Jesse' docs:

---
So far I've heard a lot about internet cafes and schools that look like fresh profiles every day, and I could imagine that they might dominate this number (do they?)

Should we aspire to measure the number of new ongoing users? A few ways we could do this:
- Don't count chickens until they're hatched: count "new >1 week old profiles" so that the wiped-every-day profiles never get counted
- Do something with IPs (e.g. count distinct IPs) to reduce the impact of this effect
- Use geo and machine data to predict for each new client whether they're going to be around for another week (could be as simple as binning on country and taking an average with a prior - basically we'd use the fact that internet cafes are more popular in certain places)
---

Is there a specific data scientist you would like or someone who has helped to triage this request:

Jesse plans to work on this, or Felix may take over if he's interested.

_____ UPDATE _______
I've broadened the scope of this bug to understanding in general our profiles that are only active a single day. This includes "internet cafe" profiles, organic churn after a single day, and telemetry optouts.

Rosanne Scholl suggests: "Some of those probably gave us their email addresses while creating an account, right? Could we email them a survey or an invite for a paid interview?" This is a very good idea.

Romain Testard also suggests looking at the proportion of active profiles that were created recently (perhaps last year) as this will help shed light on the issues around acquisition and retention.
Assignee: nobody → jmccrosky
There may be some duplicate effort (Human or Bot Challenge).  Should investigate before starting work on this:

If you have any questions regarding the Submission Process or the Challenge itself, leave a comment below. Alternatively, send a message to Eugene Ivanov (eivanov@mozilla.com or @eivanov on Slack). In addition, a Slack channel (#humanorbot) has been created to talk about this Challenge.
Status: NEW → ASSIGNED
Points: --- → 3
Summary: Identify "internet cafe or computer lab" wipe-and-regenerate profiles → Understand profiles that are only active fora single day.
Summary: Understand profiles that are only active fora single day. → Understand profiles that are only active for a single day.

I've broadened the scope of this bug to understanding in general our profiles that are only active a single day. This includes "internet cafe" profiles, organic churn after a single day, and telemetry optouts.

Rosanne Scholl suggests: "Some of those probably gave us their email addresses while creating an account, right? Could we email them a survey or an invite for a paid interview?" This is a very good idea.

I'll also note that I did some analysis for another project and found that about 5% of Firefox MAU are only "ever" active a single day.

Excited to see the results of this analysis. Can you please also take a look at utm params? I'm mostly interested if there's an acquisition source combo that is more or less likely to lead to these types of users.

Thanks!

Can we also look at an FxA cut?

Hi Jesse, the 'ko' locale issue in your doc reminded me of something odd we saw on the stub installer where an old_version '10.0.2' was seen in large numbers for 'ko' users. It turns out 10.0.2 was shipped in Feb 2012, which seems to match the profile creation date of 2012-04-13 for ko SDPs.
Could it be that profiles with re-installs somehow show-up as SDPs that day? (I assume here that these ko installs are done with a large number of profiles created on 2012-04-13 where re-installs happen regularly)
https://sql.telemetry.mozilla.org/queries/4917/source#10063

Chris found the 'ko' locale issues. Can you look into this Chris? No rush.

Flags: needinfo?(wbeard)

Romaine and I discussed this. He suggested one thing I look into is whether the profiles have a wide distribution of activity (URIs, etc). If they are temporary profiles created from the same image, then they might show very brief and uniform activity, in addition to being SDPs. This is something I can look into.

Flags: needinfo?(wbeard)

Merging in https://bugzilla.mozilla.org/show_bug.cgi?id=1534542 about determining the proportion of SDP that come from telemetry optouts here. A useful note: Just a note since this came up in the triage meeting: counting optout pings could be interesting—those still lack a client_id but we receive them on a continuous basis instead of the single point-in-time coverage measurement.

Work for the DS team is now tracked in Jira. You can search with the Data Science Jira project for the corresponding ticket.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.