What percentage of users get disconnected from their account for N period vs N-1?
Categories
(Data Science :: Investigation, task)
Tracking
(Not tracked)
People
(Reporter: adavis, Assigned: flawrence)
References
Details
Attachments
(1 file)
|
23.93 KB,
image/png
|
Details |
Brief Description of the request (required):
In May, we released the Accounts Toolbar menu with the assumption it would help drive more registrations. To our surprise, it drives 4x more Logins than registrations. We now assume that it is making it more obvious users that they are not signed in to their accounts.
But why are so many existing account users not signed in? Is there a bug where users get disconnected? We have had users report that they got disconnected but we don't know how regularly it happens.
In our main telemetry, there is a flag to say if a profile is connected to an account, or not.
Where N=week, I'd like to look at what % active profiles that were connected to an account at N-1 that have remained active desktop users during N but are no longer connected to an account.
Sample data could look something like:
| Period (week) | Active Profiles Connected to Account | Returning profile from last period | Disconnected from Account since last period | Account Attrition Rate |
|---|---|---|---|---|
| Week N-5 | 100,000 | - | - | - |
| Week N-4 | 105,000 | 80,000 | 74,400 | -7% |
| Week N-3 | 107,000 | 82,000 | 73,800 | ~10.0% |
| Week N-2 | 110,000 | 85,000 | 80,750 | -5% |
| Week N-1 | 115,000 | 90,000 | 76,500 | -15% |
| Week N | 122,00 | 94,000 | 78,020 | -18% |
In the above table, we could potentially identify that a non-negligeable amount of users are disconnecting from their accounts and that it may have gotten worse during N-1.
From here, we could investigate if these are intentional disconnections or if they are the result of a bug.
Business purpose for this request (required):
We are trying to grow relationships. We do that through acquisition, activation, and retention. If we have a retention problem caused by a bug, we should identify that quickly and fix it.
Requested timelines for the request or how this fits into roadmaps or critical decisions (required):
ASAP. Depending on results, it may impact our Q4 priorities. We may also use this chart to track the introduction of new bugs in the Sync and Accounts decoupling happening in Firefox 71.
Links to any assets (e.g Start of a PHD, BRD; any document that helps describe the project):
None at this time. This investigation would help create one. Any observations made here will educate future hypotheses.
Name of Data Scientist (If Applicable):
Leif or Kimmy
Comment 1•6 years ago
|
||
We see similar behavior in Fennec Android.
We just added prominent new buttons for account registration and login to Fennec's home screen (the "Awesomescreen"). We saw a nice 3x increase in account registrations, but like desktop, we also a surprising 5.5x increase in account logins. We don't know if these are Fennec users who had an account but never logged in to Fennec until now or if these users had been logged in but were inadvertently disconnected and reminded that they should log in again.
- Fennec 68.2 Account Logins and Registrations: https://analytics.amplitude.com/mozilla-corp/chart/4l97wcs
- Fennec 68.2 Account Authentications by Entrypoint (Logins + Registrations): https://analytics.amplitude.com/mozilla-corp/chart/pdqvkko
| Reporter | ||
Comment 2•6 years ago
|
||
There have been a lot of discussions about this in meetings recently because we're trying to hit our EOY goal for relationships.
If we can prioritize this soon, I would really appreciate it.
| Reporter | ||
Updated•6 years ago
|
Comment 3•6 years ago
•
|
||
Here's my first cut at this. I don't expect it to be the last word. I also did it quickly, in about an hour. We should have another data scientist (or myself when I'm back from PTO) take a more comprehensive look. But, I wanted to at least get us started on this before I go out.
https://sql.telemetry.mozilla.org/queries/65789/source#167081
Method:
- I looked at periods of calendar weeks for a 1% sample of clients starting in august. I ignored data where
fxa_configured = null(seemed to be infrequent enough to ignore for now) - I extracted the client's last submitted value of
fxa_configuredwithin each week (so this would end up being the most recent value from the previous week, when comparing WoW). - For the following I only counted clients when they had values set for
fxa_configuredon consecutive weeks, e.g. I did not consider a client "disconnected" if they did not submit any values for fxa_configured on the previous calendar week (to count as disconnected their previous value must have beentrueAND the number of the last week they submitted telemetry must have been equal to the number of the current week minus 1). This constraint could easily be relaxed. As it is, I think it makes interpretation easier but it might also bias things toward the patterns of users who are more likely to be active at least once a week. Maybe we only really care about those users though. - I then defined "disconnected" as clients that had
fxa_configured = truethe previous week but hadfxa_configured = falsefor the current week. I divided this by the number of total clients that were connected in the previous week (prop_of_last_week_connected_that_disconnected_this_week) and by the number of total clients that were connected in the current week (prop_of_this_week_connected_that_disconnected_this_week). The latter proportion is to facilitate comparison to "new" connections, e.g. can be thought of as the additional % of clients that we would have had active in the current week if they had not disconnected. - I also defined "continuing clients" as those who had
fxa_configured = truethe previous week and also hadfxa_configured = truefor the current week. I divided this by the number of total clients that were connected in the previous week (prop_of_last_week_connected_that_are_continuing) and by the number of total clients that were connected in the current week (prop_of_this_week_connected_that_are_continuing). - I also defined "connected or reconnected" as clients who had
fxa_configured = falsethe previous week but hadfxa_configured = truefor the current week. These could be new accounts among existing firefox users or re-connections of existing accounts. I divided this by the number of total clients that were connected in the current week (prop_of_this_week_connected_that_not_connected_last_week).
What I've found so far, given above:
- On a given week, 16-20% of clients who were connected in the previous week (
fxa_configured = true) show up as disconnected the next week (fxa_configured = false) - However in the long run "attrition" week-to-week looks to be largely compensated for by connections by clients that were not connected on the previous week (but were active in Firefox). Put another way, the loss of clients due to "disconnection" seems to be (basically) offset by roughly the same amount of new connections occuring in per week (again, "new" here just means that they had
fxa_configured = falsein the last week). - In a typical week, 35-40% of users who have
fxa_configured = truealso hadtruethe previous week, i.e. are "continuing" users.
More to be done here and to check:
- Is fxa_configured basically stable within clients within week? That is, if it changes value does it tend to stay that way ping-to-ping in the short run?
- Maybe it would be better to do sliding windows of 7 days rather than over calendar weeks.
- the ~17% figure lines up pretty close to this view from the server metrics but the continuing number from the query above seems to be low. Its not an apples to apples comparison at all though, as a user can be continuing at the user level (e.g. on another device) but not at the device level.
- assuming ive been going about this right and the 17% number is roughly in the right ballpark, next steps would be to use the client telemetry, e.g. interactions with the FxA menu(s), to give our best estimate on how many of the
true->falseusers might plausibly be user-initiated.
| Assignee | ||
Comment 4•6 years ago
•
|
||
I was asked to take a look at this but didn't realise Leif was still working at it.
I took a similar but different approach - I started by looking at users with fxa_configured in the first week of July, and considered as their starting point the first day in that week in which they reported fxa_configured. Then I followed those users for the five following weeks and checked in each week to see whether they reported fxa_configured as true, as false, or null/no ping.
The fraction of users who report fxa_configured = true in each week is plotted in blue; the fraction who report fxa_configured = false is plotted in orange. The x axis is the days since the "starting point": x=0 represents the first week, x=7 represents the second week, etc.
The two series don't add to 1: if a client doesn't report a non-null value for fxa_configured in a given week, they won't appear in either series but do count in the denominator of the fraction. And if a client reports fxa_configured=true on one day and fxa_configured=false on a different day in the same week, then they count in both series.
The num_fxa_configured series makes a lot of sense to me: it starts as 100% in the first week (we selected people with fxa_configured=True on day 0, so 100% of these report having fxa_configured=True in week 0). In week 2 it drops to a bit below 80% - this is likely due to one-off or occasional users. In subsequent weeks it drops much less from week to week - this drop likely reflects users churning or, possibly, fxa losing its configuration.
Intriguingly, num_fxa_not_configured starts at around 40% and wiggles around a bit. 40% is really high and indicates that something is wrong - with my query, with the telemetry, or with the functionality. If we saw it steadily increase from week to week, then that would be evidence that fxa configuration was getting lost in large enough numbers to overcome the underlying week-to-week churn. I don't think we see this.
I'm going to break things down by platform and by profile age to see if I can learn anything more about the 40%.
| Assignee | ||
Comment 5•6 years ago
|
||
tdsmith points out that this suggests the presence of an ETL bug that is treating null values of fxa_configured as False (he said something about None perhaps being interpreted as False rather than NULL when coming out of a histogram. Additionally, in clients_daily, fxa_configured is aggregated using F.first(), so if any pings from an fxa-configured client that day do not contain fxa_configured, then it may be non-deterministic whether fxa_configured will be recorded as True or False in that row of clients_daily.
Here are the implications for our telemetry system:
- We suspect that clients with fxa configured may intermittently report
fxa_configuredas False - this affects main_summary as well as clients_daily. - clients_daily does not gracefully handle inconsistent
fxa_configuredvalues for a client
The implication for our analyses is that fxa_configured being true is a sufficient but not necessary condition for the client having fxa configured.
I've filed bug 1593384 to fix this.
In the meantime, I think I'll repeat my analysis using main_summary to avoid the clients_daily issues, and I'll try to work out for each platform/profile age whether we're losing fxa_configured clients faster than generic clients for that platform. (This is spurred by the observation that only 68% of mac clients had fxa_configured=True in the second week, compared to 80% of windows - which may well be due to the clients_daily issue since macs allegedly send more subsession pings which might have fxa_configured=NULL.
Updated•6 years ago
|
| Assignee | ||
Comment 6•6 years ago
|
||
Per discussion with adavis, let's wait for bug 1593773 to be resolved so that we have sufficient data to do this analysis with confidence. The hope is that the functionality from Bug 1238810 goes out in v71, then bug 1593773 might be resolved within 4 weeks of that, so that once we have 4 weeks of data from v71 we can do this kind of analysis.
Description
•