Closed Bug 1559432 Opened 5 years ago Closed 5 years ago

Measure share of AVG users that moved from having passwords to not having passwords

Categories

(Data Science :: Investigation, task)

x86_64
Unspecified
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: RT, Assigned: tdsmith)

References

Details

Brief description of the request:
We hit an issue with AVG users that caused users to stop being able to access their password BD: bug 1558765
In order to understand if these users should be fixed through a dot release we need to understand the number of users impacted.
Request: Count AVG (antivirus) users daily that moved from having passwords in their password DB to not having passwords at all (PWMGR_NUM_PASSWORDS_PER_HOSTNAME) in the last 2 weeks.

Link to any assets:
Incident doc: https://docs.google.com/document/d/1Q74sLPUoq30llfWNbBxamK0DUPY_xBn57jL2rSQIyzc/edit

Is there a specific data scientist you would like or someone who has helped to triage this request:
no

PWMGR_NUM_SAVED_PASSWORDS will be much easier to work with since it's a total count, not a histogram counting number of logins for a given hostname.

Blocks: 1559458
Assignee: nobody → tdsmith

:wbeard got us started here; thanks Chris!

Count AVG (antivirus) users daily that moved from having passwords in their password DB to not having passwords at all (PWMGR_NUM_PASSWORDS_PER_HOSTNAME) in the last 2 weeks.

About 20%, or 50k users.

Note that we can't count AVG use among Windows 7 users, who are about 40% of our userbase; if we assume they use AVG at the same rate, that implies a number closer to about 83k.

Notebook and summary: https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/132863/dashboard/132972/present

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

Just a note that re-running that query today, it looks like it's closer to 206k users (124k observed + unobserved Windows 7 users), which we can probably expect to be the final number since aiui active AVG users should have received a definition update by now (albeit with unknown-to-us penetration).

My query used main_summary as a data source, which is updated on a daily cadence early in the morning GMT based on the pings received the previous day. The numbers on the 14th were current as of midnight GMT the morning of Jun 14; it appears that we heard from additional users after that point.

I don't think that difference is large enough that it would have changed our response, though it might have been useful to do additional work to either a) rely on main pings using the Dataset API for low-latency access instead of waiting for main_summary to update the next day, or b) do additional forecasting work instead of just reporting the number of users affected to date.

You need to log in before you can comment on or make changes to this bug.