Closed
Bug 1003187
Opened 11 years ago
Closed 9 years ago
Work out a metric for overall updater effectiveness
Categories
(Toolkit :: Application Update, defect)
Toolkit
Application Update
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: mconnor, Assigned: mconnor)
Details
cc-ing some quant-like people who've worked with this data
Despite a lot of excellent work, we're still seeing a significant proportion of users lag behind mainline trunk by a couple of releases (or more), meaning they're six weeks or more out of date. There are various theories about what's behind this, but ultimately what we need to measure is the overall effectiveness of the system and our past design choices. Once we have that metric, we can make a better call on prioritization of updater improvements.
Robert and I have had a very long thread (and Vidyo chat) on the subject, and we're in agreement that it's important to build a reliable product metric for update. There's also a lot of work to be done on engineering/debugging metrics, assuming the numbers are real. That's a subject for a different bug, however.
(As an explicit carve-out, we have a considerable number of users on releases too old to have FHR. I'd like to exclude those users from how we calculate this metric, and treat remediation of those users as a separate project from assessing the current updater.)
As a starting point, I'm going to propose the following metric set as a first step (using FHR data):
* For all users with software update available [1] on a given pingDate (or range):
** % of users on the current or N-1 major release (meaning they updated in the last six weeks)
** % the remainder who had more than two hours of activity in a previous session within the last six weeks (bug 803181 means they should have downloaded an update in that time) [2]
** % of the remainder who weren't online enough in the past six weeks to get updates
[1] This includes users with update prefs changed, but excludes users of Linux distro builds. We're evaluating the overall system's effectiveness, not just "is it working as designed?"
[2] This framing explicitly ignores the current behaviour of completing an existing update download before re-checking, on the grounds that updating to out of date builds cannot be considered "success"
If someone has any ideas on how this should change, let me know. Otherwise I'll be hacking on a jydoop query over the next little while to prototype this metric. (Eventually we can make it a dashboard, but I don't think that's the immediate ask.)
Oh, how I love designing stats!
Sounds like you through through the technical issues around this well.
mconnor, general nit: can you put an example row of your proposed metric? I am having a bit of a hard time grokking it.
1. In designing any stat, what should this a) reward b) punish.
2. Is 100% of users being on current (or current - 1) really the goal? If not, what is the goal? To me, the update goal should be "No STUPID reasons for being on old versions". For me, that might suggest "of people with 'update' on, get 100% to current, if they have been around recently".
3. Going right to Jydoop makes sense after you get that settled :) If you want help there, turns out some of us have jydoop jobs laying around that already take ONEPERCENT, SAMPLEDATE and things into account, and could wire this up in two hours :)
4. Nits: should this have 'by country' or other breakouts?
5. Do we record / should we record "failed upgrade attempts" in the packet? Is a failed upgrade like a bug/crash?
6. Seriously old profiles (pre-fhr) are probably super busted or opted out of upgrades, yay excluding them.
From your proposal, I am getting this vibe as the %.
On a given DAY:
%ON_RIGHT_VERSION DAY = 100 * N_IS_CURRENT_IF_SHOULD / N_SHOULD_HAVE_CURRENT
Where:
- N_IS_CURRENT_IF_SHOULD = version is current, or current - 1, and SHOULD_HAVE_CURRENT, given DAY
- N_SHOULD_HAVE_CURRENT is:
has_update_pref
has_been_online_2_hours_in_a_session_in_last_6_weeks
BUT NOT: pre-fhr
Maybe I am misreading though :) I might pick a score that isn't just that percentage, and has some global nastiness value. This score seems to not distinguish well between 'current' and 'current-1'. I might give 'half-credit' to the 'release - 1' cases.
This score as defined will also have a cyclical rhythm in the release cycle, and I would want to reward that with a score like "days to 80%", "days to 90%, etc.
(Also, small complications... as usual with FHR, it takes a while for packets to start arriving, so judging update effectively can be tangled with "uploads to FHR correctly". Hopefully this will be a small problem.)
| Assignee | ||
Comment 3•11 years ago
|
||
(In reply to Gregg Lind (User Research - Test Pilot) from comment #1)
> Oh, how I love designing stats!
>
> Sounds like you through through the technical issues around this well.
>
> mconnor, general nit: can you put an example row of your proposed metric?
> I am having a bit of a hard time grokking it.
>
> 1. In designing any stat, what should this a) reward b) punish.
a) Keeping users moving forward at the same rate as the trains.
b) leaving users well behind.
> 2. Is 100% of users being on current (or current - 1) really the goal? If
> not, what is the goal? To me, the update goal should be "No STUPID reasons
> for being on old versions". For me, that might suggest "of people with
> 'update' on, get 100% to current, if they have been around recently".
The BHAG is that everyone's always current. That said, the goal in defining a metric like this is being able to answer "is the updater good enough?" in a clear yes/no fashion.
> 3. Going right to Jydoop makes sense after you get that settled :) If you
> want help there, turns out some of us have jydoop jobs laying around that
> already take ONEPERCENT, SAMPLEDATE and things into account, and could wire
> this up in two hours :)
Yuuuup.
> 4. Nits: should this have 'by country' or other breakouts?
For a product metric, no. We'll probably want that data to understand and attack it, but largely this should be a high-level metric.
> 5. Do we record / should we record "failed upgrade attempts" in the packet?
> Is a failed upgrade like a bug/crash?
Out of scope for this as a product metric, but this is pretty important for an engineering/debugging metric.
> 6. Seriously old profiles (pre-fhr) are probably super busted or opted out
> of upgrades, yay excluding them.
>
>
> From your proposal, I am getting this vibe as the %.
>
> On a given DAY:
> %ON_RIGHT_VERSION DAY = 100 * N_IS_CURRENT_IF_SHOULD /
> N_SHOULD_HAVE_CURRENT
>
> Where:
> - N_IS_CURRENT_IF_SHOULD = version is current, or current - 1, and
> SHOULD_HAVE_CURRENT, given DAY
> - N_SHOULD_HAVE_CURRENT is:
> has_update_pref
> has_been_online_2_hours_in_a_session_in_last_6_weeks
> BUT NOT: pre-fhr
Not "has update pref" but "has the updater available", yeah.
> Maybe I am misreading though :) I might pick a score that isn't just that
> percentage, and has some global nastiness value. This score seems to not
> distinguish well between 'current' and 'current-1'. I might give
> 'half-credit' to the 'release - 1' cases.
>
> This score as defined will also have a cyclical rhythm in the release cycle,
> and I would want to reward that with a score like "days to 80%", "days to
> 90%, etc.
This is the hard question: how do we minimize the cyclical nature of this metric?
> Not "has update pref" but "has the updater available", yeah.
I don't know exactly what that means technically, but presumably someone else does :)
> This is the hard question: how do we minimize the cyclical nature of this metric?
Maybe if it's conditioned on "days after release", that should adjust it correctly. I assume this is less a perfect 'real-time' stat, than an post-mortem / on-going health thing, and the answer will be clearer after seeing a few cycles.
Comment 5•9 years ago
|
||
There is telemetry work being done and this bug won't provide anything useful so closing.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•