Closed
Bug 856315
Opened 12 years ago
Closed 12 years ago
[FHR] Outlier startup time destroys chart usability
Categories
(Firefox Health Report Graveyard :: Web: Health Report, defect, P1)
Firefox Health Report Graveyard
Web: Health Report
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: dre, Assigned: espressive)
References
Details
Attachments
(10 files)
32.90 KB,
application/json
|
Details | |
32.74 KB,
application/json
|
Details | |
12.90 KB,
text/plain
|
Details | |
13.96 KB,
application/json
|
Details | |
57.85 KB,
application/json
|
Details | |
13.43 KB,
text/plain
|
Details | |
2.38 KB,
text/plain
|
Details | |
3.10 KB,
text/plain
|
Details | |
2.75 KB,
text/plain
|
Details | |
893.10 KB,
image/png
|
Details |
See this screenshot:
http://www.screencast.com/t/kOAqjmpmlha
On the 25th, I have an odd startup time recorded which is greater than 24 hours. :/
"2013-03-25": {
"org.mozilla.searches.counts": {
"_v": 1,
"google.searchbar": 2
},
"org.mozilla.appSessions.previous": {
"_v": 3,
"cleanActiveTicks": [
6,
2008
],
"cleanTotalTime": [
63,
322923
],
"main": [
6231,
1908,
88916237
],
"firstPaint": [
10001,
4863,
88920119
],
"sessionRestored": [
10122,
4917,
88920340
],
"abortedActiveTicks": [
1716
],
"abortedTotalTime": [
88817
]
}
},
I don't know how that happened, but we need to think of some ways to prevent it from rendering the chart useless by blowing out the Y axis.
Updated•12 years ago
|
Summary: Outlier startup time destroys chart usability → [FHR] Outlier startup time destroys chart usability
Assignee | ||
Comment 1•12 years ago
|
||
So, this seems to have more to do with data than the graph itself. I cannot see how one would ever have a startup time of 24hrs ;)
Reporter | ||
Comment 2•12 years ago
|
||
It is true that the data is bad, but if it happened to me, it is likely it could happen to others.
I think it would be worthwhile to try to preserve the usefulness of the chart by either locking the Y scale or controlling the outliers or notifying the user they have bad data..
Comment 3•12 years ago
|
||
Suggestion: make the maximum of the Y-scale the 99% percentile of the data. Apart from using a log scale(which i doubt will happen), i dont see how one can fix the values
Comment 4•12 years ago
|
||
What little I know about user interaction design says that we shouldn't notify the user of a problem they have no control over. Therefore, we should silently throw away the data. Please don't accept this opinion as a substitute for asking a real UX expert.
Comment 5•12 years ago
|
||
Yes, the 99% is robust against outliers. Use this limit and silently not show these outliers. This is probably too late, but one can have a tool tip indicating that there are points beyond the y-limit.
Comment 6•12 years ago
|
||
(In reply to Saptarshi Guha from comment #5)
> Yes, the 99% is robust against outliers. Use this limit and silently not
> show these outliers. This is probably too late, but one can have a tool tip
> indicating that there are points beyond the y-limit.
I think for right now, I'd be ok with not telling the user that we're controlling the data. I don't know how a tool tip would add value to the user.
At some other point in time, we could add a note in the Raw Data page about how we're displaying the graph, but I don't think it should be in the main user-facing dashboard.
Reporter | ||
Comment 7•12 years ago
|
||
Providing initial triage, but I am not the final say.
Not marking this as blocker because it is likely that most people will be starting with fresh payloads as this rolls into beta.
That said, I believe it is critical to get some form of data filtering in place soon so we don't end up with enough people hitting this and getting a useless report to cause a PR issue.
Severity: normal → critical
Priority: -- → P1
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → sneethling
Updated•12 years ago
|
Component: General → about:healthreport
Product: Webtools → Firefox Health Report
Comment 8•12 years ago
|
||
Here is the healthreport payload from my profile. The data point from 2013-04-08 contains a bizarrely high startup time.
Comment 9•12 years ago
|
||
I too have a startup time of 78hrs.See data$days$2013-04-08
Comment 10•12 years ago
|
||
For the web pieces, we should discard any sessions where the startup time is > 5 minutes, not just for startup, but for the whole session. I've yet to see a case where the entire session wasn't completely broken.
Comment 11•12 years ago
|
||
I also had a high startup time on the April 8.
I believe it was a day that I opened Firefox on the release version
Comment 12•12 years ago
|
||
Reporter | ||
Comment 13•12 years ago
|
||
This document contains a bad startup time on April 11. The odd thing is, I did not see this bad time between the 14th and 16th. I can't be sure when exactly it appeared, but I feel pretty confident it arrived some time after the 16th.
Comment 14•12 years ago
|
||
Looks like I'm the latest casualty, with a firstPaint time of 56037046 ms on April 18. My main startup is 56033795 ms.
Did anyone else notice whether it's the main time which is driving the huge values, or does it ever occur in the delta between two of the three starutp times?
I'm wondering if there can be a session start that's not completed properly and is not registered in the payload until some event happens a few days later, so it looks like it's been running the whole time.
Comment 15•12 years ago
|
||
Comment 16•12 years ago
|
||
We should have proper detection of aborted sessions.
Comment 17•12 years ago
|
||
I ran a job to count startup outliers (which I arbitrarily defined as "firstPaint longer than ten minutes"). Attached is the per-day count. The counts were calculated by iterating through each day in each payload.
Comment 18•12 years ago
|
||
This is another view into the firstPaint outliers - this time by firstPaint time (in hours).
Reporter | ||
Comment 19•12 years ago
|
||
(In reply to Mark Reid [:mreid] from comment #17)
> Created attachment 740289 [details]
> Count of payloads with startup outliers by day
>
> I ran a job to count startup outliers (which I arbitrarily defined as
> "firstPaint longer than ten minutes"). Attached is the per-day count. The
> counts were calculated by iterating through each day in each payload.
Would be best to partition this data out by channel and/or build number.
Saptarshi, is this something that you currently have in your aggregates?
Flags: needinfo?(sguha)
Flags: needinfo?(mreid)
Comment 20•12 years ago
|
||
Actually, it would be best to use dzebers outlier detection rule to determine how many documents in a cohort actually trigger this outlier rule. This tells us that in the absence of the outlier detection rule, how many documents would have the chart mangled because of outliers.
So i would compute this proportion by channel.
Flags: needinfo?(sguha) → needinfo?(dzeber)
Comment 23•12 years ago
|
||
As Saptarshi suggested, I ran my outlier detection algorithm and computed proportions of hits, for all documents whose current build is March 1 or later. The table shows the proportions of installations whose "Average" plot would show outliers, broken down by channel/os.
channel os outlier total prop
1 aurora Darwin 991 13556 0.07310416
2 aurora Linux 104 4211 0.02469722
3 aurora WINNT 15470 377099 0.04102371
4 beta Darwin 4953 153432 0.03228140
5 beta Linux 117 11281 0.01037142
6 beta SunOS 0 1 0.00000000
7 beta WINNT 231464 7032527 0.03291335
8 nightly Darwin 790 6959 0.11352206
9 nightly Linux 336 5013 0.06702573
10 nightly WINNT 12978 260571 0.04980600
The proportions for nightly all seem to be higher than their counterparts in other channels, which would make sense as nightly is the least stable channel. We also see that across all channels, Mac is the worst offender, suggesting that the problem is OS-dependent.
One comment is that these outliers are not necessarily all "weird" measurements - they are cases where one or a few points are far removed from the bulk of the values. For example, a case with 1 time of 20s and the rest all below 3s would be flagged under this rule. But it does indicate instability in the startup times.
Flags: needinfo?(dzeber)
Comment 24•12 years ago
|
||
Here are some more detailed outlier numbers.
For each channel/os pair, the documents are split according to the size of their largest daily median (maxtime). For a given channel/os pair, facet.prop gives the proportion falling in each maxtime bucket. Then, for each channel/os/maxtime combination, we have the proportion flagged as outliers.
Interestingly, for beta with maxtime > 1hr, quite low proportions are flagged as having outliers (in the 30-50% range). This suggests that for most such documents, no startup times can be considered "unusual" compared to the others. In other words, installations with a very long startup time tend to have several of them. On the other hand, for nightly >1hr, the percentages with outliers are all around 80%, meaning that in most cases we have a very small number of unusually large times. Aurora numbers seem to fall in between.
This suggests that nightly builds still suffer from problems with weird values, but issues causing systematic long startups are getting fixed.
Updated•12 years ago
|
Flags: needinfo?(mreid)
Assignee | ||
Comment 26•12 years ago
|
||
Embedding mail from Gregory Szorc on the mailing list
Is everyone fine with implementing as such?
"In versions 1 and 2 of the sessions provider, total session time was reported in milliseconds, not seconds. When version 3 landed in bug 837238 on Feb 5 in Firefox 21, we didn't add code to munge existing sessions from milliseconds to seconds because we figured the problem would only impact Nightly users (since FHR is first enabled in 21 and 21 was in Nightly at the time).
What I just realized today is that part of session recording is in 20. 20 is recording up to 7 days of sessions in preferences, in milliseconds. When these users upgrade to 21, FHR will import these sessions from preferences and record them under v3 sessions (reported in seconds). And thus we have excessively long session times.
The simplest solution is to convert unreasonable large session times when you see them. For reference:
60,000 milliseconds in a minute
86,400 seconds in a day
300,000 milliseconds in 5 minutes
600,000 milliseconds in 10 minutes
604,800 seconds in a week
864,000 seconds in 10 days
1,800,000 milliseconds in 30 minutes
2,592,000 seconds in 30 days
3,600,000 milliseconds in 1 hour
7,776,000 seconds in 90 days
31,536,000 seconds in 365 days
86,400,000 milliseconds in a day
Say anything above 2.592M is automatically divided by 1000.
A better solution is to look in the version history in the record. We should only perform the division if 20 was seen or if the start day of the session was before 21 was installed.
With the 2.592M threshold, I re-ran a job analyzing median session times. Before, 13% of median session lengths on the beta channel were 3+ days. After, it's ~1%. The after number is more in line with what others have said is reality.
FAQ
Q. Can we fix the client?
A. We could, sure. However, there are millions of users on Beta/21 that have already reported sessions in milliseconds. And, there are millions more on 20 that will soon report sessions in milliseconds (at least for a few days). Uplifting to Beta/21 at this juncture would be difficult. So, I fear we'll just have to deal with it in all consumers of data. Sad panda. "
Assignee | ||
Comment 27•12 years ago
|
||
Looking at the JSON payload under days I see that the .previous section contains a property _v (well almost all 'sections' has this with different values) with a value of say 3.
Looking at the current documentation it has the line 'Version 3' at the top of the section on 'org.mozilla.appSessions.previous'. This got me wondering, does this correlate with the versions of the session provider that is mentioned by Gregory in my above comment?
If so, then this means I can simply use this as my indicator to determine whether the paintTime for this session would have been recorder as seconds or milliseconds and divide by 1000 as appropriate.
Also, looking at the current code for getAllStartupTimes as well as calculateMedianStartupTime, all paintTimes are currently treated as milliseconds and the result of these calculations are always divided by 1000 before being returned, which is definitely leading to incorrect data being displayed on the graph at the moment.
So the work on this bug will then also resolve any related issues.
Assignee | ||
Comment 28•12 years ago
|
||
Following on from IRC, it has been decided that we are going to use the 2.592M threshold to handle outlier as this is the most reliable.
Comment 29•12 years ago
|
||
Dumping an IRC discussion into the bug:
There are two distinct cases for outliers that must not be conflated:
* sessions where totalTime is recorded in milliseconds instead of seconds, detailed in comment 26
* sessions where startup times are wildly inflated. In the cases we've inspected actual measurements, all of the start metrics are wildly long (the longest I've seen was six days to firstPaint), which implies the startup time being recorded is incorrect for an unknown reason, and leads to the other measurements (all measured as offsets) being incredibly high.
For the second case, these extreme outliers are clearly something we should discard. I'd suggest that if either of main or firstPaint are higher than 300000 milliseconds that it is appropriate to exclude that session as an extreme outlier case.
Assignee | ||
Comment 30•12 years ago
|
||
Thanks everyone for working on this so:
1) For total session time, we are using a top threshold of 2.592M and dividing everything above that by 1000
2) For startup times, we are excluding any negative values (already implemented) and then discarding anything that is above 300,000 milliseconds from the graph.
Assignee | ||
Comment 31•12 years ago
|
||
Couple of pull request with regards to this merged into master. If someone can take this for a test run, that would be awesome.
Comment 32•12 years ago
|
||
It wfm, fwiw.
Let's get it staged and QA can have at it.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 33•12 years ago
|
||
QA verified on dev - the outlier times that were present (> 3.0 seconds) on the graphs are no longer present.
Both the 'Average' and 'All' graphs look reasonable.
Assignee | ||
Comment 34•12 years ago
|
||
(In reply to Matt Brandt [:mbrandt] from comment #33)
> Created attachment 744192 [details]
> qa - verified on dev
>
> QA verified on dev - the outlier times that were present (> 3.0 seconds) on
> the graphs are no longer present.
>
> Both the 'Average' and 'All' graphs look reasonable.
Thanks Matt, can we bump this to verified?
Comment 35•12 years ago
|
||
Thanks Schalk, missed bumping the status :)
Bumping to verified
Status: RESOLVED → VERIFIED
Updated•7 years ago
|
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•