Closed Bug 856315 Opened 12 years ago Closed 12 years ago

[FHR] Outlier startup time destroys chart usability

Tracking

(Not tracked)

Status:

VERIFIED FIXED

People

(Reporter: dre, Assigned: espressive)

References

Details

Attachments

(10 files)

healthreport payload with an unlikely startup time 12 years ago Mark Reid [:mreid] 32.90 KB, application/json		Details
Health Payload : Unlikeley Startup time 2 12 years ago "Saptarshi Guha[:joy]" 32.74 KB, application/json		Details
payload with high startup time 12 years ago Paulo Pires 12.90 KB, text/plain		Details
Weird startup time 12 years ago Pedro Martins (:PMartins) 13.96 KB, application/json		Details
FHR payload with abnormal startup time 12 years ago Daniel Einspanjer [:dre] [:deinspanjer] 57.85 KB, application/json		Details
FHR payload with huge startup time 12 years ago Dave Zeber [:dzeber] 13.43 KB, text/plain		Details
Count of payloads with startup outliers by day 12 years ago Mark Reid [:mreid] 2.38 KB, text/plain		Details
Count of payloads with startup outliers by duration 12 years ago Mark Reid [:mreid] 3.10 KB, text/plain		Details
outlier proportions by facet 12 years ago Dave Zeber [:dzeber] 2.75 KB, text/plain		Details
qa - verified on dev 12 years ago Matt Brandt [:mbrandt] 893.10 KB, image/png		Details

Daniel Einspanjer [:dre] [:deinspanjer]

Reporter

Description

•

12 years ago

See this screenshot: http://www.screencast.com/t/kOAqjmpmlha On the 25th, I have an odd startup time recorded which is greater than 24 hours. :/ "2013-03-25": { "org.mozilla.searches.counts": { "_v": 1, "google.searchbar": 2 }, "org.mozilla.appSessions.previous": { "_v": 3, "cleanActiveTicks": [ 6, 2008 ], "cleanTotalTime": [ 63, 322923 ], "main": [ 6231, 1908, 88916237 ], "firstPaint": [ 10001, 4863, 88920119 ], "sessionRestored": [ 10122, 4917, 88920340 ], "abortedActiveTicks": [ 1716 ], "abortedTotalTime": [ 88817 ] } }, I don't know how that happened, but we need to think of some ways to prevent it from rendering the chart useless by blowing out the Y axis.

Matt Brandt [:mbrandt]

Updated

•

12 years ago

Summary: Outlier startup time destroys chart usability → [FHR] Outlier startup time destroys chart usability

Schalk Neethling [:espressive]

Assignee

Comment 1

•

12 years ago

So, this seems to have more to do with data than the graph itself. I cannot see how one would ever have a startup time of 24hrs ;)

Daniel Einspanjer [:dre] [:deinspanjer]

Reporter

Comment 2

•

12 years ago

It is true that the data is bad, but if it happened to me, it is likely it could happen to others. I think it would be worthwhile to try to preserve the usefulness of the chart by either locking the Y scale or controlling the outliers or notifying the user they have bad data..

"Saptarshi Guha[:joy]"

Comment 3

•

12 years ago

Suggestion: make the maximum of the Y-scale the 99% percentile of the data. Apart from using a log scale(which i doubt will happen), i dont see how one can fix the values

Gregory Szorc [:gps]

Comment 4

•

12 years ago

What little I know about user interaction design says that we shouldn't notify the user of a problem they have no control over. Therefore, we should silently throw away the data. Please don't accept this opinion as a substitute for asking a real UX expert.

"Saptarshi Guha[:joy]"

Comment 5

•

12 years ago

Yes, the 99% is robust against outliers. Use this limit and silently not show these outliers. This is probably too late, but one can have a tool tip indicating that there are points beyond the y-limit.

"Saptarshi Guha[:joy]"

Updated

•

12 years ago

Blocks: 856512

"Saptarshi Guha[:joy]"

Updated

•

12 years ago

No longer blocks: 856512

Larissa Co [:lco]

Comment 6

•

12 years ago

(In reply to Saptarshi Guha from comment #5) > Yes, the 99% is robust against outliers. Use this limit and silently not > show these outliers. This is probably too late, but one can have a tool tip > indicating that there are points beyond the y-limit. I think for right now, I'd be ok with not telling the user that we're controlling the data. I don't know how a tool tip would add value to the user. At some other point in time, we could add a note in the Raw Data page about how we're displaying the graph, but I don't think it should be in the main user-facing dashboard.

Pedro Martins (:PMartins)

Updated

•

12 years ago

Blocks: 856988

Laura Thomson :laura

Updated

•

12 years ago

No longer blocks: 856988

Daniel Einspanjer [:dre] [:deinspanjer]

Reporter

Comment 7

•

12 years ago

Providing initial triage, but I am not the final say. Not marking this as blocker because it is likely that most people will be starting with fresh payloads as this rolls into beta. That said, I believe it is critical to get some form of data filtering in place soon so we don't end up with enough people hitting this and getting a useless report to cause a PR issue.

Severity: normal → critical

Priority: -- → P1

Schalk Neethling [:espressive]

Assignee

Updated

•

12 years ago

Assignee: nobody → sneethling

Laura Thomson :laura

Updated

•

12 years ago

Component: General → about:healthreport

Product: Webtools → Firefox Health Report

Mark Reid [:mreid]

Comment 8

•

12 years ago

Attached file healthreport payload with an unlikely startup time — Details

Here is the healthreport payload from my profile. The data point from 2013-04-08 contains a bizarrely high startup time.

"Saptarshi Guha[:joy]"

Comment 9

•

12 years ago

Attached file Health Payload : Unlikeley Startup time 2 — Details

I too have a startup time of 78hrs.See data$days$2013-04-08

Mike Connor [:mconnor]

Comment 10

•

12 years ago

For the web pieces, we should discard any sessions where the startup time is > 5 minutes, not just for startup, but for the whole session. I've yet to see a case where the entire session wasn't completely broken.

Paulo Pires

Comment 11

•

12 years ago

Attached file payload with high startup time — Details

I also had a high startup time on the April 8. I believe it was a day that I opened Firefox on the release version

Pedro Martins (:PMartins)

Comment 12

•

12 years ago

Attached file Weird startup time — Details

Daniel Einspanjer [:dre] [:deinspanjer]

Reporter

Comment 13

•

12 years ago

Attached file FHR payload with abnormal startup time — Details

This document contains a bad startup time on April 11. The odd thing is, I did not see this bad time between the 14th and 16th. I can't be sure when exactly it appeared, but I feel pretty confident it arrived some time after the 16th.

Dave Zeber [:dzeber]

Comment 14

•

12 years ago

Looks like I'm the latest casualty, with a firstPaint time of 56037046 ms on April 18. My main startup is 56033795 ms. Did anyone else notice whether it's the main time which is driving the huge values, or does it ever occur in the delta between two of the three starutp times? I'm wondering if there can be a session start that's not completed properly and is not registered in the payload until some event happens a few days later, so it looks like it's been running the whole time.

Dave Zeber [:dzeber]

Comment 15

•

12 years ago

Attached file FHR payload with huge startup time — Details

Gregory Szorc [:gps]

Comment 16

•

12 years ago

We should have proper detection of aborted sessions.

Mark Reid [:mreid]

Comment 17

•

12 years ago

Attached file Count of payloads with startup outliers by day — Details

I ran a job to count startup outliers (which I arbitrarily defined as "firstPaint longer than ten minutes"). Attached is the per-day count. The counts were calculated by iterating through each day in each payload.

Mark Reid [:mreid]

Comment 18

•

12 years ago

Attached file Count of payloads with startup outliers by duration — Details

This is another view into the firstPaint outliers - this time by firstPaint time (in hours).

Daniel Einspanjer [:dre] [:deinspanjer]

Reporter

Comment 19

•

12 years ago

(In reply to Mark Reid [:mreid] from comment #17) > Created attachment 740289 [details] > Count of payloads with startup outliers by day > > I ran a job to count startup outliers (which I arbitrarily defined as > "firstPaint longer than ten minutes"). Attached is the per-day count. The > counts were calculated by iterating through each day in each payload. Would be best to partition this data out by channel and/or build number. Saptarshi, is this something that you currently have in your aggregates?

Flags: needinfo?(sguha)

Flags: needinfo?(mreid)

"Saptarshi Guha[:joy]"

Comment 20

•

12 years ago

Actually, it would be best to use dzebers outlier detection rule to determine how many documents in a cohort actually trigger this outlier rule. This tells us that in the absence of the outlier detection rule, how many documents would have the chart mangled because of outliers. So i would compute this proportion by channel.

Flags: needinfo?(sguha) → needinfo?(dzeber)

Dave Zeber [:dzeber]

Comment 23

•

12 years ago

As Saptarshi suggested, I ran my outlier detection algorithm and computed proportions of hits, for all documents whose current build is March 1 or later. The table shows the proportions of installations whose "Average" plot would show outliers, broken down by channel/os. channel os outlier total prop 1 aurora Darwin 991 13556 0.07310416 2 aurora Linux 104 4211 0.02469722 3 aurora WINNT 15470 377099 0.04102371 4 beta Darwin 4953 153432 0.03228140 5 beta Linux 117 11281 0.01037142 6 beta SunOS 0 1 0.00000000 7 beta WINNT 231464 7032527 0.03291335 8 nightly Darwin 790 6959 0.11352206 9 nightly Linux 336 5013 0.06702573 10 nightly WINNT 12978 260571 0.04980600 The proportions for nightly all seem to be higher than their counterparts in other channels, which would make sense as nightly is the least stable channel. We also see that across all channels, Mac is the worst offender, suggesting that the problem is OS-dependent. One comment is that these outliers are not necessarily all "weird" measurements - they are cases where one or a few points are far removed from the bulk of the values. For example, a case with 1 time of 20s and the rest all below 3s would be flagged under this rule. But it does indicate instability in the startup times.

Flags: needinfo?(dzeber)

Dave Zeber [:dzeber]

Comment 24

•

12 years ago

Attached file outlier proportions by facet — Details

Here are some more detailed outlier numbers. For each channel/os pair, the documents are split according to the size of their largest daily median (maxtime). For a given channel/os pair, facet.prop gives the proportion falling in each maxtime bucket. Then, for each channel/os/maxtime combination, we have the proportion flagged as outliers. Interestingly, for beta with maxtime > 1hr, quite low proportions are flagged as having outliers (in the 30-50% range). This suggests that for most such documents, no startup times can be considered "unusual" compared to the others. In other words, installations with a very long startup time tend to have several of them. On the other hand, for nightly >1hr, the percentages with outliers are all around 80%, meaning that in most cases we have a very small number of unusually large times. Aurora numbers seem to fall in between. This suggests that nightly builds still suffer from problems with weird values, but issues causing systematic long startups are getting fixed.

Mark Reid [:mreid]

Updated

•

12 years ago

Flags: needinfo?(mreid)

Laura Thomson :laura

Comment 25

•

12 years ago

Getting this fixed blocks ship.

Severity: critical → blocker

Schalk Neethling [:espressive]

Assignee

Comment 26

•

12 years ago

Embedding mail from Gregory Szorc on the mailing list Is everyone fine with implementing as such? "In versions 1 and 2 of the sessions provider, total session time was reported in milliseconds, not seconds. When version 3 landed in bug 837238 on Feb 5 in Firefox 21, we didn't add code to munge existing sessions from milliseconds to seconds because we figured the problem would only impact Nightly users (since FHR is first enabled in 21 and 21 was in Nightly at the time). What I just realized today is that part of session recording is in 20. 20 is recording up to 7 days of sessions in preferences, in milliseconds. When these users upgrade to 21, FHR will import these sessions from preferences and record them under v3 sessions (reported in seconds). And thus we have excessively long session times. The simplest solution is to convert unreasonable large session times when you see them. For reference: 60,000 milliseconds in a minute 86,400 seconds in a day 300,000 milliseconds in 5 minutes 600,000 milliseconds in 10 minutes 604,800 seconds in a week 864,000 seconds in 10 days 1,800,000 milliseconds in 30 minutes 2,592,000 seconds in 30 days 3,600,000 milliseconds in 1 hour 7,776,000 seconds in 90 days 31,536,000 seconds in 365 days 86,400,000 milliseconds in a day Say anything above 2.592M is automatically divided by 1000. A better solution is to look in the version history in the record. We should only perform the division if 20 was seen or if the start day of the session was before 21 was installed. With the 2.592M threshold, I re-ran a job analyzing median session times. Before, 13% of median session lengths on the beta channel were 3+ days. After, it's ~1%. The after number is more in line with what others have said is reality. FAQ Q. Can we fix the client? A. We could, sure. However, there are millions of users on Beta/21 that have already reported sessions in milliseconds. And, there are millions more on 20 that will soon report sessions in milliseconds (at least for a few days). Uplifting to Beta/21 at this juncture would be difficult. So, I fear we'll just have to deal with it in all consumers of data. Sad panda. "

Schalk Neethling [:espressive]

Assignee

Comment 27

•

12 years ago

Looking at the JSON payload under days I see that the .previous section contains a property _v (well almost all 'sections' has this with different values) with a value of say 3. Looking at the current documentation it has the line 'Version 3' at the top of the section on 'org.mozilla.appSessions.previous'. This got me wondering, does this correlate with the versions of the session provider that is mentioned by Gregory in my above comment? If so, then this means I can simply use this as my indicator to determine whether the paintTime for this session would have been recorder as seconds or milliseconds and divide by 1000 as appropriate. Also, looking at the current code for getAllStartupTimes as well as calculateMedianStartupTime, all paintTimes are currently treated as milliseconds and the result of these calculations are always divided by 1000 before being returned, which is definitely leading to incorrect data being displayed on the graph at the moment. So the work on this bug will then also resolve any related issues.

Schalk Neethling [:espressive]

Assignee

Comment 28

•

12 years ago

Following on from IRC, it has been decided that we are going to use the 2.592M threshold to handle outlier as this is the most reliable.

Mike Connor [:mconnor]

Comment 29

•

12 years ago

Dumping an IRC discussion into the bug: There are two distinct cases for outliers that must not be conflated: * sessions where totalTime is recorded in milliseconds instead of seconds, detailed in comment 26 * sessions where startup times are wildly inflated. In the cases we've inspected actual measurements, all of the start metrics are wildly long (the longest I've seen was six days to firstPaint), which implies the startup time being recorded is incorrect for an unknown reason, and leads to the other measurements (all measured as offsets) being incredibly high. For the second case, these extreme outliers are clearly something we should discard. I'd suggest that if either of main or firstPaint are higher than 300000 milliseconds that it is appropriate to exclude that session as an extreme outlier case.

Schalk Neethling [:espressive]

Assignee

Comment 30

•

12 years ago

Thanks everyone for working on this so: 1) For total session time, we are using a top threshold of 2.592M and dividing everything above that by 1000 2) For startup times, we are excluding any negative values (already implemented) and then discarding anything that is above 300,000 milliseconds from the graph.

Schalk Neethling [:espressive]

Assignee

Comment 31

•

12 years ago

Couple of pull request with regards to this merged into master. If someone can take this for a test run, that would be awesome.

Laura Thomson :laura

Comment 32

•

12 years ago

It wfm, fwiw. Let's get it staged and QA can have at it.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Matt Brandt [:mbrandt]

Comment 33

•

12 years ago

Attached image qa - verified on dev — Details

QA verified on dev - the outlier times that were present (> 3.0 seconds) on the graphs are no longer present. Both the 'Average' and 'All' graphs look reasonable.

Schalk Neethling [:espressive]

Assignee

Comment 34

•

12 years ago

(In reply to Matt Brandt [:mbrandt] from comment #33) > Created attachment 744192 [details] > qa - verified on dev > > QA verified on dev - the outlier times that were present (> 3.0 seconds) on > the graphs are no longer present. > > Both the 'Average' and 'All' graphs look reasonable. Thanks Matt, can we bump this to verified?

Matt Brandt [:mbrandt]

Comment 35

•

12 years ago

Thanks Schalk, missed bumping the status :) Bumping to verified

Status: RESOLVED → VERIFIED

Matt Brandt [:mbrandt]

Updated

•

12 years ago

Blocks: 867643

BMO Automation

Updated

•

7 years ago

Product: Firefox Health Report → Firefox Health Report Graveyard

You need to log in before you can comment on or make changes to this bug.