765850 - Compare startup(firstPaint, firstloaduri) for different values of STARTUP_USING_PRELOAD_TRIAL

Reporter

Description

•

12 years ago

We've setup a randomized trial of different Firefox loading mechanisms in bug 764905. Builds should be reporting STARTUP_USING_PRELOAD_TRIAL starting on 2012-06-17.

Brian R. Bondy [:bbondy]

Comment 1

•

12 years ago

Any update on this, we're waiting on this data to see if we should disable the feature on Aurora before the next migration.  We should probably disable the feature on Aurora soon if we are going to.

"Saptarshi Guha[:joy]"

Comment 2

•

12 years ago

is the channel to look at Aurora or Nightly?

"Saptarshi Guha[:joy]"

Comment 3

•

12 years ago

also, 
1. what is firstloaduri
2. what is  STARTUP_USING_PRELOAD_TRIAL (and what do the values mean?)
3. What is the hypothesis?

Helpful if these are mentioned up front.

Brian R. Bondy [:bbondy]

Comment 4

•

12 years ago

> is the channel to look at Aurora or Nightly?

Nightly

> 1. what is firstloaduri

I think this is the time it takes to load the first uri

> what is  STARTUP_USING_PRELOAD_TRIAL (and what do the values mean?)

It's a randomly flipped value. There are 4 possible values:
0 - prefetch is enabled and preload is enabled
1 - prefetch is enabled and preload is disabled
2 - prefetch is disabled and preload is enabled
3 - prefetch is disabled and preload is disabled

> 3. What is the hypothesis?

We believe that preload is not helping in any case.  We also believe that prefetch being disabled is a net good.

Annie Elliott

Updated

•

12 years ago

Status: NEW → ASSIGNED

"Saptarshi Guha[:joy]"

Comment 5

•

12 years ago

Hello,
(FALSE is disabled, TRUE is enabled)

                      preload
prefetch     FALSE      TRUE
   FALSE 0.2158907 0.2174653
   TRUE  0.2836624 0.2829815

The observation  here is that a) PRELOAD is 50/50 split between enabled and disabled. but (b) PREFTECH is biased towards enabled :
prefetch
   FALSE     TRUE 
0.433356 0.566644 

Q: Is this by design? It could well have been part of the design plan, but if not, you might want to ask why.

ALso, 
if Z:= STARTUP_USING_PRELOAD_TRIAL
prefetch is 1 if Z is 0,1 else prefetch = 0
preload is 1 if Z is 0,2 else preload =0

Brian R. Bondy [:bbondy]

Comment 6

•

12 years ago

Are these numbers correlated with the startup time? Or are you just giving the distribution of the values?  The distributions of the true/false values do not matter and we can already see that on the histogram.

Brian R. Bondy [:bbondy]

Comment 7

•

12 years ago

What we're looking for is a comparison of the startup time in each of the 4 groups (prefetch? x preload?) to see which group gives the best startup time.
Also could you only consider values for builds after june 16 2012.

"Saptarshi Guha[:joy]"

Comment 8

•

12 years ago

I've completed this and will update the bug final results tomorrow morning (EDT).

Brian R. Bondy [:bbondy]

Comment 9

•

12 years ago

Sounds great, thanks!

"Saptarshi Guha[:joy]"

Comment 10

•

12 years ago

Attached file Summary of Bug — Details

Data Used:
    if(json$info$appName == 'Firefox'
     && json$info$OS   == 'WINNT'
     && json$info$appUpdateChannel == "nightly")

where json is the Telemetry Data packet.

Dates: "20120617" ... "20120623"


uri := firstLoadURI
paint := firstPaint
(both in seconds), 

prefetch:= STARTUP_USING_PRELOAD_TRIAL in (0,1)
preload:=  STARTUP_USING_PRELOAD_TRIAL in (0,2)

After dropping missing rows, all in all: 47,005 rows (1757 dropped because of missing variables).
See summary at end for a ... summary.

1. uri and paint are highly correlated (see page 1 of attached PDF)
i only looked at paint.

2. Percentiles of log(paint) are skewed

        0%         1%        10%        20%        30%        40%        50%        60%        70%        80%        90%        99%       100% 
-1.7092582 -0.7571525 -0.1098149  0.3625576  0.7309245  1.1079023  1.4747630  1.8528859  2.2452314  2.6762981  3.3448598  4.9689482 12.9452073 

To see the dependence of paint on preload,prefetch, we did a piece wise ANOVA, to avoid having to drop the top/bottom 1%.


3.  There is little to no interaction between prefetch and preload (see page 2), thus the model used (and also has the lowest BIC(Bayesian Information Criterion for choosing simpler models from a collection)

log(paint) ~ preload + prefetch + pwhere:preload + pwhere:prefetch

where pwhere = 0 if log(paint) < 1% percentile of log(paint), 1 if log(paint) between 1 and 99% and 2 if log(paint) > 99%.
and fpaint defined as above (2)

This method estimates the effect of prefetch and preload controlloing the different behavior in the tails of log(paint).

Output of ANOVA (having removed 401 points with high influence on coefficients)

Coefficients (base is <1% for pwhere and preload=OFF and prefetch=OFF):

                                   Estimate Std. Error t value Pr(>|t|)    
    (Intercept)                   -0.952187   0.099186  -9.600   <2e-16 ***
(1)  preloadpreloadON               0.003097   0.096171   0.032   0.9743    
(2a) prefetchprefetchON            -0.033319   0.108449  -0.307   0.7587    
     pwhere1-99                     2.414770   0.099530  24.262   <2e-16 ***
     pwhere>99                      4.978823   0.161488  30.831   <2e-16 ***
(1)  preloadpreloadON:pwhere1-99   -0.024909   0.096600  -0.258   0.7965    
(1)  preloadpreloadON:pwhere>99    -0.021531   0.175782  -0.122   0.9025    
(2)  prefetchprefetchON:pwhere1-99 -0.223112   0.108836  -2.050   0.0404 *  
(2b) prefetchprefetchON:pwhere>99   0.071636   0.182834   0.392   0.6952  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.9757 on 46677 degrees of freedom
Multiple R-squared: 0.08732,	Adjusted R-squared: 0.08716 
(The fit is not very good)

* Interpretation
Here the base for comparison is the 1-99% region

(1) Preload=ON has little to no effect on log(paint) , whatever value of log(paint).
(2) Prefetch=ON has a small effect  for where log(paint) lies in the 1-99% of it's distribution. 
The overall main effect (2a) is not signficant because it is 'crossed' : see (2b), the effect is positive i.e when prefetch is ON  log(paint)  is larger. 

- Page 3 and 4 of attached PDF are boxplots of the data, each panel a combination of preload, prefetch and 1%,1-99% and >99% - they support (1) and (2)
- Page 5 and 6 have QQ plots supporting (1) and (2). Not seen in the above table , but in Page 6 of the pdf is that prefetch=ON causes paint to drop (left most panel 1 on Page 6) - this is not statistically significant.


* Summary
Prefetch decreases time but statistically signficantly so for when log(paint) is not in the left or right tails(bottom 1% or top 1%). The effect is cause first paint
to dropy by 20%.

Aalso, inspecting  the bw plots (page  4), though prefetch causes paint to decrease, the variance remains unchanged

Brian R. Bondy [:bbondy]

Comment 11

•

12 years ago

Thanks Saptarshi!

So I think based on this data, we should stop doing preload always on Windows (since it makes no difference), and stop disabling the prefetch since having prefetch on is a benefit in most cases 1%-99%.  Can you confirm that I understood the data correctly Saptarshi?

(dormant account)

Reporter

Comment 12

•

12 years ago

I have one more theory, Saptashi, can you filter the data by SIMPLE_MEASURES_START?
According to https://metrics.mozilla.com/data/content/pentaho-cdf-dd/Render?solution=metrics2&path=%2Ftelemetry&file=telemetryHistogram.wcdf&bookmarkState={%22impl%22%3A%22client%22%2C%22params%22%3A{%22startDate%22%3A%222012-05-29%22%2C%22endDate%22%3A%222012-06-27%22%2C%22appVersion%22%3A%22%22%2C%22appName%22%3A%22Firefox%22%2C%22arch%22%3A%22%22%2C%22OS%22%3A%22%22%2C%22version%22%3A%22%22%2C%22channel%22%3A%22nightly%22%2C%22reason%22%3A%22idle-daily%22%2C%22appBuildID%22%3A%22%22%2C%22fromPlatformBuildID%22%3A%22%22%2C%22toPlatformBuildID%22%3A%22%22%2C%22excludeParam%22%3A%22%22%2C%22measure%22%3A%22SIMPLE_MEASURES_START%22%2C%22histogramCompareParam%22%3A%22appVersion%22%2C%22histogramVariablesParam%22%3A%22%22%2C%22platformBuildIDMode%22%3A%22LATEST%22%2C%22platformBuildIDTopCount%22%3A%2230%22%2C%22conditionsStatistic%22%3A%22%23conditionsStatistic%22%2C%22submissionsParameter%22%3A[[59459]]}}

23% of our users take >=1second before firefox code executes...Those could be 23% that do cold startups. Could you redo the analysis, but filter on simpleMeasures.start >=1second? I'm wondering if the fact that most startups are warm is skewing the data.

(dormant account)

Reporter

Comment 13

•

12 years ago

And if the numbers do not change, lets do what Brian suggested.

"Saptarshi Guha[:joy]"

Comment 14

•

12 years ago

Hi Brian,

yes the understanding is correct. 


but do keep in mind, though this does decrease the median, the variance appears to be unchanged (see page 4, middle 2 panels)

(In reply to Brian R. Bondy [:bbondy] from comment #11)
> Thanks Saptarshi!
> 
> So I think based on this data, we should stop doing preload always on
> Windows (since it makes no difference), and stop disabling the prefetch
> since having prefetch on is a benefit in most cases 1%-99%.  Can you confirm
> that I understood the data correctly Saptarshi?

Annie Elliott

Comment 15

•

12 years ago

(In reply to Taras Glek (:taras) from comment #12)
> I have one more theory, Saptashi, can you filter the data by
...
> 
Could you redo the analysis, but filter on
> simpleMeasures.start >=1second? I'm wondering if the fact that most startups
> are warm is skewing the data.

Annie Elliott

Comment 16

•

12 years ago

Sorry - Given my last comment, the further asks are actually requests for new analysis work.

Please file a new request for new work. CLosing this one out.

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

(dormant account)

Reporter

Comment 17

•

12 years ago

(In reply to Annie Elliott from comment #16)
> Sorry - Given my last comment, the further asks are actually requests for
> new analysis work.
> 
> Please file a new request for new work. CLosing this one out.

No, we are asking to rerun the analysis.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Lawrence Mandel [:lmandel] (use needinfo)

Updated

•

12 years ago

Whiteboard: [Telemetry:P1]

Annie Elliott

Comment 18

•

12 years ago

As above, I am closing this out. 

Going forward, when the original ask is completed, new asks, even if based on an old ask, require a new ticket, as they require further resourcing.

This is metrics workflow - there are no open-ended tickets. 

The additional work is in https://metrics.mozilla.com/projects/browse/METRICS-826.

Assignee: sguha → nobody

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

(dormant account)

Reporter

Updated

•

12 years ago

Blocks: 771745