Last Comment Bug 757215 - SIMPLE_MEASURES_MAIN should be faster when EARLY_GLUESTARTUP_READ_OPS.sum = 0
: SIMPLE_MEASURES_MAIN should be faster when EARLY_GLUESTARTUP_READ_OPS.sum = 0
Status: RESOLVED WORKSFORME
:
Product: Mozilla Metrics
Classification: Other
Component: Data/Backend Reports (show other bugs)
: unspecified
: x86_64 Windows 7
: -- normal (vote)
: Unreviewed
Assigned To: "Saptarshi Guha[:joy]"
:
Mentors:
Depends on:
Blocks: 762123 764019
  Show dependency treegraph
 
Reported: 2012-05-21 14:30 PDT by (dormant account)
Modified: 2012-06-12 10:05 PDT (History)
8 users (show)
See Also:
Due Date:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Log of Fpaint for different types of glue (146.65 KB, application/pdf)
2012-06-07 16:14 PDT, "Saptarshi Guha[:joy]"
no flags Details

Description (dormant account) 2012-05-21 14:30:38 PDT
We need an analysis to show that startup is faster in certain conditions. EARLY_GLUESTARTUP_READ_OPS.sum is derived from a heuristic to detect when windows prefetch is broken so we can run our own prefetch. Our theory is that our own prefetch is faster than the general purpose windows prefetch.

We will need a similar analysis when 692255 lands.
Comment 1 (dormant account) 2012-05-22 12:13:46 PDT
12:12 < joy> taras: some glue values are not 1 or 0 eg.  23, 28, 79 - is that okay?
12:12 < joy> is it 1 vs not 1 or 0 vs not 0?
12:12 < joy> or 0,1, not (0 or 1)


That's correct, it's a tri-state.
Comment 2 (dormant account) 2012-05-22 12:15:22 PDT
If you get bored, it would be interesting if there a cliff, where after a certain value of GLUESTARTUP startup speed is ruined.
Comment 3 "Saptarshi Guha[:joy]" 2012-05-24 12:09:08 PDT
Please see http://people.mozilla.org/~sguha/757215.html .
Short answer, yes and no.
Comment 4 (dormant account) 2012-05-24 15:13:32 PDT
(In reply to Saptarshi Guha from comment #3)
> Please see http://people.mozilla.org/~sguha/757215.html .
> Short answer, yes and no.

Thanks, what about SIMPLE_MEASURES_FIRST_PAINT instead of _MAIN? It could be that MAIN is taking a hit so FIRST_PAINT can be reached sooner
Comment 5 Brian R. Bondy [:bbondy] 2012-05-24 18:34:42 PDT
Taras I think preload is enabled when this value is 0, in which cases we think there is no prefetch. 

But in which cases do we get "1" vs "not_0&&not_1"?
I would have thought that values of 0 gave the best startup time.
Comment 6 (dormant account) 2012-05-24 18:54:13 PDT
(In reply to Brian R. Bondy [:bbondy] from comment #5)
> Taras I think preload is enabled when this value is 0, in which cases we
> think there is no prefetch. 
> 
> But in which cases do we get "1" vs "not_0&&not_1"?
> I would have thought that values of 0 gave the best startup time.

I think we need to do proper A/B testing for this, ie don't always enable preload if no prefetch is present.

I think this mainly shows that read_ops = 0, and read_ops = 1 is not a directly comparable scenario(could be the 0s are caused by faulty AV software).
Comment 7 Brian R. Bondy [:bbondy] 2012-05-24 18:56:03 PDT
So always clear prefetch but only sometimes enable preload?
What about not clearing prefetch at all sometimes as well?
Comment 8 (dormant account) 2012-05-24 18:59:30 PDT
(In reply to Brian R. Bondy [:bbondy] from comment #7)
> So always clear prefetch but only sometimes enable preload?
> What about not clearing prefetch at all sometimes as well?

From my testing having prefetch AND our own preloading is a consistent loss
Comment 9 Brian R. Bondy [:bbondy] 2012-05-24 19:11:09 PDT
I meant for that case:
What about not clearing prefetch at all AND not preloading sometimes as well?
Comment 10 (dormant account) 2012-05-24 19:16:24 PDT
(In reply to Brian R. Bondy [:bbondy] from comment #9)
> I meant for that case:
> What about not clearing prefetch at all AND not preloading sometimes as well?
Yes

Lets land the current stuff as is, if the improvement is not clear after a couple of days we can refine it with A/B testing.
Comment 11 Brian R. Bondy [:bbondy] 2012-05-24 19:17:31 PDT
OK good, will do.
Comment 12 "Saptarshi Guha[:joy]" 2012-05-24 19:31:20 PDT
So how will the A/B testing be done? Some packets will have it and others not? I would have low expectations from this sort of testing given the variability in installations.

I know this a todo, but a proper A/B will be in-user A/B i.e before/after.

Do as will be done (some users get and some not) but control for everything else (e.g. so that if there is a difference (or not) it is not because of other factors)
Comment 13 (dormant account) 2012-06-04 09:25:37 PDT
(In reply to Taras Glek (:taras) from comment #4)
> (In reply to Saptarshi Guha from comment #3)
> > Please see http://people.mozilla.org/~sguha/757215.html .
> > Short answer, yes and no.
> 
> Thanks, what about SIMPLE_MEASURES_FIRST_PAINT instead of _MAIN? It could be
> that MAIN is taking a hit so FIRST_PAINT can be reached sooner

Saptashi ^ can you rerun this with FIRST_PAINT as your benchmark?
Comment 14 "Saptarshi Guha[:joy]" 2012-06-04 12:16:17 PDT
Will do. Be on this soon.
Comment 15 "Saptarshi Guha[:joy]" 2012-06-05 15:50:37 PDT
Taras, could you explain (in some detail)

" It could be that MAIN is taking a hit so FIRST_PAINT can be reached sooner "

What do you expect to be different?
Comment 16 (dormant account) 2012-06-05 16:54:19 PDT
(In reply to Saptarshi Guha from comment #15)
> Taras, could you explain (in some detail)
> 
> " It could be that MAIN is taking a hit so FIRST_PAINT can be reached sooner
> "
> 
> What do you expect to be different?

I mean that it could be firstpaint is where the effect of the heuristic can be observed.
Comment 17 "Saptarshi Guha[:joy]" 2012-06-06 08:53:57 PDT
But why not main? We saw in the previous graphs main /is/ affected, hence firstpaint would be do (since firstpaint = main + something)
Comment 18 (dormant account) 2012-06-06 10:14:36 PDT
(In reply to Saptarshi Guha from comment #17)
> But why not main? We saw in the previous graphs main /is/ affected, hence
> firstpaint would be do (since firstpaint = main + something)

there is a bunch more stuff that needs to happen for firstpaint. If this works properly, then by the time main is called glue glue=0 everything should already be in memory, so less stuff needs to be read in for firstpaint.
Comment 19 "Saptarshi Guha[:joy]" 2012-06-06 14:06:57 PDT
Okay! Makes sense, got the results, i'll just run a few diagnostics and submit the results.
Comment 20 "Saptarshi Guha[:joy]" 2012-06-07 16:13:06 PDT
Hello,

I have attached an image which explained as follows.

1. Glue is binned as defined as "no prefetch" if no glue is present,
   1,2-8 and 8+
2. We modelled the log (base 10) of fpaint (first paint) controlling
   for log of main (base 10) and glue (incorporating interaction
   effects)
3. The attached PDF is 3 pages. Page 1 plots the 100 percentiles of the
   *predicted* log of fpaint for the case when glue == "no_prefetch" vs
   percentiles of log of fpaint when glue is 1, 2-8 and 8+ (moving
   left to right).
5. The red line is the line y=x
6. Page 1 is for the case when main <=0.41 seconds, Page 2 is for 1.7
   seconds and Page 3 <=93.53 seconds.

Summary:

In almost all cases the fpaint for the case "no_prefetch"  is greater
than or equal to fpaint for when prefetch is present.

The effects are greater when main is larger (see Page 3) but this
effect is reduced when both main and glue are large (glue == 8+) - see
Page 3, and look at the panels from left to right - the black symbols
lie above the red line indicating that fmain for no prefetch is larger
than for the cases 1 (left panel), 2-8 and 8+. But the departure from
the straight line diminishes for glue==8+.
Comment 21 "Saptarshi Guha[:joy]" 2012-06-07 16:14:03 PDT
Created attachment 631200 [details]
Log of Fpaint for different types of glue
Comment 22 (dormant account) 2012-06-07 16:33:25 PDT

thanks, sounds we should get rid of this optimization
Comment 23 "Saptarshi Guha[:joy]" 2012-06-07 16:45:59 PDT
Not true, as you can see except for one case (page 1, extreme right panel), the points are always above or equal to the red line - which implies that the having prefetch doesn't adversely affect the startup time. And in fact improves it in cases when the time is larger.
Comment 24 Brian R. Bondy [:bbondy] 2012-06-07 17:01:29 PDT
taras wrote:
> thanks, sounds we should get rid of this optimization

Saptarshi Guha wrote:
> Not true...

The optimization was to disable the prefetch.
Comment 25 Brian R. Bondy [:bbondy] 2012-06-07 20:25:26 PDT
I ran a local test and profiled a small test app that loads 30 different xul_i.dll files.  

func1: Loads 30dlls with LoadLibrary w/o the preload() function in nsGlueLinkingWin.cpp
This performs at  ~1s 10ms

func2: Loads 30dlls with LoadLibrary with the preload() function in nsGlueLinkingWin.cpp
This performs at  ~1s 378ms

I'm not sure if our preload code is effective.
Comment 26 (dormant account) 2012-06-07 21:57:36 PDT
(In reply to Brian R. Bondy [:bbondy] from comment #25)
> I ran a local test and profiled a small test app that loads 30 different
> xul_i.dll files.  
> 
> func1: Loads 30dlls with LoadLibrary w/o the preload() function in
> nsGlueLinkingWin.cpp
> This performs at  ~1s 10ms
> 
> func2: Loads 30dlls with LoadLibrary with the preload() function in
> nsGlueLinkingWin.cpp
> This performs at  ~1s 378ms
> 
> I'm not sure if our preload code is effective.
I wonder if something regressed.

Thing to do is to check xperf to see how much paging a bare loadlibrary call causes. It could very well be that it's faster to run the static initializers(since msvc packs they correctly) by demand-paging because it leaves most of the library for paging in later

Note You need to log in before you can comment on or make changes to this bug.