The default bug view has changed. See this FAQ.

SIMPLE_MEASURES_MAIN should be faster when EARLY_GLUESTARTUP_READ_OPS.sum = 0

RESOLVED WORKSFORME

Status

Mozilla Metrics
Data/Backend Reports
RESOLVED WORKSFORME
5 years ago
5 years ago

People

(Reporter: (dormant account), Assigned: joy)

Tracking

unspecified
Unreviewed
x86_64
Windows 7
Dependency tree / graph

Details

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
We need an analysis to show that startup is faster in certain conditions. EARLY_GLUESTARTUP_READ_OPS.sum is derived from a heuristic to detect when windows prefetch is broken so we can run our own prefetch. Our theory is that our own prefetch is faster than the general purpose windows prefetch.

We will need a similar analysis when 692255 lands.
(Reporter)

Comment 1

5 years ago
12:12 < joy> taras: some glue values are not 1 or 0 eg.  23, 28, 79 - is that okay?
12:12 < joy> is it 1 vs not 1 or 0 vs not 0?
12:12 < joy> or 0,1, not (0 or 1)


That's correct, it's a tri-state.
(Reporter)

Comment 2

5 years ago
If you get bored, it would be interesting if there a cliff, where after a certain value of GLUESTARTUP startup speed is ruined.
(Assignee)

Comment 3

5 years ago
Please see http://people.mozilla.org/~sguha/757215.html .
Short answer, yes and no.
(Reporter)

Comment 4

5 years ago
(In reply to Saptarshi Guha from comment #3)
> Please see http://people.mozilla.org/~sguha/757215.html .
> Short answer, yes and no.

Thanks, what about SIMPLE_MEASURES_FIRST_PAINT instead of _MAIN? It could be that MAIN is taking a hit so FIRST_PAINT can be reached sooner
Taras I think preload is enabled when this value is 0, in which cases we think there is no prefetch. 

But in which cases do we get "1" vs "not_0&&not_1"?
I would have thought that values of 0 gave the best startup time.
(Reporter)

Comment 6

5 years ago
(In reply to Brian R. Bondy [:bbondy] from comment #5)
> Taras I think preload is enabled when this value is 0, in which cases we
> think there is no prefetch. 
> 
> But in which cases do we get "1" vs "not_0&&not_1"?
> I would have thought that values of 0 gave the best startup time.

I think we need to do proper A/B testing for this, ie don't always enable preload if no prefetch is present.

I think this mainly shows that read_ops = 0, and read_ops = 1 is not a directly comparable scenario(could be the 0s are caused by faulty AV software).
So always clear prefetch but only sometimes enable preload?
What about not clearing prefetch at all sometimes as well?
(Reporter)

Comment 8

5 years ago
(In reply to Brian R. Bondy [:bbondy] from comment #7)
> So always clear prefetch but only sometimes enable preload?
> What about not clearing prefetch at all sometimes as well?

From my testing having prefetch AND our own preloading is a consistent loss
I meant for that case:
What about not clearing prefetch at all AND not preloading sometimes as well?
(Reporter)

Comment 10

5 years ago
(In reply to Brian R. Bondy [:bbondy] from comment #9)
> I meant for that case:
> What about not clearing prefetch at all AND not preloading sometimes as well?
Yes

Lets land the current stuff as is, if the improvement is not clear after a couple of days we can refine it with A/B testing.
OK good, will do.
(Assignee)

Comment 12

5 years ago
So how will the A/B testing be done? Some packets will have it and others not? I would have low expectations from this sort of testing given the variability in installations.

I know this a todo, but a proper A/B will be in-user A/B i.e before/after.

Do as will be done (some users get and some not) but control for everything else (e.g. so that if there is a difference (or not) it is not because of other factors)
(Reporter)

Comment 13

5 years ago
(In reply to Taras Glek (:taras) from comment #4)
> (In reply to Saptarshi Guha from comment #3)
> > Please see http://people.mozilla.org/~sguha/757215.html .
> > Short answer, yes and no.
> 
> Thanks, what about SIMPLE_MEASURES_FIRST_PAINT instead of _MAIN? It could be
> that MAIN is taking a hit so FIRST_PAINT can be reached sooner

Saptashi ^ can you rerun this with FIRST_PAINT as your benchmark?
(Assignee)

Comment 14

5 years ago
Will do. Be on this soon.
(Assignee)

Comment 15

5 years ago
Taras, could you explain (in some detail)

" It could be that MAIN is taking a hit so FIRST_PAINT can be reached sooner "

What do you expect to be different?
(Reporter)

Comment 16

5 years ago
(In reply to Saptarshi Guha from comment #15)
> Taras, could you explain (in some detail)
> 
> " It could be that MAIN is taking a hit so FIRST_PAINT can be reached sooner
> "
> 
> What do you expect to be different?

I mean that it could be firstpaint is where the effect of the heuristic can be observed.
(Assignee)

Comment 17

5 years ago
But why not main? We saw in the previous graphs main /is/ affected, hence firstpaint would be do (since firstpaint = main + something)
(Reporter)

Comment 18

5 years ago
(In reply to Saptarshi Guha from comment #17)
> But why not main? We saw in the previous graphs main /is/ affected, hence
> firstpaint would be do (since firstpaint = main + something)

there is a bunch more stuff that needs to happen for firstpaint. If this works properly, then by the time main is called glue glue=0 everything should already be in memory, so less stuff needs to be read in for firstpaint.
(Reporter)

Updated

5 years ago
Blocks: 762123
(Assignee)

Comment 19

5 years ago
Okay! Makes sense, got the results, i'll just run a few diagnostics and submit the results.
(Assignee)

Comment 20

5 years ago
Hello,

I have attached an image which explained as follows.

1. Glue is binned as defined as "no prefetch" if no glue is present,
   1,2-8 and 8+
2. We modelled the log (base 10) of fpaint (first paint) controlling
   for log of main (base 10) and glue (incorporating interaction
   effects)
3. The attached PDF is 3 pages. Page 1 plots the 100 percentiles of the
   *predicted* log of fpaint for the case when glue == "no_prefetch" vs
   percentiles of log of fpaint when glue is 1, 2-8 and 8+ (moving
   left to right).
5. The red line is the line y=x
6. Page 1 is for the case when main <=0.41 seconds, Page 2 is for 1.7
   seconds and Page 3 <=93.53 seconds.

Summary:

In almost all cases the fpaint for the case "no_prefetch"  is greater
than or equal to fpaint for when prefetch is present.

The effects are greater when main is larger (see Page 3) but this
effect is reduced when both main and glue are large (glue == 8+) - see
Page 3, and look at the panels from left to right - the black symbols
lie above the red line indicating that fmain for no prefetch is larger
than for the cases 1 (left panel), 2-8 and 8+. But the departure from
the straight line diminishes for glue==8+.
(Assignee)

Comment 21

5 years ago
Created attachment 631200 [details]
Log of Fpaint for different types of glue
(Reporter)

Comment 22

5 years ago

thanks, sounds we should get rid of this optimization
(Assignee)

Comment 23

5 years ago
Not true, as you can see except for one case (page 1, extreme right panel), the points are always above or equal to the red line - which implies that the having prefetch doesn't adversely affect the startup time. And in fact improves it in cases when the time is larger.
taras wrote:
> thanks, sounds we should get rid of this optimization

Saptarshi Guha wrote:
> Not true...

The optimization was to disable the prefetch.
I ran a local test and profiled a small test app that loads 30 different xul_i.dll files.  

func1: Loads 30dlls with LoadLibrary w/o the preload() function in nsGlueLinkingWin.cpp
This performs at  ~1s 10ms

func2: Loads 30dlls with LoadLibrary with the preload() function in nsGlueLinkingWin.cpp
This performs at  ~1s 378ms

I'm not sure if our preload code is effective.
(Reporter)

Comment 26

5 years ago
(In reply to Brian R. Bondy [:bbondy] from comment #25)
> I ran a local test and profiled a small test app that loads 30 different
> xul_i.dll files.  
> 
> func1: Loads 30dlls with LoadLibrary w/o the preload() function in
> nsGlueLinkingWin.cpp
> This performs at  ~1s 10ms
> 
> func2: Loads 30dlls with LoadLibrary with the preload() function in
> nsGlueLinkingWin.cpp
> This performs at  ~1s 378ms
> 
> I'm not sure if our preload code is effective.
I wonder if something regressed.

Thing to do is to check xperf to see how much paging a bare loadlibrary call causes. It could very well be that it's faster to run the static initializers(since msvc packs they correctly) by demand-paging because it leaves most of the library for paging in later
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → WORKSFORME
(Reporter)

Updated

5 years ago
Blocks: 764019
You need to log in before you can comment on or make changes to this bug.