Figure out how much of a win PGO builds actually are

RESOLVED FIXED

Status

()

Core
General
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: Ehsan, Assigned: dmandelin)

Tracking

Trunk
x86
Windows 7
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

5 years ago
We need somebody to measurements on a bunch of benchmarks that we consider interesting to see how much of a win PGO builds are on Windows.  Dave, word on the street is that you've done that measurement recently!  If that's the case, would you mind sharing the results here?  If not, can you please ask someone on your team to do the measurement?  I'm aiming to have numbers on this by the next Engineering call.

Thanks!
No longer blocks: 833881
Dave's original post:
https://groups.google.com/d/msg/mozilla.dev.platform/a1ua8-Y29ls/WwxLeaOmo9sJ

In that thread I linked to (for example) Dromaeo (DOM) results for PGO vs. non-PGO Windows builds:
http://graphs.mozilla.org/graph.html#tests=[[73,94,12],[73,1,12]]&sel=none&displayrange=365&datatype=running

You can do this for any Talos test and compare. We could probably write some scripts to find the perf difference on all of our Talos benchmarks between the two types of builds.
(Reporter)

Comment 2

5 years ago
Thanks for the link, Ted!  Do we need to do this experiment on more benchmarks?  Or is Dromaeo all that we care about?  (/me suspects the answer is that it's not!)
(Reporter)

Comment 3

5 years ago
Another important factor is that we are not yet in a position to consider turning off PGO for the JS engine, so _maybe_ Dromaeo is the only important benchmark here?
There's no reason to reduce the investigation just to Talos, other vendors have benchmarks that, even if in some cases are built for other browsers and have bugs, could be usable to test dom, canvas...
(Reporter)

Comment 5

5 years ago
(In reply to Marco Bonardo [:mak] from comment #4)
> There's no reason to reduce the investigation just to Talos, other vendors
> have benchmarks that, even if in some cases are built for other browsers and
> have bugs, could be usable to test dom, canvas...

Yeah, I was not talking about Talos at all!
(Assignee)

Comment 7

5 years ago
Got some first tests done.

Cold startup, 3 trials:

  1.23 pgo     3979, 5368, 4772
  1.24 non-pgo 4540, 4620, 3685

I don't really see any difference. I'd need to do many more trials to get statistical significance, and that would take a long time with all the reboots. I didn't really expect to see a difference given that cold startup is mostly io.
OS: Mac OS X → Windows 7
(Assignee)

Comment 8

5 years ago
Warm startup, 2 successive trials:

  1.23 pgo      984, 735
  1.24 non-pgo 1579, 741

I think this is also a no-difference. 750 ms is about what it takes on this machine for a warm startup generally. The first trial may not have been fully warmed up.
(Assignee)

Comment 9

5 years ago
SunSpider, 10 trials:

  1.23 pgo     260.0, 266.6, 284.7, 285.1, 264.8, 288.7, 269.7, 271.8, 269.4, 266.1
       mean =  272.7

  1.24 non-pgo 286.0, 278.1, 277.3, 276.6, 274.7, 278.1, 279.3, 282.1, 279.7, 281.0
       mean =  279.3

The t test p-value for the difference in the means is 0.0596. That's pretty close to statistical significance with the usual 0.05 value. I'm not entirely sure what to make of it, but I note that the lowest non-pgo score was 274.7, and pgo had 7 scores lower than that value, so I'm inclined to think it is a real difference. 

The difference I measured in these tests was 6.6 ms or 2.5%.
(Assignee)

Comment 10

5 years ago
Dromaeo DOM runs:

  1.23 pgo     http://dromaeo.com/?id=189193  total score 2139
  1.24 non-pgo http://dromaeo.com/?id=189187  total score 1903

The pgo version scored 1.15x better. The difference broadly held up over the different subtests. The main problem is that I don't know what significance a 1.15x Dromaeo DOM difference would have in actual usage. I suppose various people out there are comparing us based on Dromaeo, though.
(Assignee)

Comment 11

5 years ago
Is there anything else we should be testing, or is the above enough to go on for decision-making?
> I suppose various people out there are comparing us based on Dromaeo, though.

Yep.  In some ways, that's the biggest issue here.

That said, it's interesting to look at the numbers breakdown.  For example, the "DOM Attributes" test, "setAttribute" subtest is almost 40% faster with PGO than without.  On the other hand, the "element.expando" tests, which are pure jitcode, are of course unaffected.  That pattern holds across the board: tests which involve running lots of C++ code are must faster with PGO, while tests that are largely gated on the JIT or bottlenecked on a single simple loop or library call in C++ (e.g. createTextNode) don't win nearly as much.

As far as comments earlier in this bug about the JS engine... I thought we already had PGO off for the JS engine on Windows.  Is that not the case?

What I think is really worth measuring that I don't think we have good numbers for are layout performance.  I'm not talking pageload (we measure that with Tp); I'm talking "click the reply button in gmail and see how long that takes".
(In reply to Boris Zbarsky (:bz) from comment #12)
> As far as comments earlier in this bug about the JS engine... I thought we
> already had PGO off for the JS engine on Windows.  Is that not the case?
> 

It was turned back on awhile ago, but the bug number is escaping me atm.
(Reporter)

Comment 14

5 years ago
(In reply to comment #12)
> What I think is really worth measuring that I don't think we have good numbers
> for are layout performance.  I'm not talking pageload (we measure that with
> Tp); I'm talking "click the reply button in gmail and see how long that takes".

Do we have a good benchmark for this kind of thing?  Something that we can use to get some numbers?

(Microbenchmarks _could_ be useful here, since we're measuring differences in how well the compiler optimizes code, etc.)
I don't know of a good layout performance benchmark offhand, sadly...
But yes, we could try to write a microbenchmark for it.
(Reporter)

Comment 17

5 years ago
roc, dbaron, do you know of a good layout microbenchmark?
(Assignee)

Comment 18

5 years ago
(In reply to Boris Zbarsky (:bz) from comment #12)
> What I think is really worth measuring that I don't think we have good
> numbers for are layout performance.  I'm not talking pageload (we measure
> that with Tp); I'm talking "click the reply button in gmail and see how long
> that takes".

Videocameras/screen captures work well for that sort of thing. I can do it but probably won't be able to until at least Wednesday.
(Reporter)

Comment 19

5 years ago
(In reply to comment #18)
> (In reply to Boris Zbarsky (:bz) from comment #12)
> > What I think is really worth measuring that I don't think we have good
> > numbers for are layout performance.  I'm not talking pageload (we measure
> > that with Tp); I'm talking "click the reply button in gmail and see how long
> > that takes".
> 
> Videocameras/screen captures work well for that sort of thing. I can do it but
> probably won't be able to until at least Wednesday.

In that case, I think that's going to be valuable enough for us to delay having the final conversation by then.  Thanks, Dave!
(In reply to :Ehsan Akhgari from comment #17)
> roc, dbaron, do you know of a good layout microbenchmark?

This one maybe? http://www.craftymind.com/factory/guimark2/HTML5TextTest.html
(Assignee)

Comment 21

5 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #20)
> (In reply to :Ehsan Akhgari from comment #17)
> > roc, dbaron, do you know of a good layout microbenchmark?
> 
> This one maybe? http://www.craftymind.com/factory/guimark2/HTML5TextTest.html

 pgo      25.33
 non-pgo  22.46  = 1.13x slower
(Assignee)

Comment 22

5 years ago
I tried some Gmail interactions. Clicking reply didn't work because there was no visual effect for the click, so I couldn't get a start time. I successfully measured the time between clicking the login button and the password dialog appearing, the time between clicking the login button and seeing the emails, and the time between clicking an email and seeing it.

I did 2 pgo trials under the assumption that non-pgo trial 1 was warming up some caches. The camera is 30 fps so there is an inherent imprecision of something like +/- 0.033 seconds. Times are in seconds.

                          pwd dialog            show list          show email
 non-pgo trial 1             0.20                  2.83               0.20
 pgo                         0.13                  2.67               0.23
 non-pgo trial 2             0.17                  2.70               0.17

Hard to see what this means. I suspect io latency and animation timers are in play, so there could be CPU differences but they are too small to observe against those latencies.

So far, in summary, it seems that we have seen:

 - a clear difference for direct tests of DOM and layout speed (dromaeo, the guimark2 test) of about 1.1-1.2x in most cases (but 1.4x in one case)
 - a possibly real small difference on SunSpider of about 3%.
 - did not see clear differences in startup time
 - did not see clear differences for gmail interaction

I'm going to close this bug but feel free to ask more questions and reopen.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
(Reporter)

Comment 23

5 years ago
Thanks a lot, David.  This is very helpful.
You need to log in before you can comment on or make changes to this bug.