833890 - Figure out how much of a win PGO builds actually are

Reporter

Description

•

13 years ago

We need somebody to measurements on a bunch of benchmarks that we consider interesting to see how much of a win PGO builds are on Windows. Dave, word on the street is that you've done that measurement recently! If that's the case, would you mind sharing the results here? If not, can you please ask someone on your team to do the measurement? I'm aiming to have numbers on this by the next Engineering call. Thanks!

(not currently active) Ted Mielczarek

Updated

•

13 years ago

No longer blocks: 833881

(not currently active) Ted Mielczarek

Updated

•

13 years ago

Blocks: 833881

(not currently active) Ted Mielczarek

Comment 1

•

13 years ago

Dave's original post: https://groups.google.com/d/msg/mozilla.dev.platform/a1ua8-Y29ls/WwxLeaOmo9sJ In that thread I linked to (for example) Dromaeo (DOM) results for PGO vs. non-PGO Windows builds: http://graphs.mozilla.org/graph.html#tests=[[73,94,12],[73,1,12]]&sel=none&displayrange=365&datatype=running You can do this for any Talos test and compare. We could probably write some scripts to find the perf difference on all of our Talos benchmarks between the two types of builds.

(no longer active)

Reporter

Comment 2

•

13 years ago

Thanks for the link, Ted! Do we need to do this experiment on more benchmarks? Or is Dromaeo all that we care about? (/me suspects the answer is that it's not!)

(no longer active)

Reporter

Comment 3

•

13 years ago

Another important factor is that we are not yet in a position to consider turning off PGO for the JS engine, so _maybe_ Dromaeo is the only important benchmark here?

Marco Bonardo [:mak]

Comment 4

•

13 years ago

There's no reason to reduce the investigation just to Talos, other vendors have benchmarks that, even if in some cases are built for other browsers and have bugs, could be usable to test dom, canvas...

(no longer active)

Reporter

Comment 5

•

13 years ago

(In reply to Marco Bonardo [:mak] from comment #4) > There's no reason to reduce the investigation just to Talos, other vendors > have benchmarks that, even if in some cases are built for other browsers and > have bugs, could be usable to test dom, canvas... Yeah, I was not talking about Talos at all!

(no longer active)

Reporter

Comment 6

•

13 years ago

This is the nightly without PGO: <http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/01/2013-01-24-05-41-58-mozilla-central/>, this is the previous nightly with PGO: <https://tbpl.mozilla.org/php/getParsedLog.php?id=19055132&tree=Firefox&full=1>

David Mandelin [:dmandelin]

Assignee

Comment 7

•

13 years ago

Got some first tests done. Cold startup, 3 trials: 1.23 pgo 3979, 5368, 4772 1.24 non-pgo 4540, 4620, 3685 I don't really see any difference. I'd need to do many more trials to get statistical significance, and that would take a long time with all the reboots. I didn't really expect to see a difference given that cold startup is mostly io.

OS: Mac OS X → Windows 7

David Mandelin [:dmandelin]

Assignee

Comment 8

•

13 years ago

Warm startup, 2 successive trials: 1.23 pgo 984, 735 1.24 non-pgo 1579, 741 I think this is also a no-difference. 750 ms is about what it takes on this machine for a warm startup generally. The first trial may not have been fully warmed up.

David Mandelin [:dmandelin]

Assignee

Comment 9

•

13 years ago

SunSpider, 10 trials: 1.23 pgo 260.0, 266.6, 284.7, 285.1, 264.8, 288.7, 269.7, 271.8, 269.4, 266.1 mean = 272.7 1.24 non-pgo 286.0, 278.1, 277.3, 276.6, 274.7, 278.1, 279.3, 282.1, 279.7, 281.0 mean = 279.3 The t test p-value for the difference in the means is 0.0596. That's pretty close to statistical significance with the usual 0.05 value. I'm not entirely sure what to make of it, but I note that the lowest non-pgo score was 274.7, and pgo had 7 scores lower than that value, so I'm inclined to think it is a real difference. The difference I measured in these tests was 6.6 ms or 2.5%.

David Mandelin [:dmandelin]

Assignee

Comment 10

•

13 years ago

Dromaeo DOM runs: 1.23 pgo http://dromaeo.com/?id=189193 total score 2139 1.24 non-pgo http://dromaeo.com/?id=189187 total score 1903 The pgo version scored 1.15x better. The difference broadly held up over the different subtests. The main problem is that I don't know what significance a 1.15x Dromaeo DOM difference would have in actual usage. I suppose various people out there are comparing us based on Dromaeo, though.

David Mandelin [:dmandelin]

Assignee

Comment 11

•

13 years ago

Is there anything else we should be testing, or is the above enough to go on for decision-making?

Boris Zbarsky [:bzbarsky]

Comment 12

•

13 years ago

> I suppose various people out there are comparing us based on Dromaeo, though. Yep. In some ways, that's the biggest issue here. That said, it's interesting to look at the numbers breakdown. For example, the "DOM Attributes" test, "setAttribute" subtest is almost 40% faster with PGO than without. On the other hand, the "element.expando" tests, which are pure jitcode, are of course unaffected. That pattern holds across the board: tests which involve running lots of C++ code are must faster with PGO, while tests that are largely gated on the JIT or bottlenecked on a single simple loop or library call in C++ (e.g. createTextNode) don't win nearly as much. As far as comments earlier in this bug about the JS engine... I thought we already had PGO off for the JS engine on Windows. Is that not the case? What I think is really worth measuring that I don't think we have good numbers for are layout performance. I'm not talking pageload (we measure that with Tp); I'm talking "click the reply button in gmail and see how long that takes".

Ryan VanderMeulen [:RyanVM]

Comment 13

•

13 years ago

(In reply to Boris Zbarsky (:bz) from comment #12) > As far as comments earlier in this bug about the JS engine... I thought we > already had PGO off for the JS engine on Windows. Is that not the case? > It was turned back on awhile ago, but the bug number is escaping me atm.

(no longer active)

Reporter

Comment 14

•

13 years ago

(In reply to comment #12) > What I think is really worth measuring that I don't think we have good numbers > for are layout performance. I'm not talking pageload (we measure that with > Tp); I'm talking "click the reply button in gmail and see how long that takes". Do we have a good benchmark for this kind of thing? Something that we can use to get some numbers? (Microbenchmarks _could_ be useful here, since we're measuring differences in how well the compiler optimizes code, etc.)

Boris Zbarsky [:bzbarsky]

Comment 15

•

13 years ago

I don't know of a good layout performance benchmark offhand, sadly...

Boris Zbarsky [:bzbarsky]

Comment 16

•

13 years ago

But yes, we could try to write a microbenchmark for it.

(no longer active)

Reporter

Comment 17

•

13 years ago

roc, dbaron, do you know of a good layout microbenchmark?

David Mandelin [:dmandelin]

Assignee

Comment 18

•

13 years ago

(In reply to Boris Zbarsky (:bz) from comment #12) > What I think is really worth measuring that I don't think we have good > numbers for are layout performance. I'm not talking pageload (we measure > that with Tp); I'm talking "click the reply button in gmail and see how long > that takes". Videocameras/screen captures work well for that sort of thing. I can do it but probably won't be able to until at least Wednesday.

(no longer active)

Reporter

Comment 19

•

13 years ago

(In reply to comment #18) > (In reply to Boris Zbarsky (:bz) from comment #12) > > What I think is really worth measuring that I don't think we have good > > numbers for are layout performance. I'm not talking pageload (we measure > > that with Tp); I'm talking "click the reply button in gmail and see how long > > that takes". > > Videocameras/screen captures work well for that sort of thing. I can do it but > probably won't be able to until at least Wednesday. In that case, I think that's going to be valuable enough for us to delay having the final conversation by then. Thanks, Dave!

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 20

•

13 years ago

(In reply to :Ehsan Akhgari from comment #17) > roc, dbaron, do you know of a good layout microbenchmark? This one maybe? http://www.craftymind.com/factory/guimark2/HTML5TextTest.html

David Mandelin [:dmandelin]

Assignee

Comment 21

•

13 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #20) > (In reply to :Ehsan Akhgari from comment #17) > > roc, dbaron, do you know of a good layout microbenchmark? > > This one maybe? http://www.craftymind.com/factory/guimark2/HTML5TextTest.html pgo 25.33 non-pgo 22.46 = 1.13x slower

David Mandelin [:dmandelin]

Assignee

Comment 22

•

13 years ago

I tried some Gmail interactions. Clicking reply didn't work because there was no visual effect for the click, so I couldn't get a start time. I successfully measured the time between clicking the login button and the password dialog appearing, the time between clicking the login button and seeing the emails, and the time between clicking an email and seeing it. I did 2 pgo trials under the assumption that non-pgo trial 1 was warming up some caches. The camera is 30 fps so there is an inherent imprecision of something like +/- 0.033 seconds. Times are in seconds. pwd dialog show list show email non-pgo trial 1 0.20 2.83 0.20 pgo 0.13 2.67 0.23 non-pgo trial 2 0.17 2.70 0.17 Hard to see what this means. I suspect io latency and animation timers are in play, so there could be CPU differences but they are too small to observe against those latencies. So far, in summary, it seems that we have seen: - a clear difference for direct tests of DOM and layout speed (dromaeo, the guimark2 test) of about 1.1-1.2x in most cases (but 1.4x in one case) - a possibly real small difference on SunSpider of about 3%. - did not see clear differences in startup time - did not see clear differences for gmail interaction I'm going to close this bug but feel free to ask more questions and reopen.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

(no longer active)

Reporter

Comment 23

•

13 years ago

Thanks a lot, David. This is very helpful.

Bugzilla

Figure out how much of a win PGO builds actually are

Categories

(Core :: General, defect)

Tracking

()

People

(Reporter: ehsan.akhgari, Assigned: dmandelin)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Comment 19

Comment 20

Comment 21

Comment 22

Comment 23