Closed Bug 708641 Opened 13 years ago Closed 13 years ago

Firefox much slower at sprite perf test than Chrome

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla12
Tracking Status
firefox11 - ---

People

(Reporter: bugs, Assigned: bhackett1024)

References

()

Details

(Keywords: perf)

Attachments

(4 files)

So, ran into: http://www.scirra.com/labs/perftest-webgl off this article: http://www.scirra.com/blog/58/html5-2d-gaming-performance-analysis Off of planet-webgl.org. On my machine (Ubuntu 11.04, Radeon HD 4670, fglrx, Intel Core i5 660@3.33GHz) Chromium 15 managed ~10,300 sprites at 30fps. Firefox 11a was closer to ~1700. What was interesting was Firefox 8 which managed in the range of ~2100. I tried disabling tracing and restarting. If anything Firefox 8 seemed slightly faster. In all cases I let the render test run for about a minute to let it settle down a little.
Oh. Grabbed Firefox 9b5. It managed ~2350-2400 w/ tracejit enabled, and ~2500-2600 with it disabled. So at least the beta is fine :)
Besides the regression, there's also the fact that at its fastest Firefox was 4 times slower than Chrome. Attaching profiling from latest M-C
OK. I can definitely reproduce the regression. Profiling shows that something like 92% of our time is under EnterMethodJIT. There's some GL stuff, but the main bits are: 57% under stubs::GetProp (of which 24% is in PropertyTable::search) 16% in jitcode 6% under stubs::callProp (of which 4% is in PropertyTable::search) There are some other search() callers as well (from property cache fills and whatnot), so total time in PropertyTable::search is 33% of overall CPU time. Another 13% is code in (not under, in the function itself) js_GetPropertyHelper. Another 11% is in (not under) stubs::GetProp. So we're first of all not ICing whatever property accesses these are, and spending a _lot_ of time looking for the properties for some reason. As far as I can tell, Aurora is slower than Beta here, and Nightly slower still...
Keywords: perf, regression
So. Reran them all today, since this whole thing is a little fuzzy. On my machine, after leaving each one running for ~10k ticks in a new clean profile and watching for where 30fps seemed to settle: FF8 - ~1800 FF9b5 - ~1800 (just redownloaded it off the build server) FF9.0.1 - ~1900 FF10b - ~2000 FF12a - ~1900 Chromium 15 - ~13,500 (Chrome drifted a lot, but did stay consistently above 13,000 - sometimes closer to 14,000) Sooo. I'm not really sure now what the pattern is. Again the beta seemed to do a bit better. Different numbers, but I was trying to pick a more standard point to measure. Only point of consistency was that Chromium continued to dominate, in fact, doing better than my first time I tried to measure it. Oh well, at least builds past FF9 don't seem to be getting obviously *worse* :)
Summary: Regression? Firefox 11a slower at sprite perf test than Firefox 8 (does not seem tracing related) → Firefox much slower at sprite perf test than Chrome
What we really need here is to figure out why those PropertyTable::search calls are taking so long (or happening so much, or both)... and why we end up in those stubcalls to start with.
The ICs are being disabled because they get used with many objects (not sure how many) and all those objects are in dictionary mode, hence different shape lineages, max stubs reached on the IC and poor property cache performance once we are in the VM. From the couple hits on one of the disabled ICs I looked at, the property structure of the objects is identical so if the objects were not in dictionary mode the ICs should work. The objects are in dictionary mode because the script Object.seal()'s them, and the seal() implementation interacts poorly with the object code in that any non-empty object will be converted to dictionary mode (this is the case both before and after objshrink). I'll write a patch to fix seal() and its friend freeze().
This seems to do the trick on the access I've been looking at. Using https://github.com/bhackett1024/CodeInspector (now works in nightlies!) this access is one of the family of property accesses which seem to be dominating the time spent in jitcode.
Assignee: general → bhackett1024
Attachment #586182 - Flags: review?(luke)
Wow! On a quick build of M-C on my system I now get ~7000. A big improvement! Might do a little better w/ optimisation. So. From ~7x slower to ~2x slower. Object.seal? Man. That seems totally unrelated to sprite rendering. Why would they add stuff like that :-/ I wonder if they thought it would improve optimisation in JS engines or something. Oh well. Cool!
Comment on attachment 586182 [details] [diff] [review] patch (0ac1cbff2a67) Nice patch. It is nice that Object.freeze(), which seems like it would be a good/hygienic thing for code to do as much they can doesn't also destroy perf. >+ ::Reverse(shapes.begin(), shapes.end()); Can you drop the ::? >+ for (size_t i = 0, len = shapes.length(); i < len; i++) { Pre-existing, but can you use i < shapes.length() (and i < props.length() below)?
Attachment #586182 - Flags: review?(luke) → review+
> So. From ~7x slower to ~2x slower. On my box, comparing to Chrome dev, an opt build with this patch hits about 11500. Chrome hits about 7500. But this is on a Mac. Results might differ by OS? Or by 32-bit vs 64-bit? In any case, an updated profile shows 60% of the time in mjit-generated code, a bit of stubs::StrictNe (at least sometimes on strings), array_push, various WebGL stuff, about 10% painting and 5% event loop overhead.
Pushed, with a couple tweaks. Removed a bogus assertion that method properties are writable; there is no problem with making method non-writable, and the barrier can still be tripped. Also marked properties as reconfigured, which does inhibit type based optimizations some. As Object.seal/freeze get more pickup analysis precision could be improved here and the optimizations restored. https://hg.mozilla.org/integration/mozilla-inbound/rev/8dc46cdc401b
(In reply to Boris Zbarsky (:bz) from comment #10) > On my box, comparing to Chrome dev, an opt build with this patch hits about > 11500. Chrome hits about 7500. But this is on a Mac. Results might differ > by OS? Or by 32-bit vs 64-bit? I think it depends on the OS. On Windows, Chrome (16) is a lot faster than Firefox. Without the patch Firefox ~4000, Chrome ~18000. Did you try also the Canvas version? (http://www.scirra.com/labs/perftest-2d/) With Canvas without the patch, Firefox ~1800, Chrome ~2100. So with the patch it could be also faster, nemo could you try? If it depends on the OS and the difference is so high between WebGL and Canvas, maybe it's due to a webgl issue?
> Without the patch Firefox ~4000, Chrome ~18000. There's not point testing without the patch. Without the patch, the numbers on my machine were 5x lower for Firefox, so if that reproduces on your hardware Firefox with the patch would be at ~20000. Can you test a built with the patch, please? The ones at http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-inbound/ for example should have this patch. > Did you try also the Canvas version? No, but I just tried it. With the patch, the canvas version is ~550 in Firefox and ~450 on Chrome dev, still both on Mac. A quick profile shows that this version is completely gated on drawImage performance on Mac (90% of the time under drawImage), which is not suprising; on Windows with direct2d the situation might look different.
Alright... 2D canvas test, after 5k+ ticks: Firefox 12a - ~550 M-C + patch - ~430 Chromium 15 - ~3200 Values fluctuated more, perhaps just 'cause the noise was even stronger at the lower sprite counts. M-C doing worse is probably just 'cause my build, even w/ optimisation enabled, is not as tweaked as the nightlies? Anyway. ~6x faster than nightly. Should be a separate bug perhaps.
> M-C doing worse is probably just 'cause my build, even w/ optimisation enabled, is not as > tweaked as the nightlies? If you're on Windows and didn't build with PGO, yes. Could you file a separate bug on the Windows performance on that canvas testcase, please? It totally doesn't match what I see here...
Uh. No. That was someone else under Windows. Different systems are being thrown about here, sooo. Here's everything about mine. Running XFCE4 under Ubuntu 11.10. M-C built with gcc 4.6.1-9ubuntu3. Flags: --enable-optimize --disable-debug --enable-jprof --enable-profiling MOZ_DEBUG_SYMBOLS=1 -gstabs+ --disable-install-strip --enable-default-toolkit=cairo-gtk2 VGA compatible controller: ATI Technologies Inc RV730XT [Radeon HD 4670] [155170.050] (II) Module fglrxdrm: vendor="FireGL - ATI Technologies Inc." [155170.050] compiled for 1.4.99.906, module version = 8.91.4 model name : Intel(R) Core(TM) i5 CPU 660 @ 3.33GHz And a little glxgears output at default window size of 300x300: 38379 frames in 5.0 seconds = 7675.734 FPS 37995 frames in 5.0 seconds = 7598.864 FPS 37905 frames in 5.0 seconds = 7580.973 FPS 37764 frames in 5.0 seconds = 7552.654 FPS
OK, that could just be an issue of Chrome using skia and us using RENDER and RENDER sucking on your particular machine (for the 2d cavas case). It's really hard to tell without you profiling.
Ah. Could be related to the consistently low scores this card has yielded in things like HWACCEL where even my Intel card kicked its ass. You guys have mentioned in the past I think that ATI/fglrx has horrible x render support.
Firefox 2x faster than Chrome with Canvas. Firefox ~29000 with WebGL, Chrome ~18000. Firefox is using a bit more memory, but it's normal as there are 11000 more objects! However, your patch does wonders :D Later I'll try under Linux.
So here's the thing. For a testcase like this, there are two aspects to the performance, usually: JS and graphics (and sometimes also DOM). The JS bit is cross-platform, modulo some minor differences between 32-bit and 64-bit. The graphics bit is very platform-dependent. It sounds like the patch in this bug more or less addressed the obvious JS issues, leaving the graphics ones. nemo, want to file a specific bug for Linux for that? We can try to figure out exactly what's going on with your setup there...
To avoid even more bug spam, I'm not going to attach these. But it seemed worth mentioning. bz asked me to rerun the traces only w/ callgraphs enabled. I put them here. http://m8y.org/tmp/scirra-perftest/
On Linux (gfx card is NVIDIA). For WebGL, Firefox ~16000, Chrome ~12500 For Canvas, Firefox ~850 (~800 without the patch, so here the problem is the graphics), Chrome ~2200. I'll try with Skia as soon as possible. However, how can WebGL be faster on Windows than on Linux? Isn't the NVIDIA binary driver the same between these two platforms?
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla12
Marco, the driver is not the same, no. Also, other things may muddy the situation. Just saying "under Linux" is a bit vague. For example, FOTN rendered about 20% faster for me when I disabled compositing. Certain compiz plugins seemed particularly heavy. gnome-shell? Unity? Metacity? XFCE4? Which nvidia binary blob under linux?
Are you sure? Afaik the NVIDIA binary blob shares a lot of code between Windows and Linux. I'm using Unity, so Compiz. The version of the NVIDIA driver is the latest stable released, 285. However I've opened bug 716121 about the bad gfx performance on Canvas.
Also, some of the "webgl" time is spent in our own code, and that might well be faster on windows because of the better compiler.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: