Firefox much slower at sprite perf test than Chrome

RESOLVED FIXED in mozilla12

Status

()

Core
JavaScript Engine
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: nemo, Assigned: bhackett)

Tracking

(Blocks: 1 bug, {perf})

Trunk
mozilla12
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox11-)

Details

(URL)

Attachments

(4 attachments)

(Reporter)

Description

6 years ago
So, ran into:
http://www.scirra.com/labs/perftest-webgl
off this article:
http://www.scirra.com/blog/58/html5-2d-gaming-performance-analysis

Off of planet-webgl.org.

On my machine (Ubuntu 11.04, Radeon HD 4670, fglrx, Intel Core i5 660@3.33GHz) Chromium 15 managed ~10,300 sprites at 30fps.
Firefox 11a was closer to ~1700. 

What was interesting was Firefox 8 which managed in the range of ~2100.  I tried disabling tracing and restarting.  If anything Firefox 8 seemed slightly faster.

In all cases I let the render test run for about a minute to let it settle down a little.
(Reporter)

Comment 1

6 years ago
Oh. Grabbed Firefox 9b5.
It managed ~2350-2400 w/ tracejit enabled, and ~2500-2600 with it disabled.
So at least the beta is fine :)
Blocks: 579390
(Reporter)

Comment 2

6 years ago
Created attachment 581768 [details]
Profiling of the sprite test

Besides the regression, there's also the fact that at its fastest Firefox was 4 times slower than Chrome.
Attaching profiling from latest M-C
OK.  I can definitely reproduce the regression.

Profiling shows that something like 92% of our time is under EnterMethodJIT.  There's some GL stuff, but the main bits are:

  57% under stubs::GetProp (of which 24% is in PropertyTable::search)
  16% in jitcode
   6% under stubs::callProp (of which 4% is in PropertyTable::search)

There are some other search() callers as well (from property cache fills and whatnot), so total time in PropertyTable::search is 33% of overall CPU time.

Another 13% is code in (not under, in the function itself) js_GetPropertyHelper.  Another 11% is in (not under) stubs::GetProp.

So we're first of all not ICing whatever property accesses these are, and spending a _lot_ of time looking for the properties for some reason.

As far as I can tell, Aurora is slower than Beta here, and Nightly slower still...
tracking-firefox11: --- → ?
Keywords: perf, regression

Updated

6 years ago
tracking-firefox11: ? → +
Keywords: regressionwindow-wanted
(Reporter)

Comment 4

6 years ago
So. Reran them all today, since this whole thing is a little fuzzy.

On my machine, after leaving each one running for ~10k ticks in a new clean profile and watching for where 30fps seemed to settle:
FF8     - ~1800
FF9b5   - ~1800  (just redownloaded it off the build server)
FF9.0.1 - ~1900
FF10b   - ~2000
FF12a   - ~1900

Chromium 15 - ~13,500 (Chrome drifted a lot, but did stay consistently above 13,000 - sometimes closer to 14,000)

Sooo. I'm not really sure now what the pattern is.  Again the beta seemed to do a bit better.  Different numbers, but I was trying to pick a more standard point to measure.

Only point of consistency was that Chromium continued to dominate, in fact, doing better than my first time I tried to measure it.

Oh well, at least builds past FF9 don't seem to be getting obviously *worse* :)
Keywords: regression, regressionwindow-wanted
Summary: Regression? Firefox 11a slower at sprite perf test than Firefox 8 (does not seem tracing related) → Firefox much slower at sprite perf test than Chrome
tracking-firefox11: + → -
What we really need here is to figure out why those PropertyTable::search calls are taking so long (or happening so much, or both)... and why we end up in those stubcalls to start with.
(Assignee)

Comment 6

6 years ago
The ICs are being disabled because they get used with many objects (not sure how many) and all those objects are in dictionary mode, hence different shape lineages, max stubs reached on the IC and poor property cache performance once we are in the VM.  From the couple hits on one of the disabled ICs I looked at, the property structure of the objects is identical so if the objects were not in dictionary mode the ICs should work.

The objects are in dictionary mode because the script Object.seal()'s them, and the seal() implementation interacts poorly with the object code in that any non-empty object will be converted to dictionary mode (this is the case both before and after objshrink).  I'll write a patch to fix seal() and its friend freeze().
(Assignee)

Comment 7

6 years ago
Created attachment 586182 [details] [diff] [review]
patch (0ac1cbff2a67)

This seems to do the trick on the access I've been looking at.  Using https://github.com/bhackett1024/CodeInspector (now works in nightlies!) this access is one of the family of property accesses which seem to be dominating the time spent in jitcode.
Assignee: general → bhackett1024
Attachment #586182 - Flags: review?(luke)
(Reporter)

Comment 8

6 years ago
Wow! On a quick build of M-C on my system I now get ~7000. A big improvement!

Might do a little better w/ optimisation.
So.  From ~7x slower to ~2x slower.

Object.seal? Man. That seems totally unrelated to sprite rendering. Why would they add stuff like that :-/
I wonder if they thought it would improve optimisation in JS engines or something.

Oh well. Cool!

Comment 9

6 years ago
Comment on attachment 586182 [details] [diff] [review]
patch (0ac1cbff2a67)

Nice patch.  It is nice that Object.freeze(), which seems like it would be a good/hygienic thing for code to do as much they can doesn't also destroy perf.

>+        ::Reverse(shapes.begin(), shapes.end());

Can you drop the ::?

>+        for (size_t i = 0, len = shapes.length(); i < len; i++) {

Pre-existing, but can you use i < shapes.length() (and i < props.length() below)?
Attachment #586182 - Flags: review?(luke) → review+
> So.  From ~7x slower to ~2x slower.

On my box, comparing to Chrome dev, an opt build with this patch hits about 11500.  Chrome hits about 7500.  But this is on a Mac.  Results might differ by OS?  Or by 32-bit vs 64-bit?

In any case, an updated profile shows 60% of the time in mjit-generated code, a bit of stubs::StrictNe (at least sometimes on strings), array_push, various WebGL stuff, about 10% painting and 5% event loop overhead.
(Assignee)

Comment 11

6 years ago
Pushed, with a couple tweaks.  Removed a bogus assertion that method properties are writable; there is no problem with making method non-writable, and the barrier can still be tripped.  Also marked properties as reconfigured, which does inhibit type based optimizations some.  As Object.seal/freeze get more pickup analysis precision could be improved here and the optimizations restored.

https://hg.mozilla.org/integration/mozilla-inbound/rev/8dc46cdc401b
(In reply to Boris Zbarsky (:bz) from comment #10)
> On my box, comparing to Chrome dev, an opt build with this patch hits about
> 11500.  Chrome hits about 7500.  But this is on a Mac.  Results might differ
> by OS?  Or by 32-bit vs 64-bit?

I think it depends on the OS. On Windows, Chrome (16) is a lot faster than Firefox. Without the patch Firefox ~4000, Chrome ~18000.

Did you try also the Canvas version? (http://www.scirra.com/labs/perftest-2d/)
With Canvas without the patch, Firefox ~1800, Chrome ~2100. So with the patch it could be also faster, nemo could you try?

If it depends on the OS and the difference is so high between WebGL and Canvas, maybe it's due to a webgl issue?
> Without the patch Firefox ~4000, Chrome ~18000.

There's not point testing without the patch.  Without the patch, the numbers on my machine were 5x lower for Firefox, so if that reproduces on your hardware Firefox with the patch would be at ~20000.  Can you test a built with the patch, please?  The ones at http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-inbound/ for example should have this patch.

> Did you try also the Canvas version?

No, but I just tried it.  With the patch, the canvas version is ~550 in Firefox and ~450 on Chrome dev, still both on Mac.  A quick profile shows that this version is completely gated on drawImage performance on Mac (90% of the time under drawImage), which is not suprising; on Windows with direct2d the situation might look different.
(Reporter)

Comment 14

6 years ago
Alright...
2D canvas test, after 5k+ ticks:
Firefox 12a - ~550
M-C + patch - ~430

Chromium 15 - ~3200

Values fluctuated more, perhaps just 'cause the noise was even stronger at the lower sprite counts.
M-C doing worse is probably just 'cause my build, even w/ optimisation enabled, is not as tweaked as the nightlies?

Anyway. ~6x faster than nightly.

Should be a separate bug perhaps.
> M-C doing worse is probably just 'cause my build, even w/ optimisation enabled, is not as
> tweaked as the nightlies?

If you're on Windows and didn't build with PGO, yes.  Could you file a separate bug on the Windows performance on that canvas testcase, please?  It totally doesn't match what I see here...
(Reporter)

Comment 16

6 years ago
Uh. No. That was someone else under Windows.

Different systems are being thrown about here, sooo. Here's everything about mine.
Running XFCE4 under Ubuntu 11.10.

M-C built with gcc 4.6.1-9ubuntu3.  Flags: --enable-optimize --disable-debug --enable-jprof --enable-profiling MOZ_DEBUG_SYMBOLS=1 -gstabs+ --disable-install-strip --enable-default-toolkit=cairo-gtk2

VGA compatible controller: ATI Technologies Inc RV730XT [Radeon HD 4670]
[155170.050] (II) Module fglrxdrm: vendor="FireGL - ATI Technologies Inc."
[155170.050]    compiled for 1.4.99.906, module version = 8.91.4

model name      : Intel(R) Core(TM) i5 CPU         660  @ 3.33GHz

And a little glxgears output at default window size of 300x300:
38379 frames in 5.0 seconds = 7675.734 FPS
37995 frames in 5.0 seconds = 7598.864 FPS
37905 frames in 5.0 seconds = 7580.973 FPS
37764 frames in 5.0 seconds = 7552.654 FPS
OK, that could just be an issue of Chrome using skia and us using RENDER and RENDER sucking on your particular machine (for the 2d cavas case).  It's really hard to tell without you profiling.
(Reporter)

Comment 18

6 years ago
Ah. Could be related to the consistently low scores this card has yielded in things like HWACCEL where even my Intel card kicked its ass.
You guys have mentioned in the past I think that ATI/fglrx has horrible x render support.
Firefox 2x faster than Chrome with Canvas.
Firefox ~29000 with WebGL, Chrome ~18000. Firefox is using a bit more memory, but it's normal as there are 11000 more objects!

However, your patch does wonders :D

Later I'll try under Linux.
(Reporter)

Comment 20

6 years ago
Created attachment 586423 [details]
3D sprite test, after bhackett's patch
(Reporter)

Comment 21

6 years ago
Created attachment 586424 [details]
2D sprite test. This is also after his patch
So here's the thing.  For a testcase like this, there are two aspects to the performance, usually: JS and graphics (and sometimes also DOM).

The JS bit is cross-platform, modulo some minor differences between 32-bit and 64-bit.

The graphics bit is very platform-dependent.

It sounds like the patch in this bug more or less addressed the obvious JS issues, leaving the graphics ones.  nemo, want to file a specific bug for Linux for that?  We can try to figure out exactly what's going on with your setup there...
(Reporter)

Comment 23

6 years ago
To avoid even more bug spam, I'm not going to attach these.  But it seemed worth mentioning.
bz asked me to rerun the traces only w/ callgraphs enabled.

I put them here.
http://m8y.org/tmp/scirra-perftest/
On Linux (gfx card is NVIDIA).
For WebGL, Firefox ~16000, Chrome ~12500
For Canvas, Firefox ~850 (~800 without the patch, so here the problem is the graphics), Chrome ~2200.

I'll try with Skia as soon as possible.

However, how can WebGL be faster on Windows than on Linux? Isn't the NVIDIA binary driver the same between these two platforms?
https://hg.mozilla.org/mozilla-central/rev/8dc46cdc401b
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla12
Blocks: 716121
(Reporter)

Comment 26

6 years ago
Marco, the driver is not the same, no.  Also, other things may muddy the situation.  Just saying "under Linux" is a bit vague.
For example, FOTN rendered about 20% faster for me when I disabled compositing.
Certain compiz plugins seemed particularly heavy.  gnome-shell? Unity? Metacity? XFCE4? Which nvidia binary blob under linux?
Are you sure? Afaik the NVIDIA binary blob shares a lot of code between Windows and Linux.
I'm using Unity, so Compiz. The version of the NVIDIA driver is the latest stable released, 285.

However I've opened bug 716121 about the bad gfx performance on Canvas.
Also, some of the "webgl" time is spent in our own code, and that might well be faster on windows because of the better compiler.
(Assignee)

Updated

4 years ago
Duplicate of this bug: 909728
You need to log in before you can comment on or make changes to this bug.