Last Comment Bug 708641 - Firefox much slower at sprite perf test than Chrome
: Firefox much slower at sprite perf test than Chrome
: perf
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: x86_64 Linux
: -- normal (vote)
: mozilla12
Assigned To: Brian Hackett (:bhackett)
: 909728 (view as bug list)
Depends on:
Blocks: WebJSPerf 716121
  Show dependency treegraph
Reported: 2011-12-08 08:35 PST by nemo
Modified: 2013-08-27 07:36 PDT (History)
11 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

Profiling of the sprite test (844.58 KB, text/plain)
2011-12-14 13:47 PST, nemo
no flags Details
patch (0ac1cbff2a67) (5.09 KB, patch)
2012-01-05 12:30 PST, Brian Hackett (:bhackett)
luke: review+
Details | Diff | Review
3D sprite test, after bhackett's patch (334.96 KB, text/plain)
2012-01-06 07:46 PST, nemo
no flags Details
2D sprite test. This is also after his patch (311.71 KB, text/plain)
2012-01-06 07:48 PST, nemo
no flags Details

Description nemo 2011-12-08 08:35:17 PST
So, ran into:
off this article:

Off of

On my machine (Ubuntu 11.04, Radeon HD 4670, fglrx, Intel Core i5 660@3.33GHz) Chromium 15 managed ~10,300 sprites at 30fps.
Firefox 11a was closer to ~1700. 

What was interesting was Firefox 8 which managed in the range of ~2100.  I tried disabling tracing and restarting.  If anything Firefox 8 seemed slightly faster.

In all cases I let the render test run for about a minute to let it settle down a little.
Comment 1 nemo 2011-12-08 10:07:17 PST
Oh. Grabbed Firefox 9b5.
It managed ~2350-2400 w/ tracejit enabled, and ~2500-2600 with it disabled.
So at least the beta is fine :)
Comment 2 nemo 2011-12-14 13:47:48 PST
Created attachment 581768 [details]
Profiling of the sprite test

Besides the regression, there's also the fact that at its fastest Firefox was 4 times slower than Chrome.
Attaching profiling from latest M-C
Comment 3 Boris Zbarsky [:bz] 2011-12-14 14:50:11 PST
OK.  I can definitely reproduce the regression.

Profiling shows that something like 92% of our time is under EnterMethodJIT.  There's some GL stuff, but the main bits are:

  57% under stubs::GetProp (of which 24% is in PropertyTable::search)
  16% in jitcode
   6% under stubs::callProp (of which 4% is in PropertyTable::search)

There are some other search() callers as well (from property cache fills and whatnot), so total time in PropertyTable::search is 33% of overall CPU time.

Another 13% is code in (not under, in the function itself) js_GetPropertyHelper.  Another 11% is in (not under) stubs::GetProp.

So we're first of all not ICing whatever property accesses these are, and spending a _lot_ of time looking for the properties for some reason.

As far as I can tell, Aurora is slower than Beta here, and Nightly slower still...
Comment 4 nemo 2012-01-03 13:06:10 PST
So. Reran them all today, since this whole thing is a little fuzzy.

On my machine, after leaving each one running for ~10k ticks in a new clean profile and watching for where 30fps seemed to settle:
FF8     - ~1800
FF9b5   - ~1800  (just redownloaded it off the build server)
FF9.0.1 - ~1900
FF10b   - ~2000
FF12a   - ~1900

Chromium 15 - ~13,500 (Chrome drifted a lot, but did stay consistently above 13,000 - sometimes closer to 14,000)

Sooo. I'm not really sure now what the pattern is.  Again the beta seemed to do a bit better.  Different numbers, but I was trying to pick a more standard point to measure.

Only point of consistency was that Chromium continued to dominate, in fact, doing better than my first time I tried to measure it.

Oh well, at least builds past FF9 don't seem to be getting obviously *worse* :)
Comment 5 Boris Zbarsky [:bz] 2012-01-03 22:34:15 PST
What we really need here is to figure out why those PropertyTable::search calls are taking so long (or happening so much, or both)... and why we end up in those stubcalls to start with.
Comment 6 Brian Hackett (:bhackett) 2012-01-05 11:37:50 PST
The ICs are being disabled because they get used with many objects (not sure how many) and all those objects are in dictionary mode, hence different shape lineages, max stubs reached on the IC and poor property cache performance once we are in the VM.  From the couple hits on one of the disabled ICs I looked at, the property structure of the objects is identical so if the objects were not in dictionary mode the ICs should work.

The objects are in dictionary mode because the script Object.seal()'s them, and the seal() implementation interacts poorly with the object code in that any non-empty object will be converted to dictionary mode (this is the case both before and after objshrink).  I'll write a patch to fix seal() and its friend freeze().
Comment 7 Brian Hackett (:bhackett) 2012-01-05 12:30:59 PST
Created attachment 586182 [details] [diff] [review]
patch (0ac1cbff2a67)

This seems to do the trick on the access I've been looking at.  Using (now works in nightlies!) this access is one of the family of property accesses which seem to be dominating the time spent in jitcode.
Comment 8 nemo 2012-01-05 14:21:36 PST
Wow! On a quick build of M-C on my system I now get ~7000. A big improvement!

Might do a little better w/ optimisation.
So.  From ~7x slower to ~2x slower.

Object.seal? Man. That seems totally unrelated to sprite rendering. Why would they add stuff like that :-/
I wonder if they thought it would improve optimisation in JS engines or something.

Oh well. Cool!
Comment 9 Luke Wagner [:luke] 2012-01-05 15:22:58 PST
Comment on attachment 586182 [details] [diff] [review]
patch (0ac1cbff2a67)

Nice patch.  It is nice that Object.freeze(), which seems like it would be a good/hygienic thing for code to do as much they can doesn't also destroy perf.

>+        ::Reverse(shapes.begin(), shapes.end());

Can you drop the ::?

>+        for (size_t i = 0, len = shapes.length(); i < len; i++) {

Pre-existing, but can you use i < shapes.length() (and i < props.length() below)?
Comment 10 Boris Zbarsky [:bz] 2012-01-05 19:40:21 PST
> So.  From ~7x slower to ~2x slower.

On my box, comparing to Chrome dev, an opt build with this patch hits about 11500.  Chrome hits about 7500.  But this is on a Mac.  Results might differ by OS?  Or by 32-bit vs 64-bit?

In any case, an updated profile shows 60% of the time in mjit-generated code, a bit of stubs::StrictNe (at least sometimes on strings), array_push, various WebGL stuff, about 10% painting and 5% event loop overhead.
Comment 11 Brian Hackett (:bhackett) 2012-01-05 19:55:26 PST
Pushed, with a couple tweaks.  Removed a bogus assertion that method properties are writable; there is no problem with making method non-writable, and the barrier can still be tripped.  Also marked properties as reconfigured, which does inhibit type based optimizations some.  As Object.seal/freeze get more pickup analysis precision could be improved here and the optimizations restored.
Comment 12 Marco Castelluccio [:marco] 2012-01-06 01:49:25 PST
(In reply to Boris Zbarsky (:bz) from comment #10)
> On my box, comparing to Chrome dev, an opt build with this patch hits about
> 11500.  Chrome hits about 7500.  But this is on a Mac.  Results might differ
> by OS?  Or by 32-bit vs 64-bit?

I think it depends on the OS. On Windows, Chrome (16) is a lot faster than Firefox. Without the patch Firefox ~4000, Chrome ~18000.

Did you try also the Canvas version? (
With Canvas without the patch, Firefox ~1800, Chrome ~2100. So with the patch it could be also faster, nemo could you try?

If it depends on the OS and the difference is so high between WebGL and Canvas, maybe it's due to a webgl issue?
Comment 13 Boris Zbarsky [:bz] 2012-01-06 07:18:21 PST
> Without the patch Firefox ~4000, Chrome ~18000.

There's not point testing without the patch.  Without the patch, the numbers on my machine were 5x lower for Firefox, so if that reproduces on your hardware Firefox with the patch would be at ~20000.  Can you test a built with the patch, please?  The ones at for example should have this patch.

> Did you try also the Canvas version?

No, but I just tried it.  With the patch, the canvas version is ~550 in Firefox and ~450 on Chrome dev, still both on Mac.  A quick profile shows that this version is completely gated on drawImage performance on Mac (90% of the time under drawImage), which is not suprising; on Windows with direct2d the situation might look different.
Comment 14 nemo 2012-01-06 07:29:35 PST
2D canvas test, after 5k+ ticks:
Firefox 12a - ~550
M-C + patch - ~430

Chromium 15 - ~3200

Values fluctuated more, perhaps just 'cause the noise was even stronger at the lower sprite counts.
M-C doing worse is probably just 'cause my build, even w/ optimisation enabled, is not as tweaked as the nightlies?

Anyway. ~6x faster than nightly.

Should be a separate bug perhaps.
Comment 15 Boris Zbarsky [:bz] 2012-01-06 07:32:20 PST
> M-C doing worse is probably just 'cause my build, even w/ optimisation enabled, is not as
> tweaked as the nightlies?

If you're on Windows and didn't build with PGO, yes.  Could you file a separate bug on the Windows performance on that canvas testcase, please?  It totally doesn't match what I see here...
Comment 16 nemo 2012-01-06 07:36:52 PST
Uh. No. That was someone else under Windows.

Different systems are being thrown about here, sooo. Here's everything about mine.
Running XFCE4 under Ubuntu 11.10.

M-C built with gcc 4.6.1-9ubuntu3.  Flags: --enable-optimize --disable-debug --enable-jprof --enable-profiling MOZ_DEBUG_SYMBOLS=1 -gstabs+ --disable-install-strip --enable-default-toolkit=cairo-gtk2

VGA compatible controller: ATI Technologies Inc RV730XT [Radeon HD 4670]
[155170.050] (II) Module fglrxdrm: vendor="FireGL - ATI Technologies Inc."
[155170.050]    compiled for, module version = 8.91.4

model name      : Intel(R) Core(TM) i5 CPU         660  @ 3.33GHz

And a little glxgears output at default window size of 300x300:
38379 frames in 5.0 seconds = 7675.734 FPS
37995 frames in 5.0 seconds = 7598.864 FPS
37905 frames in 5.0 seconds = 7580.973 FPS
37764 frames in 5.0 seconds = 7552.654 FPS
Comment 17 Boris Zbarsky [:bz] 2012-01-06 07:38:53 PST
OK, that could just be an issue of Chrome using skia and us using RENDER and RENDER sucking on your particular machine (for the 2d cavas case).  It's really hard to tell without you profiling.
Comment 18 nemo 2012-01-06 07:44:07 PST
Ah. Could be related to the consistently low scores this card has yielded in things like HWACCEL where even my Intel card kicked its ass.
You guys have mentioned in the past I think that ATI/fglrx has horrible x render support.
Comment 19 Marco Castelluccio [:marco] 2012-01-06 07:44:37 PST
Firefox 2x faster than Chrome with Canvas.
Firefox ~29000 with WebGL, Chrome ~18000. Firefox is using a bit more memory, but it's normal as there are 11000 more objects!

However, your patch does wonders :D

Later I'll try under Linux.
Comment 20 nemo 2012-01-06 07:46:37 PST
Created attachment 586423 [details]
3D sprite test, after bhackett's patch
Comment 21 nemo 2012-01-06 07:48:24 PST
Created attachment 586424 [details]
2D sprite test. This is also after his patch
Comment 22 Boris Zbarsky [:bz] 2012-01-06 08:54:42 PST
So here's the thing.  For a testcase like this, there are two aspects to the performance, usually: JS and graphics (and sometimes also DOM).

The JS bit is cross-platform, modulo some minor differences between 32-bit and 64-bit.

The graphics bit is very platform-dependent.

It sounds like the patch in this bug more or less addressed the obvious JS issues, leaving the graphics ones.  nemo, want to file a specific bug for Linux for that?  We can try to figure out exactly what's going on with your setup there...
Comment 23 nemo 2012-01-06 10:18:28 PST
To avoid even more bug spam, I'm not going to attach these.  But it seemed worth mentioning.
bz asked me to rerun the traces only w/ callgraphs enabled.

I put them here.
Comment 24 Marco Castelluccio [:marco] 2012-01-06 15:20:53 PST
On Linux (gfx card is NVIDIA).
For WebGL, Firefox ~16000, Chrome ~12500
For Canvas, Firefox ~850 (~800 without the patch, so here the problem is the graphics), Chrome ~2200.

I'll try with Skia as soon as possible.

However, how can WebGL be faster on Windows than on Linux? Isn't the NVIDIA binary driver the same between these two platforms?
Comment 25 Ed Morley [:emorley] 2012-01-06 15:53:15 PST
Comment 26 nemo 2012-01-06 16:32:24 PST
Marco, the driver is not the same, no.  Also, other things may muddy the situation.  Just saying "under Linux" is a bit vague.
For example, FOTN rendered about 20% faster for me when I disabled compositing.
Certain compiz plugins seemed particularly heavy.  gnome-shell? Unity? Metacity? XFCE4? Which nvidia binary blob under linux?
Comment 27 Marco Castelluccio [:marco] 2012-01-06 16:43:33 PST
Are you sure? Afaik the NVIDIA binary blob shares a lot of code between Windows and Linux.
I'm using Unity, so Compiz. The version of the NVIDIA driver is the latest stable released, 285.

However I've opened bug 716121 about the bad gfx performance on Canvas.
Comment 28 Boris Zbarsky [:bz] 2012-01-06 17:01:58 PST
Also, some of the "webgl" time is spent in our own code, and that might well be faster on windows because of the better compiler.
Comment 29 Brian Hackett (:bhackett) 2013-08-27 07:36:33 PDT
*** Bug 909728 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.