Last Comment Bug 712261 - HUGE performance regression on FF9+ (sunspider benchmark)
: HUGE performance regression on FF9+ (sunspider benchmark)
Status: RESOLVED FIXED
[qa+]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: 9 Branch
: x86 Linux
: -- major (vote)
: mozilla12
Assigned To: David Anderson [:dvander]
:
Mentors:
: 713063 (view as bug list)
Depends on: 696291
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-20 04:19 PST by Miguel Angel
Modified: 2012-03-29 12:14 PDT (History)
14 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
-
affected
+
fixed


Attachments
disable sse2 optimizations if not available (18.72 KB, patch)
2012-01-09 14:44 PST, David Anderson [:dvander]
no flags Details | Diff | Splinter Review
v2 (20.73 KB, patch)
2012-01-12 16:00 PST, David Anderson [:dvander]
bhackett1024: review+
akeybl: approval‑mozilla‑beta-
Details | Diff | Splinter Review

Description Miguel Angel 2011-12-20 04:19:00 PST
User Agent: Mozilla/5.0 (X11; U; Linux i686; es-ES; rv:1.9.2.24) Gecko/20111101 Firefox/3.6.24
Build ID: 2011110100

Steps to reproduce:

Test javascript performance with Sunspider 0.9.1 benchmark, using official i686 linux builds.


Actual results:

Performance dropped 5-fold from Firefox 8.x to Firefox 9+
Currently FF9 performance is much worse than FF3.6 (2614.2ms in FF3.6)

FF9+ (beta channel) sunspider run:
============================================
RESULTS (means and 95% confidence intervals)
--------------------------------------------
Total:                 4263.0ms +/- 6.0%
--------------------------------------------

  3d:                   658.4ms +/- 10.4%
    cube:               204.7ms +/- 6.4%
    morph:              221.2ms +/- 6.9%
    raytrace:           232.5ms +/- 21.1%

  access:               931.1ms +/- 5.5%
    binary-trees:        94.4ms +/- 22.6%
    fannkuch:           452.1ms +/- 6.5%
    nbody:              241.2ms +/- 3.3%
    nsieve:             143.4ms +/- 2.8%

  bitops:               816.9ms +/- 7.1%
    3bit-bits-in-byte:  110.6ms +/- 7.7%
    bits-in-byte:       164.6ms +/- 24.1%
    bitwise-and:        333.1ms +/- 2.3%
    nsieve-bits:        208.6ms +/- 17.9%

  controlflow:          110.0ms +/- 19.6%
    recursive:          110.0ms +/- 19.6%

  crypto:               357.6ms +/- 20.0%
    aes:                167.0ms +/- 19.5%
    md5:                100.0ms +/- 29.7%
    sha1:                90.6ms +/- 10.4%

  date:                 224.6ms +/- 2.8%
    format-tofte:       132.2ms +/- 1.2%
    format-xparb:        92.4ms +/- 6.8%

  math:                 548.4ms +/- 5.1%
    cordic:             220.6ms +/- 1.1%
    partial-sums:       192.1ms +/- 9.6%
    spectral-norm:      135.7ms +/- 6.8%

  regexp:                26.8ms +/- 2.1%
    dna:                 26.8ms +/- 2.1%

  string:               589.2ms +/- 20.6%
    base64:             102.6ms +/- 12.1%
    fasta:              182.3ms +/- 22.2%
    tagcloud:           121.6ms +/- 26.5%
    unpack-code:         77.4ms +/- 22.6%
    validate-input:     105.3ms +/- 30.8%

FF8 sunspider run:
============================================
RESULTS (means and 95% confidence intervals)
--------------------------------------------
Total:                  856.5ms +/- 5.8%
--------------------------------------------

  3d:                   184.7ms +/- 4.2%
    cube:                56.0ms +/- 2.1%
    morph:               20.3ms +/- 2.9%
    raytrace:           108.4ms +/- 7.1%

  access:               130.1ms +/- 7.3%
    binary-trees:        58.6ms +/- 12.3%
    fannkuch:            27.7ms +/- 5.8%
    nbody:               18.2ms +/- 23.9%
    nsieve:              25.6ms +/- 10.5%

  bitops:                35.2ms +/- 25.1%
    3bit-bits-in-byte:    1.4ms +/- 26.4%
    bits-in-byte:        19.0ms +/- 43.0%
    bitwise-and:          3.0ms +/- 0.0%
    nsieve-bits:         11.8ms +/- 33.3%

  controlflow:           62.6ms +/- 11.2%
    recursive:           62.6ms +/- 11.2%

  crypto:                52.6ms +/- 7.8%
    aes:                 28.1ms +/- 12.8%
    md5:                 17.3ms +/- 8.5%
    sha1:                 7.2ms +/- 7.8%

  date:                 105.6ms +/- 15.5%
    format-tofte:        64.7ms +/- 2.4%
    format-xparb:        40.9ms +/- 40.0%

  math:                  60.5ms +/- 13.2%
    cordic:              28.7ms +/- 16.1%
    partial-sums:        18.7ms +/- 12.7%
    spectral-norm:       13.1ms +/- 8.7%

  regexp:                32.9ms +/- 13.2%
    dna:                 32.9ms +/- 13.2%

  string:               192.3ms +/- 6.4%
    base64:               8.4ms +/- 17.6%
    fasta:               44.4ms +/- 12.9%
    tagcloud:            59.2ms +/- 1.7%
    unpack-code:         58.8ms +/- 7.9%
    validate-input:      21.5ms +/- 29.8%


Expected results:

Performance should have increased or stayed the same.
Comment 1 Miguel Angel 2011-12-20 04:28:58 PST
FF 11.0a1 nightly is no better:

RESULTS (means and 95% confidence intervals)
--------------------------------------------
Total:                  5134.6ms +/- 11.2%
Comment 2 Jan de Mooij [:jandem] 2011-12-20 05:28:44 PST
If you have extensions like Firebug installed, can you disable or uninstall them and see if that helps?

If that does not help, please make sure javascript.options.methodjit.content is enabled in about:config, and let us know whether you can reproduce with a clean profile (http://support.mozilla.com/en-US/kb/Managing-profiles).
Comment 3 Miguel Angel 2011-12-20 06:16:10 PST
I did all the tests with a new profile created just for that.
No extensions at all (except Feedback 1.1.2)
I confirm javascript.options.methodjit.content is enabled (by default)
Comment 4 Boris Zbarsky [:bz] 2011-12-21 11:03:04 PST
Miguel, would you be willing to use http://harthur.github.com/mozregression/ to figure out when the problem appears for you?  This is the first report of this that we have, so presumably something specific about your exact configuration is relevant...
Comment 5 Miguel Angel 2011-12-21 12:33:22 PST
Sure!

Last good nightly: 2011-08-29
First bad nightly: 2011-08-30

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2011-08-29&enddate=2011-08-30


Can I do anything else to help?
Comment 6 Boris Zbarsky [:bz] 2011-12-21 12:45:52 PST
Hmm.  That looks like the TI landing.  That's ... really odd.

Are these 32-bit or 64-bit Linux builds?
Comment 7 Miguel Angel 2011-12-21 12:51:40 PST
Good old 32 bit.
Comment 8 Matthias Versen [:Matti] 2011-12-22 14:50:54 PST
*** Bug 713063 has been marked as a duplicate of this bug. ***
Comment 9 Boris Zbarsky [:bz] 2011-12-22 15:10:38 PST
Miguel, could you attach your about:support page here?
Comment 10 Boris Zbarsky [:bz] 2011-12-22 15:11:48 PST
Miguel, what's your exact CPU hardware?
Comment 11 Miguel Angel 2011-12-22 15:58:28 PST
My CPU is an AMD AthlonXP (cpu family 6, model 10)

about:support

  Application Basics

        Name
        Firefox

        Version
        9.0

        User Agent
        Mozilla/5.0 (X11; Linux i686; rv:9.0) Gecko/20100101 Firefox/9.0

        Profile Directory

          Open Containing Folder

        Enabled Plugins

          about:plugins

        Build Configuration

          about:buildconfig

        Crash Reports

          about:crashes

        Memory Use

          about:memory

  Extensions

        Name

        Version

        Enabled

        ID

        Feedback
        1.1.2
        true
        testpilot@labs.mozilla.com

        openSUSE Firefox Extensions
        1.0.1
        false
        susefox@opensuse.org

  Modified Preferences

      Name

      Value

        browser.places.smartBookmarksVersion
        2

        browser.startup.homepage_override.buildID
        20111212185108

        browser.startup.homepage_override.mstone
        rv:9.0

        extensions.lastAppVersion
        9.0

        network.cookie.prefsMigrated
        true

        places.history.expiration.transient_current_max_pages
        39363

        places.history.expiration.transient_optimal_database_size
        62980096

        privacy.sanitize.migrateFx3Prefs
        true

  Graphics

        Adapter Description
        GLXtest process failed (exited with status 1): GLX version older than the required 1.3

        WebGL Renderer
        Blocked for your graphics card because of unresolved driver issues.

        GPU Accelerated Windows
        0/1. Blocked for your graphics driver version. Try updating your graphics driver to version <Anything with EXT_texture_from_pixmap support> or newer.
Comment 12 David Mandelin [:dmandelin] 2011-12-22 15:59:36 PST
Miguel, could you try a few things for me?

1. Could you try SunSpider in version 8 with javascript.options.methodjit.content=false? This is to test the possibility that in 9, things are running in the interpreter only.

2. Could you try the V8 benchmarks in version 9? I'd like to know if it's specific to SunSpider or if it affects everything.
Comment 13 Miguel Angel 2011-12-22 16:42:45 PST
1. SunSpider in version 8 with javascript.options.methodjit.content=false: 
Just as fast as javascript.options.methodjit.content=true

RESULTS (means and 95% confidence intervals)
--------------------------------------------
Total:                  899.2ms +/- 8.8%
--------------------------------------------

2. V8 benchmarks (version 6):
I'd say it affects everything. Score is about 6 times lower.

FF8:
Score: 889
Richards: 2962
DeltaBlue: 1256
Crypto: 2972
RayTrace: 291
EarleyBoyer: 380
RegExp: 625
Splay: 575

FF9:
Score: 150
Richards: 69.1
DeltaBlue: 80.9
Crypto: 99.4
RayTrace: 146
EarleyBoyer: 260
RegExp: 214
Splay: 387
Comment 14 Boris Zbarsky [:bz] 2011-12-22 17:37:31 PST
> This is to test the possibility that in 9, things are running in the interpreter only.

Given comment 11, that's exactly what's happening.  "AMD AthlonXP (cpu family 6, model 10)" would probably be a CPU without SSE2 support according to <http://en.wikipedia.org/wiki/SSE2#Notable_IA-32_CPUs_not_supporting_SSE2>.  Miguel, could you confirm by looking at your /proc/cpuinfo ?

TraceMonkey worked on such CPUs, with runtime SSE2 detection.   JaegerMonkey does not: it just disables itself if SSE2 is not available.  So Miguel is getting pure-interp performance in 9.  That matches comment 13: disabling JM in 8 should be a performance hit on Sunspider if it were actually working.
Comment 15 Miguel Angel 2011-12-23 09:06:04 PST
That is correct, there is no SSE2 in this cpu:
flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow up
Comment 16 Boris Zbarsky [:bz] 2011-12-23 15:51:12 PST
Yeah, then the regression is expected: you no longer have a working JIT.  :(
Comment 17 Miguel Angel 2011-12-24 13:02:32 PST
I really hope it doesn't mean this is a wontfix.
There are a lot of 32 bit AMD cpus out there, and I don't think it is very wise to drop JIT altogether just because they lack a few altivec instructions wich provide a marginal gain at best.
Comment 18 Martin 2011-12-24 15:21:22 PST
Setting "javascript.options.typeinference" to false seems to enable the JIT on my Athlon XP. I assume this is using TraceMonkey.

Correct me if I'm wrong, but I don't see any obvious x87 instructions in the Nitro assembler source code, so it looks like adding support for non-SSE2 chips would be a big job.
Comment 19 David Anderson [:dvander] 2011-12-24 15:59:20 PST
The problem is that the x87 fpu is annoying to work with. It's a completely different instruction set. The trace JIT (which is removed in Fx11+) dealt with x87 by assuming it only had one register, so it generated bad code but at least it generated something. SSE2 is something normal and easy to use.

One option for JaegerMonkey would be to disable floating-point optimizations entirely if SSE2 isn't present. You'd still generate JIT code but floating point math would call out to C++, making it much slower than on a slightly newer CPU that had modern extensions. bug 696291 does this (it also disables type inference if there's no SSE2).
Comment 20 Martin 2011-12-27 08:00:30 PST
> One option for JaegerMonkey would be to disable floating-point optimizations
> entirely if SSE2 isn't present.

Well I would vote for that if it's easy enough to do, it's bound to be better than nothing.
Comment 21 David Mandelin [:dmandelin] 2012-01-03 15:41:28 PST
(In reply to David Anderson [:dvander] from comment #19)
> One option for JaegerMonkey would be to disable floating-point optimizations
> entirely if SSE2 isn't present. You'd still generate JIT code but floating
> point math would call out to C++, making it much slower than on a slightly
> newer CPU that had modern extensions. bug 696291 does this (it also disables
> type inference if there's no SSE2).

Is that patch ready to land?
Comment 22 David Anderson [:dvander] 2012-01-09 13:49:31 PST
(In reply to David Mandelin from comment #21)
> Is that patch ready to land?

No, it doesn't seem to apply at all. I'll rebase.
Comment 23 David Anderson [:dvander] 2012-01-09 14:44:50 PST
Created attachment 587160 [details] [diff] [review]
disable sse2 optimizations if not available

pushed to try
Comment 24 David Anderson [:dvander] 2012-01-09 19:34:13 PST
Could anyone with this problem try a build from here:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/danderson@mozilla.com-41c5e3bac27d

(avoid the -debug ones)
Comment 25 fnitram 2012-01-10 01:26:26 PST
Huge performance regression fixed for me.

Sunspider :
 Before (ff 8) : 1080 ms
 Today (ff 9) : 7800 ms
 After (ff 12a1 (2012-01-09)) : 1370 ms

Good Job B)
Thanks
Comment 26 Miguel Angel 2012-01-10 10:12:50 PST
FF 12.0a1

RESULTS (means and 95% confidence intervals)
--------------------------------------------
Total:                 1097.9ms +/- 1.6%

That's much better, 30% slower than FF8 but 2.2 times faster than FF3.6
May I suggest that this patch go in FF10? Since FF10 will be a LTS release, it wouldn't be wise to leave a good chunk of users with a low performance LTS.
Comment 27 David Anderson [:dvander] 2012-01-10 10:33:39 PST
Thanks for testing! I'm confident this patch basically works but need to see why it came up orange on the tryserver.
Comment 28 Henri Sivonen (:hsivonen) 2012-01-11 01:48:44 PST
Nominating for tracking-firefox10 even though it's very late, because it would be bad not to have a JIT in ESR (for some CPUs).
Comment 29 Alex Keybl [:akeybl] 2012-01-12 12:53:46 PST
If this proves to be an issue for enterprise, we can consider uplifting after the 10.0 release. Let's wait for their feedback before tracking though.
Comment 30 David Anderson [:dvander] 2012-01-12 16:00:12 PST
Created attachment 588224 [details] [diff] [review]
v2

fixes orange
Comment 31 Brian Hackett (:bhackett) 2012-01-12 16:26:37 PST
Comment on attachment 588224 [details] [diff] [review]
v2

Review of attachment 588224 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/methodjit/FastArithmetic.cpp
@@ +244,5 @@
>      bool canDoIntMath = op != JSOP_DIV && type != JSVAL_TYPE_DOUBLE &&
>                          !(rhs->isType(JSVAL_TYPE_DOUBLE) || lhs->isType(JSVAL_TYPE_DOUBLE));
>  
> +    if (!canDoIntMath || (frame.haveSameBacking(lhs, rhs) && !masm.supportsFloatingPoint()))
> +        return jsop_binary_slow(op, stub, type, lhs, rhs);

This test looks wrong, won't we always make a stub call instead of going through jsop_binary_double, even if the CPU has SSE2?

@@ +1627,2 @@
>  
> +        if (!lhs->isTypeKnown() || !rhs->isTypeKnown()) {

Can the code below this test just be unconditional?  This opaque test is confusing.

@@ +1718,3 @@
>  
> +        /* Link all incoming slow paths to here. */
> +        if (!lhs->isTypeKnown() || !rhs->isTypeKnown()) {

Ditto.
Comment 32 David Anderson [:dvander] 2012-01-13 17:33:07 PST
Okay thanks, I've fixed those things and sent to try.
Comment 33 David Anderson [:dvander] 2012-01-17 19:07:58 PST
https://bugzilla.mozilla.org/show_bug.cgi?id=712261
Comment 34 David Anderson [:dvander] 2012-01-24 17:40:36 PST
I botched comment #33 - this landed a week ago and appears to have stuck. It should appear in Firefox 12.
Comment 35 Boris Zbarsky [:bz] 2012-01-25 01:06:18 PST
Is this something that is worth backporting to 11?
Comment 36 David Anderson [:dvander] 2012-02-23 14:09:47 PST
Comment on attachment 588224 [details] [diff] [review]
v2

[Approval Request Comment]
Regression caused by (bug #): bug 698201
User impact if declined: older CPUs will have extremely slow JS (no JIT)
Testing completed (on m-c, etc.): yes
Risk to taking this patch (and alternatives if risky):
String changes made by this patch:
Comment 37 Alex Keybl [:akeybl] 2012-02-23 14:33:14 PST
(In reply to David Anderson [:dvander] from comment #36)
> Risk to taking this patch (and alternatives if risky):

Can you address the risk to uplifting to beta in our second to last beta?
Comment 38 David Anderson [:dvander] 2012-02-23 15:21:58 PST
The risk is it could introduce a JIT bug, either a regression, or could expose an existing bug that users with older CPUs wouldn't have otherwise seen.
Comment 39 Alex Keybl [:akeybl] 2012-02-27 08:56:03 PST
Comment on attachment 588224 [details] [diff] [review]
v2

[Triage Comment]
Given the risk evaluation and the fact that this is not a regression from FF10, let's let this bake more and then release with FF12. Thanks David.

Note You need to log in before you can comment on or make changes to this bug.