Last Comment Bug 683804 - 2x regression with JM+TI compared to JM+TM on the darkroom subtest of kraken
: 2x regression with JM+TI compared to JM+TM on the darkroom subtest of kraken
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal with 1 vote (vote)
: mozilla10
Assigned To: Brian Hackett (:bhackett)
:
Mentors:
Depends on:
Blocks: infer-perf-regress 691314
  Show dependency treegraph
 
Reported: 2011-08-31 19:45 PDT by Boris Zbarsky [:bz] (TPAC)
Modified: 2012-01-20 09:47 PST (History)
19 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+


Attachments
easy fix (3.01 KB, patch)
2011-10-17 15:05 PDT, Brian Hackett (:bhackett)
dvander: review+
Details | Diff | Splinter Review

Description Boris Zbarsky [:bz] (TPAC) 2011-08-31 19:45:31 PDT
http://www.extremetech.com/computing/94532-firefox-9-javascript-performance-improved-by-20-30-with-type-inference has TI+JM at 2.32 times as slow.  The difference between the two is about 250ms, or about 10% of total kraken time.
Comment 1 Andrew McCreight [:mccr8] 2011-08-31 19:55:19 PDT
From my reading of this article, it sounded like they may have used Nightly and just disabled Type Inference, which I presume has some different performance characteristics compared to TM+JM prior to the TI landing.  I'm not really sure.
Comment 2 Boris Zbarsky [:bz] (TPAC) 2011-08-31 20:09:05 PDT
Unless the argument is that nightly with TI disabled is a lot faster than pre-landing TM+JM was on this test _and_ is faster than TI+JM on this test, comment 1 doesn't really make sense to me....

For what it's worth, http://arewefastyet.com/?a=b&view=breakdown shows the same thing: TI+JM is at 800ms while TM+JM+profiling is at about 400ms.
Comment 3 Andrew McCreight [:mccr8] 2011-08-31 20:10:03 PDT
Oops!  Yes, I was confused.
Comment 4 David Mandelin [:dmandelin] 2011-09-01 13:49:47 PDT
My machine, darkroom:

  Firefox 6   234.9
  Nightly     551.1
Comment 5 Onno 2011-09-09 04:03:05 PDT
I can confirm this as well with Nightly of sep. 8:

Darkroom on Kraken 1.1:
Firefox 6: 363.1
Nightly:   789.0

(No complaints though: overall score went from 10465 to 5479; a 90% improvement!)

Beside this, Gaussian Blur is also slower:
Firefox 6: 783.9
Nightly:   994.3
Comment 6 Boris Zbarsky [:bz] (TPAC) 2011-09-09 08:21:56 PDT
> (No complaints though: overall score went from 10465 to 5479; a 90% improvement!)

I _have_ to comment on this.  When computing change percentages, they're always percentages of the _old_ number.  So this is a 48% improvement.  A 90% improvement would have been the score going to 1047 (aka a 10x speedup).
Comment 7 Jan de Mooij [:jandem] 2011-09-09 09:05:33 PDT
According to Instruments we spend 10% of the time in TypeMonitorResult. I'll try to find out why..

Gaussian Blur needs CSE/GVN. Hard to do in JM but IonMonkey will have this optimization.
Comment 8 Jan de Mooij [:jandem] 2011-09-09 09:56:00 PDT
OK so the problem is that we call Math.log and Math.pow from inlined functions like this:

function FastLog2(x) {
    return Math.log(x) / Math.LN2;
}

The IC is disabled for the Math.log call because "we can't recompile them yet". If I comment out the f.regs.inlined() check (unsafe of course) we are like 300 ms faster...

Brian, how hard is it to fix this?
Comment 9 Brian Hackett (:bhackett) 2011-09-09 10:12:29 PDT
I think the main issue is that when expanding inline frames we can kick a frame into the interpreter while retaining jitcode for the (outer) script making the call.  This doesn't work right when the recompiler steals native stubs, as it will patch up jumps in the stub without fixing the jumps entering the code.  It would need to make sure to reset the IC when stealing the stub, but this should not be hard.  The same would also apply to getter ICs, for which recompilation works the same way as native ICs and which are also disabled for inline frames.
Comment 10 Brian Hackett (:bhackett) 2011-09-10 22:56:02 PDT
Fix allowing native and getter ICs to be generated for inline frames. Takes my darkroom time from 640 to 400 (-mjp is about 300).

http://hg.mozilla.org/projects/jaegermonkey/rev/323595f354b1
Comment 11 Brian Hackett (:bhackett) 2011-10-06 11:42:05 PDT
This was causing crashes and didn't make it into Firefox 9.  It also hurts some of the Peacekeeper tests, such as stringDetect.
Comment 12 David Mandelin [:dmandelin] 2011-10-11 10:28:44 PDT
(In reply to Brian Hackett from comment #11)
> This was causing crashes and didn't make it into Firefox 9.  It also hurts
> some of the Peacekeeper tests, such as stringDetect.

Just to make things clear, did the patch hurt stringDetect, or does not having the patch hurt stringDetect?
Comment 13 Brian Hackett (:bhackett) 2011-10-11 10:41:42 PDT
This patch helps stringDetect, which repeatedly calls RegExp.test outside of a loop.  We inline the run() method call, and don't generate ICs at the test() calls.

There is an easy fix here which is to just mark as uninlineable scripts we try to generate native ICs for.  The time saved by inlining the scripted call is small compared to the cost incurred by not generating the IC.  I don't know yet whether to do the easy fix or the 'real' fix, but given the botch when this initially landed the easy fix is probably better.  I'd like it if I never need to touch the recompiler again.
Comment 14 Brian Hackett (:bhackett) 2011-10-17 15:05:36 PDT
Created attachment 567592 [details] [diff] [review]
easy fix

Easy fix described above, just mark functions as uninlineable when we try to generate native ICs for them (native calls handled via FastBuiltins don't affect inlining).  Gives the same speedup to darkroom, and a lot simpler.
Comment 15 Octoploid 2011-10-18 06:29:37 PDT
x86_64-pc-linux-gnu gcc-4.7.0 profiledbuild

before/after patch

TEST                         COMPARISON            FROM                 TO               DETAILS

====================================================================================

** TOTAL **:                 1.074x as fast    4366.0ms +/- 0.6%   4064.5ms +/- 0.2%     significant

====================================================================================

  ai:                        1.141x as fast     277.5ms +/- 4.1%    243.2ms +/- 1.4%     significant
    astar:                   1.141x as fast     277.5ms +/- 4.1%    243.2ms +/- 1.4%     significant

  audio:                     1.095x as fast    1469.6ms +/- 1.3%   1341.9ms +/- 0.6%     significant
    beat-detection:          -                  339.9ms +/- 0.6%    340.0ms +/- 0.9% 
    dft:                     1.29x as fast      584.1ms +/- 3.0%    453.7ms +/- 0.7%     significant
    fft:                     ??                 235.4ms +/- 1.5%    235.8ms +/- 1.2%     
    oscillator:              ??                 310.2ms +/- 0.7%    312.4ms +/- 1.6%     

  imaging:                   1.067x as fast    1763.1ms +/- 0.7%   1652.3ms +/- 0.4%     significant
    gaussian-blur:           ??                 959.3ms +/- 0.2%    960.9ms +/- 0.4%     
    darkroom:                1.29x as fast      500.6ms +/- 0.9%    386.6ms +/- 1.1%     significant
    desaturate:              ??                 303.2ms +/- 3.1%    304.8ms +/- 0.1%     

  json:                      1.050x as fast     133.5ms +/- 1.9%    127.2ms +/- 0.4%     significant
    parse-financial:         1.046x as fast      70.7ms +/- 1.7%     67.6ms +/- 0.5%     significant
    stringify-tinderbox:     1.054x as fast      62.8ms +/- 2.5%     59.6ms +/- 0.8%     significant

  stanford:                  1.032x as fast     722.3ms +/- 0.8%    699.9ms +/- 0.3%     significant
    crypto-aes:              1.035x as fast     178.1ms +/- 1.5%    172.1ms +/- 0.7%     significant
    crypto-ccm:              1.032x as fast     124.1ms +/- 0.6%    120.2ms +/- 0.7%     significant
    crypto-pbkdf2:           -                  296.1ms +/- 3.1%    287.9ms +/- 0.5% 
    crypto-sha256-iterative: -                  124.0ms +/- 4.7%    119.7ms +/- 0.6% 

Firefox with patch/ Chromium 16.0.904.0

TEST                         COMPARISON            FROM                 TO               DETAILS

====================================================================================

** TOTAL **:                 1.163x as fast    4064.5ms +/- 0.2%   3495.5ms +/- 0.8%     significant

====================================================================================

  ai:                        *1.81x as slow*    243.2ms +/- 1.4%    441.0ms +/- 0.7%     significant
    astar:                   *1.81x as slow*    243.2ms +/- 1.4%    441.0ms +/- 0.7%     significant

  audio:                     1.151x as fast    1341.9ms +/- 0.6%   1165.5ms +/- 1.7%     significant
    beat-detection:          *1.081x as slow*   340.0ms +/- 0.9%    367.5ms +/- 1.7%     significant
    dft:                     1.28x as fast      453.7ms +/- 0.7%    355.1ms +/- 4.6%     significant
    fft:                     ??                 235.8ms +/- 1.2%    236.5ms +/- 1.5%     
    oscillator:              1.51x as fast      312.4ms +/- 1.6%    206.4ms +/- 2.4%     significant

  imaging:                   1.26x as fast     1652.3ms +/- 0.4%   1314.4ms +/- 1.2%     significant
    gaussian-blur:           3.32x as fast      960.9ms +/- 0.4%    289.0ms +/- 0.4%     significant
    darkroom:                *1.28x as slow*    386.6ms +/- 1.1%    494.0ms +/- 3.5%     significant
    desaturate:              *1.74x as slow*    304.8ms +/- 0.1%    531.4ms +/- 0.7%     significant

  json:                      *1.23x as slow*    127.2ms +/- 0.4%    156.2ms +/- 1.1%     significant
    parse-financial:         *1.179x as slow*    67.6ms +/- 0.5%     79.7ms +/- 1.8%     significant
    stringify-tinderbox:     *1.28x as slow*     59.6ms +/- 0.8%     76.5ms +/- 1.8%     significant

  stanford:                  1.67x as fast      699.9ms +/- 0.3%    418.4ms +/- 1.2%     significant
    crypto-aes:              1.70x as fast      172.1ms +/- 0.7%    101.5ms +/- 1.7%     significant
    crypto-ccm:              1.21x as fast      120.2ms +/- 0.7%     99.1ms +/- 3.3%     significant
    crypto-pbkdf2:           2.11x as fast      287.9ms +/- 0.5%    136.2ms +/- 1.3%     significant
    crypto-sha256-iterative: 1.47x as fast      119.7ms +/- 0.6%     81.6ms +/- 3.0%     significant
Comment 16 Brian Hackett (:bhackett) 2011-10-22 07:36:32 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/356cd0d8f5a1
Comment 17 Ed Morley [:emorley] 2011-10-23 09:30:51 PDT
https://hg.mozilla.org/mozilla-central/rev/356cd0d8f5a1

Note You need to log in before you can comment on or make changes to this bug.