Last Comment Bug 765325 - IonMonkey: Inline small functions more aggressively, turn off insertRecompileCheck for inliningDepth > 0
: IonMonkey: Inline small functions more aggressively, turn off insertRecompile...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: ---
Assigned To: general
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-15 11:42 PDT by Kannan Vijayan [:djvj]
Modified: 2012-06-26 13:57 PDT (History)
1 user (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch (1.80 KB, patch)
2012-06-15 11:47 PDT, Kannan Vijayan [:djvj]
sstangl: review+
Details | Diff | Splinter Review
Benchmark differences (27.71 KB, text/plain)
2012-06-15 11:50 PDT, Kannan Vijayan [:djvj]
no flags Details

Description Kannan Vijayan [:djvj] 2012-06-15 11:42:18 PDT
Small functions should be inlined more aggressively, because we have less to lose from doing that.

Also, removing recompile check when compiling inlined functions.  This was suggested by jandem and seems to yield some nice performance boost when combined with the small function inlining.
Comment 1 Kannan Vijayan [:djvj] 2012-06-15 11:47:43 PDT
Created attachment 633606 [details] [diff] [review]
Patch

This patch does the following:

Turns off insertRecompileCheck when inliningDepth > 0.

During inline check, if the function we're trying to inline (the target function) is "small" (<100 bytes in bytecode), then we increase the inliningDepth by 1, and decrease the useCount requirement on the caller from usesBeforeInlining to usesBeforeInlining/4 (with a lower threshold of 10).

These are the perf results on various benches.  Note the decent boost all around on V8:

==========================                                                             |  ==========================
==========================                                                             |  ==========================
||||                                                                                   |  ||||
|||| REFERENCE_AFTERPULL/bench.ion-n-jm.kraken                                         |  |||| SMALLTARGET_INLINE_DEPTH3_AND_SKIPRECOMPILECHECK_AFTERPULL/bench.ion-n-jm.kraken
||||                                                                                   |  ||||
==========================                                                             |  ==========================
==========================                                                             |  ==========================
                                                                                       |  
RESULTS (means and 95% confidence intervals)                                           |  RESULTS (means and 95% confidence intervals)
-----------------------------------------------                                        |  -----------------------------------------------
Total:                       1975.2ms +/- 0.5%                                         |  Total:                       1982.5ms +/- 0.4%
-----------------------------------------------                                        |  -----------------------------------------------
                                                                                       |  
  ai:                         118.1ms +/- 0.3%                                         |    ai:                         118.8ms +/- 0.4%
    astar:                    118.1ms +/- 0.3%                                         |      astar:                    118.8ms +/- 0.4%
                                                                                       |  
  audio:                      621.7ms +/- 0.5%                                         |    audio:                      622.3ms +/- 0.4%
    beat-detection:           178.8ms +/- 2.0%                                         |      beat-detection:           177.5ms +/- 0.2%
    dft:                      177.1ms +/- 0.5%                                         |      dft:                      178.6ms +/- 0.7%
    fft:                      115.2ms +/- 0.3%                                         |      fft:                      115.7ms +/- 0.7%
    oscillator:               150.6ms +/- 1.4%                                         |      oscillator:               150.5ms +/- 1.4%
                                                                                       |  
  imaging:                    584.2ms +/- 1.1%                                         |    imaging:                    588.1ms +/- 1.1%
    gaussian-blur:            200.4ms +/- 2.9%                                         |      gaussian-blur:            203.4ms +/- 2.9%
    darkroom:                 248.8ms +/- 0.4%                                         |      darkroom:                 249.8ms +/- 0.7%
    desaturate:               135.0ms +/- 1.0%                                         |      desaturate:               134.9ms +/- 0.9%
                                                                                       |  
  json:                        95.0ms +/- 0.7%                                         |    json:                        95.2ms +/- 0.6%
    parse-financial:           60.9ms +/- 0.7%                                         |      parse-financial:           61.3ms +/- 0.8%
    stringify-tinderbox:       34.1ms +/- 1.2%                                         |      stringify-tinderbox:       33.9ms +/- 0.7%
                                                                                       |  
  stanford:                   556.2ms +/- 0.6%                                         |    stanford:                   558.1ms +/- 0.5%
    crypto-aes:               120.2ms +/- 0.8%                                         |      crypto-aes:               119.4ms +/- 0.3%
    crypto-ccm:               121.9ms +/- 0.5%                                         |      crypto-ccm:               122.4ms +/- 0.7%
    crypto-pbkdf2:            241.3ms +/- 1.0%                                         |      crypto-pbkdf2:            243.2ms +/- 1.0%
    crypto-sha256-iterative:   72.8ms +/- 0.6%                                         |      crypto-sha256-iterative:   73.1ms +/- 0.6%
                                                                                       |  
Results are located at kraken-1.1-results/sunspider-results-2012-06-14-16.43.35.js     |  Results are located at kraken-1.1-results/sunspider-results-2012-06-14-16.35.51.js
                                                                                       |  
==========================                                                             |  ==========================
==========================                                                             |  ==========================
||||                                                                                   |  ||||
|||| REFERENCE_AFTERPULL/bench.ion-n-jm.sunspider                                      |  |||| SMALLTARGET_INLINE_DEPTH3_AND_SKIPRECOMPILECHECK_AFTERPULL/bench.ion-n-jm.sunspider
||||                                                                                   |  ||||
==========================                                                             |  ==========================
==========================                                                             |  ==========================
                                                                                       |  
RESULTS (means and 95% confidence intervals)                                           |  RESULTS (means and 95% confidence intervals)
--------------------------------------------                                           |  --------------------------------------------
Total:                  288.2ms +/- 0.1%                                               |  Total:                  289.7ms +/- 0.2%
--------------------------------------------                                           |  --------------------------------------------
                                                                                       |  
  3d:                    34.1ms +/- 0.8%                                               |    3d:                    34.8ms +/- 1.8%
    cube:                12.4ms +/- 2.1%                                               |      cube:                13.1ms +/- 4.8%
    morph:                6.2ms +/- 0.2%                                               |      morph:                6.2ms +/- 0.2%
    raytrace:            15.5ms +/- 0.1%                                               |      raytrace:            15.5ms +/- 0.2%
                                                                                       |  
  access:               124.9ms +/- 0.1%                                               |    access:               125.9ms +/- 0.1%
    binary-trees:       111.4ms +/- 0.1%                                               |      binary-trees:       112.4ms +/- 0.1%
    fannkuch:             7.2ms +/- 0.2%                                               |      fannkuch:             7.2ms +/- 0.2%
    nbody:                3.6ms +/- 0.3%                                               |      nbody:                3.6ms +/- 0.4%
    nsieve:               2.7ms +/- 0.2%                                               |      nsieve:               2.7ms +/- 0.3%
                                                                                       |  
  bitops:                11.7ms +/- 0.3%                                               |    bitops:                11.6ms +/- 0.2%
    3bit-bits-in-byte:    1.0ms +/- 0.4%                                               |      3bit-bits-in-byte:    1.0ms +/- 0.4%
    bits-in-byte:         4.5ms +/- 0.6%                                               |      bits-in-byte:         4.5ms +/- 0.5%
    bitwise-and:          2.9ms +/- 0.2%                                               |      bitwise-and:          2.9ms +/- 0.1%
    nsieve-bits:          3.3ms +/- 0.3%                                               |      nsieve-bits:          3.3ms +/- 0.2%
                                                                                       |  
  controlflow:            2.1ms +/- 0.2%                                               |    controlflow:            2.1ms +/- 0.1%
    recursive:            2.1ms +/- 0.2%                                               |      recursive:            2.1ms +/- 0.1%
                                                                                       |  
  crypto:                18.2ms +/- 0.2%                                               |    crypto:                18.2ms +/- 0.2%
    aes:                  8.9ms +/- 0.2%                                               |      aes:                  8.9ms +/- 0.2%
    md5:                  5.8ms +/- 0.3%                                               |      md5:                  5.8ms +/- 0.4%
    sha1:                 3.6ms +/- 0.3%                                               |      sha1:                 3.6ms +/- 0.2%
                                                                                       |  
  date:                  23.1ms +/- 0.5%                                               |    date:                  23.1ms +/- 0.5%
    format-tofte:        13.3ms +/- 0.3%                                               |      format-tofte:        13.3ms +/- 0.3%
    format-xparb:         9.8ms +/- 1.1%                                               |      format-xparb:         9.8ms +/- 0.9%
                                                                                       |  
  math:                  14.1ms +/- 0.1%                                               |    math:                  14.1ms +/- 0.1%
    cordic:               3.3ms +/- 0.3%                                               |      cordic:               3.3ms +/- 0.2%
    partial-sums:         8.2ms +/- 0.1%                                               |      partial-sums:         8.3ms +/- 0.2%
    spectral-norm:        2.6ms +/- 0.2%                                               |      spectral-norm:        2.6ms +/- 0.2%
                                                                                       |  
  regexp:                 9.8ms +/- 0.2%                                               |    regexp:                 9.8ms +/- 0.1%
    dna:                  9.8ms +/- 0.2%                                               |      dna:                  9.8ms +/- 0.1%
                                                                                       |  
  string:                50.3ms +/- 0.2%                                               |    string:                50.1ms +/- 0.2%
    base64:               3.6ms +/- 0.4%                                               |      base64:               3.6ms +/- 0.3%
    fasta:                6.1ms +/- 0.4%                                               |      fasta:                6.0ms +/- 0.2%
    tagcloud:            17.5ms +/- 0.4%                                               |      tagcloud:            17.4ms +/- 0.2%
    unpack-code:         17.1ms +/- 0.4%                                               |      unpack-code:         17.0ms +/- 0.4%
    validate-input:       6.0ms +/- 0.2%                                               |      validate-input:       6.0ms +/- 0.3%
                                                                                       |  
Results are located at sunspider-1.0-results/sunspider-results-2012-06-14-16.44.01.js  |  Results are located at sunspider-1.0-results/sunspider-results-2012-06-14-16.36.18.js
                                                                                       |  
==========================                                                             |  ==========================
==========================                                                             |  ==========================
||||                                                                                   |  ||||
|||| REFERENCE_AFTERPULL/bench.ion-n-jm.v8                                             |  |||| SMALLTARGET_INLINE_DEPTH3_AND_SKIPRECOMPILECHECK_AFTERPULL/bench.ion-n-jm.v8
||||                                                                                   |  ||||
==========================                                                             |  ==========================
==========================                                                             |  ==========================
                                                                                       |  
Richards: 9267                                                                         |  Richards: 9592
DeltaBlue: 6248                                                                        |  DeltaBlue: 6863
Crypto: 13033                                                                          |  Crypto: 13170
RayTrace: 3612                                                                         |  RayTrace: 3678
EarleyBoyer: 9034                                                                      |  EarleyBoyer: 8917
RegExp: 2172                                                                           |  RegExp: 2177
Splay: 11270                                                                           |  Splay: 11295
NavierStokes: 17049                                                                    |  NavierStokes: 17459
----                                                                                   |  ----
Score (version 7): 7524                                                                |  Score (version 7): 7688
                                                                                       |
Comment 2 Kannan Vijayan [:djvj] 2012-06-15 11:50:46 PDT
Created attachment 633609 [details]
Benchmark differences

This is the benchmark results attached as a file.

The without-patch is on the left and with-patch on the right.
The first 3 sections are kraken, sunspider, and v8 with '--ion -n -m', and the last 3 sections are kraken, sunspider, and v8 with '--ion -n --no-jm'
Comment 3 Sean Stangl [:sstangl] 2012-06-19 14:05:23 PDT
Comment on attachment 633606 [details] [diff] [review]
Patch

Review of attachment 633606 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/IonBuilder.cpp
@@ +2827,5 @@
>      }
>  };
>  
> +static bool IsSmallFunction(JSFunction *target) {
> +    return target->isInterpreted() && target->script()->length < 100;

How was 100 determined? What happens if other values are used?

@@ +2835,5 @@
>  IonBuilder::makeInliningDecision(JSFunction *target)
>  {
> +    bool smallTarget = IsSmallFunction(target);
> +
> +    if ((smallTarget && inliningDepth >= 3) || (!smallTarget && inliningDepth >= 2))

A patch from jandem recently changed the maximum inlining depth from 2 to 3. Three comments:

1) What are the results from applying the patch after adding yet another inlining depth? Is it still a small win?

2) Instead of hardcoding constants, we should use named static const uint32.

3) The rationale behind extra inlining for small functions should be provided in a comment above.

@@ +2842,5 @@
> +    uint32 checkUses = js_IonOptions.usesBeforeInlining;
> +    if (smallTarget) {
> +        checkUses = js_IonOptions.usesBeforeInlining / 4;
> +        if (checkUses < 10)
> +            checkUses = 10;

This seems more arbitrary than usual. I would almost prefer setting some js_IonOptions.smallFunctionUsesBeforeInlining rather than having a distant calculation in the IonBuilder.
Comment 4 Kannan Vijayan [:djvj] 2012-06-20 10:22:38 PDT
(In reply to Sean Stangl from comment #3)
> How was 100 determined? What happens if other values are used?

Empirically.  Just tried a bunch of different values, ran full benches, compared results.

> 1) What are the results from applying the patch after adding yet another
> inlining depth? Is it still a small win?

Re-ran with jandem's patches included.  The gains are still there.  But I removed the inlining depth change only (leaving in the lowering of the usecount check), and realized that the depth change wasn't having a direct effect.  It was the other aggressiveness (reducing the use count requirements) that were leading to the gains.  So I removed that.

> 2) Instead of hardcoding constants, we should use named static const uint32.
>
> 3) The rationale behind extra inlining for small functions should be
> provided in a comment above.

Done and done :)

> This seems more arbitrary than usual. I would almost prefer setting some
> js_IonOptions.smallFunctionUsesBeforeInlining rather than having a distant
> calculation in the IonBuilder.

I added both smallFunctionMaxBytecodeLength and smallFunctionUsesBeforeInlining to js_IonOptions, with a short explanation of how they were arrived at.


Pushed: https://hg.mozilla.org/projects/ionmonkey/rev/a0c81536dd0b
Waiting for tbpl and awfy before closing.

Note You need to log in before you can comment on or make changes to this bug.