Closed Bug 765325 Opened 12 years ago Closed 12 years ago

IonMonkey: Inline small functions more aggressively, turn off insertRecompileCheck for inliningDepth > 0

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: djvj, Unassigned)

Details

Attachments

(2 files)

Small functions should be inlined more aggressively, because we have less to lose from doing that.

Also, removing recompile check when compiling inlined functions.  This was suggested by jandem and seems to yield some nice performance boost when combined with the small function inlining.
Attached patch PatchSplinter Review
This patch does the following:

Turns off insertRecompileCheck when inliningDepth > 0.

During inline check, if the function we're trying to inline (the target function) is "small" (<100 bytes in bytecode), then we increase the inliningDepth by 1, and decrease the useCount requirement on the caller from usesBeforeInlining to usesBeforeInlining/4 (with a lower threshold of 10).

These are the perf results on various benches.  Note the decent boost all around on V8:

==========================                                                             |  ==========================
==========================                                                             |  ==========================
||||                                                                                   |  ||||
|||| REFERENCE_AFTERPULL/bench.ion-n-jm.kraken                                         |  |||| SMALLTARGET_INLINE_DEPTH3_AND_SKIPRECOMPILECHECK_AFTERPULL/bench.ion-n-jm.kraken
||||                                                                                   |  ||||
==========================                                                             |  ==========================
==========================                                                             |  ==========================
                                                                                       |  
RESULTS (means and 95% confidence intervals)                                           |  RESULTS (means and 95% confidence intervals)
-----------------------------------------------                                        |  -----------------------------------------------
Total:                       1975.2ms +/- 0.5%                                         |  Total:                       1982.5ms +/- 0.4%
-----------------------------------------------                                        |  -----------------------------------------------
                                                                                       |  
  ai:                         118.1ms +/- 0.3%                                         |    ai:                         118.8ms +/- 0.4%
    astar:                    118.1ms +/- 0.3%                                         |      astar:                    118.8ms +/- 0.4%
                                                                                       |  
  audio:                      621.7ms +/- 0.5%                                         |    audio:                      622.3ms +/- 0.4%
    beat-detection:           178.8ms +/- 2.0%                                         |      beat-detection:           177.5ms +/- 0.2%
    dft:                      177.1ms +/- 0.5%                                         |      dft:                      178.6ms +/- 0.7%
    fft:                      115.2ms +/- 0.3%                                         |      fft:                      115.7ms +/- 0.7%
    oscillator:               150.6ms +/- 1.4%                                         |      oscillator:               150.5ms +/- 1.4%
                                                                                       |  
  imaging:                    584.2ms +/- 1.1%                                         |    imaging:                    588.1ms +/- 1.1%
    gaussian-blur:            200.4ms +/- 2.9%                                         |      gaussian-blur:            203.4ms +/- 2.9%
    darkroom:                 248.8ms +/- 0.4%                                         |      darkroom:                 249.8ms +/- 0.7%
    desaturate:               135.0ms +/- 1.0%                                         |      desaturate:               134.9ms +/- 0.9%
                                                                                       |  
  json:                        95.0ms +/- 0.7%                                         |    json:                        95.2ms +/- 0.6%
    parse-financial:           60.9ms +/- 0.7%                                         |      parse-financial:           61.3ms +/- 0.8%
    stringify-tinderbox:       34.1ms +/- 1.2%                                         |      stringify-tinderbox:       33.9ms +/- 0.7%
                                                                                       |  
  stanford:                   556.2ms +/- 0.6%                                         |    stanford:                   558.1ms +/- 0.5%
    crypto-aes:               120.2ms +/- 0.8%                                         |      crypto-aes:               119.4ms +/- 0.3%
    crypto-ccm:               121.9ms +/- 0.5%                                         |      crypto-ccm:               122.4ms +/- 0.7%
    crypto-pbkdf2:            241.3ms +/- 1.0%                                         |      crypto-pbkdf2:            243.2ms +/- 1.0%
    crypto-sha256-iterative:   72.8ms +/- 0.6%                                         |      crypto-sha256-iterative:   73.1ms +/- 0.6%
                                                                                       |  
Results are located at kraken-1.1-results/sunspider-results-2012-06-14-16.43.35.js     |  Results are located at kraken-1.1-results/sunspider-results-2012-06-14-16.35.51.js
                                                                                       |  
==========================                                                             |  ==========================
==========================                                                             |  ==========================
||||                                                                                   |  ||||
|||| REFERENCE_AFTERPULL/bench.ion-n-jm.sunspider                                      |  |||| SMALLTARGET_INLINE_DEPTH3_AND_SKIPRECOMPILECHECK_AFTERPULL/bench.ion-n-jm.sunspider
||||                                                                                   |  ||||
==========================                                                             |  ==========================
==========================                                                             |  ==========================
                                                                                       |  
RESULTS (means and 95% confidence intervals)                                           |  RESULTS (means and 95% confidence intervals)
--------------------------------------------                                           |  --------------------------------------------
Total:                  288.2ms +/- 0.1%                                               |  Total:                  289.7ms +/- 0.2%
--------------------------------------------                                           |  --------------------------------------------
                                                                                       |  
  3d:                    34.1ms +/- 0.8%                                               |    3d:                    34.8ms +/- 1.8%
    cube:                12.4ms +/- 2.1%                                               |      cube:                13.1ms +/- 4.8%
    morph:                6.2ms +/- 0.2%                                               |      morph:                6.2ms +/- 0.2%
    raytrace:            15.5ms +/- 0.1%                                               |      raytrace:            15.5ms +/- 0.2%
                                                                                       |  
  access:               124.9ms +/- 0.1%                                               |    access:               125.9ms +/- 0.1%
    binary-trees:       111.4ms +/- 0.1%                                               |      binary-trees:       112.4ms +/- 0.1%
    fannkuch:             7.2ms +/- 0.2%                                               |      fannkuch:             7.2ms +/- 0.2%
    nbody:                3.6ms +/- 0.3%                                               |      nbody:                3.6ms +/- 0.4%
    nsieve:               2.7ms +/- 0.2%                                               |      nsieve:               2.7ms +/- 0.3%
                                                                                       |  
  bitops:                11.7ms +/- 0.3%                                               |    bitops:                11.6ms +/- 0.2%
    3bit-bits-in-byte:    1.0ms +/- 0.4%                                               |      3bit-bits-in-byte:    1.0ms +/- 0.4%
    bits-in-byte:         4.5ms +/- 0.6%                                               |      bits-in-byte:         4.5ms +/- 0.5%
    bitwise-and:          2.9ms +/- 0.2%                                               |      bitwise-and:          2.9ms +/- 0.1%
    nsieve-bits:          3.3ms +/- 0.3%                                               |      nsieve-bits:          3.3ms +/- 0.2%
                                                                                       |  
  controlflow:            2.1ms +/- 0.2%                                               |    controlflow:            2.1ms +/- 0.1%
    recursive:            2.1ms +/- 0.2%                                               |      recursive:            2.1ms +/- 0.1%
                                                                                       |  
  crypto:                18.2ms +/- 0.2%                                               |    crypto:                18.2ms +/- 0.2%
    aes:                  8.9ms +/- 0.2%                                               |      aes:                  8.9ms +/- 0.2%
    md5:                  5.8ms +/- 0.3%                                               |      md5:                  5.8ms +/- 0.4%
    sha1:                 3.6ms +/- 0.3%                                               |      sha1:                 3.6ms +/- 0.2%
                                                                                       |  
  date:                  23.1ms +/- 0.5%                                               |    date:                  23.1ms +/- 0.5%
    format-tofte:        13.3ms +/- 0.3%                                               |      format-tofte:        13.3ms +/- 0.3%
    format-xparb:         9.8ms +/- 1.1%                                               |      format-xparb:         9.8ms +/- 0.9%
                                                                                       |  
  math:                  14.1ms +/- 0.1%                                               |    math:                  14.1ms +/- 0.1%
    cordic:               3.3ms +/- 0.3%                                               |      cordic:               3.3ms +/- 0.2%
    partial-sums:         8.2ms +/- 0.1%                                               |      partial-sums:         8.3ms +/- 0.2%
    spectral-norm:        2.6ms +/- 0.2%                                               |      spectral-norm:        2.6ms +/- 0.2%
                                                                                       |  
  regexp:                 9.8ms +/- 0.2%                                               |    regexp:                 9.8ms +/- 0.1%
    dna:                  9.8ms +/- 0.2%                                               |      dna:                  9.8ms +/- 0.1%
                                                                                       |  
  string:                50.3ms +/- 0.2%                                               |    string:                50.1ms +/- 0.2%
    base64:               3.6ms +/- 0.4%                                               |      base64:               3.6ms +/- 0.3%
    fasta:                6.1ms +/- 0.4%                                               |      fasta:                6.0ms +/- 0.2%
    tagcloud:            17.5ms +/- 0.4%                                               |      tagcloud:            17.4ms +/- 0.2%
    unpack-code:         17.1ms +/- 0.4%                                               |      unpack-code:         17.0ms +/- 0.4%
    validate-input:       6.0ms +/- 0.2%                                               |      validate-input:       6.0ms +/- 0.3%
                                                                                       |  
Results are located at sunspider-1.0-results/sunspider-results-2012-06-14-16.44.01.js  |  Results are located at sunspider-1.0-results/sunspider-results-2012-06-14-16.36.18.js
                                                                                       |  
==========================                                                             |  ==========================
==========================                                                             |  ==========================
||||                                                                                   |  ||||
|||| REFERENCE_AFTERPULL/bench.ion-n-jm.v8                                             |  |||| SMALLTARGET_INLINE_DEPTH3_AND_SKIPRECOMPILECHECK_AFTERPULL/bench.ion-n-jm.v8
||||                                                                                   |  ||||
==========================                                                             |  ==========================
==========================                                                             |  ==========================
                                                                                       |  
Richards: 9267                                                                         |  Richards: 9592
DeltaBlue: 6248                                                                        |  DeltaBlue: 6863
Crypto: 13033                                                                          |  Crypto: 13170
RayTrace: 3612                                                                         |  RayTrace: 3678
EarleyBoyer: 9034                                                                      |  EarleyBoyer: 8917
RegExp: 2172                                                                           |  RegExp: 2177
Splay: 11270                                                                           |  Splay: 11295
NavierStokes: 17049                                                                    |  NavierStokes: 17459
----                                                                                   |  ----
Score (version 7): 7524                                                                |  Score (version 7): 7688
                                                                                       |
Attachment #633606 - Flags: review?(sstangl)
Attached file Benchmark differences
This is the benchmark results attached as a file.

The without-patch is on the left and with-patch on the right.
The first 3 sections are kraken, sunspider, and v8 with '--ion -n -m', and the last 3 sections are kraken, sunspider, and v8 with '--ion -n --no-jm'
Comment on attachment 633606 [details] [diff] [review]
Patch

Review of attachment 633606 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/IonBuilder.cpp
@@ +2827,5 @@
>      }
>  };
>  
> +static bool IsSmallFunction(JSFunction *target) {
> +    return target->isInterpreted() && target->script()->length < 100;

How was 100 determined? What happens if other values are used?

@@ +2835,5 @@
>  IonBuilder::makeInliningDecision(JSFunction *target)
>  {
> +    bool smallTarget = IsSmallFunction(target);
> +
> +    if ((smallTarget && inliningDepth >= 3) || (!smallTarget && inliningDepth >= 2))

A patch from jandem recently changed the maximum inlining depth from 2 to 3. Three comments:

1) What are the results from applying the patch after adding yet another inlining depth? Is it still a small win?

2) Instead of hardcoding constants, we should use named static const uint32.

3) The rationale behind extra inlining for small functions should be provided in a comment above.

@@ +2842,5 @@
> +    uint32 checkUses = js_IonOptions.usesBeforeInlining;
> +    if (smallTarget) {
> +        checkUses = js_IonOptions.usesBeforeInlining / 4;
> +        if (checkUses < 10)
> +            checkUses = 10;

This seems more arbitrary than usual. I would almost prefer setting some js_IonOptions.smallFunctionUsesBeforeInlining rather than having a distant calculation in the IonBuilder.
Attachment #633606 - Flags: review?(sstangl) → review+
(In reply to Sean Stangl from comment #3)
> How was 100 determined? What happens if other values are used?

Empirically.  Just tried a bunch of different values, ran full benches, compared results.

> 1) What are the results from applying the patch after adding yet another
> inlining depth? Is it still a small win?

Re-ran with jandem's patches included.  The gains are still there.  But I removed the inlining depth change only (leaving in the lowering of the usecount check), and realized that the depth change wasn't having a direct effect.  It was the other aggressiveness (reducing the use count requirements) that were leading to the gains.  So I removed that.

> 2) Instead of hardcoding constants, we should use named static const uint32.
>
> 3) The rationale behind extra inlining for small functions should be
> provided in a comment above.

Done and done :)

> This seems more arbitrary than usual. I would almost prefer setting some
> js_IonOptions.smallFunctionUsesBeforeInlining rather than having a distant
> calculation in the IonBuilder.

I added both smallFunctionMaxBytecodeLength and smallFunctionUsesBeforeInlining to js_IonOptions, with a short explanation of how they were arrived at.


Pushed: https://hg.mozilla.org/projects/ionmonkey/rev/a0c81536dd0b
Waiting for tbpl and awfy before closing.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.