Fix function inlining issue on Octane-deltablue

RESOLVED FIXED in Firefox 39

Status

()

RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: jandem, Assigned: jandem)

Tracking

unspecified
mozilla39
Points:
---

Firefox Tracking Flags

(firefox39 fixed)

Details

Attachments

(6 attachments, 2 obsolete attachments)

(Assignee)

Description

4 years ago
Bug 1059364 eliminated ObjectGroupDispatch's falback path if the dispatch is known to handle all incoming objects. This was a win on Octane-deltablue, but not a big one because there are some call sites where we're not inlining one callee, so we still need the fallback path.

The problem is that we hit the max inlining depth. I need to look into this more, but if I bump the smallFunctionMaxBytecodeLength from 100 to 125 I get the following scores on Octane-deltablue:

32-bit:

default:         53500 -> 55228
unboxed-objects: 55625 -> 68016

64-bit:

default:         40696 -> 46455
unboxed-objects: 50547 -> 69093

So it's a pretty big win, especially with --unboxed-objects (20-30%).
(Assignee)

Comment 1

4 years ago
Created attachment 8576871 [details] [diff] [review]
Increase smallFunctionMaxBytecodeLength_

This increases smallFunctionMaxBytecodeLength_ from 100 to 125. "Small functions" have a different inlining depth.

This is a pretty big win on Octane-deltablue (see comment 0) and Octane-raytrace, both with/without --unboxed-objects. I don't see any big regressions, but will keep an eye on AWFY.
Attachment #8576871 - Flags: review?(bhackett1024)
Comment on attachment 8576871 [details] [diff] [review]
Increase smallFunctionMaxBytecodeLength_

Review of attachment 8576871 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: js/src/jit/JitOptions.cpp
@@ +157,5 @@
>      SET_DEFAULT(osrPcMismatchesBeforeRecompile, 6000);
>  
>      // The bytecode length limit for small function.
>      //
>      // The default for this was arrived at empirically via benchmarking.

Maybe remove this sentence too, it's pretty weird/vague.
Attachment #8576871 - Flags: review?(bhackett1024) → review+
(Assignee)

Comment 3

4 years ago
Created attachment 8577192 [details] [diff] [review]
Fix loop inlining

Actually the first patch was a win but didn't fix the root cause and regressed richards a bit.

Here's an alternative patch based on what Hannes did in bug 768288. We record the max inlining depth when we inline in IonBuilder, and when we inline a function with loops we take that into account.

This allows us to inline loops in a lot of cases, removes some very narrow heuristics nbp added, fixes deltablue and doesn't regress richards or anything else as far as I can see.

Raytrace improves less than with the other patch, we can look into that later.
Attachment #8577192 - Flags: review?(hv1989)
(Assignee)

Comment 4

4 years ago
Created attachment 8578055 [details] [diff] [review]
Part 4 - Fix loop inlining (v2)

This fixes the issue we discussed on IRC today. Good catch btw!
Attachment #8577192 - Attachment is obsolete: true
Attachment #8577192 - Flags: review?(hv1989)
Attachment #8578055 - Flags: review?(hv1989)
Comment on attachment 8578055 [details] [diff] [review]
Part 4 - Fix loop inlining (v2)

Review of attachment 8578055 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome!
Just some suggestions on (hopefully) improving the explanation of this new heuristic.

::: js/src/jit/BaselineJIT.h
@@ +214,5 @@
>  
> +    // The max inlining depth that should be used (in addition to the normal
> +    // depth checks) when deciding to inline this script. This starts as
> +    // UINT8_MAX, so that it can always be inlined, and is updated when we
> +    // Ion-compile this script. See makeInliningDecision for more info.

// The max inlining depth when inlining this script where we still can inline all currently inlined functions.
// This starts as UINT8_MAX, since we have no data yet and want to allow anything, and is updated when we Ion-compile this script.
// See makeInliningDecision for more info.

::: js/src/jit/IonBuilder.cpp
@@ +4867,5 @@
> +    BaselineScript *outerBaseline = outermostBuilder()->script()->baselineScript();
> +    if (inliningDepth_ >= maxInlineDepth) {
> +        // We hit the depth limit, so don't allow inlining the outer script in
> +        // other scripts. This is currently only used if it has loops, see the
> +        // comment below.

// We hit the depth limit and don't inline a function. Mark the outer most script to always failing to inline a function, even if the outer most script isn't inlined. This heuristic is currently only used for inlining loops, see the comment bellow.

@@ +4890,5 @@
> +    //
> +    // To avoid this problem, we record a separate max inlining depth for each
> +    // script, based on the functions we inline in it. So for the example above,
> +    // if we inline g in f, we will no longer allow inlining f if it means we
> +    // can't inline g as well.

// To avoid this problem, we record a separate max inlining depth for each
// script, indicating at which depth we won't be able to inline all inner functions anymore.
// Which solves the issue above and only inlines f, if it means we can also inline g.

@@ +4897,5 @@
> +    {
> +        trackOptimizationOutcome(TrackedOutcome::CantInlineExceededDepth);
> +        return DontInline(targetScript, "Vetoed: exceeding allowed script inline depth");
> +    }
> +

Can you add here:

// End of heuristics.
//
// Update properties knowing we definitely shall inline this script
Attachment #8578055 - Flags: review?(hv1989) → review+
(Assignee)

Comment 6

4 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/b79cddbe7de8

Thanks, I adjusted the comments a bit.

Comment 8

4 years ago
On Windows, according to AWFY, this patch was:

13% regression on ss-aes
25% regression on kraken-crypto-ccm
14% regression on kraken-crypto-pbkdf2
3% regression on kraken-crypto-sha256-iterative
2% regression on dromaeo-sunspider-bitops-nsieve-bits

8% improvement on octane-crypto
20% improvement on octane-deltablue
2% improvement on octane-earleyboyer
3% improvement on octane-richards
18% improvement on octane-typescript
3% improvement on dromaeo-sunspider-access-nsieve
9% improvement on dromaeo-sunspider-crypto-aes
7% improvement on dromaeo-sunspider-math-cordic
17% improvement on dromaeo-v8-deltablue
4% improvement on dromaeo-v8-earley-boyer
3% improvement on dromaeo-v8-richards

On Mac there was less up and down, it seems, but looking only on the things that don't run on Windows:

6% regression on misc-bugs-508716-fluid-dynamics
16% regression on misc-bugs-636096-model2d
18% regression on misc-bugs-658844-sha256-bitcoins

16% improvement on misc-bugs-1131099-lodash1
4% improvement on asmjs-ubench-life (no asmjs)
41% improvement on asmjs-ubench-mandelbrot-native (no asmjs)
(Assignee)

Comment 9

4 years ago
Created attachment 8580559 [details] [diff] [review]
Part 1 - Fix an old typo

Ugh, inliningMaxCallerBytecodeLength() returned inlineMaxTotalBytecodeLength_ (1000) instead of inliningMaxCallerBytecodeLength_ (10000), so in a lot of scripts we only inlined small functions.

This patch is a big win (20% or so) on raytrace and explains why my initial patch for this bug was a win there.
Attachment #8576871 - Attachment is obsolete: true
Attachment #8580559 - Flags: review?(hv1989)
(Assignee)

Comment 10

4 years ago
Created attachment 8580580 [details] [diff] [review]
Part 2 - Lower inline limit if off-thread compilation is not available

The test that turned orange inlines a ton of stuff, but we don't have off-thread compilation on the Amazon EC2 slaves (I think), so we hit a 10 second setTimeout somewhere because we were compiling *huge* amounts of code on the main thread.

If we have no off-thread compilation, don't inline scripts with bytecode length > 500 (instead of 1000 for the main thread).
Attachment #8580580 - Flags: review?(hv1989)
(Assignee)

Comment 11

4 years ago
Created attachment 8580585 [details] [diff] [review]
Part 3 - Cap max inlined bytecode length

This patch keeps track of the total bytecode length inlined in a single script. On Octane, the max value I saw was 53703 bytes.

On the pdf.js test that turned orange, I got values as high as 90,000-165,000 (!).

I set the limit to 80,000, so we never hit it on Octane (and most other benchmarks I think), but it should avoid uncontrolled inlining in pathological cases.
Attachment #8580585 - Flags: review?(hv1989)
(Assignee)

Updated

4 years ago
Attachment #8580580 - Attachment description: Part 3 - Lower inline limit if off-thread compilation is not available → Part 2 - Lower inline limit if off-thread compilation is not available
(Assignee)

Updated

4 years ago
Attachment #8580559 - Attachment description: Part 2 - Fix an old typo → Part 1 - Fix an old typo
(Assignee)

Updated

4 years ago
Attachment #8578055 - Attachment description: Fix loop inlining (v2) → Part 4 - Fix loop inlining (v2)
Comment on attachment 8580559 [details] [diff] [review]
Part 1 - Fix an old typo

Review of attachment 8580559 [details] [diff] [review]:
-----------------------------------------------------------------

Good catch!
Attachment #8580559 - Flags: review?(hv1989) → review+
Comment on attachment 8580580 [details] [diff] [review]
Part 2 - Lower inline limit if off-thread compilation is not available

Review of attachment 8580580 [details] [diff] [review]:
-----------------------------------------------------------------

Can you replace all uses of "OffThreadCompilationAvailable" with JitCompileOptions.offThreadCompilationAvailable()

::: js/src/jit/CompileWrappers.cpp
@@ +286,5 @@
>      JS::CompartmentOptions &options = cx->compartment()->options();
>      cloneSingletons_ = options.cloneSingletons();
>      spsSlowAssertionsEnabled_ = cx->runtime()->spsProfiler.enabled() &&
>                                  cx->runtime()->spsProfiler.slowAssertionsEnabled();
> +    offThreadCompilationAvailable_ = offThreadCompilationAvailable;

Can we move the content of "OffThreadCompilationAvailable" to here? So that this value is always correct?

::: js/src/jit/IonAnalysis.cpp
@@ +3191,5 @@
>          return false;
>      }
>  
>      BaselineInspector inspector(script);
> +    const JitCompileOptions options(cx, /* offThreadCompilationAvailable = */ true);

Why are we taking default "true" here? We should also calculate it.

@@ +3417,5 @@
>      if (!constraints)
>          return false;
>  
>      BaselineInspector inspector(script);
> +    const JitCompileOptions options(cx, /* offThreadCompilationAvailable = */ true);

Here it makes no difference, since the value is only used for inlining heuristics. But I would prefer to have the correct input here. If we add more checks that use this variable it can introduce faults...

::: js/src/jit/IonOptimizationLevels.h
@@ +214,5 @@
>  
> +    uint32_t inlineMaxBytecodePerCallSite(bool offThread) const {
> +        return offThread
> +               ? inlineMaxBytecodePerCallSiteOffThread_
> +               : inlineMaxBytecodePerCallSiteMainThread_;

Can you check "js_JitOptions.limitScriptSize" and return "inlineMaxBytecodePerCallSiteOffThread_" instead of "inlineMaxBytecodePerCallSiteMainThread_" if we dont' limit it?
Attachment #8580580 - Flags: review?(hv1989) → review+
Comment on attachment 8580585 [details] [diff] [review]
Part 3 - Cap max inlined bytecode length

Review of attachment 8580585 [details] [diff] [review]:
-----------------------------------------------------------------

I was thinking this would trip on asmjs things, but apparently not on the asmjs benchmarks we have. Also the testcase in bug 1010417 doesn't trip. So it looks fine :D
Attachment #8580585 - Flags: review?(hv1989) → review+
(Assignee)

Comment 16

4 years ago
(In reply to Guilherme Lima from comment #8)
> On Windows, according to AWFY, this patch was:

Thanks for making that list! :)

Let's see how the latest patches affect AWFY. I'll look into the regressions later today or next week.
(Assignee)

Comment 17

4 years ago
Created attachment 8580726 [details] [diff] [review]
Part 5 - Lower inliningMaxCallerBytecodeLength

There were some new regressions compared to yesterday's landing. I think those were caused by the new part 1: it effectively increased inliningMaxCallerBytecodeLength from 1000 to 10000.

This patch lowers it to 1500, that's enough to keep the raytrace win and seems to fix the TypeScript regression locally (unfortunately I don't see the Mandreel and regexp regressions here).
Attachment #8580726 - Flags: review?(hv1989)
Attachment #8580726 - Flags: review?(hv1989) → review+
These patches are regressing sunspider runs on FxOS flame devices.
Especially ss-aes, ss-cube and ss-md5.

http://arewefastyet.com/#machine=26&view=single&suite=ss&subtest=cube&start=1426845007&end=1426873392
(Assignee)

Comment 23

4 years ago
NI myself to look into the Kraken crypto ccm/pbkdf2 regression.
Flags: needinfo?(jdemooij)
(Assignee)

Comment 24

4 years ago
Created attachment 8581727 [details] [diff] [review]
Part 6 - Fix Kraken regression

Don't inline scripts that are known to inline a lot of code. This fixes the crypto-ccm regression locally and doesn't regress Octane/Sunspider or other Kraken tests.

Might be a small win on Octane Crypto or TypeScript, hard to say.
Flags: needinfo?(jdemooij)
Attachment #8581727 - Flags: review?(hv1989)
(Assignee)

Updated

4 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 8581727 [details] [diff] [review]
Part 6 - Fix Kraken regression

Review of attachment 8581727 [details] [diff] [review]:
-----------------------------------------------------------------

Can you static assert that "inlineMaxCalleeInlinedBytecodeLength_" is smaller than 16bits? Or maybe also make it 16bits?
Attachment #8581727 - Flags: review?(hv1989) → review+
Depends on: 1147379
https://hg.mozilla.org/mozilla-central/rev/9bf8f0e455c4
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.