Closed Bug 1142669 Opened 9 years ago Closed 9 years ago

Fix function inlining issue on Octane-deltablue

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla39

Tracking Flags:

Tracking

Status

firefox39

---

fixed

People

(Reporter: jandem, Assigned: jandem)

References

Details

Attachments

(6 files, 2 obsolete files)

Increase smallFunctionMaxBytecodeLength_ 9 years ago Jan de Mooij [:jandem] 884 bytes, patch	bhackett1024 : review+	Details \| Diff \| Splinter Review
Fix loop inlining 9 years ago Jan de Mooij [:jandem] 9.38 KB, patch		Details \| Diff \| Splinter Review
Part 4 - Fix loop inlining (v2) 9 years ago Jan de Mooij [:jandem] 10.62 KB, patch	h4writer : review+	Details \| Diff \| Splinter Review
Part 1 - Fix an old typo 9 years ago Jan de Mooij [:jandem] 958 bytes, patch	h4writer : review+	Details \| Diff \| Splinter Review
Part 2 - Lower inline limit if off-thread compilation is not available 9 years ago Jan de Mooij [:jandem] 9.25 KB, patch	h4writer : review+	Details \| Diff \| Splinter Review
Part 3 - Cap max inlined bytecode length 9 years ago Jan de Mooij [:jandem] 8.13 KB, patch	h4writer : review+	Details \| Diff \| Splinter Review
Part 5 - Lower inliningMaxCallerBytecodeLength 9 years ago Jan de Mooij [:jandem] 933 bytes, patch	h4writer : review+	Details \| Diff \| Splinter Review
Part 6 - Fix Kraken regression 9 years ago Jan de Mooij [:jandem] 8.72 KB, patch	h4writer : review+	Details \| Diff \| Splinter Review

Jan de Mooij [:jandem]

Assignee

Description

•

9 years ago

Bug 1059364 eliminated ObjectGroupDispatch's falback path if the dispatch is known to handle all incoming objects. This was a win on Octane-deltablue, but not a big one because there are some call sites where we're not inlining one callee, so we still need the fallback path.

The problem is that we hit the max inlining depth. I need to look into this more, but if I bump the smallFunctionMaxBytecodeLength from 100 to 125 I get the following scores on Octane-deltablue:

32-bit:

default:         53500 -> 55228
unboxed-objects: 55625 -> 68016

64-bit:

default:         40696 -> 46455
unboxed-objects: 50547 -> 69093

So it's a pretty big win, especially with --unboxed-objects (20-30%).

Jan de Mooij [:jandem]

Assignee

Comment 1

•

9 years ago

Attached patch Increase smallFunctionMaxBytecodeLength_ (obsolete) — Details — Splinter Review

This increases smallFunctionMaxBytecodeLength_ from 100 to 125. "Small functions" have a different inlining depth.

This is a pretty big win on Octane-deltablue (see comment 0) and Octane-raytrace, both with/without --unboxed-objects. I don't see any big regressions, but will keep an eye on AWFY.

Attachment #8576871 - Flags: review?(bhackett1024)

Brian Hackett [Laid off!]

Comment 2

•

9 years ago

Comment on attachment 8576871 [details] [diff] [review]
Increase smallFunctionMaxBytecodeLength_

Review of attachment 8576871 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: js/src/jit/JitOptions.cpp
@@ +157,5 @@
>      SET_DEFAULT(osrPcMismatchesBeforeRecompile, 6000);
>  
>      // The bytecode length limit for small function.
>      //
>      // The default for this was arrived at empirically via benchmarking.

Maybe remove this sentence too, it's pretty weird/vague.

Attachment #8576871 - Flags: review?(bhackett1024) → review+

Jan de Mooij [:jandem]

Assignee

Comment 3

•

9 years ago

Attached patch Fix loop inlining (obsolete) — Details — Splinter Review

Actually the first patch was a win but didn't fix the root cause and regressed richards a bit.

Here's an alternative patch based on what Hannes did in bug 768288. We record the max inlining depth when we inline in IonBuilder, and when we inline a function with loops we take that into account.

This allows us to inline loops in a lot of cases, removes some very narrow heuristics nbp added, fixes deltablue and doesn't regress richards or anything else as far as I can see.

Raytrace improves less than with the other patch, we can look into that later.

Attachment #8577192 - Flags: review?(hv1989)

Jan de Mooij [:jandem]

Assignee

Comment 4

•

9 years ago

Attached patch Part 4 - Fix loop inlining (v2) — Details — Splinter Review

This fixes the issue we discussed on IRC today. Good catch btw!

Attachment #8577192 - Attachment is obsolete: true

Attachment #8577192 - Flags: review?(hv1989)

Attachment #8578055 - Flags: review?(hv1989)

Hannes Verschore [:h4writer]

Comment 5

•

9 years ago

Comment on attachment 8578055 [details] [diff] [review]
Part 4 - Fix loop inlining (v2)

Review of attachment 8578055 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome!
Just some suggestions on (hopefully) improving the explanation of this new heuristic.

::: js/src/jit/BaselineJIT.h
@@ +214,5 @@
>  
> +    // The max inlining depth that should be used (in addition to the normal
> +    // depth checks) when deciding to inline this script. This starts as
> +    // UINT8_MAX, so that it can always be inlined, and is updated when we
> +    // Ion-compile this script. See makeInliningDecision for more info.

// The max inlining depth when inlining this script where we still can inline all currently inlined functions.
// This starts as UINT8_MAX, since we have no data yet and want to allow anything, and is updated when we Ion-compile this script.
// See makeInliningDecision for more info.

::: js/src/jit/IonBuilder.cpp
@@ +4867,5 @@
> +    BaselineScript *outerBaseline = outermostBuilder()->script()->baselineScript();
> +    if (inliningDepth_ >= maxInlineDepth) {
> +        // We hit the depth limit, so don't allow inlining the outer script in
> +        // other scripts. This is currently only used if it has loops, see the
> +        // comment below.

// We hit the depth limit and don't inline a function. Mark the outer most script to always failing to inline a function, even if the outer most script isn't inlined. This heuristic is currently only used for inlining loops, see the comment bellow.

@@ +4890,5 @@
> +    //
> +    // To avoid this problem, we record a separate max inlining depth for each
> +    // script, based on the functions we inline in it. So for the example above,
> +    // if we inline g in f, we will no longer allow inlining f if it means we
> +    // can't inline g as well.

// To avoid this problem, we record a separate max inlining depth for each
// script, indicating at which depth we won't be able to inline all inner functions anymore.
// Which solves the issue above and only inlines f, if it means we can also inline g.

@@ +4897,5 @@
> +    {
> +        trackOptimizationOutcome(TrackedOutcome::CantInlineExceededDepth);
> +        return DontInline(targetScript, "Vetoed: exceeding allowed script inline depth");
> +    }
> +

Can you add here:

// End of heuristics.
//
// Update properties knowing we definitely shall inline this script

Attachment #8578055 - Flags: review?(hv1989) → review+

Jan de Mooij [:jandem]

Assignee

Comment 6

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/b79cddbe7de8

Thanks, I adjusted the comments a bit.

Ryan VanderMeulen [:RyanVM]

Comment 7

•

9 years ago

Backed out for making bug 1112947 spike significantly.
bug https://hg.mozilla.org/integration/mozilla-inbound/rev/519e18aa7875

Guilherme Lima

Comment 8

•

9 years ago

On Windows, according to AWFY, this patch was:

13% regression on ss-aes
25% regression on kraken-crypto-ccm
14% regression on kraken-crypto-pbkdf2
3% regression on kraken-crypto-sha256-iterative
2% regression on dromaeo-sunspider-bitops-nsieve-bits

8% improvement on octane-crypto
20% improvement on octane-deltablue
2% improvement on octane-earleyboyer
3% improvement on octane-richards
18% improvement on octane-typescript
3% improvement on dromaeo-sunspider-access-nsieve
9% improvement on dromaeo-sunspider-crypto-aes
7% improvement on dromaeo-sunspider-math-cordic
17% improvement on dromaeo-v8-deltablue
4% improvement on dromaeo-v8-earley-boyer
3% improvement on dromaeo-v8-richards

On Mac there was less up and down, it seems, but looking only on the things that don't run on Windows:

6% regression on misc-bugs-508716-fluid-dynamics
16% regression on misc-bugs-636096-model2d
18% regression on misc-bugs-658844-sha256-bitcoins

16% improvement on misc-bugs-1131099-lodash1
4% improvement on asmjs-ubench-life (no asmjs)
41% improvement on asmjs-ubench-mandelbrot-native (no asmjs)

Jan de Mooij [:jandem]

Assignee

Comment 9

•

9 years ago

Attached patch Part 1 - Fix an old typo — Details — Splinter Review

Ugh, inliningMaxCallerBytecodeLength() returned inlineMaxTotalBytecodeLength_ (1000) instead of inliningMaxCallerBytecodeLength_ (10000), so in a lot of scripts we only inlined small functions.

This patch is a big win (20% or so) on raytrace and explains why my initial patch for this bug was a win there.

Attachment #8576871 - Attachment is obsolete: true

Attachment #8580559 - Flags: review?(hv1989)

Jan de Mooij [:jandem]

Assignee

Comment 10

•

9 years ago

Attached patch Part 2 - Lower inline limit if off-thread compilation is not available — Details — Splinter Review

The test that turned orange inlines a ton of stuff, but we don't have off-thread compilation on the Amazon EC2 slaves (I think), so we hit a 10 second setTimeout somewhere because we were compiling *huge* amounts of code on the main thread.

If we have no off-thread compilation, don't inline scripts with bytecode length > 500 (instead of 1000 for the main thread).

Attachment #8580580 - Flags: review?(hv1989)

Jan de Mooij [:jandem]

Assignee

Comment 11

•

9 years ago

Attached patch Part 3 - Cap max inlined bytecode length — Details — Splinter Review

This patch keeps track of the total bytecode length inlined in a single script. On Octane, the max value I saw was 53703 bytes.

On the pdf.js test that turned orange, I got values as high as 90,000-165,000 (!).

I set the limit to 80,000, so we never hit it on Octane (and most other benchmarks I think), but it should avoid uncontrolled inlining in pathological cases.

Attachment #8580585 - Flags: review?(hv1989)

Jan de Mooij [:jandem]

Assignee

Updated

•

9 years ago

Attachment #8580580 - Attachment description: Part 3 - Lower inline limit if off-thread compilation is not available → Part 2 - Lower inline limit if off-thread compilation is not available

Jan de Mooij [:jandem]

Assignee

Updated

•

9 years ago

Attachment #8580559 - Attachment description: Part 2 - Fix an old typo → Part 1 - Fix an old typo

Jan de Mooij [:jandem]

Assignee

Updated

•

9 years ago

Attachment #8578055 - Attachment description: Fix loop inlining (v2) → Part 4 - Fix loop inlining (v2)

Hannes Verschore [:h4writer]

Comment 12

•

9 years ago

Comment on attachment 8580559 [details] [diff] [review]
Part 1 - Fix an old typo

Review of attachment 8580559 [details] [diff] [review]:
-----------------------------------------------------------------

Good catch!

Attachment #8580559 - Flags: review?(hv1989) → review+

Hannes Verschore [:h4writer]

Comment 13

•

9 years ago

Comment on attachment 8580580 [details] [diff] [review]
Part 2 - Lower inline limit if off-thread compilation is not available

Review of attachment 8580580 [details] [diff] [review]:
-----------------------------------------------------------------

Can you replace all uses of "OffThreadCompilationAvailable" with JitCompileOptions.offThreadCompilationAvailable()

::: js/src/jit/CompileWrappers.cpp
@@ +286,5 @@
>      JS::CompartmentOptions &options = cx->compartment()->options();
>      cloneSingletons_ = options.cloneSingletons();
>      spsSlowAssertionsEnabled_ = cx->runtime()->spsProfiler.enabled() &&
>                                  cx->runtime()->spsProfiler.slowAssertionsEnabled();
> +    offThreadCompilationAvailable_ = offThreadCompilationAvailable;

Can we move the content of "OffThreadCompilationAvailable" to here? So that this value is always correct?

::: js/src/jit/IonAnalysis.cpp
@@ +3191,5 @@
>          return false;
>      }
>  
>      BaselineInspector inspector(script);
> +    const JitCompileOptions options(cx, /* offThreadCompilationAvailable = */ true);

Why are we taking default "true" here? We should also calculate it.

@@ +3417,5 @@
>      if (!constraints)
>          return false;
>  
>      BaselineInspector inspector(script);
> +    const JitCompileOptions options(cx, /* offThreadCompilationAvailable = */ true);

Here it makes no difference, since the value is only used for inlining heuristics. But I would prefer to have the correct input here. If we add more checks that use this variable it can introduce faults...

::: js/src/jit/IonOptimizationLevels.h
@@ +214,5 @@
>  
> +    uint32_t inlineMaxBytecodePerCallSite(bool offThread) const {
> +        return offThread
> +               ? inlineMaxBytecodePerCallSiteOffThread_
> +               : inlineMaxBytecodePerCallSiteMainThread_;

Can you check "js_JitOptions.limitScriptSize" and return "inlineMaxBytecodePerCallSiteOffThread_" instead of "inlineMaxBytecodePerCallSiteMainThread_" if we dont' limit it?

Attachment #8580580 - Flags: review?(hv1989) → review+

Hannes Verschore [:h4writer]

Comment 14

•

9 years ago

Comment on attachment 8580585 [details] [diff] [review]
Part 3 - Cap max inlined bytecode length

Review of attachment 8580585 [details] [diff] [review]:
-----------------------------------------------------------------

I was thinking this would trip on asmjs things, but apparently not on the asmjs benchmarks we have. Also the testcase in bug 1010417 doesn't trip. So it looks fine :D

Attachment #8580585 - Flags: review?(hv1989) → review+

Jan de Mooij [:jandem]

Assignee

Comment 15

•

9 years ago

Let's try this again:

https://hg.mozilla.org/integration/mozilla-inbound/rev/1c9568d7d26f
https://hg.mozilla.org/integration/mozilla-inbound/rev/310b3af47e93
https://hg.mozilla.org/integration/mozilla-inbound/rev/157929ef51b3
https://hg.mozilla.org/integration/mozilla-inbound/rev/f7299a88c59c

Try push confirms bc2 is happy:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=80788bc9b1f9

Jan de Mooij [:jandem]

Assignee

Comment 16

•

9 years ago

(In reply to Guilherme Lima from comment #8)
> On Windows, according to AWFY, this patch was:

Thanks for making that list! :)

Let's see how the latest patches affect AWFY. I'll look into the regressions later today or next week.

Jan de Mooij [:jandem]

Assignee

Comment 17

•

9 years ago

Attached patch Part 5 - Lower inliningMaxCallerBytecodeLength — Details — Splinter Review

There were some new regressions compared to yesterday's landing. I think those were caused by the new part 1: it effectively increased inliningMaxCallerBytecodeLength from 1000 to 10000.

This patch lowers it to 1500, that's enough to keep the raytrace win and seems to fix the TypeScript regression locally (unfortunately I don't see the Mandreel and regexp regressions here).

Attachment #8580726 - Flags: review?(hv1989)

Hannes Verschore [:h4writer]

Updated

•

9 years ago

Attachment #8580726 - Flags: review?(hv1989) → review+

Nicolas B. Pierron [:nbp]

Comment 18

•

9 years ago

These patches are regressing sunspider runs on FxOS flame devices.
Especially ss-aes, ss-cube and ss-md5.

http://arewefastyet.com/#machine=26&view=single&suite=ss&subtest=cube&start=1426845007&end=1426873392

Nicolas B. Pierron [:nbp]

Comment 19

•

9 years ago

And also kraken crypto benchmarks:

http://arewefastyet.com/#machine=26&view=single&suite=kraken&subtest=crypto-pbkdf2&start=1426842497&end=1426873392

Jan de Mooij [:jandem]

Assignee

Comment 20

•

9 years ago

Part 5:

https://hg.mozilla.org/integration/mozilla-inbound/rev/751aea13dd05

Phil Ringnalda (:philor)

Comment 21

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/1c9568d7d26f
https://hg.mozilla.org/mozilla-central/rev/310b3af47e93
https://hg.mozilla.org/mozilla-central/rev/157929ef51b3
https://hg.mozilla.org/mozilla-central/rev/f7299a88c59c
https://hg.mozilla.org/mozilla-central/rev/847f9159b3e9

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox39: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla39

Phil Ringnalda (:philor)

Comment 22

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/751aea13dd05

Jan de Mooij [:jandem]

Assignee

Comment 23

•

9 years ago

NI myself to look into the Kraken crypto ccm/pbkdf2 regression.

Flags: needinfo?(jdemooij)

Jan de Mooij [:jandem]

Assignee

Comment 24

•

9 years ago

Attached patch Part 6 - Fix Kraken regression — Details — Splinter Review

Don't inline scripts that are known to inline a lot of code. This fixes the crypto-ccm regression locally and doesn't regress Octane/Sunspider or other Kraken tests.

Might be a small win on Octane Crypto or TypeScript, hard to say.

Flags: needinfo?(jdemooij)

Attachment #8581727 - Flags: review?(hv1989)

Jan de Mooij [:jandem]

Assignee

Updated

•

9 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Hannes Verschore [:h4writer]

Comment 25

•

9 years ago

Comment on attachment 8581727 [details] [diff] [review]
Part 6 - Fix Kraken regression

Review of attachment 8581727 [details] [diff] [review]:
-----------------------------------------------------------------

Can you static assert that "inlineMaxCalleeInlinedBytecodeLength_" is smaller than 16bits? Or maybe also make it 16bits?

Attachment #8581727 - Flags: review?(hv1989) → review+

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

9 years ago

Depends on: 1147379

Jan de Mooij [:jandem]

Assignee

Comment 26

•

9 years ago

Part 6:

https://hg.mozilla.org/integration/mozilla-inbound/rev/9bf8f0e455c4

Ryan VanderMeulen [:RyanVM]

Comment 27

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/9bf8f0e455c4

Status: REOPENED → RESOLVED

Closed: 9 years ago → 9 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.