Closed
Bug 1142669
Opened 10 years ago
Closed 10 years ago
Fix function inlining issue on Octane-deltablue
Categories
(Core :: JavaScript Engine: JIT, defect)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
mozilla39
Tracking | Status | |
---|---|---|
firefox39 | --- | fixed |
People
(Reporter: jandem, Assigned: jandem)
References
Details
Attachments
(6 files, 2 obsolete files)
10.62 KB,
patch
|
h4writer
:
review+
|
Details | Diff | Splinter Review |
958 bytes,
patch
|
h4writer
:
review+
|
Details | Diff | Splinter Review |
9.25 KB,
patch
|
h4writer
:
review+
|
Details | Diff | Splinter Review |
8.13 KB,
patch
|
h4writer
:
review+
|
Details | Diff | Splinter Review |
933 bytes,
patch
|
h4writer
:
review+
|
Details | Diff | Splinter Review |
8.72 KB,
patch
|
h4writer
:
review+
|
Details | Diff | Splinter Review |
Bug 1059364 eliminated ObjectGroupDispatch's falback path if the dispatch is known to handle all incoming objects. This was a win on Octane-deltablue, but not a big one because there are some call sites where we're not inlining one callee, so we still need the fallback path.
The problem is that we hit the max inlining depth. I need to look into this more, but if I bump the smallFunctionMaxBytecodeLength from 100 to 125 I get the following scores on Octane-deltablue:
32-bit:
default: 53500 -> 55228
unboxed-objects: 55625 -> 68016
64-bit:
default: 40696 -> 46455
unboxed-objects: 50547 -> 69093
So it's a pretty big win, especially with --unboxed-objects (20-30%).
Assignee | ||
Comment 1•10 years ago
|
||
This increases smallFunctionMaxBytecodeLength_ from 100 to 125. "Small functions" have a different inlining depth.
This is a pretty big win on Octane-deltablue (see comment 0) and Octane-raytrace, both with/without --unboxed-objects. I don't see any big regressions, but will keep an eye on AWFY.
Attachment #8576871 -
Flags: review?(bhackett1024)
Comment 2•10 years ago
|
||
Comment on attachment 8576871 [details] [diff] [review]
Increase smallFunctionMaxBytecodeLength_
Review of attachment 8576871 [details] [diff] [review]:
-----------------------------------------------------------------
Nice!
::: js/src/jit/JitOptions.cpp
@@ +157,5 @@
> SET_DEFAULT(osrPcMismatchesBeforeRecompile, 6000);
>
> // The bytecode length limit for small function.
> //
> // The default for this was arrived at empirically via benchmarking.
Maybe remove this sentence too, it's pretty weird/vague.
Attachment #8576871 -
Flags: review?(bhackett1024) → review+
Assignee | ||
Comment 3•10 years ago
|
||
Actually the first patch was a win but didn't fix the root cause and regressed richards a bit.
Here's an alternative patch based on what Hannes did in bug 768288. We record the max inlining depth when we inline in IonBuilder, and when we inline a function with loops we take that into account.
This allows us to inline loops in a lot of cases, removes some very narrow heuristics nbp added, fixes deltablue and doesn't regress richards or anything else as far as I can see.
Raytrace improves less than with the other patch, we can look into that later.
Attachment #8577192 -
Flags: review?(hv1989)
Assignee | ||
Comment 4•10 years ago
|
||
This fixes the issue we discussed on IRC today. Good catch btw!
Attachment #8577192 -
Attachment is obsolete: true
Attachment #8577192 -
Flags: review?(hv1989)
Attachment #8578055 -
Flags: review?(hv1989)
Comment 5•10 years ago
|
||
Comment on attachment 8578055 [details] [diff] [review]
Part 4 - Fix loop inlining (v2)
Review of attachment 8578055 [details] [diff] [review]:
-----------------------------------------------------------------
Awesome!
Just some suggestions on (hopefully) improving the explanation of this new heuristic.
::: js/src/jit/BaselineJIT.h
@@ +214,5 @@
>
> + // The max inlining depth that should be used (in addition to the normal
> + // depth checks) when deciding to inline this script. This starts as
> + // UINT8_MAX, so that it can always be inlined, and is updated when we
> + // Ion-compile this script. See makeInliningDecision for more info.
// The max inlining depth when inlining this script where we still can inline all currently inlined functions.
// This starts as UINT8_MAX, since we have no data yet and want to allow anything, and is updated when we Ion-compile this script.
// See makeInliningDecision for more info.
::: js/src/jit/IonBuilder.cpp
@@ +4867,5 @@
> + BaselineScript *outerBaseline = outermostBuilder()->script()->baselineScript();
> + if (inliningDepth_ >= maxInlineDepth) {
> + // We hit the depth limit, so don't allow inlining the outer script in
> + // other scripts. This is currently only used if it has loops, see the
> + // comment below.
// We hit the depth limit and don't inline a function. Mark the outer most script to always failing to inline a function, even if the outer most script isn't inlined. This heuristic is currently only used for inlining loops, see the comment bellow.
@@ +4890,5 @@
> + //
> + // To avoid this problem, we record a separate max inlining depth for each
> + // script, based on the functions we inline in it. So for the example above,
> + // if we inline g in f, we will no longer allow inlining f if it means we
> + // can't inline g as well.
// To avoid this problem, we record a separate max inlining depth for each
// script, indicating at which depth we won't be able to inline all inner functions anymore.
// Which solves the issue above and only inlines f, if it means we can also inline g.
@@ +4897,5 @@
> + {
> + trackOptimizationOutcome(TrackedOutcome::CantInlineExceededDepth);
> + return DontInline(targetScript, "Vetoed: exceeding allowed script inline depth");
> + }
> +
Can you add here:
// End of heuristics.
//
// Update properties knowing we definitely shall inline this script
Attachment #8578055 -
Flags: review?(hv1989) → review+
Assignee | ||
Comment 6•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/b79cddbe7de8
Thanks, I adjusted the comments a bit.
Comment 7•10 years ago
|
||
Backed out for making bug 1112947 spike significantly.
bug https://hg.mozilla.org/integration/mozilla-inbound/rev/519e18aa7875
Comment 8•10 years ago
|
||
On Windows, according to AWFY, this patch was:
13% regression on ss-aes
25% regression on kraken-crypto-ccm
14% regression on kraken-crypto-pbkdf2
3% regression on kraken-crypto-sha256-iterative
2% regression on dromaeo-sunspider-bitops-nsieve-bits
8% improvement on octane-crypto
20% improvement on octane-deltablue
2% improvement on octane-earleyboyer
3% improvement on octane-richards
18% improvement on octane-typescript
3% improvement on dromaeo-sunspider-access-nsieve
9% improvement on dromaeo-sunspider-crypto-aes
7% improvement on dromaeo-sunspider-math-cordic
17% improvement on dromaeo-v8-deltablue
4% improvement on dromaeo-v8-earley-boyer
3% improvement on dromaeo-v8-richards
On Mac there was less up and down, it seems, but looking only on the things that don't run on Windows:
6% regression on misc-bugs-508716-fluid-dynamics
16% regression on misc-bugs-636096-model2d
18% regression on misc-bugs-658844-sha256-bitcoins
16% improvement on misc-bugs-1131099-lodash1
4% improvement on asmjs-ubench-life (no asmjs)
41% improvement on asmjs-ubench-mandelbrot-native (no asmjs)
Assignee | ||
Comment 9•10 years ago
|
||
Ugh, inliningMaxCallerBytecodeLength() returned inlineMaxTotalBytecodeLength_ (1000) instead of inliningMaxCallerBytecodeLength_ (10000), so in a lot of scripts we only inlined small functions.
This patch is a big win (20% or so) on raytrace and explains why my initial patch for this bug was a win there.
Attachment #8576871 -
Attachment is obsolete: true
Attachment #8580559 -
Flags: review?(hv1989)
Assignee | ||
Comment 10•10 years ago
|
||
The test that turned orange inlines a ton of stuff, but we don't have off-thread compilation on the Amazon EC2 slaves (I think), so we hit a 10 second setTimeout somewhere because we were compiling *huge* amounts of code on the main thread.
If we have no off-thread compilation, don't inline scripts with bytecode length > 500 (instead of 1000 for the main thread).
Attachment #8580580 -
Flags: review?(hv1989)
Assignee | ||
Comment 11•10 years ago
|
||
This patch keeps track of the total bytecode length inlined in a single script. On Octane, the max value I saw was 53703 bytes.
On the pdf.js test that turned orange, I got values as high as 90,000-165,000 (!).
I set the limit to 80,000, so we never hit it on Octane (and most other benchmarks I think), but it should avoid uncontrolled inlining in pathological cases.
Attachment #8580585 -
Flags: review?(hv1989)
Assignee | ||
Updated•10 years ago
|
Attachment #8580580 -
Attachment description: Part 3 - Lower inline limit if off-thread compilation is not available → Part 2 - Lower inline limit if off-thread compilation is not available
Assignee | ||
Updated•10 years ago
|
Attachment #8580559 -
Attachment description: Part 2 - Fix an old typo → Part 1 - Fix an old typo
Assignee | ||
Updated•10 years ago
|
Attachment #8578055 -
Attachment description: Fix loop inlining (v2) → Part 4 - Fix loop inlining (v2)
Comment 12•10 years ago
|
||
Comment on attachment 8580559 [details] [diff] [review]
Part 1 - Fix an old typo
Review of attachment 8580559 [details] [diff] [review]:
-----------------------------------------------------------------
Good catch!
Attachment #8580559 -
Flags: review?(hv1989) → review+
Comment 13•10 years ago
|
||
Comment on attachment 8580580 [details] [diff] [review]
Part 2 - Lower inline limit if off-thread compilation is not available
Review of attachment 8580580 [details] [diff] [review]:
-----------------------------------------------------------------
Can you replace all uses of "OffThreadCompilationAvailable" with JitCompileOptions.offThreadCompilationAvailable()
::: js/src/jit/CompileWrappers.cpp
@@ +286,5 @@
> JS::CompartmentOptions &options = cx->compartment()->options();
> cloneSingletons_ = options.cloneSingletons();
> spsSlowAssertionsEnabled_ = cx->runtime()->spsProfiler.enabled() &&
> cx->runtime()->spsProfiler.slowAssertionsEnabled();
> + offThreadCompilationAvailable_ = offThreadCompilationAvailable;
Can we move the content of "OffThreadCompilationAvailable" to here? So that this value is always correct?
::: js/src/jit/IonAnalysis.cpp
@@ +3191,5 @@
> return false;
> }
>
> BaselineInspector inspector(script);
> + const JitCompileOptions options(cx, /* offThreadCompilationAvailable = */ true);
Why are we taking default "true" here? We should also calculate it.
@@ +3417,5 @@
> if (!constraints)
> return false;
>
> BaselineInspector inspector(script);
> + const JitCompileOptions options(cx, /* offThreadCompilationAvailable = */ true);
Here it makes no difference, since the value is only used for inlining heuristics. But I would prefer to have the correct input here. If we add more checks that use this variable it can introduce faults...
::: js/src/jit/IonOptimizationLevels.h
@@ +214,5 @@
>
> + uint32_t inlineMaxBytecodePerCallSite(bool offThread) const {
> + return offThread
> + ? inlineMaxBytecodePerCallSiteOffThread_
> + : inlineMaxBytecodePerCallSiteMainThread_;
Can you check "js_JitOptions.limitScriptSize" and return "inlineMaxBytecodePerCallSiteOffThread_" instead of "inlineMaxBytecodePerCallSiteMainThread_" if we dont' limit it?
Attachment #8580580 -
Flags: review?(hv1989) → review+
Comment 14•10 years ago
|
||
Comment on attachment 8580585 [details] [diff] [review]
Part 3 - Cap max inlined bytecode length
Review of attachment 8580585 [details] [diff] [review]:
-----------------------------------------------------------------
I was thinking this would trip on asmjs things, but apparently not on the asmjs benchmarks we have. Also the testcase in bug 1010417 doesn't trip. So it looks fine :D
Attachment #8580585 -
Flags: review?(hv1989) → review+
Assignee | ||
Comment 15•10 years ago
|
||
Let's try this again:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1c9568d7d26f
https://hg.mozilla.org/integration/mozilla-inbound/rev/310b3af47e93
https://hg.mozilla.org/integration/mozilla-inbound/rev/157929ef51b3
https://hg.mozilla.org/integration/mozilla-inbound/rev/f7299a88c59c
Try push confirms bc2 is happy:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=80788bc9b1f9
Assignee | ||
Comment 16•10 years ago
|
||
(In reply to Guilherme Lima from comment #8)
> On Windows, according to AWFY, this patch was:
Thanks for making that list! :)
Let's see how the latest patches affect AWFY. I'll look into the regressions later today or next week.
Assignee | ||
Comment 17•10 years ago
|
||
There were some new regressions compared to yesterday's landing. I think those were caused by the new part 1: it effectively increased inliningMaxCallerBytecodeLength from 1000 to 10000.
This patch lowers it to 1500, that's enough to keep the raytrace win and seems to fix the TypeScript regression locally (unfortunately I don't see the Mandreel and regexp regressions here).
Attachment #8580726 -
Flags: review?(hv1989)
Updated•10 years ago
|
Attachment #8580726 -
Flags: review?(hv1989) → review+
Comment 18•10 years ago
|
||
These patches are regressing sunspider runs on FxOS flame devices.
Especially ss-aes, ss-cube and ss-md5.
http://arewefastyet.com/#machine=26&view=single&suite=ss&subtest=cube&start=1426845007&end=1426873392
Comment 19•10 years ago
|
||
And also kraken crypto benchmarks:
http://arewefastyet.com/#machine=26&view=single&suite=kraken&subtest=crypto-pbkdf2&start=1426842497&end=1426873392
Assignee | ||
Comment 20•10 years ago
|
||
Comment 21•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/1c9568d7d26f
https://hg.mozilla.org/mozilla-central/rev/310b3af47e93
https://hg.mozilla.org/mozilla-central/rev/157929ef51b3
https://hg.mozilla.org/mozilla-central/rev/f7299a88c59c
https://hg.mozilla.org/mozilla-central/rev/847f9159b3e9
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
status-firefox39:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Comment 22•10 years ago
|
||
Assignee | ||
Comment 23•10 years ago
|
||
NI myself to look into the Kraken crypto ccm/pbkdf2 regression.
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 24•10 years ago
|
||
Don't inline scripts that are known to inline a lot of code. This fixes the crypto-ccm regression locally and doesn't regress Octane/Sunspider or other Kraken tests.
Might be a small win on Octane Crypto or TypeScript, hard to say.
Flags: needinfo?(jdemooij)
Attachment #8581727 -
Flags: review?(hv1989)
Assignee | ||
Updated•10 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 25•10 years ago
|
||
Comment on attachment 8581727 [details] [diff] [review]
Part 6 - Fix Kraken regression
Review of attachment 8581727 [details] [diff] [review]:
-----------------------------------------------------------------
Can you static assert that "inlineMaxCalleeInlinedBytecodeLength_" is smaller than 16bits? Or maybe also make it 16bits?
Attachment #8581727 -
Flags: review?(hv1989) → review+
Assignee | ||
Comment 26•10 years ago
|
||
Comment 27•10 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•