Closed
Bug 1364908
Opened 8 years ago
Closed 7 years ago
Support inlining functions that use arguments[x]
Categories
(Core :: JavaScript Engine: JIT, enhancement)
Core
JavaScript Engine: JIT
Tracking
()
Tracking | Status | |
---|---|---|
firefox56 | --- | fixed |
People
(Reporter: jandem, Assigned: nbp)
References
(Blocks 2 open bugs)
Details
Attachments
(4 files, 2 obsolete files)
15.52 KB,
patch
|
Details | Diff | Splinter Review | |
35.81 KB,
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
7.04 KB,
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
45.01 KB,
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
We're currently unable to inline functions in Ion that use lazy arguments + arguments[x]. Unfortunately this is pretty common, also in self-hosted code, so we should fix it.
When the number of arguments is small, we could get away with a MIR instruction that takes all arguments as operands and then just branches on the index. This doesn't really scale but in practice the number of arguments is often small so doing this for <= 3 arguments or so might be a reasonable short-term improvement.
Handling the many-arguments case is a lot more complicated. We have to reserve a range of slots for the arguments somewhere, make sure the arguments are stored there, and then for arguments[x] we could add a MIR instruction to load the argument from slot x.
Reporter | ||
Comment 1•8 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #0)
> We have to
> reserve a range of slots for the arguments somewhere, make sure the
> arguments are stored there, and then for arguments[x] we could add a MIR
> instruction to load the argument from slot x.
For this I think our main options are:
(1) Somehow force the register allocator to store the inlined arguments where we want them. Maybe for each inlined argument we could add an LUse with a fixed stack slot. Then ideally the allocator will be smart enough to move these spills out of the loop.
(2) Push some kind of JIT frame (JitFrame_IonInlined) when we enter the inlined function, and pop it when we leave. This will complicate codegen (need to add the size of this frame to slot offsets) and frame iterators though.
Reporter | ||
Comment 2•8 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #0)
> This doesn't really scale but in practice the number of arguments
> is often small so doing this for <= 3 arguments or so might be a reasonable
> short-term improvement.
Hm I checked Speedometer2 and various websites and the data confirms that handling <= 3 arguments will cover the vast majority of cases (like > 97%), so I think I'll start with this when I'm back from PTO (later this week).
On Gmail it's also pretty common to abort inlining when we have arguments[x] and 0 actual arguments, that's just silly.
Flags: needinfo?(jdemooij)
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(jdemooij)
Whiteboard: [qf]
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(jdemooij)
Comment 4•8 years ago
|
||
This allows using an "unlimited" number of arguments by using stack arguments. However after running out of registers the performance sucks. Instead of using a linear number of compares, maybe a jump table could be used? I would probably just limit this to <= 3 as suggested, a small number of compares should still be cheap.
Getting this to work on x86 is going to be painful, we need to support Values where the tag or payload can be on the stack.
I hope this is useful, I didn't read your last comment about working on this after you are back.
Updated•8 years ago
|
Blocks: Speedometer_V2
Updated•8 years ago
|
Blocks: TimeToFirstPaint_FB
Updated•8 years ago
|
No longer blocks: TimeToFirstPaint_FB
Assignee | ||
Comment 5•7 years ago
|
||
The most interesting part of this feature, is mostly the fact that we would inline the code of the function in the caller. Still we don't want to make it smaller to access the arguments, so I did a bunch of testing, with a build using this WIP patch, and a build using the backtracking on inlining failures:
Benchmark style:
function sum() {
var res = 0;
for (var i = 0; i < arguments.length; i++)
res += arguments[i];
return res;
}
function args10() {
var isInIon = false;
var start, end;
for (var c = 0; c < 1000000; ) {
sum(1,2,3,4,5,6,7,8,9,10);
if (inIon()) {
c++;
} else {
start = dateNow();
}
}
end = dateNow();
print("10 arguments:", end - start);
}
Results:
(this patch) (today)
10 arguments: 373.260009765625 396.137939453125
9 arguments: 355.77783203125 400.9169921875
8 arguments: 367.971923828125 410.825927734375
7 arguments: 360.7490234375 393.195068359375
6 arguments: 350.8798828125 394.631103515625
5 arguments: 359.70703125 396.396240234375
4 arguments: 354.60009765625 391.651123046875
3 arguments: 360.801025390625 388.56005859375
2 arguments: 348.429931640625 386.27001953125
1 arguments: 346.49609375 376.010986328125
The current implementation implements a switch case, with an indirection table. Here, the arguments are all constants which are flagged as emitAtUses, which means that they are baked-in the switch case generated in CodeGenerator::LoadElementFromStateV.
The reason why I adding MArgumentState and MLoadElementFromState, is that we might be able to use MLoadElementFromState with an MArrayState produced by ScalarReplacement. And later, we could also imagine using the same mechanism for property accesses on MObjectstate.
I will split this patch in various refactoring, and a final one which adds MArgumentState and MLoadElementFromState.
Assignee | ||
Comment 6•7 years ago
|
||
Attachment #8882277 -
Flags: review?(jdemooij)
Assignee | ||
Comment 7•7 years ago
|
||
Attachment #8882278 -
Flags: review?(jdemooij)
Assignee | ||
Comment 8•7 years ago
|
||
Attachment #8882279 -
Flags: review?(jdemooij)
Reporter | ||
Comment 9•7 years ago
|
||
Comment on attachment 8882277 [details] [diff] [review]
Move moveValue into the generic MacroAssembler.
Review of attachment 8882277 [details] [diff] [review]:
-----------------------------------------------------------------
Nice patch.
::: js/src/jit/MacroAssembler-inl.h
@@ +337,5 @@
> + moveValue(src.value(), dest);
> + return;
> + }
> +
> + if (src.reg().hasValue()) {
Wouldn't it be clearer to have the TypedOrValueRegister overload call moveValue(ValueOperand) if needed? Now we can only call it with TypedOrValueRegister if reg.hasTyped() is true, but we could easily handle the ValueOperand case there instead of here.
Attachment #8882277 -
Flags: review?(jdemooij) → review+
Reporter | ||
Comment 10•7 years ago
|
||
Comment on attachment 8882278 [details] [diff] [review]
Add MacroAssembler::branchToComputedAddress.
Review of attachment 8882278 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/arm/MacroAssembler-arm-inl.h
@@ +2067,5 @@
> +
> + if (addr.offset != 0) {
> + SecondScratchRegisterScope scratch2(*this);
> + ScratchRegisterScope scratch(*this);
> + MOZ_ASSERT(base != pc, "NYI: offset from pc should be shifted.");
We assert addr.base == pc at the start of this function. Does that mean this code is dead?
Attachment #8882278 -
Flags: review?(jdemooij) → review+
Assignee | ||
Updated•7 years ago
|
Attachment #8879680 -
Attachment is obsolete: true
Assignee | ||
Comment 11•7 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #10)
> ::: js/src/jit/arm/MacroAssembler-arm-inl.h
> @@ +2067,5 @@
> > +
> > + if (addr.offset != 0) {
> > + SecondScratchRegisterScope scratch2(*this);
> > + ScratchRegisterScope scratch(*this);
> > + MOZ_ASSERT(base != pc, "NYI: offset from pc should be shifted.");
>
> We assert addr.base == pc at the start of this function. Does that mean this
> code is dead?
Yes it is, I will replace the above condition by an assertion.
Assignee | ||
Comment 12•7 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #9)
> ::: js/src/jit/MacroAssembler-inl.h
> @@ +337,5 @@
> > + moveValue(src.value(), dest);
> > + return;
> > + }
> > +
> > + if (src.reg().hasValue()) {
>
> Wouldn't it be clearer to have the TypedOrValueRegister overload call
> moveValue(ValueOperand) if needed? Now we can only call it with
> TypedOrValueRegister if reg.hasTyped() is true, but we could easily handle
> the ValueOperand case there instead of here.
I will move the previous condition into the moveValue(TypedOrValueRegister, ValueOperand) functions.
Thus removing the assertion at the top of these functions.
Reporter | ||
Comment 13•7 years ago
|
||
Comment on attachment 8882279 [details] [diff] [review]
IonMonkey: Add LoadElementFromSate to support argument[x] in inlined functions.
Review of attachment 8882279 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good but some questions below.
::: js/src/jit/CodeGenerator.cpp
@@ +11146,5 @@
> + codegen->visitOutOfLineSwitch(this);
> + }
> +
> + public:
> + OutOfLineSwitch(TempAllocator& alloc)
Nit: explicit
@@ +11169,5 @@
> + if (tableType == SwitchTableType::Inline) {
> +#if defined(JS_CODEGEN_ARM)
> + base = ::js::jit::pc;
> +#else
> + MOZ_ASSERT(false, "NYI: SwitchTableType::Inline");
MOZ_CRASH?
@@ +11173,5 @@
> + MOZ_ASSERT(false, "NYI: SwitchTableType::Inline");
> +#endif
> + } else {
> +#if defined(JS_CODEGEN_ARM) || defined(JS_CODEGEN_ARM64)
> + MOZ_ASSERT(false, "NYI: SwitchTableType::OutOfLine");
And here
@@ +11212,5 @@
> +{
> + jumpTable->setOutOfLine();
> + if (tableType == SwitchTableType::OutOfLine) {
> +#if defined(JS_CODEGEN_ARM) || defined(JS_CODEGEN_ARM64)
> + MOZ_ASSERT(false, "NYI: SwitchTableType::OutOfLine");
And here
::: js/src/jit/IonBuilder.cpp
@@ +7641,5 @@
> MOZ_TRY(getElemTryArguments(&emitted, obj, index));
> if (emitted)
> return Ok();
>
> + trackOptimizationAttempt(TrackedStrategy::GetElem_ArgumentsInlinedConstant);
There are no changes to TrackedOptimizationInfo.h in this patch?
@@ +8365,5 @@
> + obj->setImplicitlyUsedUnchecked();
> +
> + MOZ_ASSERT(!info().argsObjAliasesFormals());
> +
> + // Ensure index is an integer.
We should probably check index->type() == MIRType::Int32 or at least IsNumberType(index->type()).
@@ +8370,5 @@
> + MInstruction* idInt32 = MToInt32::New(alloc(), index);
> + current->add(idInt32);
> + index = idInt32;
> +
> + // Bailouts if we read more than the number of actual arguments.
s/Bailouts/Bailout/
Can you add a comment here saying bailing out is okay because reading out of bounds arguments will disable the lazy arguments optimization for this script? That should ensure we don't get stuck in a bailout loop.
(I was also thinking LoadElementFromState could return undefined in this case. That's not correct because of getters on the prototype, but we could fix that with TI.)
::: js/src/jit/Lowering.cpp
@@ +3245,5 @@
> + temp1 = temp();
> +#endif
> + LLoadElementFromStateV* lir = new(alloc()) LLoadElementFromStateV(temp(), temp1, tempDouble());
> + if (!ins->array()->isArgumentState()) {
> + abort(AbortReason::Alloc, "LIRGenerator::visitLoadElementFromState: Unsupported state object");
Hm I don't like having non-OOM aborts in the compiler backend, and AbortReason::Alloc seems wrong.
Can this actually happen? If yes can you add a comment explaining when, and why this is uncommon enough that aborting is okay? If no, can we MOZ_ASSERT?
@@ +3295,5 @@
> + lir->setOperand(1 + BOX_PIECES * i + 1, LAllocation());
> +#endif
> + break;
> + default:
> + abort(AbortReason::Alloc, "LIRGenerator::visitLoadElementFromState: Unsupported element type.");
Same here, I'd prefer a MOZ_CRASH. These aborts always show up on some random benchmark or website out there.
::: js/src/jit/MIR.h
@@ +3957,5 @@
> }
> };
>
> +// Hold the arguments of an inlined frame. At the moment this class is not
> +// recovered on bailout, but as it does not have an implementation and it should
Nit: remove the "but " here?
::: js/src/jit/shared/LIR-shared.h
@@ +5737,5 @@
> return getOperand(0);
> }
> };
>
> +// Load a value from a dense array's elements vector. Bail out if it's the hole value.
This comment should be updated.
Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(jdemooij)
Reporter | ||
Updated•7 years ago
|
Attachment #8882279 -
Flags: review?(jdemooij)
Assignee | ||
Comment 14•7 years ago
|
||
Delta:
- Apply nits from comment 13.
- Add the uncommitted files:
- TrackedOptimizationInfo.h
- jit-test/tests/ion/inlining/inline-getelem-args.js
- Add test cases to check for underflow/overflow bailouts.
- Add support for properly loading boolean from the stack:
- see StackSlotAllocator::width(LDefinition::TypeFrom(elem->type()))
Attachment #8884831 -
Flags: review?(jdemooij)
Assignee | ||
Updated•7 years ago
|
Attachment #8882279 -
Attachment is obsolete: true
Reporter | ||
Comment 15•7 years ago
|
||
Comment on attachment 8884831 [details] [diff] [review]
IonMonkey: Add LoadElementFromSate to support argument[x] in inlined functions.
Review of attachment 8884831 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks. Some gnarly CodeGen stuff but I can't think of a good way to simplify that.
It will be interesting to see if this affects Speedometer - hopefully there won't be regressions from increased compilation time :)
::: js/src/jit/CodeGenerator.cpp
@@ +11298,5 @@
> + if (elem->type() == MIRType::Double) {
> + masm.loadDouble(ToAddress(a), tempD);
> + input = TypedOrValueRegister(elem->type(), AnyRegister(tempD));
> + } else if (elem->type() == MIRType::Value) {
> + typeReg = temp0;
Nit: can remove this line since we also set it before the if-else.
@@ +11304,5 @@
> +#ifdef JS_PUNBOX64
> + input = TypedOrValueRegister(ValueOperand(temp0));
> +#endif
> + } else {
> + typeReg = temp0;
Same.
::: js/src/jit/IonBuilder.cpp
@@ +8357,5 @@
> + return Ok();
> +
> + // Currently, we do not support any arguments vector larger than 10, as this
> + // is being translated into code at the call site, and it would be better to
> + // the arguments aligned on the stack.
s/better to/better to store/?
Attachment #8884831 -
Flags: review?(jdemooij) → review+
Comment 16•7 years ago
|
||
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a02a6cb9c94c
Move moveValue into the generic MacroAssembler. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/60809bf55571
Add MacroAssembler::branchToComputedAddress. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/b6be7c2be50e
IonMonkey: Add LoadElementFromSate to support argument[x] in inlined functions. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/a249596785aa
Fix ion/bug1284485.js unreported OOM. r=bbouvier
Comment 17•7 years ago
|
||
Backed out for build bustage: undeclared 'StackSlotAllocator' at js/src/jit/CodeGenerator.cpp:11199:
https://hg.mozilla.org/integration/mozilla-inbound/rev/273e8ce1c9e38f4ae3527508f9512adfe2d0bec4
https://hg.mozilla.org/integration/mozilla-inbound/rev/5913aca4915039cc0e05e7933922db23c30e4366
https://hg.mozilla.org/integration/mozilla-inbound/rev/9d84a411ea6da7680c5086f29ecdfe20d5c4a688
https://hg.mozilla.org/integration/mozilla-inbound/rev/a51672e13a1b018a41a6e9f6c156605861863293
Push with bustage: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=a249596785aa58973113cf830a55a54d0a5257de&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=runnable
Build log: https://treeherder.mozilla.org/logviewer.html#?job_id=114912641&repo=mozilla-inbound
[task 2017-07-17T17:11:01.326683Z] 17:11:01 INFO - In file included from /home/worker/workspace/build/src/obj-firefox/js/src/Unified_cpp_js_src11.cpp:38:0:
[task 2017-07-17T17:11:01.326804Z] 17:11:01 INFO - /home/worker/workspace/build/src/js/src/jit/CodeGenerator.cpp: In member function 'virtual void js::jit::CodeGenerator::visitLoadElementFromStateV(js::jit::LLoadElementFromStateV*)':
[task 2017-07-17T17:11:01.326870Z] 17:11:01 INFO - /home/worker/workspace/build/src/js/src/jit/CodeGenerator.cpp:11199:32: error: 'StackSlotAllocator' has not been declared
[task 2017-07-17T17:11:01.326925Z] 17:11:01 INFO - size_t width = StackSlotAllocator::width(LDefinition::TypeFrom(elem->type()));
[task 2017-07-17T17:11:01.326958Z] 17:11:01 INFO - ^
[task 2017-07-17T17:11:01.327011Z] 17:11:01 INFO - /home/worker/workspace/build/src/config/rules.mk:1051: recipe for target 'Unified_cpp_js_src11.o' failed
[task 2017-07-17T17:11:01.327048Z] 17:11:01 INFO - gmake[5]: *** [Unified_cpp_js_src11.o] Error 1
Flags: needinfo?(nicolas.b.pierron)
Assignee | ||
Comment 18•7 years ago
|
||
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #17)
> Backed out for build bustage: undeclared 'StackSlotAllocator' at
> js/src/jit/CodeGenerator.cpp:11199:
Fixed by adding a missing include in CodeGenerator.cpp:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8f98b23446f2eb7dd240fe4fe952f0e43e7df73e
Flags: needinfo?(nicolas.b.pierron)
Comment 19•7 years ago
|
||
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0f61bc51de11
Move moveValue into the generic MacroAssembler. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/8be467b057f6
Add MacroAssembler::branchToComputedAddress. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/07e91ed49793
IonMonkey: Add LoadElementFromSate to support argument[x] in inlined functions. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/09937ba00629
Fix ion/bug1284485.js unreported OOM. r=bbouvier
Comment 20•7 years ago
|
||
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2817d497d4b1
Fix test case basic/bug858097.js, to checking in Compacting GC cases. r=jonco
Reporter | ||
Comment 21•7 years ago
|
||
This regressed the assorted misc-bugs-658844-sha256-bitcoins test by 68% (22 ms -> 38 ms) and the misc-basic-arguments test from 15 to 24 ms. Would you mind taking a look? :)
Flags: needinfo?(nicolas.b.pierron)
Comment 22•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0f61bc51de11
https://hg.mozilla.org/mozilla-central/rev/8be467b057f6
https://hg.mozilla.org/mozilla-central/rev/07e91ed49793
https://hg.mozilla.org/mozilla-central/rev/09937ba00629
https://hg.mozilla.org/mozilla-central/rev/2817d497d4b1
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox56:
--- → fixed
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Assignee | ||
Comment 23•7 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #21)
> This regressed the assorted misc-bugs-658844-sha256-bitcoins test by 68% (22
> ms -> 38 ms) and the misc-basic-arguments test from 15 to 24 ms. Would you
> mind taking a look? :)
Testing with misc-basic-arguments benchmark, and changing the number of actual arguments:
before after
10 7.0341796875 10.79296875
9 7.213134765625 9.68408203125
8 7.39208984375 10.72802734375
7 7.550048828125 10.2041015625
6 7.85595703125 10.31298828125
5 8.593017578125 10.824951171875
4 8.97998046875 12.198974609375
3 9.863037109375 11.991943359375
2 11.97412109375 14.195068359375
1 18.015869140625 14.73193359375
Based on these results, it sounds like we should handle the case where there is only one argument to the function separately, and probably look for allocating a vector on the stack otherwise. I will look if there is an easy way to handle that.
Assignee | ||
Comment 24•7 years ago
|
||
Manually unrolling the loop which contains arguments[x], and making sure that these reads are not LICM-ed out of the loop highlights that we can optimize arguments[x] better by only unrolling the loops which are containing these expressions.
before after
10 7.76513671875 7.136962890625
9 6.701171875 7.051025390625
8 7.461181640625 6.993896484375
7 7.212158203125 6.819091796875
6 7.501220703125 6.922119140625
5 8.198974609375 7.032958984375
4 9.0849609375 6.64404296875
3 12.39794921875 4.43798828125
2 15.126953125 3.48095703125
1 27.291015625 1.794921875
If the body of a loop is too large to be unrolled, then the cost of the arguments[x] probably does not matter as much compared to the cost of the body of the loop.
In these results the cost of the arguments[x] expression (not yet folded into a single argument read, but still implemented as a switch case) seems to confirm that the issue seen in comment 23 & comment 21 comes from the fact that the generated code does not play nicely with the branch predictor (within a loop).
Unrolling the loop containing arguments[x] removes the miss-prediction issue and indirect branch is known by the branch predictor to always use the same case statement, leading to the results above. If we were to fold the MLoadElementFromState given a known index, into the proper argument read, then we could potentially get even more performance out of it.
I suggest to focus on enabling loop unrolling as a solution for these regressions.
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(nicolas.b.pierron)
Updated•6 years ago
|
Assignee: nobody → nicolas.b.pierron
Updated•3 years ago
|
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in
before you can comment on or make changes to this bug.
Description
•