Support inlining functions that use arguments[x]

RESOLVED FIXED in Firefox 56

Status

()

Core
JavaScript Engine: JIT
RESOLVED FIXED
9 months ago
6 months ago

People

(Reporter: jandem, Unassigned)

Tracking

(Depends on: 1 bug, Blocks: 2 bugs)

unspecified
mozilla56
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox56 fixed)

Details

(Whiteboard: [qf:p1])

Attachments

(4 attachments, 2 obsolete attachments)

(Reporter)

Description

9 months ago
We're currently unable to inline functions in Ion that use lazy arguments + arguments[x]. Unfortunately this is pretty common, also in self-hosted code, so we should fix it.

When the number of arguments is small, we could get away with a MIR instruction that takes all arguments as operands and then just branches on the index. This doesn't really scale but in practice the number of arguments is often small so doing this for <= 3 arguments or so might be a reasonable short-term improvement.

Handling the many-arguments case is a lot more complicated. We have to reserve a range of slots for the arguments somewhere, make sure the arguments are stored there, and then for arguments[x] we could add a MIR instruction to load the argument from slot x.
(Reporter)

Comment 1

9 months ago
(In reply to Jan de Mooij [:jandem] from comment #0)
> We have to
> reserve a range of slots for the arguments somewhere, make sure the
> arguments are stored there, and then for arguments[x] we could add a MIR
> instruction to load the argument from slot x.

For this I think our main options are:

(1) Somehow force the register allocator to store the inlined arguments where we want them. Maybe for each inlined argument we could add an LUse with a fixed stack slot. Then ideally the allocator will be smart enough to move these spills out of the loop.

(2) Push some kind of JIT frame (JitFrame_IonInlined) when we enter the inlined function, and pop it when we leave. This will complicate codegen (need to add the size of this frame to slot offsets) and frame iterators though.
(Reporter)

Comment 2

9 months ago
(In reply to Jan de Mooij [:jandem] from comment #0)
> This doesn't really scale but in practice the number of arguments
> is often small so doing this for <= 3 arguments or so might be a reasonable
> short-term improvement.

Hm I checked Speedometer2 and various websites and the data confirms that handling <= 3 arguments will cover the vast majority of cases (like > 97%), so I think I'll start with this when I'm back from PTO (later this week).

On Gmail it's also pretty common to abort inlining when we have arguments[x] and 0 actual arguments, that's just silly.
Flags: needinfo?(jdemooij)
(Reporter)

Updated

9 months ago
Flags: needinfo?(jdemooij)
Whiteboard: [qf]
(Reporter)

Updated

9 months ago
Flags: needinfo?(jdemooij)
P1 because it impacts Speedometer
Whiteboard: [qf] → [qf:p1]
Created attachment 8869147 [details] [diff] [review]
POC for x64

This allows using an "unlimited" number of arguments by using stack arguments. However after running out of registers the performance sucks. Instead of using a linear number of compares, maybe a jump table could be used? I  would probably just limit this to <= 3 as suggested, a small number of compares should still be cheap.

Getting this to work on x86 is going to be painful, we need to support Values where the tag or payload can be on the stack.

I hope this is useful, I didn't read your last comment about working on this after you are back.
Blocks: 1339557

Updated

9 months ago
Blocks: 1364854

Updated

9 months ago
Blocks: 1366777

Updated

9 months ago
No longer blocks: 1366777
Created attachment 8879680 [details] [diff] [review]
WIP x64 - LoadElementFromState

The most interesting part of this feature, is mostly the fact that we would inline the code of the function in the caller.  Still we don't want to make it smaller to access the arguments, so I did a bunch of testing, with a build using this WIP patch, and a build using the backtracking on inlining failures:

Benchmark style:

function sum() {
    var res = 0;
    for (var i = 0; i < arguments.length; i++)
        res += arguments[i];
    return res;
}

function args10() {
    var isInIon = false;
    var start, end;
    for (var c = 0; c < 1000000; ) {
        sum(1,2,3,4,5,6,7,8,9,10);

        if (inIon()) {
            c++;
        } else {
            start = dateNow();
        }
    }
    end = dateNow();
    print("10 arguments:", end - start);
}

Results:

               (this patch)                         (today)
10 arguments: 373.260009765625			396.137939453125
9 arguments:  355.77783203125			400.9169921875
8 arguments:  367.971923828125			410.825927734375
7 arguments:  360.7490234375			393.195068359375
6 arguments:  350.8798828125			394.631103515625
5 arguments:  359.70703125			396.396240234375
4 arguments:  354.60009765625			391.651123046875
3 arguments:  360.801025390625			388.56005859375
2 arguments:  348.429931640625			386.27001953125
1 arguments:  346.49609375			376.010986328125

The current implementation implements a switch case, with an indirection table. Here, the arguments are all constants which are flagged as emitAtUses, which means that they are baked-in the switch case generated in CodeGenerator::LoadElementFromStateV.

The reason why I adding MArgumentState and MLoadElementFromState, is that we might be able to use MLoadElementFromState with an MArrayState produced by ScalarReplacement.  And later, we could also imagine using the same mechanism for property accesses on MObjectstate.

I will split this patch in various refactoring, and a final one which adds MArgumentState and MLoadElementFromState.
Depends on: 1376921
Created attachment 8882277 [details] [diff] [review]
Move moveValue into the generic MacroAssembler.
Attachment #8882277 - Flags: review?(jdemooij)
Created attachment 8882278 [details] [diff] [review]
Add MacroAssembler::branchToComputedAddress.
Attachment #8882278 - Flags: review?(jdemooij)
Created attachment 8882279 [details] [diff] [review]
IonMonkey: Add LoadElementFromSate to support argument[x] in inlined functions.
Attachment #8882279 - Flags: review?(jdemooij)
(Reporter)

Comment 9

8 months ago
Comment on attachment 8882277 [details] [diff] [review]
Move moveValue into the generic MacroAssembler.

Review of attachment 8882277 [details] [diff] [review]:
-----------------------------------------------------------------

Nice patch.

::: js/src/jit/MacroAssembler-inl.h
@@ +337,5 @@
> +        moveValue(src.value(), dest);
> +        return;
> +    }
> +
> +    if (src.reg().hasValue()) {

Wouldn't it be clearer to have the TypedOrValueRegister overload call moveValue(ValueOperand) if needed? Now we can only call it with TypedOrValueRegister if reg.hasTyped() is true, but we could easily handle the ValueOperand case there instead of here.
Attachment #8882277 - Flags: review?(jdemooij) → review+
(Reporter)

Comment 10

8 months ago
Comment on attachment 8882278 [details] [diff] [review]
Add MacroAssembler::branchToComputedAddress.

Review of attachment 8882278 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/MacroAssembler-arm-inl.h
@@ +2067,5 @@
> +
> +    if (addr.offset != 0) {
> +        SecondScratchRegisterScope scratch2(*this);
> +        ScratchRegisterScope scratch(*this);
> +        MOZ_ASSERT(base != pc, "NYI: offset from pc should be shifted.");

We assert addr.base == pc at the start of this function. Does that mean this code is dead?
Attachment #8882278 - Flags: review?(jdemooij) → review+
Attachment #8879680 - Attachment is obsolete: true
(In reply to Jan de Mooij [:jandem] from comment #10)
> ::: js/src/jit/arm/MacroAssembler-arm-inl.h
> @@ +2067,5 @@
> > +
> > +    if (addr.offset != 0) {
> > +        SecondScratchRegisterScope scratch2(*this);
> > +        ScratchRegisterScope scratch(*this);
> > +        MOZ_ASSERT(base != pc, "NYI: offset from pc should be shifted.");
> 
> We assert addr.base == pc at the start of this function. Does that mean this
> code is dead?

Yes it is, I will replace the above condition by an assertion.
(In reply to Jan de Mooij [:jandem] from comment #9)
> ::: js/src/jit/MacroAssembler-inl.h
> @@ +337,5 @@
> > +        moveValue(src.value(), dest);
> > +        return;
> > +    }
> > +
> > +    if (src.reg().hasValue()) {
> 
> Wouldn't it be clearer to have the TypedOrValueRegister overload call
> moveValue(ValueOperand) if needed? Now we can only call it with
> TypedOrValueRegister if reg.hasTyped() is true, but we could easily handle
> the ValueOperand case there instead of here.

I will move the previous condition into the moveValue(TypedOrValueRegister, ValueOperand) functions.
Thus removing the assertion at the top of these functions.
(Reporter)

Comment 13

8 months ago
Comment on attachment 8882279 [details] [diff] [review]
IonMonkey: Add LoadElementFromSate to support argument[x] in inlined functions.

Review of attachment 8882279 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good but some questions below.

::: js/src/jit/CodeGenerator.cpp
@@ +11146,5 @@
> +        codegen->visitOutOfLineSwitch(this);
> +    }
> +
> +  public:
> +    OutOfLineSwitch(TempAllocator& alloc)

Nit: explicit

@@ +11169,5 @@
> +        if (tableType == SwitchTableType::Inline) {
> +#if defined(JS_CODEGEN_ARM)
> +            base = ::js::jit::pc;
> +#else
> +            MOZ_ASSERT(false, "NYI: SwitchTableType::Inline");

MOZ_CRASH?

@@ +11173,5 @@
> +            MOZ_ASSERT(false, "NYI: SwitchTableType::Inline");
> +#endif
> +        } else {
> +#if defined(JS_CODEGEN_ARM) || defined(JS_CODEGEN_ARM64)
> +            MOZ_ASSERT(false, "NYI: SwitchTableType::OutOfLine");

And here

@@ +11212,5 @@
> +{
> +    jumpTable->setOutOfLine();
> +    if (tableType == SwitchTableType::OutOfLine) {
> +#if defined(JS_CODEGEN_ARM) || defined(JS_CODEGEN_ARM64)
> +        MOZ_ASSERT(false, "NYI: SwitchTableType::OutOfLine");

And here

::: js/src/jit/IonBuilder.cpp
@@ +7641,5 @@
>          MOZ_TRY(getElemTryArguments(&emitted, obj, index));
>          if (emitted)
>              return Ok();
>  
> +        trackOptimizationAttempt(TrackedStrategy::GetElem_ArgumentsInlinedConstant);

There are no changes to TrackedOptimizationInfo.h in this patch?

@@ +8365,5 @@
> +    obj->setImplicitlyUsedUnchecked();
> +
> +    MOZ_ASSERT(!info().argsObjAliasesFormals());
> +
> +    // Ensure index is an integer.

We should probably check index->type() == MIRType::Int32 or at least IsNumberType(index->type()).

@@ +8370,5 @@
> +    MInstruction* idInt32 = MToInt32::New(alloc(), index);
> +    current->add(idInt32);
> +    index = idInt32;
> +
> +    // Bailouts if we read more than the number of actual arguments.

s/Bailouts/Bailout/

Can you add a comment here saying bailing out is okay because reading out of bounds arguments will disable the lazy arguments optimization for this script? That should ensure we don't get stuck in a bailout loop.

(I was also thinking LoadElementFromState could return undefined in this case. That's not correct because of getters on the prototype, but we could fix that with TI.)

::: js/src/jit/Lowering.cpp
@@ +3245,5 @@
> +    temp1 = temp();
> +#endif
> +    LLoadElementFromStateV* lir = new(alloc()) LLoadElementFromStateV(temp(), temp1, tempDouble());
> +    if (!ins->array()->isArgumentState()) {
> +        abort(AbortReason::Alloc, "LIRGenerator::visitLoadElementFromState: Unsupported state object");

Hm I don't like having non-OOM aborts in the compiler backend, and AbortReason::Alloc seems wrong.

Can this actually happen? If yes can you add a comment explaining when, and why this is uncommon enough that aborting is okay? If no, can we MOZ_ASSERT?

@@ +3295,5 @@
> +            lir->setOperand(1 + BOX_PIECES * i + 1, LAllocation());
> +#endif
> +            break;
> +          default:
> +            abort(AbortReason::Alloc, "LIRGenerator::visitLoadElementFromState: Unsupported element type.");

Same here, I'd prefer a MOZ_CRASH. These aborts always show up on some random  benchmark or website out there.

::: js/src/jit/MIR.h
@@ +3957,5 @@
>      }
>  };
>  
> +// Hold the arguments of an inlined frame. At the moment this class is not
> +// recovered on bailout, but as it does not have an implementation and it should

Nit: remove the "but " here?

::: js/src/jit/shared/LIR-shared.h
@@ +5737,5 @@
>          return getOperand(0);
>      }
>  };
>  
> +// Load a value from a dense array's elements vector. Bail out if it's the hole value.

This comment should be updated.
(Reporter)

Updated

8 months ago
Flags: needinfo?(jdemooij)
(Reporter)

Updated

7 months ago
Attachment #8882279 - Flags: review?(jdemooij)
Created attachment 8884831 [details] [diff] [review]
IonMonkey: Add LoadElementFromSate to support argument[x] in inlined functions.

Delta:
 - Apply nits from comment 13.
 - Add the uncommitted files:
   - TrackedOptimizationInfo.h
   - jit-test/tests/ion/inlining/inline-getelem-args.js
 - Add test cases to check for underflow/overflow bailouts.
 - Add support for properly loading boolean from the stack:
   - see StackSlotAllocator::width(LDefinition::TypeFrom(elem->type()))
Attachment #8884831 - Flags: review?(jdemooij)
Attachment #8882279 - Attachment is obsolete: true
(Reporter)

Comment 15

7 months ago
Comment on attachment 8884831 [details] [diff] [review]
IonMonkey: Add LoadElementFromSate to support argument[x] in inlined functions.

Review of attachment 8884831 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks. Some gnarly CodeGen stuff but I can't think of a good way to simplify that.

It will be interesting to see if this affects Speedometer - hopefully there won't be regressions from increased compilation time :)

::: js/src/jit/CodeGenerator.cpp
@@ +11298,5 @@
> +            if (elem->type() == MIRType::Double) {
> +                masm.loadDouble(ToAddress(a), tempD);
> +                input = TypedOrValueRegister(elem->type(), AnyRegister(tempD));
> +            } else if (elem->type() == MIRType::Value) {
> +                typeReg = temp0;

Nit: can remove this line since we also set it before the if-else.

@@ +11304,5 @@
> +#ifdef JS_PUNBOX64
> +                input = TypedOrValueRegister(ValueOperand(temp0));
> +#endif
> +            } else {
> +                typeReg = temp0;

Same.

::: js/src/jit/IonBuilder.cpp
@@ +8357,5 @@
> +        return Ok();
> +
> +    // Currently, we do not support any arguments vector larger than 10, as this
> +    // is being translated into code at the call site, and it would be better to
> +    // the arguments aligned on the stack.

s/better to/better to store/?
Attachment #8884831 - Flags: review?(jdemooij) → review+

Comment 16

7 months ago
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a02a6cb9c94c
Move moveValue into the generic MacroAssembler. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/60809bf55571
Add MacroAssembler::branchToComputedAddress. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/b6be7c2be50e
IonMonkey: Add LoadElementFromSate to support argument[x] in inlined functions. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/a249596785aa
Fix ion/bug1284485.js unreported OOM. r=bbouvier
Backed out for build bustage: undeclared 'StackSlotAllocator' at js/src/jit/CodeGenerator.cpp:11199:

https://hg.mozilla.org/integration/mozilla-inbound/rev/273e8ce1c9e38f4ae3527508f9512adfe2d0bec4
https://hg.mozilla.org/integration/mozilla-inbound/rev/5913aca4915039cc0e05e7933922db23c30e4366
https://hg.mozilla.org/integration/mozilla-inbound/rev/9d84a411ea6da7680c5086f29ecdfe20d5c4a688
https://hg.mozilla.org/integration/mozilla-inbound/rev/a51672e13a1b018a41a6e9f6c156605861863293

Push with bustage: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=a249596785aa58973113cf830a55a54d0a5257de&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=runnable
Build log: https://treeherder.mozilla.org/logviewer.html#?job_id=114912641&repo=mozilla-inbound

[task 2017-07-17T17:11:01.326683Z] 17:11:01     INFO -  In file included from /home/worker/workspace/build/src/obj-firefox/js/src/Unified_cpp_js_src11.cpp:38:0:
[task 2017-07-17T17:11:01.326804Z] 17:11:01     INFO -  /home/worker/workspace/build/src/js/src/jit/CodeGenerator.cpp: In member function 'virtual void js::jit::CodeGenerator::visitLoadElementFromStateV(js::jit::LLoadElementFromStateV*)':
[task 2017-07-17T17:11:01.326870Z] 17:11:01     INFO -  /home/worker/workspace/build/src/js/src/jit/CodeGenerator.cpp:11199:32: error: 'StackSlotAllocator' has not been declared
[task 2017-07-17T17:11:01.326925Z] 17:11:01     INFO -                   size_t width = StackSlotAllocator::width(LDefinition::TypeFrom(elem->type()));
[task 2017-07-17T17:11:01.326958Z] 17:11:01     INFO -                                  ^
[task 2017-07-17T17:11:01.327011Z] 17:11:01     INFO -  /home/worker/workspace/build/src/config/rules.mk:1051: recipe for target 'Unified_cpp_js_src11.o' failed
[task 2017-07-17T17:11:01.327048Z] 17:11:01     INFO -  gmake[5]: *** [Unified_cpp_js_src11.o] Error 1
Flags: needinfo?(nicolas.b.pierron)
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #17)
> Backed out for build bustage: undeclared 'StackSlotAllocator' at
> js/src/jit/CodeGenerator.cpp:11199:

Fixed by adding a missing include in CodeGenerator.cpp:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8f98b23446f2eb7dd240fe4fe952f0e43e7df73e
Flags: needinfo?(nicolas.b.pierron)

Comment 19

7 months ago
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0f61bc51de11
Move moveValue into the generic MacroAssembler. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/8be467b057f6
Add MacroAssembler::branchToComputedAddress. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/07e91ed49793
IonMonkey: Add LoadElementFromSate to support argument[x] in inlined functions. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/09937ba00629
Fix ion/bug1284485.js unreported OOM. r=bbouvier

Comment 20

7 months ago
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2817d497d4b1
Fix test case basic/bug858097.js, to checking in Compacting GC cases. r=jonco
(Reporter)

Comment 21

7 months ago
This regressed the assorted misc-bugs-658844-sha256-bitcoins test by 68% (22 ms -> 38 ms) and the misc-basic-arguments test from 15 to 24 ms. Would you mind taking a look? :)
Flags: needinfo?(nicolas.b.pierron)
(In reply to Jan de Mooij [:jandem] from comment #21)
> This regressed the assorted misc-bugs-658844-sha256-bitcoins test by 68% (22
> ms -> 38 ms) and the misc-basic-arguments test from 15 to 24 ms. Would you
> mind taking a look? :)

Testing with misc-basic-arguments benchmark, and changing the number of actual arguments:

      before               after
10 7.0341796875		10.79296875
9 7.213134765625	9.68408203125
8 7.39208984375		10.72802734375
7 7.550048828125	10.2041015625
6 7.85595703125		10.31298828125
5 8.593017578125	10.824951171875
4 8.97998046875		12.198974609375
3 9.863037109375	11.991943359375
2 11.97412109375	14.195068359375
1 18.015869140625	14.73193359375

Based on these results, it sounds like we should handle the case where there is only one argument to the function separately, and probably look for allocating a vector on the stack otherwise.  I will look if there is an easy way to handle that.
Manually unrolling the loop which contains arguments[x], and making sure that these reads are not LICM-ed out of the loop highlights that we can optimize arguments[x] better by only unrolling the loops which are containing these expressions.

     before                after
10 7.76513671875	7.136962890625
9  6.701171875		7.051025390625
8  7.461181640625	6.993896484375
7  7.212158203125	6.819091796875
6  7.501220703125	6.922119140625
5  8.198974609375	7.032958984375
4  9.0849609375		6.64404296875
3  12.39794921875	4.43798828125
2  15.126953125		3.48095703125
1  27.291015625		1.794921875

If the body of  a loop is too large to be unrolled, then the cost of the arguments[x] probably does not matter as much compared to the cost of the body of the loop.

In these results the cost of the arguments[x] expression (not yet folded into a single argument read, but still implemented as a switch case) seems to confirm that the issue seen in comment 23 & comment 21 comes from the fact that the generated code does not play nicely with the branch predictor (within a loop).

Unrolling the loop containing arguments[x] removes the miss-prediction issue and indirect branch is known by the branch predictor to always use the same case statement, leading to the results above.  If we were to fold the MLoadElementFromState given a known index, into the proper argument read, then we could potentially get even more performance out of it.

I suggest to focus on enabling loop unrolling as a solution for these regressions.
Depends on: 1383828
Flags: needinfo?(nicolas.b.pierron)

Updated

6 months ago
Depends on: 1388005
You need to log in before you can comment on or make changes to this bug.