Closed Bug 721031 Opened 10 years ago Closed 10 years ago

IonMonkey: Make native calls faster


(Core :: JavaScript Engine, defect)

Not set





(Reporter: jandem, Assigned: sstangl)


(Blocks 1 open bug)



(1 file, 3 obsolete files)

Ion+TI runs Kraken imaging-darkroom slower than JM+TI, because our native calls are slower. The benchmark calls Math.log/Math.pow 3844801 times; we can measure the overhead of this with the following micro-benchmark:
function f() {
    for (var i=0; i<3844801; i++) {
var t = new Date;
print(new Date - t);

JM+TI:  87 ms (65% in math_log)
IM+TI: 210 ms (23% in math_log)

We spend a lot of time in js::Invoke, ContextStack::onTop, ContextStack::pushInvokeArgs etc. This seems to be responsible for most of our slowdown vs JM+TI.

FWIW, V8 inlines/specializes Math.log/Math.pow - we can do something similar, but this bug is about native calls in general, not specifically Math.log/Math.pow.
Assignee: general → sstangl
Attached patch WIP patch. (obsolete) — Splinter Review
This is basically done. It needs factoring out some common patterns, but otherwise works. The test case runs at the same speed with '--ion -n' as with '-m -n'.

It can't land yet until OSI changes are made -- since we're not going through callVM(), there is no call to a generateReturnWrapper() that can have a return address passed. We could fake-call the next instruction, but that's a bit silly. OSI is changing soon anyway, so blocking on that.

Passes all tests except basic/arrayConcat.js, which requires native invalidation.
Depends on: 722238
This is about a 4% speedup on SS.
Attached patch Patch v1. (obsolete) — Splinter Review
Handles native calls on all platforms in an efficient way, creating a fake exit frames. Passes arrayConcat.js (invalidation). ARM code is written and compiles and should work, but can't be tested because ARM segfaults for unrelated reasons in generate().

On x86, jandem's test with an iteration counter of 10,000,000 gives '-m -n' a score of 262; '--ion -n' gets 258, both results consistent.
Attachment #593221 - Attachment is obsolete: true
Attachment #594029 - Flags: review?(dvander)
Attached patch Patch v2 (obsolete) — Splinter Review
Actual patch.
Attachment #594029 - Attachment is obsolete: true
Attachment #594029 - Flags: review?(dvander)
Attachment #594034 - Flags: review?(dvander)
Comment on attachment 594029 [details] [diff] [review]
Patch v1.

Review of attachment 594029 [details] [diff] [review]:

Looks good but there's a bunch of little things that should probably be addressed first:

::: js/src/ion/CodeGenerator.cpp
@@ +454,5 @@
> +    const Register argVpReg = ToRegister(argVp);
> +
> +    // Misc. temporary registers.
> +    const LAllocation *temp = call->getTempReg();
> +    const Register tempReg = ToRegister(temp);

I'd just do ToRegister(call->getTempReg())

@@ +458,5 @@
> +    const Register tempReg = ToRegister(temp);
> +
> +    DebugOnly<uint32> initialStack = masm.framePushed();
> +
> +#ifdef JS_CPU_ARM

Does this need to be #ifdef ARM? Might be nicer to make it an empty call on x86/x64.

@@ +469,5 @@
> +    // are the function arguments.
> +
> +    // Allocate space for the outparam, moving the StackPointer to &vp[0].
> +    unusedStack -= sizeof(Value);
> +    masm.adjustStack(unusedStack);

This must push a Value containing the callee object. Natives are allowed to access their callee before setting rval.

::: js/src/ion/IonBuilder.cpp
@@ +130,5 @@
>      return state;
>  }
> +JSFunction *
> +IonBuilder::getSingleCallTarget(uint32 argc, jsbytecode *pc, bool inlining)

Why does this take in an |argc|?

@@ +153,5 @@
>      }
>      if (calleeTypes->getObjectCount() > 1) {
> +        if (inlining)
> +            IonSpew(IonSpew_Inlining, "Cannot inline due to multiple objects");

Using the bool parameter for this seems unwieldy. I'd just switch these to IonSpew_MIR, and change "Cannot inline" to "Cannot determine single call target". Or just remove the spew, it's not associated with any pc or line number so it doesn't seem useful.

@@ +2392,4 @@
>      // Bytecode order: Function, This, Arg0, Arg1, ..., ArgN, Call.
> +    for (int i = argc; i >= 0; i--)
> +        call->addArg(i, current->pop()->toPassArg());
> +    call->initFunction(current->pop());

If constructing, |thisv| must be replaced with a Magic value with why=JS_IS_CONSTRUCTING. You can do this either in MIR or in the CodeGenerator (by just writing to the |thisv| slot like JM, see MonoIC.cpp generateNative).

::: js/src/ion/Lowering.cpp
@@ +177,5 @@
> +        JSObject &obj = funcdef->toConstant()->value().toObject();
> +        if (!obj.isFunction())
> +            break;
> +
> +        target = obj.toFunction();

It might look cleaner to extract this code out as a separate function, like CheckKnownCallee(MDefinition *) or something.

@@ +186,5 @@
> +
> +    // Monomorphic native calls lower to LCallNative.
> +    if (target->isNative()) {
> +        LCallNative *lcall = new LCallNative(target, argslot,
> +#if defined(JS_CPU_X64)

We should try to avoid #ifdefs like this, especially in really generic files like this one. If this optimization makes a big difference (probably unlikely), it's worth making CPU-specific lowering functions.

::: js/src/ion/arm/MacroAssembler-arm.cpp
@@ +1047,5 @@
> +uint32
> +MacroAssemblerARMCompat::buildFakeExitFrame(const Register &scratch)
> +{
> +    DebugOnly<uint32> initialDepth = framePushed();

Same comments as on x86.

::: js/src/ion/shared/MacroAssembler-x86-shared.h
@@ +200,5 @@
>      }
> +
> +    // Builds an exit frame on the stack, with a return address to an internal
> +    // non-function. Returns offset to be passed to markSafepointAt().
> +    uint32 buildFakeExitFrame(const Register &scratch) {

I think this function can be a lot simpler:

    Label label;

If it's followed immediately by a markSafepoint(), you don't even need to return anything.

::: js/src/ion/x64/MacroAssembler-x64.h
@@ +605,4 @@
>              unboxNonDouble(Operand(address), dest.gpr());
>      }
> +    void loadInstructionPointerAfterCall(const Register &scratch) {


::: js/src/ion/x86/MacroAssembler-x86.h
@@ +549,4 @@
>          shrl(imm, dest);
>      }
> +    void loadInstructionPointerAfterCall(const Register &scratch) {

Attachment #594029 - Attachment is obsolete: false
Note on the JS_IS_CONSTRUCTING bit: that is only for when calling natives.
(In reply to David Anderson [:dvander] from comment #5)
> I think this function can be a lot simpler:
>     Label label;
>     Push(Imm32(descriptor));
>     call(&label);
>     bind(&label);
> If it's followed immediately by a markSafepoint(), you don't even need to
> return anything.

The branch predictor on Intel chips wants each call to be matched with a ret. I can measure both ways, but there are strongly-worded warnings in Intel documentation to absolutely never call non-functions.
Would it work to just have a bogus ret as well then? (leaveFakeExitFrame?)
Not reasonably. The ret would go to the code after the call, which would have to be a jump to after leaveFakeExitFrame().

But the code after the call has to be from markSafepoint(), and we probably don't want that code to be re-executed. Since we can't have both a jump and markSafepoint() code immediately following the call, the jump has to be after markSafepoint(), so we wind up not only using space in the branch predictor, but also executing more instructions.

Furthermore, that jump then has to be bound in visitCallNative(), but not in the case of ARM, so visitCallNative() becomes platform-specific, at least with #ifdefs.

It would be breaking up buildFakeExitFrame() into two halves, and requiring the caller to stitch them together. Keeping the frame building code close together seems simpler.
Random idea: do we even need a call instruction? We could just push the next offset, and make sure in C++ to call the right overload to getSafepointIndex/getOsiIndex. For debugging we will need some way to differentiate C++ exit frames from JSNative exit frames, that could be a bit in the frame descriptor, telling us how to interpret the return address.
That's a good idea.

I was talking with Marty last night, and it's likely the (untested) ARM implementation of markSnapshot() is wrong -- because of the moving pools, currentOffset() can never be called, since the code can be shifted to make way for a pool. We could always patch it in later, though.

If we fix that, keeping a unique offset around would be sufficient for the native case.
Attached patch Patch v3.Splinter Review
Magic value placed in MIR; callee value pushed as part of codegen. X64-specific optimization removed -- it made about 4ms difference in a test suite with 1,000,000 iterations.

The logic to create the exit frame was kept, since it works on x86/x64, is no more broken than ARM, and lets this patch not touch OSI. Will file a follow-up bug for that work once this lands.
Attachment #594029 - Attachment is obsolete: true
Attachment #594034 - Attachment is obsolete: true
Attachment #594853 - Flags: review?(dvander)
Comment on attachment 594853 [details] [diff] [review]
Patch v3.

Review of attachment 594853 [details] [diff] [review]:

::: js/src/ion/Lowering.cpp
@@ +191,5 @@
> +                tempFixed(CallTempReg0), tempFixed(CallTempReg1), tempFixed(CallTempReg2),
> +                tempFixed(CallTempReg3));
> +        if (!defineReturn(lcall, call))
> +            return false;
> +        ins = (LInstruction *)lcall;

Are these casts needed?
Attachment #594853 - Flags: review?(dvander) → review+
Closed: 10 years ago
Resolution: --- → FIXED

Small fix to get building on OS X.

Backed out: although the check-* tests for v8 pass individually, this patch prevented the harness from running to completion. Will debug that and re-land.
Resolution: FIXED → ---

The bug involved argument padding for native calls -- since the Native functions take an argc parameter, padding with |undefined| and lying about argc is incorrect.
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.