Last Comment Bug 670484 - IonMonkey: Implement function calls
: IonMonkey: Implement function calls
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86 Windows 7
: -- normal (vote)
: ---
Assigned To: Sean Stangl [:sstangl]
:
Mentors:
Depends on: 685370 676389 680286 682454 684563 685097 685099 692211 692291 693724 694169
Blocks: IonMonkey
  Show dependency treegraph
 
Reported: 2011-07-09 19:16 PDT by David Anderson [:dvander]
Modified: 2011-10-12 15:30 PDT (History)
14 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Implement JSOP_NOTEARG. (2.82 KB, patch)
2011-08-24 14:24 PDT, Sean Stangl [:sstangl]
dvander: review+
Details | Diff | Splinter Review
Implement DynamicArityList. (1.41 KB, patch)
2011-08-24 14:28 PDT, Sean Stangl [:sstangl]
dvander: review+
Details | Diff | Splinter Review
Implement Call lowering and generation. (33.53 KB, patch)
2011-08-24 14:50 PDT, Sean Stangl [:sstangl]
dvander: feedback+
Details | Diff | Splinter Review
LSRA support for calls. (3.97 KB, patch)
2011-08-24 14:53 PDT, Sean Stangl [:sstangl]
no flags Details | Diff | Splinter Review
GreedyAllocator support for calls. (3.03 KB, patch)
2011-08-24 14:54 PDT, Sean Stangl [:sstangl]
no flags Details | Diff | Splinter Review
Implement JSOP_NOTEARG [v2] (3.48 KB, patch)
2011-09-06 14:19 PDT, Sean Stangl [:sstangl]
dvander: review+
Details | Diff | Splinter Review
Implement FixedList [v2] (1.53 KB, patch)
2011-09-06 14:20 PDT, Sean Stangl [:sstangl]
dvander: review+
Details | Diff | Splinter Review
Implement Call lowering and generation [v2] (41.69 KB, patch)
2011-09-06 14:28 PDT, Sean Stangl [:sstangl]
no flags Details | Diff | Splinter Review
GreedyAllocator support for calls. (3.02 KB, patch)
2011-09-06 14:33 PDT, Sean Stangl [:sstangl]
dvander: review+
Details | Diff | Splinter Review
LSRA support for calls [v2] (3.84 KB, patch)
2011-09-06 14:37 PDT, Sean Stangl [:sstangl]
adrake: review+
Details | Diff | Splinter Review
Implement Call lowering and generation [v3] (41.37 KB, patch)
2011-09-06 15:34 PDT, Sean Stangl [:sstangl]
no flags Details | Diff | Splinter Review
Implement Call lowering and generation [v4] (44.92 KB, patch)
2011-09-07 16:02 PDT, Sean Stangl [:sstangl]
dvander: review+
Details | Diff | Splinter Review
Function tests, where functions are passed via argument. (1.95 KB, application/javascript)
2011-09-07 16:29 PDT, Sean Stangl [:sstangl]
no flags Details

Description David Anderson [:dvander] 2011-07-09 19:16:39 PDT
To implement non-inlined function calls, we need some sort of calling convention. In JS, you can omit or add extra arguments to a function call, which makes it hard to make sure both calling functions and accessing arguments are fast.

In JM, a callee accesses its arguments by looking above the current frame, peeking into the previous frame's stack. The i-th argument is at position [frame - formalArgs + i]. If argc != formalArgs, the frame has to be padded or shrunk, and this process is expensive.

In IonMonkey we should be able to pretty easily fix this by reversing the order in which arguments are pushed, so the i-th argument is at [frame - i]. Then we would never need to resize the callee's frame. But naively doing this at the CALL instruction is suboptimal. For example, consider this:

   call(a.b, c(), "d");

 0: getprop
 1: call
 2: string("d")
 3: push(2)
 4: push(1)
 5: push(0)
 6: call

If we need to spill in [0, 2] then the pushes become memory-to-memory. Instead, we want something like:

 0: getprop
 1: push(0)
 2: call
 3: push(2)
 4: string("d")
 5: push(4)
 6: call

But this isn't in the right order: "push" would have to mean "write to the correct argument slot on the stack". That seems plausible, but now we'd need a way to associate stack pushes as argument pushes. One idea: emit a JSOP_PUSHARG or something, and if necessary prefix calls by a JSOP_PREPARECALL. We want that association no matter what order we push arguments.

Note: it might be that FixupArity can be made really cheap, if we can get it down to a few instructions generated as a special trampoline. 

Aside from pushing arguments:
 * Caller must communicate to the register allocator that it interferes with
   volatile registers.
 * Callee must save and restore the frame pointer (ebp).
 * Callee must save some nugget of compiler information in a canonical place on
   the frame, so that C++ can traverse the stack if necessary. (can be a
   follow-up bug)
 * Callee must guard against stack overflow. (can be a follow-up bug)
 * Callee must allocate a closure environment if needed. (can be a follow-up bug)
Comment 1 Luke Wagner [:luke] 2011-07-09 19:32:12 PDT
I'm a big fan of reverse-ordered arguments and dropping both FixupArity and this whole "canonicalActualArgs" business.
Comment 2 David Mandelin [:dmandelin] 2011-07-11 10:08:22 PDT
(In reply to comment #1)
> I'm a big fan of reverse-ordered arguments and dropping both FixupArity and
> this whole "canonicalActualArgs" business.

Thirded.

(In reply to comment #0)
> For example, consider this:
> 
>    call(a.b, c(), "d");
> 
>  0: getprop
>  1: call
>  2: string("d")
>  3: push(2)
>  4: push(1)
>  5: push(0)
>  6: call
> 
> If we need to spill in [0, 2] then the pushes become memory-to-memory.
> Instead, we want something like:
> 
>  0: getprop
>  1: push(0)
>  2: call
>  3: push(2)
>  4: string("d")
>  5: push(4)
>  6: call
> 
> But this isn't in the right order: "push" would have to mean "write to the
> correct argument slot on the stack". 

Yes.

> That seems plausible, but now we'd need
> a way to associate stack pushes as argument pushes. One idea: emit a
> JSOP_PUSHARG or something, and if necessary prefix calls by a
> JSOP_PREPARECALL. We want that association no matter what order we push
> arguments.

AIUI, the conventional way to express this (at a HIR/MIR level) is something like:

  0: getprop
  1: call
  2: string("d")
  3: call(0, 1, 2)

The call instruction is responsible for putting the arguments in the right place. As LIR, 3 would expand to something like:

  L32: move argR0, L2
  L31: move argR1, L1
  L30: move argR2, L0   // L0 is whatever LIR thing computes the value of MIR 0
  L33: call

I've written it so that |argR0| means the first argument slot in stack order, thus the last argument in the argument list. The idea is that |argR0| is something that represents the address of that argument-area location.

Of course, it's always nice to have a way to make L0 compute its result directly into argR2 if it can.

> Note: it might be that FixupArity can be made really cheap, if we can get it
> down to a few instructions generated as a special trampoline. 

What is FixupArity in this version, anyway? It seems that excess arguments are no problem at all. And can insufficient arguments be fixed up by the IC in the caller?

> Aside from pushing arguments:
>  * Caller must communicate to the register allocator that it interferes with
>    volatile registers.

The liveness pass can compute length-0 live intervals for all the volatile registers at every call instruction, or something like that.

>  * Callee must save and restore the frame pointer (ebp).
>  * Callee must save some nugget of compiler information in a canonical place
> on
>    the frame, so that C++ can traverse the stack if necessary. (can be a
>    follow-up bug)

Do you mean a pointer to the previous stack frame, or to a map of the new stack frame.

>  * Callee must guard against stack overflow. (can be a follow-up bug)

Can we use memory protection of this?

>  * Callee must allocate a closure environment if needed. (can be a follow-up
> bug)

Yep.
Comment 3 David Anderson [:dvander] 2011-07-11 10:30:19 PDT
> The call instruction is responsible for putting the arguments in the right
> place. As LIR, 3 would expand to something like:
> 
>   L32: move argR0, L2
>   L31: move argR1, L1
>   L30: move argR2, L0   // L0 is whatever LIR thing computes the value of
> MIR 0
>   L33: call
>
> Of course, it's always nice to have a way to make L0 compute its result
> directly into argR2 if it can.

Is there an easy way to do this optimization if the argument movement happens at the call, rather than at the arguments? (I'm not even sure if it matters - if it doesn't, the design you mention is indeed nicer.)

> What is FixupArity in this version, anyway? It seems that excess arguments
> are no problem at all. And can insufficient arguments be fixed up by the IC
> in the caller?

This just meant, "If we don't want to do the reversal thing, FixupArity could probably be made really cheap." But it's a bunch of extra complexity and everyone seems to want reversing.

> >  * Callee must save and restore the frame pointer (ebp).
> >  * Callee must save some nugget of compiler information in a canonical place
> > on
> >    the frame, so that C++ can traverse the stack if necessary. (can be a
> >    follow-up bug)
> 
> Do you mean a pointer to the previous stack frame, or to a map of the new
> stack frame.

A map of the new stack frame - I'm not entirely sure of what needs to be in it yet, but definitely something with the frame size so the stack can be traversed.

> Can we use memory protection of this?

That would be awesome, we should investigate. We would have to jump in before breakpad and detect that JIT is running and that it's trying to go over the stack limit.
Comment 4 David Anderson [:dvander] 2011-07-11 10:32:39 PDT
Follow-up: tentatively the code generator is computing frame stack offsets relative to esp, not ebp, so in theory we could allocate ebp and only save/restore it in the JIT trampoline. I guess whether we do this depends on how valuable having an extra register is on x86, and whether we care what it does to breakpad.
Comment 6 Andrew Drake [:adrake] 2011-08-18 18:33:37 PDT
(In reply to Andrew Drake [:adrake] from comment #5)

Disregard this comment, it was targeted at bug 676389.
Comment 7 Andrew Drake [:adrake] 2011-08-19 13:26:07 PDT
With the "bogus" interval support in LSRA from bug 676389, it should be fairly easy to handle the necessary spills for calls. There are a couple of options:

1) We hang a "clobbered regs" field on instructions, and LSRA just generates bogus intervals in the output half of instructions for those registers on all instructions during interval building.

2) LSRA checks "isCall" during its interval building phase and generates bogus intervals in the output half of that instruction for all registers in CallClobberedRegs (or something like that).

The second one imposes no additional memory requirements, but is quite a bit grosser. They are about equally easy to implement.
Comment 8 Sean Stangl [:sstangl] 2011-08-24 14:24:39 PDT
Created attachment 555542 [details] [diff] [review]
Implement JSOP_NOTEARG.

Translation from JS bytecode to the SSA-form IR maintains a Value stack while processing the bytecode, as in the interpreter. A naive implementation of support for JSOP_CALL would then list the function operands as arguments to some MCall, push them when the MCall is reached, and execute. But we really want the lifetime of each argument to be short, so that it's stored to the right place in the argument vector before the call is reached, when the interpreter has the value being pushed -- then it can't spill, and we get more free registers.

In order to do this, we need to know which pushes are done for the sake of satisfying a future function call. This patch adds a no-op JSOP_NOTEARG after each push that corresponds to a function argument. The IonBuilder can use this opcode to wrap arguments in an MPassArg that terminates the lifetime of the Value before the callsite.
Comment 9 Sean Stangl [:sstangl] 2011-08-24 14:28:03 PDT
Created attachment 555545 [details] [diff] [review]
Implement DynamicArityList.

MCalls have a static argc, but the value is unknown until runtime, so a FixedArityList, which is templated on the length, cannot be used. This patch implements a DynamicArityList, which is of a fixed length, but that length is determined at time of construction. The interface is otherwise identical to that of the FixedArityList.
Comment 10 Sean Stangl [:sstangl] 2011-08-24 14:50:50 PDT
Created attachment 555551 [details] [diff] [review]
Implement Call lowering and generation.

This patch modifies the IR to support calls and implements code generation. It is mostly as finished as we can get it at this point (lacking C++ calls), and snapshot support is missing, so I am just requesting feedback instead of review. I nevertheless expect that snapshot support will introduce only minor changes.

The changes to the MIR introduce MStartCall, MPassArg, and MCall.
The changes to the LIR introduce LStackArg and LCall.

MStartCall is inserted upon reaching a JSOP_CALLARG, which occurs before any arguments are pushed for a call. The MStartCall eventually becomes the first operand to its corresponding MCall, but bears no data otherwise. It exists to solve the problem of argument placement during lowering. Consider the following:

> f(1, g(2), 3);

Since JS evaluates from left-to-right, but IonMonkey uses the C-style right-to-left argument pushing convention (hooray!), the stack upon invocation of g() looks like:

> [   ] <- space for 3, eventually -- hasn't been evaluated yet.
> [   ] <- space for g()'s return, eventually.
> [ 1 ] <- evaluated before g().
> [   ] <- space for f()'s |this|, eventually.
> [ 2 ]
> [ t ] <- g()'s |this|

Note that arguments are not simply pushed; they are written into an argument vector, and there can be more than one active vector in the case of nesting. In other words, when we lower an MPassArg to an LStackArg, the LStackArg must be informed of its position in a frame-wide argument vector, and its position is not necessarily derived solely from argc.

We assign slots into the argument vector by keeping track of its length during lowering -- when we reach an MStartCall, we search through its use chain to find its MCall, and ask the MCall for its argc value; the argument vector is increased to the given length. When we reach an MCall, the argument vector is decreased by that length. We keep track of the maximum length of this vector, and it eventually nestles into a space just below the local slot storage for the frame.

Finally, when we go to make a call, the stack pointer is moved up to the position of the first argument on the stack. So the junk space inserted to fill up the given FrameSizeClass does not contribute to any nasty memory usage.
Comment 11 Sean Stangl [:sstangl] 2011-08-24 14:53:38 PDT
Created attachment 555553 [details] [diff] [review]
LSRA support for calls.

We need to dump and reload all register state around a call. This patch does so by inserting bogus intervals into the output end of call, expressing a requirement to hold all registers, forcing them to be spilled.

There is currently no need for a way to express "registers used in a given calling convention", but it would be simple to add when we reach that point.
Comment 12 Sean Stangl [:sstangl] 2011-08-24 14:54:15 PDT
Created attachment 555556 [details] [diff] [review]
GreedyAllocator support for calls.

Similar
Comment 13 Sean Stangl [:sstangl] 2011-08-24 14:55:50 PDT
(In reply to Sean Stangl from comment #12)
> Similar

Never press Return to complete the r? field, it seems!
In any case, that patch is theoretically isometric to the LSRA patch in Comment 10.
Comment 14 David Anderson [:dvander] 2011-08-24 16:26:06 PDT
Comment on attachment 555545 [details] [diff] [review]
Implement DynamicArityList.

Review of attachment 555545 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/FixedArityList.h
@@ +78,5 @@
>  };
>  
> +// List of a fixed length, but the length is unknown until runtime.
> +template <typename T>
> +class DynamicArityList

"Dynamic arity" feels a little misleading - maybe FixedList or ExactList? Or even better if it's possible to still use FixedArityList but somehow just omit the template parameter, but I don't know C++ well enough to say if that's possible.

@@ +81,5 @@
> +template <typename T>
> +class DynamicArityList
> +{
> +    size_t length_;
> +    T *list_;

Luke suggests having hidden copy and assignment constructors.
Comment 15 David Anderson [:dvander] 2011-08-24 16:31:09 PDT
Comment on attachment 555556 [details] [diff] [review]
GreedyAllocator support for calls.

Review of attachment 555556 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/GreedyAllocator.cpp
@@ +570,5 @@
> +    for (size_t i = 0; i < Registers::Total; i++) {
> +        Register r = Register::FromCode(i);
> +        if (r.allocatable() && !maybeEvict(AnyRegister(r)))
> +            return false;
> +    }

Looks good. For C++ calls we'll want to preserve non-volatile regs so we may have to tweak this somehow later.

@@ +633,5 @@
>      assertValidRegisterState();
>  
> +    // Step 1. If necessary, save registers around a call.
> +    if (ins->isCall() && !spillForCall(ins))
> +        return false;

This should come after allocating inputs and before informing the snapshot. Otherwise, you will undo the evictions by immediately allocating new registers.
Comment 16 David Anderson [:dvander] 2011-08-25 17:15:48 PDT
Comment on attachment 555551 [details] [diff] [review]
Implement Call lowering and generation.

Review of attachment 555551 [details] [diff] [review]:
-----------------------------------------------------------------

Great start. Looks like it's pretty close.

::: js/src/ion/IonBuilder.cpp
@@ +127,5 @@
> +IonBuilder::pushStartCall()
> +{
> +    MInstruction *ins = new MStartCall;
> +    if (!ins)
> +        return false;

No need to NULL check here. (also, any opinion on SpiderMonkey parlance "Init" or "Prepare" instead of "Start"?)

@@ +438,5 @@
> +        if (!pushStartCall())
> +            return false;
> +        current->pushArg(GET_SLOTNO(pc));
> +        if (!pushConstant(UndefinedValue())) // Implicit |this|.
> +            return false;

Why is the MStartCall part of the stack? This will cause problems for bailouts while pushing arguments, since you'll reify a stack that's off-by-one.

@@ +1619,5 @@
> +{
> +    // JSOP_NOTEARG notes that the value in current->pop() has just
> +    // been pushed onto the stack for use in calling a function.
> +    MDefinition *def = current->pop();
> +    MPassArg *arg = MPassArg::New(def);

Nice, this seems like exactly what we want.

@@ +1631,5 @@
> +
> +bool
> +IonBuilder::jsop_call(uint32 argc)
> +{
> +    MCall *ins = MCall::New(argc + 1); // +1 for implicit this.

Same comment as in LIR - we'll have different kinds of calls, including one or two where the actual function is known ahead of time, so you might want to consider a more explicit name. (You also want to make sure that the call captures an actual MUse of the callee, so it doesn't get eliminated.)

::: js/src/ion/IonCode.h
@@ +68,1 @@
>      uint8 *code_;

To avoid exposing this you can have something like:

static size_t OffsetOfCode()
{
    return offsetof(IonCode, code_);
}

::: js/src/ion/LOpcodes.h
@@ +51,4 @@
>      _(Parameter)                    \
>      _(TableSwitch)                  \
>      _(Goto)                         \
> +    _(Call)                         \

Do we want to distinguish names based on what kind of call it is? Since we'll have at least three types, excluding ICs:
  * Monomorphic JS (Call?)
  * Monomorphic Native (CallNative?)
  * Polymorphic (CallAny?)
  * Polymorphic w/ IC (CallIC?)

::: js/src/ion/Lowering.cpp
@@ +160,5 @@
> +    ins->getDef(0)->setOutput(LGeneralReg(JSReturnReg_Type));
> +    ins->getDef(1)->setOutput(LGeneralReg(JSReturnReg_Data));
> +#elif defined(JS_PUNBOX64)
> +    ins->getDef(0)->setOutput(LGeneralReg(JSReturnReg));
> +#endif

Maybe there's a way we could ask the Lowering-$(ARCH) to do this?

::: js/src/ion/MIR.h
@@ +793,5 @@
> +{
> +    DynamicArityList<MDefinition *> operands_;
> +
> +  public:
> +    bool init(size_t length) {

Could this be hidden, and used by the instruction's ::New constructor?

@@ +1006,5 @@
> +// MCall. Boxes the input and stores it to the correct location on stack.
> +//
> +// Arguments are *not* simply pushed onto a call stack: they are evaluated
> +// left-to-right, but stored in the arg vector in C-style, right-to-left.
> +class MPassArg

MPushArg would better describe the interpreter semantics here, I think, even if someday during lowering we choose to pass through some other convention. Failing that, MArgument is good too.

::: js/src/ion/TypePolicy.cpp
@@ +314,5 @@
> +        return true;
> +
> +    MInstruction *unbox = MUnbox::New(func, MIRType_Object);
> +    if (!unbox)
> +        return false;

No need to NULL check when allocating instructions.

::: js/src/ion/shared/CodeGenerator-shared.h
@@ +97,5 @@
> +        return offset;
> +    }
> +
> +    // For argument construction for calls. Argslots are Value-sized.
> +    inline int32 ArgslotToStackOffset(int32 slot) const {

Is there a less ambiguous name we could use here? Like maybe StackOffsetOfPushedArg?

::: js/src/ion/x64/Assembler-x64.h
@@ +406,5 @@
>      }
> +    void cmpq(Imm32 lhs, const Register &rhs) {
> +        masm.cmpq_ir(lhs.value, rhs.code());
> +    }
> +    void cmpPtr(Imm32 lhs, const Register &rhs) {

cmpPtr should be in MacroAssembler-x64

@@ +416,5 @@
>      }
> +    void testq(const Register &lhs, const Register &rhs) {
> +        masm.testq_rr(lhs.code(), rhs.code());
> +    }
> +    void testPtr(const Register &lhs, const Register &rhs) {

Ditto

::: js/src/ion/x64/MacroAssembler-x64.h
@@ +176,5 @@
> +    void unboxObject(const ValueOperand &src, const Register &dest) {
> +        // TODO: Can we unbox more efficiently? Bug 680294.
> +        movq(JSVAL_PAYLOAD_MASK, ScratchReg);
> +        movq(src.value(), dest);
> +        andq(ScratchReg, dest);

Forewarning, we might collide a bit - TestVAndBranch introduces testObject().

::: js/src/ion/x86/Assembler-x86.h
@@ +262,5 @@
> +        return cmpl(lhs, rhs);
> +    }
> +    void testPtr(const Register &lhs, const Register &rhs) {
> +        return testl(lhs, rhs);
> +    }

These should be in MacroAssembler-x86

::: js/src/ion/x86/CodeGenerator-x86.cpp
@@ +188,5 @@
> +    uint32 argslot = arg->getArgslot();
> +    int32 stack_offset = ArgslotToStackOffset(argslot);
> +
> +    masm.mov(ToRegister(type), Operand(StackPointer, stack_offset + sizeof(void *)));
> +    masm.mov(ToRegister(data), Operand(StackPointer, stack_offset));

Off-topic: One thing that occurred to me here is that if you pass in a constant, or typed value, it'll go to registers first. We should consider adding a "useBoxOrConstant" (if it's not there already) and extending ValueOperand, but that's out of this bug's scope.
Comment 17 Nicolas B. Pierron [:nbp] 2011-08-29 02:00:37 PDT
Comment on attachment 555551 [details] [diff] [review]
Implement Call lowering and generation.

Review of attachment 555551 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/shared/CodeGenerator-x86-shared.cpp
@@ +565,5 @@
> +    uint32 size_descriptor = stack_size << 1;
> +
> +    // Construct the IonFramePrefix.
> +    masm.push(Imm32(0x0)); // TODO calleeToken
> +    masm.push(Imm32(size_descriptor));

Based on the content of IonFrameData, I think you need ImmWord here instead of Imm32.  May be the best would be to delegate such writing to an inline-able function inside IonFrames.h.
Comment 18 Sean Stangl [:sstangl] 2011-08-29 09:15:06 PDT
(In reply to Nicolas B. Pierron [:pierron] from comment #17)
> Based on the content of IonFrameData, I think you need ImmWord here instead
> of Imm32.  May be the best would be to delegate such writing to an
> inline-able function inside IonFrames.h.

A masm.push is always word-sized, e.g., masm.push(Imm32(0x0)) on x86_64 outputs "pushq $0x0". The forced 32-bit variant would be called "masm.pushl" in the current naming convention.

It's a good idea to use ImmWord for clarity reasons, though. Thanks.
Comment 19 Sean Stangl [:sstangl] 2011-09-06 14:19:36 PDT
Created attachment 558607 [details] [diff] [review]
Implement JSOP_NOTEARG [v2]

Differs from the original patch in that JSOP_NOTEARG also notes that |this| is an argument, which is important to handle cases like "1()", since there is no JSOP_CALLFOO to hint at |this| on stack.
Comment 20 David Anderson [:dvander] 2011-09-06 14:20:47 PDT
Comment on attachment 558607 [details] [diff] [review]
Implement JSOP_NOTEARG [v2]

Review of attachment 558607 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsemit.cpp
@@ +6790,5 @@
> +        if (!callop) {
> +                if (js_Emit1(cx, cg, JSOP_PUSH) < 0)
> +                        return JS_FALSE;
> +                if (js_Emit1(cx, cg, JSOP_NOTEARG) < 0)
> +                        return JS_FALSE;

Somehow some 8-space indents leaked in here
Comment 21 Sean Stangl [:sstangl] 2011-09-06 14:20:51 PDT
Created attachment 558609 [details] [diff] [review]
Implement FixedList [v2]

Same as the old DynamicArityList patch, except renamed, and with hidden copy/assignment constructors.
Comment 22 Sean Stangl [:sstangl] 2011-09-06 14:28:54 PDT
Created attachment 558612 [details] [diff] [review]
Implement Call lowering and generation [v2]

This patch has been reworked from the previous iteration; the majority of TODOs are resolved, and every case is handled, except for the case of passing too few arguments, which will be handled in a separate patch.

From discussion of the previous patch, we resolved:
 - There will be just one MCall
 - This MCall will lower to LCall{Generic|Specific}{|Native}. So this patch implements MCall and LCallGeneric.
 - MPassArg is probably the least confusing name, given that I tried MArgument (which conflicted with MParameter), then MPushArg (which was confusing, since the arguments aren't actually pushed), then finally MPassArg.

There are known failure cases after this patch is applied, but those failures are the result of bugs in ResumePoint logic, since function calls provide the ability to have a ResumePoint that is not at the head of the block. This will be addressed in a follow-up bug.

If you are using this code and expecting results, please note that only JSOP_CALLARG currently works. Bug 684402 will enable wider test coverage of functions.
Comment 23 Sean Stangl [:sstangl] 2011-09-06 14:33:32 PDT
Created attachment 558614 [details] [diff] [review]
GreedyAllocator support for calls.

We resolved that the first position is correct for spillForCall().

Note that this patch is somewhat distasteful, because it explicitly tests against isCallGeneric(). This is intentional but temporary, since not doing so involves changes to the LIR, but I do not want to make those changes before I understand what the eventual requirements will be.

One option is to provide flags on the LIR, and set isCall(), but dvander has expressed a dislike for LIR flags.

Another option is to give every LInstruction a mir_ private, and then since all calls are lowered from the same MCall, check LInstruction->mir()->isCall(). But it is unclear if we want spill logic around something that is not explicitly a call.

So I took the easy route, which should work for now.
Comment 24 Sean Stangl [:sstangl] 2011-09-06 14:37:44 PDT
Created attachment 558615 [details] [diff] [review]
LSRA support for calls [v2]

Same comment about isCallGeneric() as with the GreedyAllocator patch above.

This patch is known to produce redundant moves involving Temporaries, but is nevertheless correct, and we can address that in a follow-up bug.

Andrew Drake helped debug the previous errors related to this patch, resulting in http://hg.mozilla.org/projects/ionmonkey/rev/aa856d61ef85 , which fixed all known funcall/LSRA issues. Thanks!
Comment 25 Sean Stangl [:sstangl] 2011-09-06 14:43:45 PDT
(In reply to Sean Stangl from comment #22)
I forgot to commit the change renaming ArgslotToStackOffset() to StackOffsetOfPassedArg(). Kindly pretend that it is actually named as the latter.
Comment 26 Sean Stangl [:sstangl] 2011-09-06 15:34:17 PDT
Created attachment 558640 [details] [diff] [review]
Implement Call lowering and generation [v3]

Re-submitting this patch since a merge from m-c caused the patch to not apply cleanly, and it may be useful for some specific people to actually apply the patch.

For clarity, the changeset order is intended to be:

0001-patch-Implement-JSOP_NOTEARG.patch
0002-patch-Implement-FixedList.patch
0003-patch-Implement-Call-lowering-and-generation.patch
0004-patch-GreedyAllocator-support-for-calls.patch
0005-patch-LSRA-support-for-calls.patch
Comment 27 David Anderson [:dvander] 2011-09-06 17:55:16 PDT
Comment on attachment 558640 [details] [diff] [review]
Implement Call lowering and generation [v3]

Review of attachment 558640 [details] [diff] [review]:
-----------------------------------------------------------------

Okay, sorry for the amount of comments - they're mostly nits and renaming suggestions, though there are places where we need to check whether our abstractions are holding or whether they need to change, so for that I'd like to take another pass over the final patch.

::: js/src/ion/IonBuilder.cpp
@@ +1639,5 @@
> +    // been pushed onto the stack for use in calling a function.
> +    MDefinition *def = current->pop();
> +    MPassArg *arg = MPassArg::New(def);
> +    if (!arg)
> +        return false;

No need for NULL check

@@ +1649,5 @@
> +
> +bool
> +IonBuilder::jsop_call(uint32 argc)
> +{
> +    MCall *ins = MCall::New(argc + 1); // +1 for implicit this.

This, however, is fallible, because it calls init(), right?

@@ +1666,3 @@
>  
> +    current->add(ins);
> +    ins->setResultType(MIRType_Value);

Should the setResultType be in the MCall constructor?

::: js/src/ion/IonCode.h
@@ +114,4 @@
>          return code;
>      }
>  
> +    static size_t CodeOffset() {

OffsetOfCode

::: js/src/ion/LIR-Common.h
@@ +185,5 @@
> +class LCallGeneric : public LInstructionHelper<BOX_PIECES, 1, 1>
> +{
> +    // Slot below which %esp should be adjusted to make the call.
> +    // Functions without arguments have lastargslot_ == 0.
> +    uint32 argslot_;

What is lastargslot_ ?

@@ +197,5 @@
> +        setOperand(0, func);
> +        setTemp(0, temp);
> +    }
> +
> +    uint32 getArgslot() {

const

::: js/src/ion/Lowering.cpp
@@ +125,5 @@
> +
> +bool
> +LIRGenerator::visitPassArg(MPassArg *arg)
> +{
> +    MDefinition *opd = arg->getOperand(0);

Should this get an accessor?

@@ +138,5 @@
> +    // Pass through the virtual register of the operand.
> +    // This causes snapshots to correctly copy the operand on the stack.
> +    // 
> +    // TODO: This keeps the backing store around longer than strictly required.
> +    // We could do better by informing snapshots about the argument vector.

-TODO:

@@ +140,5 @@
> +    // 
> +    // TODO: This keeps the backing store around longer than strictly required.
> +    // We could do better by informing snapshots about the argument vector.
> +    arg->setVirtualRegister(opd->virtualRegister());
> +    arg->setResultType(opd->type());

Is setting the result type here necessary? It'll always be MIRType_Value, so if so you could setResultType in the MPassArg's constructor.

::: js/src/ion/Lowering.h
@@ +86,5 @@
>      bool precreatePhi(LBlock *block, MPhi *phi);
>      bool definePhis();
>  
> +    // Allocate argument slots for a future function call.
> +    void expandArgumentVector(uint32 argc);

allocateArguments?

@@ +92,5 @@
> +    // Slots are indexed from 1. argnum is indexed from 0.
> +    uint32 getArgumentSlot(uint32 argnum);
> +    uint32 getArgumentSlotForCall() { return argslots_; }
> +    // Free argument slots following a function call.
> +    void shrinkArgumentVector(uint32 argc);

freeArguments?

::: js/src/ion/MIR.cpp
@@ +359,5 @@
> +MCall *
> +MCall::New(size_t argc)
> +{
> +    MCall *ins = new MCall;
> +    if (!ins || !ins->init(argc + NumNonArgumentOperands))

Don't have to NULL check, we have infallible small allocations.

::: js/src/ion/MIR.h
@@ +807,5 @@
> +    { }
> +
> +    // Get the vector size for the upcoming call by looking at the call.
> +    // An MPrepareCall cannot know its argc at the point of construction, since
> +    // the MCall may have additional arguments added during analysis phases.

Is this comment still true? It looks like MCall knows its argc at construction and we should never add more arguments after that.

@@ +813,5 @@
> +};
> +
> +// An instruction with a variable number of operands.
> +// For example, different MCall instructions may have different argc.
> +// Performs dynamic memory allocation, 

Seems like this got cut off.

@@ +814,5 @@
> +
> +// An instruction with a variable number of operands.
> +// For example, different MCall instructions may have different argc.
> +// Performs dynamic memory allocation, 
> +class MVariableInstruction : public MInstruction

MVariableArity or Length? or MVariadicInstruction?

@@ +844,5 @@
> +  public:
> +    // An MCall uses the MPrepareCall, MDefinition for the function, and
> +    // MPassArg instructions. They are stored in the same list.
> +    static const size_t PrepareCallOperandIndex  = 0;
> +    static const size_t FunctionOperandIndex   = 1;

I'd advocate just having named accessors instead of these, it ends up being shorter and clearer. We could introduce macros like:

#define OPERAND_ACCESSOR(name, number)

To auto-generate get, replace.

@@ +849,5 @@
> +    static const size_t NumNonArgumentOperands = 2;
> +
> +  protected:
> +    MCall()
> +    { }

Needed?

@@ +862,5 @@
> +    }
> +    void initFunc(MDefinition *func) {
> +        JS_ASSERT(!func->isPassArg());
> +        return MNode::initOperand(FunctionOperandIndex, func);
> +    }

Just initOperand() should work

@@ +866,5 @@
> +    }
> +
> +    void addArg(size_t argnum, MPassArg *arg);
> +    
> +    MDefinition *getFunc() {

We can spare the "tion" :) though for super simple accessors (like argc too) that don't have inputs, it sometimes looks better to elide the "get".

::: js/src/ion/TypePolicy.cpp
@@ +294,5 @@
> +
> +void
> +CallPolicy::specializeInputs(MInstruction *ins, TypeAnalysis *analysis)
> +{
> +    // No specialization: input must be an object.

There should be a:
   analyzer->preferType(ins->getOperand(0), MIRType_Object)

Here, which will help the type analyzer optimistically unbox values for you.

@@ +301,5 @@
> +bool
> +CallPolicy::adjustInputs(MInstruction *ins)
> +{
> +    JS_ASSERT(ins->isCall());
> +    MCall *call = (MCall *)ins;

ins->toCall(), which will also assert for you.

@@ +310,5 @@
> +
> +    // If we already know that the function is impossible to call, just abort.
> +    // Most likely, such code doesn't exist on the web.
> +    if (func->type() != MIRType_Value)
> +        return false;

return false will propagate no error, acting as an OOM. The pattern you want here is like:

    if (in->type() != MIRType_Value)
        in = boxAt(def, in);

The subsequent unbox will fail if it's ever taken, so basically equivalent to having an MBailout instruction.

@@ +314,5 @@
> +        return false;
> +
> +    MInstruction *unbox = MUnbox::New(func, MIRType_Object);
> +    call->block()->insertBefore(call, unbox);
> +    call->replaceOperand(MCall::FunctionOperandIndex, unbox);

call->replaceFunction[Operand?](unbox) seems nicer

::: js/src/ion/TypePolicy.h
@@ +147,5 @@
> +class CallPolicy : public TypePolicy
> +{
> +  public:
> +    bool respecialize(MInstruction *def);
> +    void specializeInputs(MInstruction *ins, TypeAnalysis *analyzer);

Inherit from BoxInputsPolicy and then you can remove the respecialize() callback.

::: js/src/ion/shared/CodeGenerator-x86-shared.cpp
@@ +630,5 @@
>  bool
> +CodeGeneratorX86Shared::visitCallGeneric(LCallGeneric *call)
> +{
> +    // Holds the function object.
> +    LAllocation *obj = call->getOperand(0);

For this and getTemp etc, make accessors that return const pointers.

@@ +655,5 @@
> +    masm.cmpl(tokreg, Imm32(JSFUN_INTERPRETED));
> +    if (!bailoutIf(Assembler::Below, call->snapshot()))
> +        return false;
> +
> +    // TODO: Implement a maximum recursion depth.

This should have a bug on file, blocking this one, but I suspect it should be on the callee side, where the new frame size is known.

@@ +672,5 @@
> +
> +    masm.movePtr(Operand(objreg, offsetof(IonScript, method_)), objreg);
> +    masm.movePtr(Operand(objreg, IonCode::CodeOffset()), objreg);
> +
> +    // TODO: Handle the case when too few arguments are passed. (Bail.)

Make sure to actually bail here.

@@ +696,5 @@
> +    
> +    if (restore_diff > 0)
> +        masm.addPtr(Imm32(restore_diff), StackPointer);
> +    else if (restore_diff < 0)
> +        masm.subPtr(Imm32(-restore_diff), StackPointer);

Nice.

::: js/src/ion/shared/Lowering-shared-inl.h
@@ +124,5 @@
> +    defineBox(lir, mir, LDefinition::PRESET);
> +
> +#if defined(JS_NUNBOX32)
> +    lir->getDef(0)->setOutput(LGeneralReg(JSReturnReg_Type));
> +    lir->getDef(1)->setOutput(LGeneralReg(JSReturnReg_Data));

While we're here, 0 can be TYPE_INDEX and 1 -> DATA_INDEX.

::: js/src/ion/x64/Architecture-x64.h
@@ +117,5 @@
>          (1 << JSC::X86Registers::r11);      // This is ScratchReg.
>  
>      static const uint32 AllocatableMask = AllMask & ~NonAllocatableMask;
> +
> +    static const uint32 CallClobberMask =

JSCallClobberMask, since C++ calls may have a different convention

::: js/src/ion/x64/MacroAssembler-x64.h
@@ +125,4 @@
>      void movePtr(ImmWord imm, const Register &dest) {
>          movq(imm, dest);
>      }
> +    void movePtr(Operand op, const Register &dest) {

I'm on the fence about this because, on one hand, I'm really not liking how we currently have "mov" because it's totally ambiguous as opposed to "movq" and "movl". On the other hand, the macro assembler cannot expose Operand here, since Operand is platform-specific.

If the intent is to be used only in x86/64 shared code, let's following MacroAssembler-x86-shared's lead and organize the functions in this file into what's exposed across all platforms and what's not. Then this is totally fine and I'd advocate getting rid of mov() entirely.

::: js/src/ion/x86/Architecture-x86.h
@@ +105,5 @@
>          (1 << JSC::X86Registers::esp);
>  
>      static const uint32 AllocatableMask = AllMask & ~NonAllocatableMask;
> +
> +    static const uint32 CallClobberMask =

Same as the x64 comment

::: js/src/ion/x86/CodeGenerator-x86.cpp
@@ +183,5 @@
> +bool
> +CodeGeneratorX86::visitStackArg(LStackArg *arg)
> +{
> +    LAllocation *type = arg->getOperand(0);
> +    LAllocation *data = arg->getOperand(1);

TYPE_INDEX/DATA_INDEX, or even better use ToValueOperand(), and have MacroAssembler::storeValue(ValueOperand, address), so eventually ValueOperand can bake in constants or known types.

::: js/src/ion/x86/MacroAssembler-x86.h
@@ +95,5 @@
> +
> +    void cmpPtr(const Imm32 lhs, const Register &rhs) {
> +        return cmpl(rhs, lhs);
> +    }
> +    void cmpPtr(const ImmWord lhs, const Register &rhs) {

Non-nit: the inputs to cmpPtr should be reversed. They should be consistent everywhere so that   cmp(a, b) ; j<op> ; is (a <op> b).
Comment 28 Sean Stangl [:sstangl] 2011-09-07 16:02:57 PDT
Created attachment 558989 [details] [diff] [review]
Implement Call lowering and generation [v4]

Fourth version of the patch.

Note that CodeGeneratorX86::visitStackArg() is exactly the same as CodeGeneratorX64::visitStackArg() -- the duplication exists since CodeGeneratorX86::ToValue() and CodeGeneratorX64::ToValue() would be inaccessible from any CodeGeneratorX86Shared::visitStackArg(), due to "specific" inheriting from "general", until the last step of inheritance, when "most general" is typedef'd to "most specific".

We could solve this by factoring out some code into some "common" class that respects equal interfaces from specific codegenerators, but that would probably be invasive.
Comment 29 David Anderson [:dvander] 2011-09-07 16:14:20 PDT
Comment on attachment 558989 [details] [diff] [review]
Implement Call lowering and generation [v4]

Looks great, let's land it!
Comment 30 Sean Stangl [:sstangl] 2011-09-07 16:29:20 PDT
Created attachment 558994 [details]
Function tests, where functions are passed via argument.
Comment 31 Andrew Drake [:adrake] 2011-09-07 16:44:32 PDT
Comment on attachment 558615 [details] [diff] [review]
LSRA support for calls [v2]

Review of attachment 558615 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. One benign issue: any allocated temporaries that are in the clobber mask will generate a single spurious spill move before the call. This doesn't impact correctness at all, but it does generate a useless instruction on the call path.

The reason for this is the way spill intervals work: they're one-length intervals at the output of the instruction, which guarantees that spills are emitted before the instruction and that they won't collide with function parameters. Unfortunately, temporaries /do/ collide with parameters, so they are both the input and output half of the instruction. After the spill intervals are allocated, the temporary interval will be allocated and will get split at the now-occupied output half, and a spurious move will be inserted.

An easy bandaid would be to special case temporary intervals -- make them not get split, or make it so that no moves will be inserted for splits they do happen. I'm not sure how possible it is to do better than this -- this behavior is a side-effect of having temporaries come into existence in the input half of the instruction (which normally would not make sense).
Comment 32 Sean Stangl [:sstangl] 2011-09-07 16:53:08 PDT
Filed Bug 685370 to address temporary intervals from Comment 31.

Note You need to log in before you can comment on or make changes to this bug.