Closed Bug 898963 Opened 11 years ago Closed 10 years ago

Odinmonkey (ARM): check that d14, the NANReg, is being restored if necessary in FFI calls.

Categories

(Core :: JavaScript Engine, defect)

ARM
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla31

People

(Reporter: dougc, Assigned: dougc)

References

Details

Attachments

(2 files, 12 obsolete files)

4.56 KB, patch
luke
: review+
Details | Diff | Splinter Review
27.76 KB, patch
dougc
: review+
Details | Diff | Splinter Review
With the fast path from Odin to Ion code added in bug 860838 it might be the case that floating point register d15 is not being restored after returning to Odin.

This might interact with bug 893553 if it changes the calling conventions for Ion and Odin code.

It should be possible to test for such an issue, and tests need to be developed.
Summary: Odinmonkey (ARM): check that d15, the NANReg, is being restored is necessary in FFI calls. → Odinmonkey (ARM): check that d15, the NANReg, is being restored if necessary in FFI calls.
Sorry, mistyped a bug number above.  This might interact with bug 893552.
Blocks: 896808
Note that bug 937944 changed the NANReg from d15 to d14.
Summary: Odinmonkey (ARM): check that d15, the NANReg, is being restored if necessary in FFI calls. → Odinmonkey (ARM): check that d14, the NANReg, is being restored if necessary in FFI calls.
Assignee: general → dtc-moz
Attachment #782412 - Attachment is obsolete: true
Optimize the stack usage further:

* Allocate only one stack frame for GenerateFFIIonExit and GenerateOOLConvert, saving a little stack manipulation.

* Keep the stack aligned, allowing the ABI calls to enable and disable the activation to avoid dynamic stack alignment.  Extended setupAlignedABICall to handle this usage.  This also avoided another push/pop pair.

Also inline setActive() which is called from EnableActivation and DisableActivation.  These are hot functions and are down to a half dozen instructions now, ripe to be inlined.  Is it worth taking time to try and inline these now or are there other plans for the activations?

Tested on jit-tests and some of the big asm.js demos.  Shall do some more testing.
Attachment #8391973 - Attachment is obsolete: true
Attachment #8395359 - Flags: feedback?(luke)
* Fix non-unified builds.

* Avoid a couple more instructions on the x64 and ARM.

* Timing results for a micro-benchmark (run time with patch / run time without patch):

x64: 70%
x86: 65%
ARM: 84%

* Here's a perf profiling result (x64) for the microbenchmark.  The result looks a little odd but suggests that Enable/DisableActivationFromAsmJS are still hot.

 44.04%  js  js                  [.] DisableActivationFromAsmJS(js::AsmJSActivation*)
 36.99%  js  perf-5107.map       [.] 0x00000000f77170f7
 17.73%  js  js                  [.] EnableActivationFromAsmJS(js::AsmJSActivation*)


* Here's Enable/DisableActivationFromAsmJS (ARM) that are not inlined, but it looks possible.  I suggest landing the current improvement first.

EnableActivationFromAsmJS:
   0x00073b6c <+0>:	ldr	r3, [r0, #0]
   0x00073b6e <+2>:	movs	r1, #1
   0x00073b70 <+4>:	ldr	r2, [r3, #0]
   0x00073b72 <+6>:	ldr	r2, [r2, #104]	; 0x68
   0x00073b74 <+8>:	strb.w	r1, [r2, #33]	; 0x21
   0x00073b78 <+12>:	ldr	r1, [r3, #0]
   0x00073b7a <+14>:	ldr	r1, [r1, #92]	; 0x5c
   0x00073b7c <+16>:	str	r1, [r2, #24]
   0x00073b7e <+18>:	ldr	r1, [r3, #0]
   0x00073b80 <+20>:	ldr	r1, [r1, #96]	; 0x60
   0x00073b82 <+22>:	str	r1, [r2, #28]
   0x00073b84 <+24>:	ldr	r2, [r3, #0]
   0x00073b86 <+26>:	str	r3, [r2, #96]	; 0x60
   0x00073b88 <+28>:	bx	lr

DisableActivationFromAsmJS:
   0x00073b50 <+0>:	ldr	r3, [r0, #0]
   0x00073b52 <+2>:	movs	r1, #0
   0x00073b54 <+4>:	ldr	r2, [r3, #0]
   0x00073b56 <+6>:	ldr	r2, [r2, #104]	; 0x68
   0x00073b58 <+8>:	strb.w	r1, [r2, #33]	; 0x21
   0x00073b5c <+12>:	ldr	r0, [r2, #24]
   0x00073b5e <+14>:	ldr	r1, [r3, #0]
   0x00073b60 <+16>:	ldr	r2, [r2, #28]
   0x00073b62 <+18>:	str	r0, [r1, #92]	; 0x5c
   0x00073b64 <+20>:	ldr	r3, [r3, #0]
   0x00073b66 <+22>:	str	r2, [r3, #96]	; 0x60
   0x00073b68 <+24>:	bx	lr
Attachment #8395359 - Attachment is obsolete: true
Attachment #8395359 - Flags: feedback?(luke)
Nice!  Is the patch ready for me to review?

On a side note: there is a bunch of duplication in GenerateFFIInterpreterExit that it looks like could be factored out to be more like GenerateFFIIonExit ... if by any chance you were interested :)
This patch just cleans up duplication in GenerateFFIInterpreterExit, as requested in comment 9, and should otherwise be neutral.
Attachment #8396358 - Flags: review?(luke)
Comment on attachment 8396358 [details] [diff] [review]
2. Refactor GenerateFFIInterpreterExit.

\m/
Attachment #8396358 - Flags: review?(luke) → review+
Comment on attachment 8395561 [details] [diff] [review]
1. Optimize asm.js FFI calls to Ion functions trimming the registers saved on the stack and trim stack maniplation.

Testing look good.

Note GenerateOOLConvert no longer allocates its own stack frame, but rather re-uses the stack frame already set up.  There is a frame check in GenerateOOLConvert but it's a bit rough.
Attachment #8395561 - Attachment description: Optimize asm.js FFI calls to Ion functions trimming the registers saved on the stack and trim stack maniplation. → 1. Optimize asm.js FFI calls to Ion functions trimming the registers saved on the stack and trim stack maniplation.
Attachment #8395561 - Flags: review?(luke)
Comment on attachment 8395561 [details] [diff] [review]
1. Optimize asm.js FFI calls to Ion functions trimming the registers saved on the stack and trim stack maniplation.

Review of attachment 8395561 [details] [diff] [review]:
-----------------------------------------------------------------

This is looking great; very excited to have these improvements.  A few opening comments/questions:

::: js/src/jit/AsmJS.cpp
@@ +6526,4 @@
>      unsigned argBytes = 3 * sizeof(size_t) + (1 + exit.sig().args().length()) * sizeof(Value);
>      unsigned extraBytes = 0;
>  #if defined(JS_CODEGEN_ARM)
>      extraBytes += sizeof(size_t);

I was looking at this code earlier and I was curious about extraBytes.  It'd be nice to comment what this is and why we can't just masm.Push something so that sizeof(size_t) is already in masm.framePushed().

@@ +6529,5 @@
>      extraBytes += sizeof(size_t);
>  #endif
> +
> +    unsigned alreadyPushed = AlignmentAtPrologue + masm.framePushed();
> +    unsigned stackDec = AlignBytes(alreadyPushed + argBytes + extraBytes, StackAlignment) - alreadyPushed;

I'd like to use StackDecrementForCall as much as possible.  Is the reason you unfolded the definition here to avoid the emptyVector?  I guess StackArgBytes(argTypes) also adds in the ShadowStackSpace; is that the issue?

@@ +6540,4 @@
>  
>      // 2. Callee
>      Register callee = ABIArgGenerator::NonArgReturnVolatileReg0;
> +    Register scratch = ABIArgGenerator::NonVolatileReg;

If a non-volatile register is used as scratch, won't that clobber a register of the caller?

@@ +6543,5 @@
> +    Register scratch = ABIArgGenerator::NonVolatileReg;
> +#if defined(JS_CODEGEN_X64) || defined(JS_CODEGEN_ARM)
> +    Register arg0 = IntArgReg0;
> +#else
> +    Register arg0 = callee;

This is rather confusing at this point in the code.  (It's not clear what arg0 means or why it's not IntArgReg0 on ARM.)  It'd be nice to push this down closer to the use so it made more sense or have some comments to explain why the ARM variance.

@@ +6598,5 @@
> +    AssertStackAlignment(masm);
> +    LoadAsmJSActivationIntoRegister(masm, arg0);
> +    masm.setupAlignedABICall(1, AlignmentAtPrologue);
> +    masm.passABIArg(arg0);
> +    masm.callWithABI(AsmJSImm_EnableActivationFromAsmJS);

In addition to making this path way faster, it seems like it'd be a nice simplification to remove these calls by inlining EnableActivation/DisableActivation (see bug 886411).

::: js/src/jit/arm/Assembler-arm.h
@@ +79,5 @@
>      ABIArg &current() { return current_; }
>      uint32_t stackBytesConsumedSoFar() const { return stackOffset_; }
>      static const Register NonArgReturnVolatileReg0;
>      static const Register NonArgReturnVolatileReg1;
> +    static const Register NonVolatileReg;

I don't see this defined on non-ARM.
(In reply to Luke Wagner [:luke] from comment #13)
> Comment on attachment 8395561 [details] [diff] [review]
> 1. Optimize asm.js FFI calls to Ion functions trimming the registers saved
> on the stack and trim stack maniplation.
> 
> Review of attachment 8395561 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This is looking great; very excited to have these improvements.  A few
> opening comments/questions:

Thank you for the quick feedback.   I'll make another pass over this to try and address the feedback, but here's a few quick answers.

> 
> ::: js/src/jit/AsmJS.cpp
> @@ +6526,4 @@
> >      unsigned argBytes = 3 * sizeof(size_t) + (1 + exit.sig().args().length()) * sizeof(Value);
> >      unsigned extraBytes = 0;
> >  #if defined(JS_CODEGEN_ARM)
> >      extraBytes += sizeof(size_t);
> 
> I was looking at this code earlier and I was curious about extraBytes.  It'd
> be nice to comment what this is and why we can't just masm.Push something so
> that sizeof(size_t) is already in masm.framePushed().

Shall do.  The extraBytes here for the ARM is a word on the stack into which the return PC will be stored when making the call into Ion. The ARM Ion frames reserve a slot for the return PC and the caller stores it there when making the call. The x86 and x64 have a different convention as the return PC is pushed as part of the call instruction.    ARM Odin has a different convention again.

> @@ +6529,5 @@
> >      extraBytes += sizeof(size_t);
> >  #endif
> > +
> > +    unsigned alreadyPushed = AlignmentAtPrologue + masm.framePushed();
> > +    unsigned stackDec = AlignBytes(alreadyPushed + argBytes + extraBytes, StackAlignment) - alreadyPushed;
> 
> I'd like to use StackDecrementForCall as much as possible.  Is the reason
> you unfolded the definition here to avoid the emptyVector?  I guess
> StackArgBytes(argTypes) also adds in the ShadowStackSpace; is that the issue?

While it is possible to thread this through StackDecrementForCall, it makes the code less readable for me because this is a call into Ion, not into Odin, and the conventions are different - I find myself mentally unrolling this just to understand what the net effect is.

The ShadowStackSpace appears to be for Windows x64.  Forgot about this and will need to check.  It's not relevant for the ARM paths.

> @@ +6540,4 @@
> >  
> >      // 2. Callee
> >      Register callee = ABIArgGenerator::NonArgReturnVolatileReg0;
> > +    Register scratch = ABIArgGenerator::NonVolatileReg;
> 
> If a non-volatile register is used as scratch, won't that clobber a register
> of the caller?

The caller is Odin code and it does not yet take advantage of non-volatile registers across calls and does not expect them to be preserved. This code will be rewritten when the calls are inlined anyway.

> @@ +6543,5 @@
> > +    Register scratch = ABIArgGenerator::NonVolatileReg;
> > +#if defined(JS_CODEGEN_X64) || defined(JS_CODEGEN_ARM)
> > +    Register arg0 = IntArgReg0;
> > +#else
> > +    Register arg0 = callee;
> 
> This is rather confusing at this point in the code.  (It's not clear what
> arg0 means or why it's not IntArgReg0 on ARM.)  It'd be nice to push this
> down closer to the use so it made more sense or have some comments to
> explain why the ARM variance.

Shall look into moving this and documenting it.  The issue is that the x64 and ARM can pass the argument in a register so it saves a move to load the value directly into this register.
 
> @@ +6598,5 @@
> > +    AssertStackAlignment(masm);
> > +    LoadAsmJSActivationIntoRegister(masm, arg0);
> > +    masm.setupAlignedABICall(1, AlignmentAtPrologue);
> > +    masm.passABIArg(arg0);
> > +    masm.callWithABI(AsmJSImm_EnableActivationFromAsmJS);
> 
> In addition to making this path way faster, it seems like it'd be a nice
> simplification to remove these calls by inlining
> EnableActivation/DisableActivation (see bug 886411).

This does look possible, and I'll look into it.

> ::: js/src/jit/arm/Assembler-arm.h
> @@ +79,5 @@
> >      ABIArg &current() { return current_; }
> >      uint32_t stackBytesConsumedSoFar() const { return stackOffset_; }
> >      static const Register NonArgReturnVolatileReg0;
> >      static const Register NonArgReturnVolatileReg1;
> > +    static const Register NonVolatileReg;
> 
> I don't see this defined on non-ARM.

See Assembler-x64.cpp, and Assembler-x86.cpp
(In reply to Douglas Crosher [:dougc] from comment #14)
Thanks for all the explanations!

> While it is possible to thread this through StackDecrementForCall, it makes
> the code less readable for me because this is a call into Ion, not into
> Odin, and the conventions are different - I find myself mentally unrolling
> this just to understand what the net effect is.

Yeah, I guess you're right.

> > I don't see this defined on non-ARM.
> 
> See Assembler-x64.cpp, and Assembler-x86.cpp

Ah, those changes seem not to have made it into the posted patch (or at least into bugzilla's diff viewer :).
Inline EnableActivationFromAsmJS() and DisableActivationFromAsmJS().

* Timing results for a micro-benchmark show further improvement (run time with patch / run time without patch):

x64: 61%
x86: 31%
ARM: 67%

The patch passes the asm.js jit-tests on the x86, x64 and ARM.

Needs some more testing, and cleaning up, and source comments.
Attachment #8398460 - Attachment is obsolete: true
Clean up.  Retested ARM and Linux x86 and x64.  TODO check Windows 64 shadow stack usage and test on the Windows builds.
Attachment #8401833 - Attachment is obsolete: true
Comment on attachment 8402702 [details] [diff] [review]
1. Optimize asm.js FFI calls to Ion functions, trim regs saved, inline activation and deactivation.

Review of attachment 8402702 [details] [diff] [review]:
-----------------------------------------------------------------

That is awesome!

Can you add a comment to the JitActivation::active() function that the logic is inlined in odinmonkey. So that any change to that function should get reflected in AsmJS.cpp?
Thank you pointing out the shadow stack, it turned out to be quite an omission in prior patches.  Jit-tests succeed on all platforms now, except win64 builds which are timing out but win64 works locally, so I'll request review.

https://tbpl.mozilla.org/?tree=Try&rev=a935b239286c
https://tbpl.mozilla.org/?tree=Try&rev=01a551423d18

The set of registers allocated is rather complex, and it might be hard to read.  There are source comments, but feedback on this would be particularly welcomed.
Attachment #8402702 - Attachment is obsolete: true
Attachment #8405329 - Flags: review?(luke)
Comment on attachment 8405329 [details] [diff] [review]
1. Optimize asm.js FFI calls to Ion functions, trim regs saved, inline activation and deactivation.

Review of attachment 8405329 [details] [diff] [review]:
-----------------------------------------------------------------

This is looking great!  Thanks for all the work.  I'd like one more quick pass over this patch before r+.

::: js/src/jit/AsmJS.cpp
@@ +6545,2 @@
>      unsigned argBytes = 3 * sizeof(size_t) + (1 + exit.sig().args().length()) * sizeof(Value);
> +    // The extraBytes accounts for the stack difference when the call into Ion is made.

\n before this comment please

@@ +6550,1 @@
>      unsigned extraBytes = 0;

Could we name this 'bytesBeforeArgs'?  Also, I didn't quite understand this comment until reading ma_callIonNoPush and seeing what was going on.  Perhaps the comment could read more like:

"On ARM, we call with ma_callIonNoPush which, following the Ion calling convention, stores the return address into *sp. This means we need to include an extra word of space before the arguments in the stack allocation. (On x86/x64, the call instruction does the push itself and the ABI just requires us to be aligned before the call instruction.)"

@@ +6550,5 @@
>      unsigned extraBytes = 0;
>  #if defined(JS_CODEGEN_ARM)
>      extraBytes += sizeof(size_t);
>  #endif
> +    //

Can you convert this // into an empty \n line

@@ +6556,5 @@
> +    // The Ion frame does not use the system ABI for passing arguments and does not need
> +    // to allocate the shadow stack used on win64.
> +    unsigned alreadyPushed = AlignmentAtPrologue + masm.framePushed();
> +    unsigned stackDec = AlignBytes(alreadyPushed + argBytes + extraBytes, StackAlignment)
> +        - alreadyPushed;

Could you split StackDecrementForCall into:
  static unsigned
  StackDecrementForCall(MacroAssembler &masm, unsigned argBytes, unsigned extraBytes)
  {
      // the code you have above
  }
and then change the existing StackDecrementForCall to just
  return StackDecrementForCall(masm, StackArgBytes(argTypes), extraBytes);
?

@@ +6567,5 @@
> +    callArgTypes.infallibleAppend(typeArray, ArrayLength(typeArray));
> +    unsigned OOLStackDec = StackDecrementForCall(masm, callArgTypes, sizeof(Value));
> +
> +    // Allocate a frame large enough for both of the above calls.
> +    stackDec = Max(stackDec, OOLStackDec);

Rather than mutating stackDec, could you name the first definition 'stackDecForIonCall' (and perhaps rename OOLStackDec to 'stackDecForOOLCall' to keep with the locals-have-lowercase-names convention)?

@@ +6596,5 @@
>      // 2.2. Get callee
>      masm.loadPtr(Address(callee, offsetof(AsmJSModule::ExitDatum, fun)), callee);
>  
>      // 2.3. Save callee
> +    masm.storePtr(callee, Address(StackPointer, extraBytes + sizeof(size_t)));

Pre-existing, but since you are tweaking already: these manual stack offset computations seem a bit error prone.  What about instead having one variable declared up above:
  size_t argOffset = byteBeforeArgs;
and then each storePtr to Address(StackPointer, argOffset) is followed by a
  argOffset += sizeof(size_t) (or sizeof(Value))
then we can remove the 'offsetToArgs' computation below, and pass argOffset to FillArgumentsArray directly.  Finally, we can have:
  argOffset += exit.sig().args().length() * sizeof(Value);
  JS_ASSERT(argOffset == extraBytes + argBytes);

@@ +6624,5 @@
>      masm.branchIfFunctionHasNoScript(callee, &ionFailed);
>  #endif
>  
>      masm.loadPtr(Address(callee, JSFunction::offsetOfNativeOrScript()), scratch);
>      masm.loadBaselineOrIonNoArgCheck(scratch, scratch, SequentialExecution, maybeDebugBreakpoint);

IIUC, callee is dead after this loadPtr, so you could use 'callee' here and below instead of 'scratch'.  Name-wise, that'd be nice since 'callee' would be the actual callee.

@@ +6633,5 @@
> +    //
> +    // This sequence requires four registers, and needs to preserve the 'scratch'
> +    // register, so there are five live registers.  The 'scratch' register is currently
> +    // allocated to: ARM r6; x64 r13; x86 ebx.  The four registers chosen for this
> +    // sequence are: ARM r0, r1, r2, r3; x64 rax, rdi, rbx, rsi; x86 edi, eax, ecx, edx.

One idea for how to make this constraint more explicit is to add an AsmJSFFIIonExitRegCallee so that way, in all the Assembler-* files it is easy to see that all the named registers are different without needing any of these comments.  Instead, we'd just have:
  JS_ASSERT(callee == AsmJSFFIIonExitRegCallee);

@@ +6642,5 @@
> +
> +    LoadAsmJSActivationIntoRegister(masm, reg0);
> +
> +    // Load the JSContext from the AsmJSActivation
> +    // JSContext *cx = activation->cx();

For blocks of assembly like this, mostly I just want to scan over the code and see what how registers are used and the interleaved comments makes it a bit harder to follow.  Also, the masm function names mostly are clear enough w/o comments.  Could you instead have one comment up front that perhaps is just a copy of the C++ code that is being effectively inlined below?

@@ +6650,5 @@
> +    masm.loadPtr(Address(reg3, JSContext::offsetOfRuntime()), reg0);
> +    // Load the Activation from the JSContext mainThread().
> +    // Activation *act = cx->mainThread().activation();
> +    size_t offsetOfActivation = offsetof(JSRuntime, mainThread) +
> +        PerThreadData::offsetOfActivation();

Could you indent 'PerThreadData' with the 'offsetof' above (here and below x2)?

@@ +6678,1 @@
>      masm.push(scratch);

Instead of doing this, what if we just left the stack unaligned (and moved the AssertStackAlignment below into the #else branch)?  Then instead we could have:

  // ma_callIonNoPush will store the return address into *sp and, according to the Ion ABI,
  // the Ion callee will pop this value when returning. Thus, on return from Ion, the stack
  // is unaligned and incremented by sizeof(void*).
  masm.ma_callIonNoPush(scratch);
  masm.adjustFrame(-sizeof(void*));
  stackDec -= sizeof(void*);

which would shave off one insn from Ion calls on ARM.

@@ +6688,5 @@
> +    // This sequence needs three registers, and must preserve the JSReturnReg_Data (ARM
> +    // r2, x64 rcx, x86 ecx), and JSReturnReg_Type (ARM r3, x64 rcx, x86 edx).  The three
> +    // registers chosen are: ARM r0, r1, r4; x64 rax, rdi, rbx; x86 edi, eax, esi; The
> +    // definitions of reg0 and reg1 defined above are reused.
> +    Register reg4 = AsmJSFFIIonExitReg4;

To capture this constraints in the code, how about we have a separate set of registers (one for enable activation, one for disable) named AsmJSActivateIonReg* and AsmJSDeactiveIonReg*, resp.

Then we can have:
  JS_ASSERT(AsmJSDeactivateIonRegReturnData == JSReturnReg_Data);
  JS_ASSERT(AsmJSDeactivateIonRegReturnType == JSReturnReg_Type);
  Register reg0 = AsmJSDeactivateIonReg0;

Also, could you wrap both the activate/deactivate blocks in { } which will avoid the declaration conflict for 'reg0' but also visually set off these blocks.

@@ +6750,2 @@
>      masm.ret();
> +#endif

Can you JS_ASSERT(masm.framePushed() == 0); ?

::: js/src/jit/arm/Assembler-arm.h
@@ +97,5 @@
>  static MOZ_CONSTEXPR_VAR FloatRegister ScratchFloatReg = { FloatRegisters::d15 };
>  
>  static MOZ_CONSTEXPR_VAR FloatRegister NANReg = { FloatRegisters::d14 };
>  
> +// Registers used in GenerateFFIIonExit.

Can you add "None of these may be secondScratchRegister (lr)."
Attachment #8405329 - Flags: review?(luke)
Blocks: 882399
Thank you for the great feedback.  This has improved readability a lot.  All suggestions have been applied, except the removal of the stack push on return from the Ion call on the ARM (to keep the stack aligned) although it has been replaced by a subtraction instruction.

It does appear possible to remove this stack adjustment instruction, but it would need to be added back to the OOL paths because the stack needs to be aligned when making the OOL path ABI calls.  This could add a good deal of complexity for little gain.  Do you want this done?

A try run looks ok.  Win64 debug failed to build, but it looks like an infrastructure issue:
https://tbpl.mozilla.org/?tree=Try&rev=cfa316f52428
Attachment #8405329 - Attachment is obsolete: true
Attachment #8405825 - Flags: review?(luke)
(In reply to Douglas Crosher [:dougc] from comment #23)
> Created attachment 8405825 [details] [diff] [review]
> 1. Optimize asm.js FFI calls to Ion functions, trim regs saved, inline
> activation and deactivation.
> 
> Thank you for the great feedback.  This has improved readability a lot.  

I agree. Awesome work.

Could it be you overlooked comment 20? Would it still be possible to add a comment at
http://dxr.mozilla.org/mozilla-central/source/js/src/vm/Stack.cpp#1520
That this is inlined in AsmJS.cpp?
Address comment 20 which was overlook, sorry.  Adds a comment to setActive() to note that changes need to be reflected in GenerateFFIIonExit().
Attachment #8405825 - Attachment is obsolete: true
Attachment #8405825 - Flags: review?(luke)
Attachment #8405850 - Flags: review?(luke)
Comment on attachment 8405850 [details] [diff] [review]
1. Optimize asm.js FFI calls to Ion functions, trim regs saved, inline activation and deactivation.

Review of attachment 8405850 [details] [diff] [review]:
-----------------------------------------------------------------

Beautiful, thanks!

::: js/src/jit/AsmJS.cpp
@@ +6034,5 @@
>      return iter.stackBytesConsumedSoFar();
>  }
>  
> +static unsigned
> +StackDecrementForCall(MacroAssembler &masm, unsigned argBytes, unsigned extraBytes = 0)

Nit on my own suggestion: since there is no real distinction between argBytes vs. extraBytes in this function, having them separate raises the unnecessary question of what bytes should go where.  Can this function instead take a single "bytesToPush" argument (such that callers handle the addition)?

@@ +6545,5 @@
> +    //
> +    // Arguments to the Ion function are in the following order on the stack:
> +    // descriptor | callee | argc | this | arg1 | arg2 | ...
> +    //
> +    // Reserve and align space for the arguments.

This last line of comment doesn't seem to belong anymore, can you remove it and the preceding \n?

@@ +6651,5 @@
> +        LoadAsmJSActivationIntoRegister(masm, reg0);
> +
> +        // The following is inlined:
> +        //   JSContext *cx = activation->cx();
> +        //   Activation *act = cx->mainThread().activation();

Can you add: act->active_ = true;

::: js/src/jit/x86/Assembler-x86.h
@@ +91,5 @@
> +static MOZ_CONSTEXPR_VAR Register AsmJSFFIIonExitRegCallee = ecx;
> +static MOZ_CONSTEXPR_VAR Register AsmJSFFIIonExitRegE0 = edi;
> +static MOZ_CONSTEXPR_VAR Register AsmJSFFIIonExitRegE1 = eax;
> +static MOZ_CONSTEXPR_VAR Register AsmJSFFIIonExitRegE2 = ebx;
> +static MOZ_CONSTEXPR_VAR Register AsmJSFFIIonExitRegE3 = edx;

Nit on my own previously-suggested name: JSFFII boy that's a lot of capital letters together.  I think AsmJSIonExitReg* would be just as clear and less ugly.

::: js/src/vm/Stack.h
@@ +1340,5 @@
> +    }
> +    static size_t offsetOfPrevJitJSContext() {
> +        return offsetof(JitActivation, prevJitJSContext_);
> +    }
> +    static size_t offsetOfActive() {

Could you rename this to offsetOfActiveUint8() and JS_STATIC_ASSERT(sizeof(active_) == 8) in the body?
Attachment #8405850 - Flags: review?(luke) → review+
Hannes: could setActive be changed to only contain "active_ = active" and none of the rest?  That is, it seems like the rest could be done in the JitActivation ctor/dtor since they aren't going to change while inside asm.js code.  Is there some use of JitActivation that depends on these being null?  It seems like such uses should instead be guarded by isActive().
Address reviewer feedback. Carrying forward r+.
Attachment #8405850 - Attachment is obsolete: true
Attachment #8406888 - Flags: review+
Keywords: checkin-needed
(In reply to Luke Wagner [:luke] from comment #27)
> Hannes: could setActive be changed to only contain "active_ = active" and
> none of the rest?  That is, it seems like the rest could be done in the
> JitActivation ctor/dtor since they aren't going to change while inside
> asm.js code.  Is there some use of JitActivation that depends on these being
> null?  It seems like such uses should instead be guarded by isActive().

1) Theoretical: yes. My initial version did this. (Onlye active_ = active and cx()->mainThread().ionTop = asJit()->prevIonTop(); when active = false;)

2) Practical: no. See bug 886255. So doing this created crashes which are really very deep and hard to debug. Eventually I adjusted setActive to be more like the ctor/dtor of JitActivation, which fixed the issue. One brave man might try to remove them again. But be aware it is an error prone and should be done carefully to not introduce crashes. (I assume that some of the fields get overwritten or something, or get changed during execution)
(In reply to Hannes Verschore [:h4writer] from comment #29)
Hmm, creepy.  I'm a bit confused how bug 886255 hit this, though since the URL doesn't appear to run any asm.js.  Perhaps there was a different/related bug that got fixed as part of the overall work?
(In reply to Luke Wagner [:luke] from comment #30)
> (In reply to Hannes Verschore [:h4writer] from comment #29)
> Hmm, creepy.  I'm a bit confused how bug 886255 hit this, though since the
> URL doesn't appear to run any asm.js.  Perhaps there was a different/related
> bug that got fixed as part of the overall work?

I think I fixed one additional thing. If I am still able to reproduce. I can possible see if the additional thing fixed it. If that is the case. I'm willing to try to remove the extra code again...
Flags: needinfo?(hv1989)
Ah, that seems likely.  In particular, setActive() is only called for asm.js FFI Ion calls so it seems like any change in behavior in setActive() couldn't possibly affect the site in bug 886255.
Fix build failure on Windows, sorry.  The compiler does not accept sizeof(slot), so use sizeof(bool).  Also the failing test should have been using a static assert.  Carrying forward r+.

Successful try run on Windows:
https://tbpl.mozilla.org/?tree=Try&rev=99a5029f34a8
Attachment #8406888 - Attachment is obsolete: true
Attachment #8407532 - Flags: review+
Whiteboard: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/28d75105f7c8
https://hg.mozilla.org/mozilla-central/rev/d1b66744e023
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
(In reply to Luke Wagner [:luke] from comment #32)
> Ah, that seems likely.  In particular, setActive() is only called for asm.js
> FFI Ion calls so it seems like any change in behavior in setActive()
> couldn't possibly affect the site in bug 886255.

Okay, I was able to reproduce the crash. (Thanks github for history of the website). And I can officially say the bug in bug 886255 was not fixed by the setActive changes. I tried to bisect the real issue, but I only have a range, since firefox didn't want to build in those revisions.
So the good news is that there is no reason anymore against making setActive leaner. As in only retain the active=true/false part.
Flags: needinfo?(hv1989)
Thanks for the detective work Hannes!
You need to log in before you can comment on or make changes to this bug.