Align the local stack storage for asm.js ARM

RESOLVED FIXED in mozilla32

Status

()

defect
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: dougc, Assigned: bbouvier)

Tracking

Trunk
mozilla32
ARM
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 3 obsolete attachments)

The asm.js ARM local stack is currently not aligned to a dual word boundary which could hurt performance of double float stores and loads to the local stack.  The frame setup and access is also rather convoluted with workarounds for the misalignment.

This patch makes the frame descriptor for the asm.js ARM frames two words - the return pc plus an unused fill word. It then references that local stack to this dual word aligned datum. Dual word aligning the stack makes the asm.js code close to the regular Ion stack conventions and simplifies the code.

The problem could be solved in other ways.  For example having the Ion stack allocator manage the return pc stack slot.  The current patch addresses the alignment issue now without affecting common code.

The unused stack slot could potentially be used to store a frame pointer which would be filled by the caller and point back to the callers frame descriptor, and this could enable backtracing thought ARM asm.js frames.  A discussion of the use of the system ABI for asm.js and the potential in using alternative conventions might be helpful.
Attachment #736276 - Flags: review?(mrosenberg)
Summary: Align the locals stack storage for asm.js ARM → Align the local stack storage for asm.js ARM
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment on attachment 736276 [details] [diff] [review]
Align the local stack storage for ARM asm.js frames.

Review of attachment 736276 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry to drive-by nit pre-existing conditions, but the code just poked me in the eye...

::: js/src/ion/AsmJS.cpp
@@ +4858,2 @@
>      if(getenv("GDB_BREAK")) {
>          masm.breakpoint(js::ion::Assembler::Always);

Can you remove the whole if (getenv) {...}?

@@ +4858,4 @@
>      if(getenv("GDB_BREAK")) {
>          masm.breakpoint(js::ion::Assembler::Always);
>      }
>      // Copy parameters out of argv into the registers/stack-slots specified by

Can you add a \n before the comment?
Revised patch addressing Luke's comments.
Attachment #736276 - Attachment is obsolete: true
Attachment #736276 - Flags: review?(mrosenberg)
Attachment #736782 - Flags: review?(mrosenberg)
Marty and Douglas, is this patch to align ARM asm.js frames still applicable? The patch has been awaiting review for three months.
Flags: needinfo?(mrosenberg)
Comment on attachment 736782 [details] [diff] [review]
Align the local stack storage for ARM asm.js frames.

Review of attachment 736782 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/arm/MacroAssembler-arm.cpp
@@ +1581,5 @@
> +//  |  ...              |
> +//  |-------------------|
> +//  |  ...              |  <- Stack Pointer. Dual word aligned.
> +//  |-------------------|
> +//

We should probably have these diagrams in more places in the IM codebase.
Attachment #736782 - Flags: review?(mrosenberg) → review+
Flags: needinfo?(mrosenberg)
Posted patch rebased patch (obsolete) — Splinter Review
While I was working on bug 1014083, after I've asked what AlignmentMidPrologue is, dougc pointed me that bug. The patch is ready, I rebased it and updated it (and btw learned a lot about the different asm.js trampolines and stackwalking). As it changed a lot (and I'd like to make sure I am not breaking anything)), I'll ask people involved in that patch to give a look, and Marty for final re-review.

The only main difference is in the asm.js -> Ion trampoline, in which we push GlobalReg and HeapReg before reserving stack for the call. For that case, I preferred to update the reserveAsmJSStack interface rather than having fancy ways to move the pushes (to avoid triggering the assertions framePushed() == 0 in the original patch).

Making a try push for sanity, even though all tests are locally passing in the ARM simulator. I'll ask for f? and r? if everything is green.
https://tbpl.mozilla.org/?tree=Try&rev=d5a5a4b1c386
Comment on attachment 8429226 [details] [diff] [review]
rebased patch

It seems to be all green (locally + try build). Not sure irc if jit-tests are now run for ARM platforms, during the build or separately (seems it's not the case as i can't find the Jit label in the tests list for these platforms).

Asking for feedback from owners. Do you think it'd be worth to have it fuzzed?
Attachment #8429226 - Flags: review?(mrosenberg)
Attachment #8429226 - Flags: feedback?(luke)
Attachment #8429226 - Flags: feedback?(dtc-moz)
Maybe I don't understand all the constraints here, but I would've expected a slightly different strategy here:
 - masm.push(lr) in the prologue
 - set AlignmentAtPrologue = sizeof(void*) (making ARM like x86), keep NativeFrameSize = sizeof(void*)
 - no new reserveAsmJSStackFrame; StackDecrementForCall should compute the right thing given that StackAlignment = 8
 - change the ARM MacroAssembler's appendCallSite to add an extra sizeof(void*) like x86/x64 already does so that AsmJSFrameIter::popFrame doesn't need to change
Basically, the goal would be to act as if calls on ARM were like calls on x86.
(In reply to Luke Wagner [:luke] from comment #8)
>  - set AlignmentAtPrologue = sizeof(void*) (making ARM like x86), keep
> NativeFrameSize = sizeof(void*)
>  - no new reserveAsmJSStackFrame; StackDecrementForCall should compute the
> right thing given that StackAlignment = 8

Unless i am missing something, in GenerateEntry, StackDecrementForCall will align on StackAlignment, taking account for AlignmentAtPrologue, but the entry function doesn't have AlignmentAtPrologue, as we introduce it. So there would be a need for another const variable, which sounds equivalent to the Alignment{At,Mid}Prologue problem. Does that sound true? (i was actually trying to implement it the way you suggested and ran into this)
(In reply to Benjamin Bouvier [:bbouvier] from comment #9)
What I'd do for GenerateEntry is:
 - masm.push(lr) at the beginning (#ifdef CODEGEN_ARM)
 - change the return at the end of GenerateEntry/GenerateThrowExit to masm.ret() (which pops the stack into pc)
 - remove lr from NonVolatileRegs on ARM
This change will also be a nice preparation for profiler stack-walking where we need to be able to unwind the entry stub to find the return address of the calling C++.
Posted patch another way to do it (obsolete) — Splinter Review
That's the approach luke suggested. It is indeed cleaner, except for unmonitored calls (meaning, the calls that don't have a CallDescSite: calls in LSoftModI, LSoftDivI, LSoftUDivOrMod). For these calls, we need to take AlignmentAtPrologue into account when aligning stack bytes, hence the callWithAsmJSAbi function (i preferred to have a specific function which isn't in IonMacroAssembler rather than adding a third optional boolean argument to IonMacroAssembler::callWithABI, as this is pretty specific to ARM).
Attachment #8430118 - Flags: review?(mrosenberg)
Attachment #8430118 - Flags: feedback?(luke)
Attachment #8430118 - Flags: feedback?(dtc-moz)
Comment on attachment 8429226 [details] [diff] [review]
rebased patch

Cancelling all r? and f? for this patch, as the other one looks simpler.
Attachment #8429226 - Flags: review?(mrosenberg)
Attachment #8429226 - Flags: feedback?(luke)
Attachment #8429226 - Flags: feedback?(dtc-moz)
As this is only used by AsmJS, and as on ARM there's now a difference between the native alignment at prologue (0) and AsmJS alignment at prologue (sizeof(void*)) with the above patch applied, let's make it clear.
Attachment #8430119 - Flags: review?(luke)
Attachment #8430119 - Flags: review?(luke) → review+
Comment on attachment 8430118 [details] [diff] [review]
another way to do it

Review of attachment 8430118 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome, this seems much simpler and removes several corner cases.  Thanks a lot!

::: js/src/jit/AsmJS.cpp
@@ +5980,5 @@
>                  FloatRegisterSet(FloatRegisters::AllMask));
>  #if defined(JS_CODEGEN_ARM)
>  // The ARM system ABI also includes d15 in the non volatile float registers.
> +// Also exclude r14 (a.k.a. lr) as we preserve it manually)
> +JS_STATIC_ASSERT(Registers::r14 == Registers::lr);

Instead of static asserting, can you just use 'lr' below?

@@ +6071,5 @@
>      // NB: GenerateExits assumes that masm.framePushed() == 0 before
>      // PushRegsInMask(NonVolatileRegs).
>      masm.setFramePushed(0);
> +#if defined(JS_CODEGEN_ARM)
> +    masm.push(lr); // masm.ret() will pop lr

Can you expand this comment a bit with something like:
  // Push lr without incrementing masm.framePushed since this push is accounted
  // for by AlignmentAtAsmJSPrologue. The masm.ret at the end will pop.
and put a \n before and after the #ifdef/#endif?

@@ +6324,5 @@
>      masm.align(CodeAlignment);
>      m.setInterpExitOffset(exitIndex);
>      masm.setFramePushed(0);
>  #if defined(JS_CODEGEN_ARM)
> +    masm.push(lr);

Can you add the same comment as above and also the \n before/after?

@@ +6495,5 @@
>  #if defined(JS_CODEGEN_X64)
>      masm.Push(HeapReg);
>  #elif defined(JS_CODEGEN_ARM)
> +    // The lr register holds the return address and needs to be saved.
> +    masm.push(lr);

Can you add the same comment as above and put a \n after?

@@ +6698,1 @@
>  # if defined(JS_CODEGEN_X64)

Now that these are not inside an #if, can you remove the spaces before the 'if' and the 'endif'?

@@ +6700,5 @@
>  # endif
> +#if defined(JS_CODEGEN_ARM)
> +    masm.ma_vimm(GenericNaN(), NANReg);
> +    masm.PopRegsInMask(GeneralRegisterSet((1<<GlobalReg.code()) | (1<<HeapReg.code())));
> +    // on ARM we have manually pushed lr at the beginning, so we can use masm.ret()

I don't think this comment is necessary.

::: js/src/jit/IonMacroAssembler.h
@@ +250,5 @@
>              }
>          }
>      }
>  
> +    // asm.js compilation handles its own IonContent-pushing

Hehe, *IonContext

::: js/src/jit/arm/CodeGenerator-arm.cpp
@@ -1052,5 @@
> -
> -    // The way the stack slots work, we assume that everything from depth == 0 downwards is writable
> -    // however, since our frame is included in this, ensure that the frame gets skipped
> -    if (gen->compilingAsmJS())
> -        offset -= AlignmentMidPrologue;

Good riddance

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +3826,5 @@
> +    // AsmJS unmonitored calls to native functions (as in LSoft{Div,Mod}) are
> +    // unaligned because we have an AlignmentAtPrologue offset from the
> +    // StackAlignment boundary. Calls made with LAsmJSCall push a word right
> +    // before the actual call, which aligns the offset and there's no issue.
> +    uint32_t maybePrologueOffset = (asmJSNativeCall) ? AlignmentAtPrologue : 0;

Instead of "asmJSNativeCall", can you name it "callfromAsmJS"?  (The word "native" is rather confusing and overloaded in SM.)

Second, if the statement is written:
 uint32_t alignmentAtPrologue = callFromAsmJS ? AlignmentAtAsmJSPrologue : 0;
it sorta speaks for itself without a comment.

@@ +3835,2 @@
>      } else {
> +        JS_ASSERT(!asmJSNativeCall);

Is this assert necessary?  With dynamic alignment, we'd end up with an aligned pointer, so asm.js would be fine, I'd think.

::: js/src/jit/arm/MacroAssembler-arm.h
@@ +1530,5 @@
>      // Emits a call to a C/C++ function, resolving all argument moves.
>      void callWithABI(void *fun, MoveOp::Type result = MoveOp::GENERAL);
> +    void callWithABI(AsmJSImmPtr imm, MoveOp::Type result = MoveOp::GENERAL,
> +                     bool asmJSNativeCall = false);
> +    void callWithAsmJSABI(AsmJSImmPtr imm, MoveOp::Type result = MoveOp::GENERAL);

I would think that *every* callWithABI that passes an AsmJSImmPtr wants to pass "callFromAsmJS = true".  Is that the case?  If so, then there would be no need to add a new function.
Attachment #8430118 - Flags: feedback?(luke) → feedback+
(In reply to Luke Wagner [:luke] from comment #14)
> ::: js/src/jit/arm/MacroAssembler-arm.h
> @@ +1530,5 @@
> >      // Emits a call to a C/C++ function, resolving all argument moves.
> >      void callWithABI(void *fun, MoveOp::Type result = MoveOp::GENERAL);
> > +    void callWithABI(AsmJSImmPtr imm, MoveOp::Type result = MoveOp::GENERAL,
> > +                     bool asmJSNativeCall = false);
> > +    void callWithAsmJSABI(AsmJSImmPtr imm, MoveOp::Type result = MoveOp::GENERAL);
> 
> I would think that *every* callWithABI that passes an AsmJSImmPtr wants to
> pass "callFromAsmJS = true".  Is that the case?  If so, then there would be
> no need to add a new function.

That's actually not the case. For all asm.js calls made with MAsmJSCall / LAsmJSCall, we now add the return address right before the call (which harmonizes the ABI with x86 and x64), while we don't do the same with these particular function calls that are directly made from LSoftDiv, etc. That's what the comment in callWithABIPre tried to explain:

> +    // AsmJS unmonitored calls to native functions (as in LSoft{Div,Mod}) are
> +    // unaligned because we have an AlignmentAtPrologue offset from the
> +    // StackAlignment boundary. Calls made with LAsmJSCall push a word right
> +    // before the actual call, which aligns the offset and there's no issue.
> +    uint32_t maybePrologueOffset = (asmJSNativeCall) ? AlignmentAtPrologue : 0;

The other solution would be to push an empty word before the calls, but it looks a little bit silly. Are you fine with this? Does the comment need a better wording?

Try build (nits not fixed yet)
https://tbpl.mozilla.org/?tree=Try&rev=6f15797b50aa
(In reply to Benjamin Bouvier [:bbouvier] from comment #15)
> > I would think that *every* callWithABI that passes an AsmJSImmPtr wants to
> > pass "callFromAsmJS = true".  Is that the case?  If so, then there would be
> > no need to add a new function.
> 
> That's actually not the case. For all asm.js calls made with MAsmJSCall /
> LAsmJSCall, we now add the return address right before the call (which
> harmonizes the ABI with x86 and x64), while we don't do the same with these
> particular function calls that are directly made from LSoftDiv, etc.

I'm still confused: LAsmJSCall uses plain masm.call; not masm.callWithABI.
Attachment #8430118 - Flags: review?(mrosenberg) → review+
(In reply to Luke Wagner [:luke] from comment #16)
> I'm still confused: LAsmJSCall uses plain masm.call; not masm.callWithABI.

You're right! The only two other places we're using callWithABI with AsmJSImmPtr are unaligned calls, which don't have to deal with this alignmentAtPrologue issue, so we're fine. The patch gets way simpler.

Carrying forward r+.
Attachment #8429226 - Attachment is obsolete: true
Attachment #8430118 - Attachment is obsolete: true
Attachment #8430118 - Flags: feedback?(dtc-moz)
Attachment #8433219 - Flags: review+
As i understand it, the forceAlign constant in CodeGeneratorShared ctor says "we want to align stack in all cases for calls". That sounds exactly like what StackKeptAligned means, in the Assembler-*.h files, and as it exactly matches forceAlign, let's use it.

Also inverting the conditions in the test, just in case the compiler doesn't find out that StackKeptAligned is a constant boolean.
Assignee: general → benj
Status: NEW → ASSIGNED
Attachment #8433222 - Flags: review?(mrosenberg)
I don't want to block MIPS work (the last patch is just a cleanup).

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/c8a1656249fc
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/066f499d0544
Whiteboard: [leave open]
Attachment #8433222 - Flags: review?(mrosenberg) → review+
https://hg.mozilla.org/mozilla-central/rev/ba505f710435
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Depends on: 1022142
This fix causes Bug 1022802 regression.
Blocks: 1022802
You need to log in before you can comment on or make changes to this bug.