Closed Bug 1439404 Opened 2 years ago Closed 2 years ago

Instantiate Rabaldr porting APIs for ARM64

Categories

(Core :: JavaScript Engine: JIT, enhancement, P3)

ARM64
All
enhancement

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox60 --- wontfix
firefox61 --- fixed

People

(Reporter: lth, Assigned: lth)

References

Details

Attachments

(4 files, 2 obsolete files)

Basically three things:

- support the ARM64 stack, which must be allocated in chunks, not 
  individual words
- provide ARM64 implementations for the porting points still present
  in WasmBaselineCompile.cpp
- change wasm::HasCompilerSupport and wasm::BaselineCanCompile to
  enable ARM64
Priority: -- → P2
Priority: P2 → P3
Make SpiderMonkey aware that we have wasm support for ARM64.  This will be the last patch to land.
Attachment #8954035 - Flags: review?(bbouvier)
Make Rabaldr push and pop its spill stack in larger chunks on ARM64, to avoid the mess with the aligned SP without using a shadow stack pointer.
Attachment #8954036 - Flags: review?(bbouvier)
Instantiate the porting points in Rabaldr for ARM64.  There aren't many of these left now.
Attachment #8954038 - Flags: review?(bbouvier)
Attachment #8954035 - Flags: review?(bbouvier) → review+
Comment on attachment 8954036 [details] [diff] [review]
bug1439404-arm64-rabaldr-chunky-stack.patch

Review of attachment 8954036 [details] [diff] [review]:
-----------------------------------------------------------------

I wonder from all of this if it wouldn't be better to have another kind of MacroAssembler specialized for the BaselineCompiler that reimplements framePushed()/setFramePushed()/Push()/Pop() etc, to avoid future misuse of framePushed() that wouldn't take the stackHeight into consideration. In particular, hiding the framePushedForStackHeight function would be great. But I guess this would be a huge pain to implement (b/o all the functions needed) as well as almost equivalent to the work that's done here; it would just benefit by adding another layer of abstraction solving framePushed misuses.

I am not a huge fan of the names here, but I can't really come up with anything better, and any alternatives I can think of are worse than stackHeight et al.

Cancelling review because it's complicated code and I'd like to see the comments I suggest to be added.

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +1047,5 @@
>      uint32_t varHigh_;
>  
> +#ifdef RABALDR_CHUNKY_STACK
> +    // The effective height of the frame; masm.framePushed may be greater.
> +    uint32_t        stackHeight_;

Can you precise in comment that it doesn't include the InitialChunk size? Can you also explicit the CHUNK_INVARIANT implications here?

@@ +1114,5 @@
>  
>      void endFunctionPrologue() {
> +        MOZ_ASSERT(masm.framePushed() == initialSize());
> +#ifdef RABALDR_CHUNKY_STACK
> +        MOZ_ASSERT(masm.framePushed() == localSize_ + InitialChunk);

nit: This assert is redundant with the one that's above, considered what initialSize() returns for CHUNKY_STACK.

@@ +1116,5 @@
> +        MOZ_ASSERT(masm.framePushed() == initialSize());
> +#ifdef RABALDR_CHUNKY_STACK
> +        MOZ_ASSERT(masm.framePushed() == localSize_ + InitialChunk);
> +#endif
> +        MOZ_ASSERT(initialSize() != UINT32_MAX);

I think this assert is redundant too, and doesn't do what we expect in the CHUNKY case.

@@ +1238,5 @@
>      }
>  
> +#ifdef RABALDR_CHUNKY_STACK
> +
> +# define CHUNKY_INVARIANT() \

Can you undef it at some point (end of file)?

@@ +1239,5 @@
>  
> +#ifdef RABALDR_CHUNKY_STACK
> +
> +# define CHUNKY_INVARIANT() \
> +    MOZ_ASSERT(masm.framePushed() - stackHeight_ < ChunkSize + InitialChunk)

/me mutters about underflows

Can we change this invariant to make sure the LHS isn't an underflow subtract (slightly more documented by using InitialChunk too):

MOZ_ASSERT(stackHeight_ + InitialChunk <= masm.framePushed());
then your MOZ_ASSERT, which could then be rewritten as masm.framePushed() - stackHeight_ - InitialChunk < ChunkSize, which I think better expresses the intent.

@@ +1252,5 @@
> +
> +    void popChunkyBytes(uint32_t bytes) {
> +        CHUNKY_INVARIANT();
> +        stackHeight_ -= bytes;
> +        if (masm.framePushed() - stackHeight_ >= ChunkSize + InitialChunk) {

nit: as suggested above, moving InitialChunk on the LHS better expresses intent, I think.

@@ +1256,5 @@
> +        if (masm.framePushed() - stackHeight_ >= ChunkSize + InitialChunk) {
> +            // Sometimes, popChunkyBytes() is used to pop a larger area, as when
> +            // we drop values consumed by a call, and we may need to drop
> +            // several chunks.  But never drop the initial chunk.
> +            masm.freeStack((masm.framePushed() - InitialChunk - stackHeight_) & ~(ChunkSize - 1));

nit: swap InitialChunk and stackHeight_, or add parens to group them logically?

@@ +1265,5 @@
> +#endif
> +
> +    uint32_t framePushedForHeight(uint32_t stackHeight) {
> +#ifdef RABALDR_CHUNKY_STACK
> +        return localSize_ + AlignBytes(stackHeight - localSize_, ChunkSize) + InitialChunk;

Can you comment this just a bit, something along the lines of "returns the ChunkSize-aligned frame pushed equivalent for a given stack height"?

@@ +1336,5 @@
>      }
>  
>      // Set the frame height.  This must only be called with a value returned
>      // from stackHeight().
>      void setStackHeight(uint32_t amount) {

I think I've suggested this before, but just in case: would it make sense to have a light u32 wrapper type for "value returned from stackHeight()", so the compiler could statically ensure the values are correct? (I wonder if you said no in the past because it might be inconvenient to use, i.e. reimplementing basic arithmetic operations etc)

@@ +1515,2 @@
>              masm.reserveStack(size);
> +        }

nit: revert this change

@@ +1520,5 @@
>      void freeArgAreaAndPopBytes(size_t argSize, size_t dropSize) {
> +#ifdef RABALDR_CHUNKY_STACK
> +        if (argSize)
> +            masm.freeStack(argSize);
> +        popChunkyBytes(dropSize);

Why not popChunkBytes(argSize + dropSize)? Wouldn't it break the chunky invariant if argSize is large?
Attachment #8954036 - Flags: review?(bbouvier)
Comment on attachment 8954038 [details] [diff] [review]
bug1439404-arm64-rabaldr-porting-points.patch

Review of attachment 8954038 [details] [diff] [review]:
-----------------------------------------------------------------

Looks plausible, thanks.

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +202,5 @@
> +
> +static const Register RabaldrScratchI32 = Register::FromCode(15);
> +// These cannot be:
> +//   - the assembler's ScratchFloat32Reg / ScratchDoubleReg
> +//   - registers used for parameter passing in any ABI we use

Can you static_assert a few of these requirements?

@@ +3811,5 @@
>          masm.as_mfhi(srcDest.reg);
> +# elif defined(JS_CODEGEN_ARM64)
> +        MOZ_ASSERT(reserved.isInvalid());
> +        ARMRegister sd(srcDest.reg, 64);
> +        ARMRegister r(rhs.reg, 64);

Not a huge fan of these short names. Maybe rename r to divisor at least?
Attachment #8954038 - Flags: review?(bbouvier) → review+
Attached patch stackheight-wrapper.patch (obsolete) — Splinter Review
(In reply to Benjamin Bouvier [:bbouvier] from comment #4)
> Comment on attachment 8954036 [details] [diff] [review]
> bug1439404-arm64-rabaldr-chunky-stack.patch
> 

> @@ +1336,5 @@
> >      }
> >  
> >      // Set the frame height.  This must only be called with a value returned
> >      // from stackHeight().
> >      void setStackHeight(uint32_t amount) {
> 
> I think I've suggested this before, but just in case: would it make sense to
> have a light u32 wrapper type for "value returned from stackHeight()", so
> the compiler could statically ensure the values are correct? (I wonder if
> you said no in the past because it might be inconvenient to use, i.e.
> reimplementing basic arithmetic operations etc)

I have investigated this and I'm attaching a sketch of what I think is a workable solution.  In this case we distinguish between the /stack height/, which is something we need at control flow joins, and the /stack position/, which is something we use to talk about pushed values.  stackHeight() returns a StackHeight wrapper and setStackHeight() takes one for an argument, but pushPtr() and the other pushers still operate on stack positions represented as uint32_t.  We can change the type of the exposed stack position too, but we'd want a different wrapper type, or we'd have gained very little.
Flags: needinfo?(bbouvier)
(In reply to Benjamin Bouvier [:bbouvier] from comment #4)
> Comment on attachment 8954036 [details] [diff] [review]
> bug1439404-arm64-rabaldr-chunky-stack.patch
> 
> Review of attachment 8954036 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +1239,5 @@
> >  
> > +#ifdef RABALDR_CHUNKY_STACK
> > +
> > +# define CHUNKY_INVARIANT() \
> > +    MOZ_ASSERT(masm.framePushed() - stackHeight_ < ChunkSize + InitialChunk)
> 
> /me mutters about underflows

Yeah, OK...

> Can we change this invariant to make sure the LHS isn't an underflow
> subtract (slightly more documented by using InitialChunk too):
> 
> MOZ_ASSERT(stackHeight_ + InitialChunk <= masm.framePushed());
> then your MOZ_ASSERT, which could then be rewritten as masm.framePushed() -
> stackHeight_ - InitialChunk < ChunkSize, which I think better expresses the
> intent.

How about:

  MOZ_ASSERT(masm.framePushed() >= stackHeight_ && 
             masm.framePushed() - stackHeight_ < ChunkSize + InitialChunk)

Here masm.framePushed() >= stackHeight_ is precisely the right condition for those two operands and also guards against overflow.  Then, what we're trying to assert here is that there's not too much unused memory on the stack, ergo masm.framePushed() - stackHeight_ is the correct lhs and ChunkSize + InitialChunk is the correct rhs.

To clarify I will define ChunkCutoff = ChunkSize + InitialChunk and use that a few places.

> @@ +1256,5 @@
> > +        if (masm.framePushed() - stackHeight_ >= ChunkSize + InitialChunk) {
> > +            // Sometimes, popChunkyBytes() is used to pop a larger area, as when
> > +            // we drop values consumed by a call, and we may need to drop
> > +            // several chunks.  But never drop the initial chunk.
> > +            masm.freeStack((masm.framePushed() - InitialChunk - stackHeight_) & ~(ChunkSize - 1));
> 
> nit: swap InitialChunk and stackHeight_

Nice.
 
> @@ +1265,5 @@
> > +#endif
> > +
> > +    uint32_t framePushedForHeight(uint32_t stackHeight) {
> > +#ifdef RABALDR_CHUNKY_STACK
> > +        return localSize_ + AlignBytes(stackHeight - localSize_, ChunkSize) + InitialChunk;
> 
> Can you comment this just a bit, something along the lines of "returns the
> ChunkSize-aligned frame pushed equivalent for a given stack height"?

I will.  I also think that computation is incorrect.  I think this is closer:

 return stackHeight <= initialSize()
        ? initialSize()
        : initialSize() + AlignBytes(stackHeight - initialSize(), ChunkSize);

but I'll need to experiment some.

> @@ +1520,5 @@
> >      void freeArgAreaAndPopBytes(size_t argSize, size_t dropSize) {
> > +#ifdef RABALDR_CHUNKY_STACK
> > +        if (argSize)
> > +            masm.freeStack(argSize);
> > +        popChunkyBytes(dropSize);
> 
> Why not popChunkBytes(argSize + dropSize)? Wouldn't it break the chunky
> invariant if argSize is large?

It's because the frame does not really know about the outgoing arguments area.  A call is set up by a direct invocation of masm.reserveStack() following by an ABI iter copying of values into that area.  This freeStack() is the mate for the reserveStack performed elsewhere, this is actually kind of gross but since in the non-chunky case we can perform a single freeStack, I thought it was worthwhile to break the abstraction boundary.  I'll add a comment about this.
On further thought, it's possible the chunky invariant can be even tighter, so I'll look into that.
Patch looks plausible. How do you feel about it? Did you find that it helped you in any way or was it just more boilerplate?
Flags: needinfo?(bbouvier)
(In reply to Benjamin Bouvier [:bbouvier] from comment #9)
> Patch looks plausible. How do you feel about it? Did you find that it helped
> you in any way or was it just more boilerplate?

Usually I don't care too much about wrappers like these but in this case I'm more inclined to go with it, since the StackHeight is exclusively a datum exported from and passed back into the stack abstraction.  In principle it could be represented as a pointer into stack storage and this would still work.  (Not proposing that!!)  It's plausibly even a "stack mark", not a "stack height".

I'll fold it into the patch I think.
Drive-by fix porting a change from the ARM simulator to the ARM64 simulator.
Attachment #8957488 - Flags: review?(bbouvier)
Attachment #8956435 - Attachment is obsolete: true
I think this addresses all comments.  It also fixes some bugs that were discovered as a result of the review and testing on more complicated code (tanks + zen garden); the stack invariant is more intuitive now.

I also managed to hide the "height" property of the StackHeight structure except inside the BaseStackFrame, so now StackHeight really is an opaque value to the compiler.
Attachment #8954036 - Attachment is obsolete: true
Attachment #8957490 - Flags: review?(bbouvier)
Attachment #8957488 - Flags: review?(bbouvier) → review+
Comment on attachment 8957490 [details] [diff] [review]
bug1439404-arm64-rabaldr-chunky-stack-v2.patch

Review of attachment 8957490 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. The new invariant is really what I would have unconsciously expected the first time I saw this patch. Thanks!

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +1063,5 @@
>      // High byte offset + 1 of local area for true locals.
>      uint32_t varHigh_;
>  
> +#ifdef RABALDR_CHUNKY_STACK
> +    // The current logical height of the frame, ie the sum of space for locals

nit: i.e.

@@ +1068,5 @@
> +    // and space for what is currently pushed.  The allocated size of the frame
> +    // -- provided by masm.framePushed() -- is usually larger than stackHeight_,
> +    // notably at the beginning of execution when we've allocated InitialChunk
> +    // extra space.
> +    uint32_t stackHeight_;

light suggestion: I really like to think of stackHeight as you describe it above: as a logical frame pushed. Do you think logicalFramePushed would be a better name? (the set of {framePushed,stackHeight,stackPosition,initialSize} make it hard to distinguish how each of them is different from the others)

@@ +1179,4 @@
>  
>      // The fixed amount of memory, in bytes, allocated on the stack below the
>      // Frame for purposes such as locals and other fixed values.  Includes all
>      // necessary alignment.

pre-existing: are there really "other fixed values", or can this be removed from this comment?

@@ +1180,5 @@
>      // The fixed amount of memory, in bytes, allocated on the stack below the
>      // Frame for purposes such as locals and other fixed values.  Includes all
>      // necessary alignment.
>  
>      uint32_t initialSize() const {

light suggestion: how about renaming fixedFrameSize()? or fixedInitialSize()? or localsAndFixedSize()? or fixedSlotsSize()? or even fixedSize()?

@@ +1258,5 @@
> +
> +# define CHUNKY_INVARIANT()                                             \
> +    MOZ_ASSERT(masm.framePushed() >= stackHeight_);                     \
> +    MOZ_ASSERT(masm.framePushed() == initialSize() ||                   \
> +               masm.framePushed() - stackHeight_ < ChunkSize)
Attachment #8957490 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #13)
> Comment on attachment 8957490 [details] [diff] [review]
> bug1439404-arm64-rabaldr-chunky-stack-v2.patch
> 
> Review of attachment 8957490 [details] [diff] [review]:
> -----------------------------------------------------------------
> @@ +1179,4 @@
> >  
> >      // The fixed amount of memory, in bytes, allocated on the stack below the
> >      // Frame for purposes such as locals and other fixed values.  Includes all
> >      // necessary alignment.
> 
> pre-existing: are there really "other fixed values", or can this be removed
> from this comment?

The tls slot is one such "fixed value" at the moment.
(In reply to Benjamin Bouvier [:bbouvier] from comment #13)
> Comment on attachment 8957490 [details] [diff] [review]
> bug1439404-arm64-rabaldr-chunky-stack-v2.patch
> 
> Review of attachment 8957490 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +1068,5 @@
> > +    // and space for what is currently pushed.  The allocated size of the frame
> > +    // -- provided by masm.framePushed() -- is usually larger than stackHeight_,
> > +    // notably at the beginning of execution when we've allocated InitialChunk
> > +    // extra space.
> > +    uint32_t stackHeight_;
> 
> light suggestion: I really like to think of stackHeight as you describe it
> above: as a logical frame pushed. Do you think logicalFramePushed would be a
> better name? (the set of {framePushed,stackHeight,stackPosition,initialSize}
> make it hard to distinguish how each of them is different from the others)

I agree this is pretty messy.  I will do this:

Rename stackHeight_ => currentFramePushed_
Rename stackPosition() => currentFramePushed()
Rename maxStackHeight_ => maxFramePushed_ and clarify that its value is relative to currentFramePushed()

I will keep StackHeight as it is as it is for external consumption; we can revisit later.

I will try to do something about initialSize() but I haven't decided what, yet.  Certainly something less generic.
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5014a6b01aea
Wasm on ARM64: gating.  r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/56d7b668df45
ARM64 simulator bugfix, add missing guard.  r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/27bc0a49fe15
Wasm baseline, support the chunky ARM64 stack.  r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/32fdbeda9305
Wasm baseline, fill in the porting APIs for ARM64.  r=bbouvier
Depends on: 1526579
Depends on: 1535848
You need to log in before you can comment on or make changes to this bug.