ARM64: Import Architecture and Assembler

RESOLVED FIXED in Firefox 41

Status

()

Core
JavaScript Engine: JIT
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: sstangl, Assigned: sstangl)

Tracking

unspecified
mozilla41
Points:
---

Firefox Tracking Flags

(firefox41 fixed)

Details

Attachments

(1 attachment)

(Assignee)

Description

3 years ago
Created attachment 8607807 [details] [diff] [review]
0001-Define-ARM64-architecture-and-assembler.patch

This patch imports Architecture-arm64, Assembler-arm64 (based off the vixl::Assembler from Bug 1160672), and MacroAssembler-arm64.

The intention is to land these patches independent of the other ARM64 patches ASAP, then hook them all together with a build system patch at the end.
Attachment #8607807 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8607807 [details] [diff] [review]
0001-Define-ARM64-architecture-and-assembler.patch

Review of attachment 8607807 [details] [diff] [review]:
-----------------------------------------------------------------

I will start reviewing tomorrow.
Comment on attachment 8607807 [details] [diff] [review]
0001-Define-ARM64-architecture-and-assembler.patch

Review of attachment 8607807 [details] [diff] [review]:
-----------------------------------------------------------------

(Architecture-arm64 review)

::: js/src/jit/arm64/Architecture-arm64.cpp
@@ +25,5 @@
> +        return fp;
> +    if (strcmp(name, "x30") == 0) // Default name "lr"
> +        return x30;
> +    if (strcmp(name, "x31") == 0) // Default name "sp"
> +        return sp;

GetName returns "lr" and "sp", is there any reasons to match for these?

::: js/src/jit/arm64/Architecture-arm64.h
@@ +76,5 @@
> +        w27 = 27, x27 = 27,
> +        w28 = 28, x28 = 28,
> +        w29 = 29, x29 = 29, fp = 29,
> +        w30 = 30, x30 = 30, lr = 30,
> +        w31 = 31, x31 = 31, wzr = 31, xzr = 31, sp = 31, // Special: both stack pointer and a zero register.

Based on the special nature of the register the documentation excludes this register from the x* and w* registers.
Wouldn't it be better to not provide a x* and w* name either?

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0801b/BABJFBJG.html

@@ +130,5 @@
> +    static const SetType ArgRegMask =
> +        (1 << Registers::x0) | (1 << Registers::x1) |
> +        (1 << Registers::x2) | (1 << Registers::x3) |
> +        (1 << Registers::x4) | (1 << Registers::x5) |
> +        (1 << Registers::x6) | (1 << Registers::x7);

x8 is the out-param register, in case the returned value is too large, x8 should be a pointer to this location.

@@ +144,5 @@
> +        (1 << Registers::x13) | (1 << Registers::x14) |
> +        (1 << Registers::x14) | (1 << Registers::x15);
> +
> +    static const SetType NonVolatileMask =
> +                                (1 << Registers::x16) |

This does not seems to match the documentation,

    A subroutine invocation must preserve the contents of the registers r19-r29 and SP

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0055b/index.html

@@ +176,5 @@
> +
> +    // Registers returned from a JS -> C call.
> +    static const SetType CallMask =
> +        (1 << Registers::x0) |
> +        (1 << Registers::x1); // Used for double-sized returns.

Only one register is required for 64bits registers.
The same for JSCallMask.
Comment on attachment 8607807 [details] [diff] [review]
0001-Define-ARM64-architecture-and-assembler.patch

Review of attachment 8607807 [details] [diff] [review]:
-----------------------------------------------------------------

(Assembler-arm64 review: 3/4)

::: js/src/jit/arm64/Assembler-arm64.cpp
@@ +61,5 @@
> +    return current_;
> +}
> +
> +const Register ABIArgGenerator::NonArgReturnReg0 = r8;
> +const Register ABIArgGenerator::NonArgReturnReg1 = r9;

Isn't that just r8, which is used as a pointer to the stack space which is reserved for the out-param?

@@ +115,5 @@
> +        // Each jump entry is of the form:
> +        //   LDR ip0 [PC, 8]
> +        //   BR ip0
> +        //   [Patchable 8-byte constant low bits]
> +        //   [Patchable 8-byte constant high bits]

nit: Mention that these instructions are made patchable by mapping these instruction with a JumpTableEntry structure.

@@ +118,5 @@
> +        //   [Patchable 8-byte constant low bits]
> +        //   [Patchable 8-byte constant high bits]
> +        DebugOnly<size_t> preOffset = size_t(armbuffer_.nextOffset().getOffset());
> +
> +        ldr(vixl::ip0, ptrdiff_t(2));

What is this 2?  shouldn't this be something like  8 / sizeof(Instruction) ?

@@ +119,5 @@
> +        //   [Patchable 8-byte constant high bits]
> +        DebugOnly<size_t> preOffset = size_t(armbuffer_.nextOffset().getOffset());
> +
> +        ldr(vixl::ip0, ptrdiff_t(2));
> +        br(vixl::ip0);

Why do we mix different type of assembler function with different type for arguments?
As we are apparently doing an implementation inheritance, I would prefer if we could mix it cleanly in our assembler, such that we use ip0 instead of vixl::ip0, even if we have to provide an implicit case from Register to ARMRegister.

@@ +163,5 @@
> +                branch->SetImmPCOffsetTarget(target);
> +            } else {
> +                JumpTableEntry* entry = &extendedJumpTable[i];
> +                branch->SetImmPCOffsetTarget(entry->getLdr());
> +                entry->data = target;

Nice :)

@@ +167,5 @@
> +                entry->data = target;
> +            }
> +        } else {
> +            // Currently a two-instruction call, it should be possible to optimize this
> +            // into a single instruction call + nop in some instances, but this will work.

We should patch the literal value of the target?

@@ +177,5 @@
> +Assembler::immPool(ARMRegister dest, uint8_t* value, vixl::LoadLiteralOp op, ARMBuffer::PoolEntry* pe)
> +{
> +    uint32_t inst = op | Rt(dest);
> +    const size_t numInst = 1;
> +    const unsigned numPoolEntries = 2; // Each pool entry is 32 bits, so this stores 64.

define a constant name sizeOfPoolEntryInBits = 32, and write  64 / sizeOfPoolEntryInBits

@@ +192,5 @@
> +Assembler::immPool64Branch(RepatchLabel* label, ARMBuffer::PoolEntry* pe, Condition c)
> +{
> +    MOZ_CRASH("immPool64Branch");
> +
> +    #if 0

Remove dead code

@@ +214,5 @@
> +Assembler::fImmPool(ARMFPRegister dest, uint8_t* value, vixl::LoadLiteralOp op)
> +{
> +    uint32_t inst = op | Rt(dest);
> +    const size_t numInst = 1;
> +    const unsigned numPoolEntries = dest.size() / 32; // Each pool entry is 32 bits.

Use sizeOfPoolEntryInBits here.

@@ +338,5 @@
> +
> +void
> +PatchJump(CodeLocationJump& jump_, CodeLocationLabel label)
> +{
> +    MOZ_CRASH("PatchJump -- incomplete, untested code.");

dead code?

@@ +512,5 @@
> +    // If the jump is within the code buffer, it uses the extended jump table.
> +    if (target >= code->raw() && target < code->raw() + code->instructionsSize()) {
> +        MOZ_ASSERT(target + Assembler::SizeOfJumpTableEntry <= code->raw() + code->instructionsSize());
> +
> +        uint8_t** patchablePtr = (uint8_t**)(target + Assembler::OffsetOfJumpTableEntryPointer);

Use the JumpTableEntry structure instead.

::: js/src/jit/arm64/Assembler-arm64.h
@@ +403,5 @@
> +
> +    struct JumpTableEntry
> +    {
> +        uint32_t ldr;
> +        uint32_t br;

Add a const to these fields, such that we never change them accidentally.

@@ +507,5 @@
> +    static const Register NonArgReturnReg1;
> +    static const Register NonVolatileReg;
> +    static const Register NonArg_VolatileReg;
> +    static const Register NonReturn_VolatileReg0;
> +    static const Register NonReturn_VolatileReg1;

What are these used for?
(Assignee)

Comment 4

3 years ago
(In reply to Nicolas B. Pierron [:nbp] from comment #2)
> Comment on attachment 8607807 [details] [diff] [review]
> 0001-Define-ARM64-architecture-and-assembler.patch
> 
> Review of attachment 8607807 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> @@ +144,5 @@
> > +        (1 << Registers::x13) | (1 << Registers::x14) |
> > +        (1 << Registers::x14) | (1 << Registers::x15);
> > +
> > +    static const SetType NonVolatileMask =
> > +                                (1 << Registers::x16) |
> 
> This does not seems to match the documentation,
> 
>     A subroutine invocation must preserve the contents of the registers
> r19-r29 and SP
> 
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0055b/index.
> html

Doesn't that match the documentation, though? AFAICT VolatileMask is for caller-save registers, and NonVolatileMask is for callee-save register excluding sp. For example, on x86, VolatileMask is {rax, rcx, rdx}.
(Assignee)

Comment 5

3 years ago
(In reply to Nicolas B. Pierron [:nbp] from comment #3)
> Comment on attachment 8607807 [details] [diff] [review]
> 0001-Define-ARM64-architecture-and-assembler.patch
> 
> Review of attachment 8607807 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> (Assembler-arm64 review: 3/4)
> 
> ::: js/src/jit/arm64/Assembler-arm64.cpp
> @@ +61,5 @@
> > +    return current_;
> > +}
> > +
> > +const Register ABIArgGenerator::NonArgReturnReg0 = r8;
> > +const Register ABIArgGenerator::NonArgReturnReg1 = r9;
> 
> Isn't that just r8, which is used as a pointer to the stack space which is
> reserved for the out-param?

The only use of NonArgReturnReg1 is as a scratch register in some AsmJS code. Suggest disregarding the name.

> @@ +119,5 @@
> > +        //   [Patchable 8-byte constant high bits]
> > +        DebugOnly<size_t> preOffset = size_t(armbuffer_.nextOffset().getOffset());
> > +
> > +        ldr(vixl::ip0, ptrdiff_t(2));
> > +        br(vixl::ip0);
> 
> Why do we mix different type of assembler function with different type for
> arguments?
> As we are apparently doing an implementation inheritance, I would prefer if
> we could mix it cleanly in our assembler, such that we use ip0 instead of
> vixl::ip0, even if we have to provide an implicit case from Register to
> ARMRegister.

Usually this is not necessary because all VIXL registers are imported into the js::jit namespace, with the exception of ip0 and ip1. In general this suggestion probably would not be a good idea, since our registers are of untyped size, and VIXL registers require size annotations, which can differ on a per-instruction basis.

I changed "vixl::ip0" to just "x16" here, but I like it much less.

> @@ +163,5 @@
> > +                branch->SetImmPCOffsetTarget(target);
> > +            } else {
> > +                JumpTableEntry* entry = &extendedJumpTable[i];
> > +                branch->SetImmPCOffsetTarget(entry->getLdr());
> > +                entry->data = target;
> 
> Nice :)
> 
> @@ +167,5 @@
> > +                entry->data = target;
> > +            }
> > +        } else {
> > +            // Currently a two-instruction call, it should be possible to optimize this
> > +            // into a single instruction call + nop in some instances, but this will work.
> 
> We should patch the literal value of the target?

It turns out that the jump table isn't even used. Marty was planning to replace it with loads from nearby literal pools.
(In reply to Sean Stangl [:sstangl] from comment #5)
> I changed "vixl::ip0" to just "x16" here, but I like it much less.

I agree ip0 is better naming than x16, but I wish we could remove the vixl prefix.
Comment on attachment 8607807 [details] [diff] [review]
0001-Define-ARM64-architecture-and-assembler.patch

Review of attachment 8607807 [details] [diff] [review]:
-----------------------------------------------------------------

(Assembler-arm64 done, still 4000 lines to do)

::: js/src/jit/arm64/Assembler-arm64.cpp
@@ +554,5 @@
> +            Value v = IMPL_TO_JSVAL(layout);
> +            TraceManuallyBarrieredEdge(trc, &v, "ion-masm-value");
> +            *word = JSVAL_TO_IMPL(v).asBits;
> +
> +            // TODO: When we can, flush caches here.

nit: We only have to flush caches if one of the pointer of this section got moved.

@@ +560,5 @@
> +        }
> +
> +        // No barriers needed since the pointers are constants.
> +        TraceManuallyBarrieredGenericPointerEdge(trc, reinterpret_cast<gc::Cell**>(literalAddr),
> +                                                 "ion-masm-ptr");

nit: And we should probably flush caches here too.

@@ +583,5 @@
> +
> +    while (reader.more()) {
> +        size_t offset = reader.readUnsigned();
> +        Instruction* ins = (Instruction*)(buffer + offset);
> +        uintptr_t* word_ptr = ins->LiteralAddress<uintptr_t*>();

Why don't we have the same logic as in ::TraceDataRelocations?

@@ +593,5 @@
> +        JSObject* obj = nurseryObjects[index];
> +        *word_ptr = uintptr_t(obj);
> +
> +                // Either all objects are still in the nursery, or all objects are
> +        // tenured.

nit: fix comment indentation.

@@ +599,5 @@
> +
> +        if (!hasNurseryPointers && IsInsideNursery(obj))
> +            hasNurseryPointers = true;
> +
> +    }

nit: Remove extra blank line before closing curly brace.

::: js/src/jit/arm64/Assembler-arm64.h
@@ +288,5 @@
> +    int labelOffsetToPatchOffset(int labelOff) {
> +        return actualOffset(labelOff);
> +    }
> +    static uint8_t* PatchableJumpAddress(JitCode* code, uint32_t index) {
> +        return code->raw() + index;

nit: MOZ_CRASH("huh?");  ?


or make a separate patch to add /rename this on all platform and use it here:
https://dxr.mozilla.org/mozilla-central/source/js/src/jit/IonCaches.cpp#59

::: js/src/jit/arm64/MacroAssembler-arm64.cpp
@@ +15,5 @@
> +namespace js {
> +namespace jit {
> +
> +void
> +MacroAssembler::PushRegsInMask(LiveRegisterSet set)

I did a lot of re-organizations of these functions, such that we can find the MacroAssembler:: function easily.
Please try to follow the same locations as with other architecture, meaning that these functions should be at the bottom under the large comment saying "Stack functions".

@@ +25,5 @@
> +        vixl::CPURegister src3 = vixl::NoCPUReg;
> +
> +        src0 = ARMRegister(*iter, 64);
> +        adjustFrame(8);
> +        ++iter;

nit: swap these 2 lines to make this consistent with the 3 below.

@@ +109,5 @@
> +
> +        if (!iter.more() || ignore.has(*iter)) {
> +            // There is no 'next' that can be loaded, and there is already one
> +            // element in the queue, just deal with that element.
> +            Ldr(src0, MemOperand(GetStackPointer64(), offset));

nit: s/src/dst/ ?

@@ +120,5 @@
> +        ++iter;
> +
> +        MOZ_ASSERT(!src0.Is(src1));
> +        ldp(src0, src1, MemOperand(GetStackPointer64(), offset));
> +    }

What do you think about:

    vixl::CPURegister src[2] = { vixl::NoCPUReg, vixl::NoCPUReg };
    size_t srcCount = 0;
    for (FloatRegisterIterator iter(fset); iter.more(); ++iter) {
        // Pack multiple registers into one load instruction.
        nextOffset += sizeof(double);
        if (!ignore.has(*iter))
            src[srcCount++] = ARMFPRegister(*iter);

        // Load the registers if we are full, or if they are not contiguous in memory.
        if (srcCount == 2) {
            ldp(src[0], src[1], MemOperand(GetStackPointer64(), offset));
            offset = nextOffset;
            srcCount = 0;
        } else if ((ignore.has(*iter) || !iter.more()) && srcCount == 1) {
            Ldr(src[0], MemOperand(GetStackPointer64(), offset));
            offset = nextOffset;
            srcCount = 0;
        } else if (ignore.has(*iter)) {
            offset = nextOffset;
        }
     }
     MOZ_ASSERT(srcCount == 0);

The same could be done for PushRegsInMask, which would make this code more symmetrical.
Comment on attachment 8607807 [details] [diff] [review]
0001-Define-ARM64-architecture-and-assembler.patch

Review of attachment 8607807 [details] [diff] [review]:
-----------------------------------------------------------------

(MacroAssembler-arm64 done)

::: js/src/jit/arm64/MacroAssembler-arm64.cpp
@@ +152,5 @@
> +        // There is both more, and it isn't being ignored.
> +        src1 = ARMRegister(*iter, 64);
> +        ++iter;
> +        nextOffset += sizeof(double);
> +        ldp(src0, src1, MemOperand(GetStackPointer64(), offset));

Any reason not to use Pop with 4 arguments instead of Ldr, and bump the stack pointer manually when needed?

@@ +229,5 @@
> +                                 (uint8_t*)&instructionScratch, literalAddr);
> +}
> +
> +void
> +MacroAssemblerCompat::handleFailureWithHandlerTail(void* handler)

If I understand correctly we have no support for sps? How do we forbid that in the generic Macro Assembler?

@@ +237,5 @@
> +    Sub(GetStackPointer64(), GetStackPointer64(), Operand(size));
> +    if (!GetStackPointer64().Is(sp))
> +        Add(sp, GetStackPointer64(), Operand(0));
> +
> +    Add(x0, GetStackPointer64(), Operand(0));

Use AddPtr instead of Add.
use Imm32 instead of Operand

@@ +342,5 @@
> +    // Unhandled for sp -- needs slightly different logic.
> +    MOZ_ASSERT(!GetStackPointer64().Is(sp));
> +
> +    // Remember the stack address on entry.
> +    Add(scratch64, GetStackPointer64(), Operand(0));

nit: Use Imm32() or similar instead of Operand()

@@ +346,5 @@
> +    Add(scratch64, GetStackPointer64(), Operand(0));
> +
> +    // Make alignment, including the effective push of the previous sp.
> +    Sub(GetStackPointer64(), GetStackPointer64(), Operand(8));
> +    And(GetStackPointer64(), GetStackPointer64(), Operand(alignment));

nit: same here.

::: js/src/jit/arm64/MacroAssembler-arm64.h
@@ +791,5 @@
> +    void movePtr(Register src, Register dest) {
> +        Mov(ARMRegister(dest, 64), ARMRegister(src, 64));
> +    }
> +    void movePtr(ImmWord imm, Register dest) {
> +        Mov(ARMRegister(dest, 64), (int64_t)imm.value);

nit: use C++ cast,  int64_t(imm.value)

@@ +1045,5 @@
> +    // StackPointer manipulation.
> +    template <typename T>
> +    void addToStackPtr(T t) { addPtr(t, getStackPointer()); }
> +    template <typename T>
> +    void addStackPtrTo(T t) { addPtr(getStackPointer(), t); }

Would there be a proper way to assert that we are not using the stack pointer as argument of addPtr without being under this scope?
I am thinking of something like:

   void addPtr(Register, Register, StackPointerContext* = nullptr)

where StackPointerContext is a private phantom type of the Macro Assembler?


This way we would be able to assert within addPtr that we are not supposed to use the stack pointer.

@@ +2289,5 @@
> +                tst(ScratchReg64, Operand(-1ll << JSVAL_TAG_SHIFT));
> +                return cond;
> +            }
> +
> +            MOZ_CRASH("NYI: non-equality comparisons");

invert the condition above in order to reduce the indentation level.

@@ +2497,5 @@
> +        vixl::UseScratchRegisterScope temps(this);
> +        const Register scratch = temps.AcquireX().asUnsized();
> +        MOZ_ASSERT(src.base != scratch);
> +        MOZ_ASSERT(src.index != scratch);
> +        splitTag(src, scratch);

These functions are rewritten multiple, and multiple times, with the major difference being the assertions which are executed.  Could we factor out these assertions?
Then, after, we can use templates instantiation later in this file, in order to avoid redundant/error-prone definitions.

@@ +2833,5 @@
> +
> +    void callAndPushReturnAddress(Label* label);
> +
> +    void profilerEnterFrame(Register framePtr, Register scratch) {
> +        AbsoluteAddress activation(GetJitContext()->runtime->addressOfProfilingActivation());

I do not understand, the presence of this code sounds inconsistent with the lack of sps check in HandleFailureWithHandlerTail.
Shouldn't we just MOZ_ASSERT if we never planned into supporting SPS with AArch64?

@@ +3091,5 @@
> +        // Unfortunately, we can't forbid pool prevention, because we're trying
> +        // to add an entry to a pool. So as a temporary fix, just flush the pool
> +        // now, so that it won't add later. If you're changing this, also
> +        // check ToggleCall(), which will probably break.
> +        armbuffer_.flushPool();

Wouldn't it be better to flush the pool before syncStackPtr(), knowing that the code which is below use this instruction encoding?

@@ +3116,5 @@
> +        int ret = 8;
> +        Instruction* cur = (Instruction*)code;
> +        uint32_t* curw = (uint32_t*)code;
> +        // oh god, this is bad, just hard-code the stack sync instruction
> +        if (*curw == 0x9100039f) {

static const stackSync = …; // mov r28, sp ?

@@ +3215,5 @@
> +        const ARMRegister scratch64 = temps.AcquireX();
> +
> +        Mov(scratchAddr64, uint64_t(dest.addr));
> +        ldr(scratch64, MemOperand(scratchAddr64, 0));
> +        add(scratch64, scratch64, Operand(1));

nit: Imm32()?
Comment on attachment 8607807 [details] [diff] [review]
0001-Define-ARM64-architecture-and-assembler.patch

Review of attachment 8607807 [details] [diff] [review]:
-----------------------------------------------------------------

As commented, there are many remarks, questions, clean-up, refactoring which would be nice to answer/do.
I do not see any blocking reason which might justify a second review.
Attachment #8607807 - Flags: review?(nicolas.b.pierron) → review+
(Assignee)

Updated

3 years ago
Assignee: nobody → sstangl

Comment 10

3 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/2d25727bb45e
https://hg.mozilla.org/mozilla-central/rev/2d25727bb45e
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox41: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
You need to log in before you can comment on or make changes to this bug.