Closed Bug 693449 Opened 13 years ago Closed 13 years ago

IonMonkey: More ARM improvements

Categories

(Core :: JavaScript Engine, defect)

ARM
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: mjrosenb, Assigned: mjrosenb)

Details

Attachments

(4 files, 1 obsolete file)

Most things that I have tested seem to work.
Current things that I know need to be implemented:
Function calls
Pools (Sadly, we *really* want these for loading doubles)
In general, handling immediates that cannot fit into a single instruction
ability to patch code after finalization.
*rename toInt() to encode()
*rename writeBlob() to writeInst()
*move most functions from {Macro,}Assembler-arm.h to {Macro,}Assembler.cpp
and create MacroAssembler-arm.cpp, since it did not previously exist.
Attachment #566033 - Flags: review?(Jacob.Bramley)
Comment on attachment 566079 [details] [diff] [review]
Implement all of the simple, far reaching changes discussed in 690053

Most of this work was accomplished via emacs macros. Notably, I never bothered to strip out the comments from the header files.  I suspect the comments delimiting sections should stay, and others should be stripped
(or the comments should not exist, and the declarations should be grouped using some syntatic construct?)
Attachment #566079 - Flags: review?(Jacob.Bramley)
Comment on attachment 566033 [details] [diff] [review]
Improve on last week's patch of the ARM/IM code

Review of attachment 566033 [details] [diff] [review]:
-----------------------------------------------------------------

Mostly fine, but one or two questions.

::: .hgignore
@@ +11,5 @@
>  ^\.sw.$
>  .[^/]*\.sw.$
>  
> +(^|/)#*#
> +(^|/).#*#

What are these? Emacs swap files? Comment please!

::: js/src/ion/arm/Architecture-arm.h
@@ +64,5 @@
>  // These offsets are related to bailouts.
>  ////
>  
>  // Size of each bailout table entry. On arm, this is at most a ldr, then a branch
> +static const uint32 BAILOUT_TABLE_ENTRY_SIZE    = 4;

The size no longer matches up with the comment. LDR+B is 8 bytes. What's going on here?

::: js/src/ion/arm/Assembler-arm.cpp
@@ +341,5 @@
>  
>  ALUOp
>  ion::ALUNeg(ALUOp op, Imm32 *imm)
>  {
> +    return op_invalid;

Did you mean to leave this in here?

::: js/src/ion/arm/Assembler-arm.h
@@ +199,5 @@
> +    }
> +    struct Serial;
> +    Serial encode();
> +    // for serializing values
> +    struct Serial {

I don't like the name of this. It is a very specialized struct with a generic name. How about VFPRegIndexSplit?

@@ +482,3 @@
>          return imm4L | (imm4H << 16);
>      };
> +    Imm8VFPImmData() : imm4L(-1), imm4H(-1), isInvalid(-1) {}

Why do you need a default constructor?

@@ +733,2 @@
>      VFPOffImm(uint32 imm)
> +        : VFPOff(datastore::Imm8VFPOffData(imm >> 2), imm < 0 ? IsDown : IsUp) {}

I'd add ASSERT(!(imm & ~0x3)) to the content.

@@ +1209,5 @@
>  
>      // VFP instructions!
> +    enum vmov_op {
> +        opv_dbl = 0xC000A10,
> +        opv_sng = 0xC000A10

Are these supposed to be identical? Also, you've only given 7 nibbles. If you meant 0x0C000A10, it's best to write it that way to avoid confusion.

@@ +1225,5 @@
>          JS_ASSERT(vd.equiv(vn) && vd.equiv(vm));
> +        vfp_size sz = isDouble;
> +        if (!vd.isDouble()) {
> +            sz = isSingle;
> +        }

It's personal preference, but I think this would be neater:
vfp_size sz = (vd.isDouble()) ? (isDouble) : (isSingle);
(Also seen elsewhere.)

@@ +1296,5 @@
>  
>      // specifically, a move between two same sized-registers
>      void as_vmov(VFPRegister vd, VFPRegister vsrc, Condition c = Always)
>      {
> +        as_vfp_float(vd, NoVFPRegister, vsrc, opv_add, c);

opv_add?

@@ +1306,5 @@
>      };
> +
> +    enum VFPXferSize {
> +        WordTransfer   = 0x0E000A10,
> +        DoubleTransfer = 0x0C400A10

These are just single bits combined with opv_dbl and opv_sng. Ideally, you should express them in terms of opv_(dbl|sng) or just express the single bits and combine them in as_vxfer.

@@ +1320,5 @@
>                    Condition c = Always)
>      {
> +        vfp_size sz = isSingle;
> +        if (vm.isDouble()) {
> +            // Technically, this can be done with a vmov à la A8-644/A8-646

What are A8-644 and A8-646?

@@ +1325,5 @@
> +            // however, that requires at least an extra bit saying if the
> +            // operation should be performed on the lower or upper half of the
> +            // double.  Moving a single to/from 2N/2N+1 isn't equivalent,
> +            // since there are 32 single registers, and 32 double registers
> +            // so there is no way to encode the last 16 double registers.

For the record, you can also VMOV two adjacent but unaligned S registers.

@@ +1367,5 @@
> +        vfp_size sz = isDouble;
> +        if (!vd.isDouble()) {
> +            sz = isSingle;
> +        } else {
> +            length = length << 1;

"length *= 2" is clearer, in my opinion.

@@ +1381,5 @@
> +        vfp_size sz = isDouble;
> +        if (!vd.isDouble()) {
> +            // totally do not know how to handle this right now
> +            sz = isSingle;
> +            JS_NOT_REACHED("non-double immediate");

See here: http://code.google.com/p/v8/source/browse/branches/bleeding_edge/src/arm/assembler-arm.cc#1908
or my original here (for a month): http://pastebin.mozilla.org/1353337

That is actually my code (with ARM copyright) so you can use it under the tri-license without any licensing mess. I wrote it for NanoJIT but never integrated it. I gave it to some V8 contributors at ARM, which is how it ended up there.

@@ +1605,5 @@
> +            ret = (ret << 1) | b;
> +        return ret;
> +    }
> +    uint32 encode(uint8 value) {
> +        //ARM ARM A7-25

Page references to the ARM ARM are (unfortunately) a bad idea. Things might get moved around in future revisions and suchlike.

::: js/src/ion/arm/CodeGenerator-arm.cpp
@@ +96,5 @@
> +        masm.ma_b(ifTrue->label(), cond);
> +    } else {
> +        masm.ma_b(ifFalse->label(), Assembler::InvertCondition(cond));
> +        if (!isNextBlock(ifTrue))
> +            masm.ma_b(ifTrue->label());

Curly brackets. Also, this would be an ideal place to flush the literal pool (once we get to that). To stop it happening too often, a 'MaybeFlushLiteralPool' function would be handy.

@@ +130,5 @@
>  
>      // Test the operand
> +    // jbramley says that cmp is more efficent than tst.
> +    // this can be replaced with cmp opd, 0.
> +    masm.ma_tst(ToRegister(opd), ToRegister(opd));

He does! Why did you revert this back to ma_tst?

@@ +310,5 @@
> +
> +    if (rhs->isConstant())
> +        masm.ma_sub(ToRegister(lhs), Imm32(ToInt32(rhs)), ToRegister(dest));
> +    else
> +        masm.ma_sub(ToRegister(lhs), ToOperand(rhs), ToRegister(dest));

Curly brackets. (I'm not going to mention them elsewhere, but there are several other missing ones.)

@@ +492,5 @@
> +    const LAllocation *lhs = ins->getOperand(0);
> +    const LAllocation *rhs = ins->getOperand(1);
> +    const LDefinition *dest = ins->getDef(0);
> +
> +    switch (ins->bitop()) {

In all of these, you actually need to explicitly AND the rhs register with 0x1f, for the same reason that you do for the immediates. (That is, unless JSOP_*SH guarantees that it won't give shifts larger than 0x1f.)

@@ +512,5 @@
> +                masm.ma_asr(Imm32(ToInt32(rhs) & 0x1F), ToRegister(lhs), ToRegister(dest));
> +            else
> +                masm.ma_asr(ToRegister(rhs), ToRegister(lhs), ToRegister(dest));
> + 
> +            // Note: this is an unsigned operation.

ASR (arithmetic shift right) is surely a signed operation. Should you have used LSR here, and ASR for the JSOP_RSH case?

@@ +848,5 @@
>      const LConstantIndex *cindex = ins->getOperand(0)->toConstantIndex();
>      const Value &v = graph.getConstant(cindex->index());
>  
>      jsdpun dpun;
>      dpun.d = v.toDouble();

Type punning through a union is a GCC extension. Using memcpy is always better.

@@ +849,5 @@
>      const Value &v = graph.getConstant(cindex->index());
>  
>      jsdpun dpun;
>      dpun.d = v.toDouble();
> +    if ((dpun.u64 &0xffffffff) == 0) {

Put a space after '&'.

@@ +850,5 @@
>  
>      jsdpun dpun;
>      dpun.d = v.toDouble();
> +    if ((dpun.u64 &0xffffffff) == 0) {
> +        VFPImm dblEnc(dpun.u64 >> 32);

It looks like you really want a punning destination with two 32-bit words. That would probably generate better code, too.

@@ +922,5 @@
> +    uint32 argslot = arg->argslot();
> +    int32 stack_offset = StackOffsetOfPassedArg(argslot);
> +
> +    masm.ma_str(val.typeReg(), DTRAddr(StackPointer, DtrOffImm(stack_offset)));
> +    masm.ma_str(val.payloadReg(), DTRAddr(StackPointer, DtrOffImm(stack_offset+4)));

Ideally, these would be passed through some ma_str64 method that knows about STRD and suchlike, like the Store64* methods I added to Macro Assembler. You need to have suitable typeReg and payloadReg assignments for optimal code, though.

::: js/src/ion/arm/CodeGenerator-arm.h
@@ +86,5 @@
>  
>      bool emitDoubleToInt32(const FloatRegister &src, const Register &dest, Label *fail);
>      void emitTruncateDouble(const FloatRegister &src, const Register &dest, Label *fail);
> +    // Emits a conditional set.
> +    void emitSet(Assembler::Condition cond, const Register &dest);

This isn't defined (yet). However, I'm curious. To what is 'dest' set? Shouldn't this take an additional argument (or two)?
Comment on attachment 566079 [details] [diff] [review]
Implement all of the simple, far reaching changes discussed in 690053

Review of attachment 566079 [details] [diff] [review]:
-----------------------------------------------------------------

I skimmed the bits that just looked like they were a mess move and suchlike.

I'd leave the comments in the headers, as they are now.

::: js/src/ion/arm/Assembler-arm.cpp
@@ +714,5 @@
> +    as_alu(InvalidReg, src1, op2, op_tst, SetCond, c);
> +}
> +
> +// Not quite ALU worthy, but useful none the less:
> +// These also have the isue of these being formatted

s/isue/issue/

@@ +802,5 @@
> +}
> +
> +// Branch can branch to an immediate *or* to a register.
> +// Branches to immediates are pc relative, branches to registers
> +// are absolute

The B instruction itself is relative-immediate-only. BX can be used to branch to an address in a register. Note that the three as_b implementations only actually use immediate offsets here.
Attachment #566079 - Flags: review?(Jacob.Bramley) → review+
(In reply to Jacob Bramley [:jbramley] from comment #4)
> I skimmed the bits that just looked like they were a mess move and suchlike.

I meant "mass move" of course.
Assignee: general → mrosenberg
Status: NEW → ASSIGNED
Attached patch Fix the nitsSplinter Review
This also rebases against the latest patchset in the IonMonkey repo...
Now to rebase the next patch
Attachment #566033 - Attachment is obsolete: true
Attachment #566033 - Flags: review?(Jacob.Bramley)
Attachment #566768 - Flags: review?(Jacob.Bramley)
This time around, I'm attempting to fix the nits that need fixing, then explain the rest, since I missed some last time.
(In reply to Jacob Bramley [:jbramley] from comment #3)
> Comment on attachment 566033 [details] [diff] [review] [diff] [details] [review]
> Improve on last week's patch of the ARM/IM code
> 

> >  ^\.sw.$
> >  .[^/]*\.sw.$
> >  
> > +(^|/)#*#
> > +(^|/).#*#
> 
> What are these? Emacs swap files? Comment please!
> 
Those are from my own .hgignore.  I should find some way of adding those that doesn't get
pushed upstream ever :(

> ::: js/src/ion/arm/Architecture-arm.h
> @@ +64,5 @@
> >  // These offsets are related to bailouts.
> >  ////
> >  
> >  // Size of each bailout table entry. On arm, this is at most a ldr, then a branch
> > +static const uint32 BAILOUT_TABLE_ENTRY_SIZE    = 4;
> 
> The size no longer matches up with the comment. LDR+B is 8 bytes. What's
> going on here?
> 
comment has been updated, and expanded to remind me to actually fix it back to the way it should be (8, but for a different reason).

> ::: js/src/ion/arm/Assembler-arm.cpp
> @@ +341,5 @@
> >  
> >  ALUOp
> >  ion::ALUNeg(ALUOp op, Imm32 *imm)
> >  {
> > +    return op_invalid;
> 
> Did you mean to leave this in here?
> 
No, but it did turn up a bug!
I should probably make all of the functions that are able to fail without spoiling the compilation fail under certain conditions (env variables maybe?)

> ::: js/src/ion/arm/Assembler-arm.h
> @@ +199,5 @@
> > +    }
> > +    struct Serial;
> > +    Serial encode();
> > +    // for serializing values
> > +    struct Serial {
> 
> I don't like the name of this. It is a very specialized struct with a
> generic name. How about VFPRegIndexSplit?
I didn't like the name horribly either, which is why it is embedded in the VFPRegister class

> 
> @@ +482,3 @@
> >          return imm4L | (imm4H << 16);
> >      };
> > +    Imm8VFPImmData() : imm4L(-1), imm4H(-1), isInvalid(-1) {}
> 
> Why do you need a default constructor?
>
because I declare an array of them, which requires a default constructor.
 
> @@ +733,2 @@
> >      VFPOffImm(uint32 imm)
> > +        : VFPOff(datastore::Imm8VFPOffData(imm >> 2), imm < 0 ? IsDown : IsUp) {}
> 
> I'd add ASSERT(!(imm & ~0x3)) to the content.
> 
I'm not going to bother, since I have another patch in the pipeline that changes this up again.

> @@ +1209,5 @@
> >  
> >      // VFP instructions!
> > +    enum vmov_op {
> > +        opv_dbl = 0xC000A10,
> > +        opv_sng = 0xC000A10
> 
> Are these supposed to be identical? Also, you've only given 7 nibbles. If
> you meant 0x0C000A10, it's best to write it that way to avoid confusion.
> 
They aren't referenced, and have been removed.

> @@ +1225,5 @@
> >          JS_ASSERT(vd.equiv(vn) && vd.equiv(vm));
> > +        vfp_size sz = isDouble;
> > +        if (!vd.isDouble()) {
> > +            sz = isSingle;
> > +        }
> 
> It's personal preference, but I think this would be neater:
> vfp_size sz = (vd.isDouble()) ? (isDouble) : (isSingle);
> (Also seen elsewhere.)
It is indeed neater, but that was so I could break on said lines.

> 
> @@ +1296,5 @@
> >  
> >      // specifically, a move between two same sized-registers
> >      void as_vmov(VFPRegister vd, VFPRegister vsrc, Condition c = Always)
> >      {
> > +        as_vfp_float(vd, NoVFPRegister, vsrc, opv_add, c);
> 
> opv_add?
good catch.

> 
> @@ +1306,5 @@
> >      };
> > +
> > +    enum VFPXferSize {
> > +        WordTransfer   = 0x0E000A10,
> > +        DoubleTransfer = 0x0C400A10
> 
> These are just single bits combined with opv_dbl and opv_sng. Ideally, you
> should express them in terms of opv_(dbl|sng) or just express the single
> bits and combine them in as_vxfer.
> 
Expressing the single bits is fine, but I am not sure where to put the rest of the immediate.
the old opv_dbl didn't really belong.

> @@ +1320,5 @@
> >                    Condition c = Always)
> >      {
> > +        vfp_size sz = isSingle;
> > +        if (vm.isDouble()) {
> > +            // Technically, this can be done with a vmov à la A8-644/A8-646
> 
> What are A8-644 and A8-646?
pages in the ARM ARM.  that part of the comment has been stripped.

> 
> @@ +1325,5 @@
> > +            // however, that requires at least an extra bit saying if the
> > +            // operation should be performed on the lower or upper half of the
> > +            // double.  Moving a single to/from 2N/2N+1 isn't equivalent,
> > +            // since there are 32 single registers, and 32 double registers
> > +            // so there is no way to encode the last 16 double registers.
> 
> For the record, you can also VMOV two adjacent but unaligned S registers.
good to know!


> @@ +130,5 @@
> >  
> >      // Test the operand
> > +    // jbramley says that cmp is more efficent than tst.
> > +    // this can be replaced with cmp opd, 0.
> > +    masm.ma_tst(ToRegister(opd), ToRegister(opd));
> 
> He does! Why did you revert this back to ma_tst?
I think the code that I'd written was ma_cmp(ToRegister(opd), ToRegister(opd))
which was wrong, and rather than thinking about why it was wrong (for testing of course), I just reverted it back to ma_tst with the same arguments.


> 
> @@ +492,5 @@
> > +    const LAllocation *lhs = ins->getOperand(0);
> > +    const LAllocation *rhs = ins->getOperand(1);
> > +    const LDefinition *dest = ins->getDef(0);
> > +
> > +    switch (ins->bitop()) {
> 
> In all of these, you actually need to explicitly AND the rhs register with
> 0x1f, for the same reason that you do for the immediates. (That is, unless
> JSOP_*SH guarantees that it won't give shifts larger than 0x1f.)
> 
I've added a comment saying this needs to be done.  I need to change a bunch of code to steal an extra register, or mark one as dead.

> @@ +512,5 @@
> > +                masm.ma_asr(Imm32(ToInt32(rhs) & 0x1F), ToRegister(lhs), ToRegister(dest));
> > +            else
> > +                masm.ma_asr(ToRegister(rhs), ToRegister(lhs), ToRegister(dest));
> > + 
> > +            // Note: this is an unsigned operation.
> 
> ASR (arithmetic shift right) is surely a signed operation. Should you have
> used LSR here, and ASR for the JSOP_RSH case?
I'm going to assume that I was asleep while writing this.

> 
> @@ +848,5 @@
> >      const LConstantIndex *cindex = ins->getOperand(0)->toConstantIndex();
> >      const Value &v = graph.getConstant(cindex->index());
> >  
> >      jsdpun dpun;
> >      dpun.d = v.toDouble();
> 
> Type punning through a union is a GCC extension. Using memcpy is always
> better.
> 
that was copied from x86, If we change it here, it should also be changed there.

> @@ +850,5 @@
> >  
> >      jsdpun dpun;
> >      dpun.d = v.toDouble();
> > +    if ((dpun.u64 &0xffffffff) == 0) {
> > +        VFPImm dblEnc(dpun.u64 >> 32);
> 
> It looks like you really want a punning destination with two 32-bit words.
> That would probably generate better code, too.
> 
I think the union already has a 32+32 struct in it.  Mind if this gets rolled into a later patch?
> @@ +922,5 @@
> > +    uint32 argslot = arg->argslot();
> > +    int32 stack_offset = StackOffsetOfPassedArg(argslot);
> > +
> > +    masm.ma_str(val.typeReg(), DTRAddr(StackPointer, DtrOffImm(stack_offset)));
> > +    masm.ma_str(val.payloadReg(), DTRAddr(StackPointer, DtrOffImm(stack_offset+4)));
> 
> Ideally, these would be passed through some ma_str64 method that knows about
> STRD and suchlike, like the Store64* methods I added to Macro Assembler. You
> need to have suitable typeReg and payloadReg assignments for optimal code,
> though.
Agreed.  I'm not too sure how the register allocator needs to be modified to assign these regs efficently.
> 
> ::: js/src/ion/arm/CodeGenerator-arm.h
> @@ +86,5 @@
> >  
> >      bool emitDoubleToInt32(const FloatRegister &src, const Register &dest, Label *fail);
> >      void emitTruncateDouble(const FloatRegister &src, const Register &dest, Label *fail);
> > +    // Emits a conditional set.
> > +    void emitSet(Assembler::Condition cond, const Register &dest);
> 
> This isn't defined (yet). However, I'm curious. To what is 'dest' set?
> Shouldn't this take an additional argument (or two)?
It is supposed to mimmic an x86 instruction that sets a register depending on the condition codes, so setne %eax will set eax to 1 if the equal bit is off, and to 0 if it is on.
(In reply to Marty Rosenberg [:Marty] from comment #7)
> > >  ALUOp
> > >  ion::ALUNeg(ALUOp op, Imm32 *imm)
> > >  {
> > > +    return op_invalid;
> > 
> > Did you mean to leave this in here?
> > 
> No, but it did turn up a bug!
> I should probably make all of the functions that are able to fail without
> spoiling the compilation fail under certain conditions (env variables maybe?)

That is an excellent idea. Debug-only environment variables would work
well. It'd also be handy to have conditional failures, like 'fail after
N times', 'fail when called by X', 'fail with probability Y' and
suchlike. These are things to add later, however.

> > Why do you need a default constructor?
> >
> because I declare an array of them, which requires a default constructor.

Oh, I see. I didn't spot that.

> > I'd add ASSERT(!(imm & ~0x3)) to the content.
> > 
> I'm not going to bother, since I have another patch in the pipeline that
> changes this up again.

Right-o.

> > @@ +1306,5 @@
> > >      };
> > > +
> > > +    enum VFPXferSize {
> > > +        WordTransfer   = 0x0E000A10,
> > > +        DoubleTransfer = 0x0C400A10
> > 
> > These are just single bits combined with opv_dbl and opv_sng. Ideally, you
> > should express them in terms of opv_(dbl|sng) or just express the single
> > bits and combine them in as_vxfer.
> > 
> Expressing the single bits is fine, but I am not sure where to put the rest
> of the immediate.
> the old opv_dbl didn't really belong.

Well you removed opv_dbl anyway, so it's probably best to least
VFPXferSize as it is.

> > Type punning through a union is a GCC extension. Using memcpy is always
> > better.
> > 
> that was copied from x86, If we change it here, it should also be changed
> there.

Ok, not a big problem, just something specific that bugs me :-) I'm
_really_ conservative about type aliasing because it results in the
worst possible class of bugs when it goes wrong. (Weird wrong-result
failures rather than crashes, intermittent, goes away with different
compiler options and suchlike.)

> > It looks like you really want a punning destination with two 32-bit words.
> > That would probably generate better code, too.
> > 
> I think the union already has a 32+32 struct in it.  Mind if this gets
> rolled into a later patch?

No problem.
Attachment #566768 - Flags: review?(Jacob.Bramley) → review+
Summary: More ARM/IM improvements → IonMonkey: More ARM improvements
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.