Last Comment Bug 745391 - IonMonkey: SETELEM: Fast path for typed arrays on ARM
: IonMonkey: SETELEM: Fast path for typed arrays on ARM
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: ARM Linux
: -- normal (vote)
: ---
Assigned To: general
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-13 17:57 PDT by Marty Rosenberg [:mjrosenb]
Modified: 2012-07-30 08:58 PDT (History)
1 user (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
/home/mrosenberg/patches/add_SETELEM_TypedArray-r0.patch (36.96 KB, patch)
2012-04-13 17:57 PDT, Marty Rosenberg [:mjrosenb]
no flags Details | Diff | Splinter Review
the actual patch this time... (25.70 KB, patch)
2012-04-16 22:48 PDT, Marty Rosenberg [:mjrosenb]
Jacob.Bramley: review+
Details | Diff | Splinter Review

Description Marty Rosenberg [:mjrosenb] 2012-04-13 17:57:04 PDT
Created attachment 614976 [details] [diff] [review]
/home/mrosenberg/patches/add_SETELEM_TypedArray-r0.patch

Mostly straightforward.  You may want to check my logic on truncating doubles to uint8's.  I believe I've implemented it the way the spec specifies, but I really don't trust myself to get it right. (also, whee, new assembler functions!)
Comment 1 Jacob Bramley [:jbramley] 2012-04-16 03:07:52 PDT
Comment on attachment 614976 [details] [diff] [review]
/home/mrosenberg/patches/add_SETELEM_TypedArray-r0.patch

Review of attachment 614976 [details] [diff] [review]:
-----------------------------------------------------------------

This looks very familiar. Does it replace another patch that I reviewed? There are no older patches on this bug.

Actually, I can't see your double-to-uint8 implementation. Did you forget to qrefresh?

::: js/src/ion/IonMacroAssembler.cpp
@@ +243,5 @@
>          else
>              loadDouble(src, dest.fpu());
>  
> +#ifdef JS_CPU_ARM
> +        // this code won't get committed, it is waiting on a patch from jandem so I can just call branchCompareDoubles

Make does it does the right thing. (The vmul solution, as discussed on IRC, would be fine here.)

Also, you should add a comment so the purpose of this is clear. The #else clause has one, but this one comes first.

::: js/src/ion/arm/Assembler-arm.cpp
@@ +3,5 @@
>   *
>   * ***** BEGIN LICENSE BLOCK *****
>   * Version: MPL 1.1/GPL 2.0/LGPL 2.1
>   *
> + * THE contents of this file are subject to the Mozilla Public License Version

This looks unintentional.

::: js/src/ion/arm/MacroAssembler-arm.cpp
@@ +797,5 @@
>  MacroAssemblerARM::ma_dtr(LoadStore ls, Register rn, Imm32 offset, Register rt,
>                            Index mode, Assembler::Condition cc)
>  {
> +    ma_dataTransferN(ls, 32, true, rn, offset, rt, mode, cc);
> +#if 0

Delete the block.

@@ +835,5 @@
>  {
> +    ma_dataTransferN(ls, 32, true,
> +                     Register::FromCode(addr.base()), Imm32(addr.disp()),
> +                     rt, mode, cc);
> +#if 0

Delete the block.

@@ +943,5 @@
>      JS_NOT_REACHED("Feature NYI");
>  }
>  
>  void
>  MacroAssemblerARM::ma_dataTransferN(LoadStore ls, int size, bool IsSigned,

This function contains relatively complicated logic, but I think the code was copied from elsewhere. (I recognize it from an earlier IonMonkey review.) This needs testing as it's very easy to make errors in code like this, and they often don't show up for a long time. However, I think I gave the same comment before, so it's likely that this code is already reasonably well tested in a different context.

@@ +952,1 @@
>      // we can encode this as a standard ldr... MAKE IT SO

I don't think this comment helps here. We might use as_extdtr for example, and we might use more than one instruction for large offsets.

@@ +1443,5 @@
> +
> +    if (src.offset != 0) {
> +        ma_mov(base, ScratchRegister);
> +        base = ScratchRegister;
> +        ma_add(Imm32(src.offset), base);

Can't you do this?

ma_add(Imm32(src.offset), base, ScratchRegister);
base = ScratchRegister;

(I don't think I'll ever get used to that destination-on-the-right ordering.)

@@ +1470,5 @@
> +        if (index != ScratchRegister) {
> +            ma_mov(index, ScratchRegister);
> +            index = ScratchRegister;
> +        }
> +        ma_add(Imm32(src.offset), index);

ma_add(Imm32(src.offset), index, ScratchRegister);
index = ScratchRegister;

With this, it doesn't matter whether index and ScratchRegister are the same or not.

@@ +1482,5 @@
>      ma_dataTransferN(IsLoad, 16, false, address.base, Imm32(address.offset), dest);
>  }
>  
>  void
> +MacroAssemblerARMCompat::load16ZeroExtend_mask(const Address &address, Imm32 mask, const Register &dest)

I like this new naming scheme. I'm not sure why it's useful to perform a load and a mask in one call though.

@@ +1529,5 @@
> +        if (index != ScratchRegister) {
> +            ma_mov(index, ScratchRegister);
> +            index = ScratchRegister;
> +        }
> +        ma_add(Imm32(src.offset), index);

ma_add(Imm32(src.offset), index, ScratchRegister);
index = ScratchRegister;

With this, it doesn't matter whether index and ScratchRegister are the same or not.

@@ +1619,5 @@
> +MacroAssemblerARMCompat::loadFloat(const Address &address, const FloatRegister &dest)
> +{
> +    VFPRegister rt = dest;
> +    ma_vdtr(IsLoad, address, rt.singleOverlay());
> +    as_vcvt(rt, rt.singleOverlay());

Why store values as floats but convert them to doubles for handling? Is this required by the specification? Single-precision operations will be faster than double-precision operations, so we should use them if we can.
Comment 2 Marty Rosenberg [:mjrosenb] 2012-04-16 22:45:38 PDT
(In reply to Jacob Bramley [:jbramley] from comment #1)
> Comment on attachment 614976 [details] [diff] [review]
> /home/mrosenberg/patches/add_SETELEM_TypedArray-r0.patch
> 
> Review of attachment 614976 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks very familiar. Does it replace another patch that I reviewed?
> There are no older patches on this bug.
> 
> Actually, I can't see your double-to-uint8 implementation. Did you forget to
> qrefresh?
> 
Well, this is embarrassing. I managed to grab the wrong file when I posted the patch.  You did in fact review this patch before.  The worst part about this is there were in fact 3 patches in the directory.
Comment 3 Marty Rosenberg [:mjrosenb] 2012-04-16 22:48:18 PDT
Created attachment 615620 [details] [diff] [review]
the actual patch this time...

Turns out that pasting the file name into the Description field does not actually ensure that you post the right file.
4f4da25cc2b607d73a39b8f64e4f0f98  add_SETELEM_TypedArray-r0.patch
Comment 4 Jacob Bramley [:jbramley] 2012-04-17 01:19:40 PDT
(In reply to Marty Rosenberg [:mjrosenb] from comment #2)
> Well, this is embarrassing. I managed to grab the wrong file when I posted
> the patch.  You did in fact review this patch before.

The question is, did I make the same comments? :-)
Comment 5 Jacob Bramley [:jbramley] 2012-04-17 03:39:26 PDT
Comment on attachment 615620 [details] [diff] [review]
the actual patch this time...

Review of attachment 615620 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good (with the changes requested).

::: js/src/ion/IonMacroAssembler.cpp
@@ +326,1 @@
>  #ifdef JS_CPU_ARM

This solution looks correct. However, why not just use vcvtr and let VFP do the hard work? (The 'vcvtr' instruction is the same as 'vcvt', except that it uses the FPSCR's rounding mode, which is round-to-nearest by default.)

  vcvtr.s32.f64   s0, d1
  vmov            r0, s0
  usat            r0, #8, r0

(Note the .s32 target format for vcvtr here; usat expects a signed input!)

@@ +328,5 @@
> +    ma_vimm(0.5, ScratchFloatReg);
> +    ma_vadd(input, ScratchFloatReg, ScratchFloatReg);
> +    // Convert the double into an unsigned fixed point value with 24 bits of
> +    // precision. The resulting number will look like 0xII.DDDDDD
> +    as_vcvtFixed(ScratchFloatReg, false, 24, true);

The fixed-point forms of vcvt are only available since VFPv3 (ARMv7), so this would not work on ARMv6.

@@ +340,5 @@
> +    ma_lsr(Imm32(24), output, output);
> +    // If any of the bottom 24 bits were non-zero, then we're good, since this number
> +    // can't be exactly XX.0
> +    ma_b(&notSplit, NonZero);
> +    as_vxfer(ScratchRegister, InvalidReg, ScratchFloatReg, FloatToCore);

This operation is expensive, and you already did it once. I'd duplicate the core register that you wrote to rather than emit another vmov.

::: js/src/ion/arm/Assembler-arm.cpp
@@ +1505,5 @@
>  }
>  void
>  Assembler::writePoolGuard(BufferOffset branch, Instruction *dest, BufferOffset afterPool)
>  {
> +    BOffImm off = afterPool.diffB<BOffImm>(branch);

That change doesn't look related, but looks ominous (negating an offset). Was it deliberate?

@@ +1735,4 @@
>          sz = isDouble;
> +        JS_ASSERT(idx == 0 || idx == 1);
> +        // If we are transferring a single half of the double
> +        // then it must be moving a VFP reg to a core reg.

Why not the other way?

@@ +1821,5 @@
> +    JS_ASSERT(vd.isFloat());
> +    uint32 sx = 0x1;
> +    vfp_size sf = vd.isDouble() ? isDouble : isSingle;
> +    int32 imm5 = fixedPoint;
> +    imm5 = (sx ? 32 : 16) - imm5;

It'd be better to really hard-code it, rather than leave it like this.

::: js/src/ion/arm/Assembler-arm.h
@@ +1503,5 @@
>      // our encoding actually allows just the src and the dest (and theiyr types)
>      // to uniquely specify the encoding that we are going to use.
>      void as_vcvt(VFPRegister vd, VFPRegister vm,
>                   Condition c = Always);
> +    // hard coded to the 32 bit variant for now

The comment isn't clear. Does it mean .f32? Perhaps something like "... the 32-bit fixed-point variant ..." or something similar.

::: js/src/ion/arm/MacroAssembler-arm.cpp
@@ +1623,5 @@
> +
> +    if (dest.offset != 0) {
> +        ma_mov(base, ScratchRegister);
> +        base = ScratchRegister;
> +        ma_add(base, Imm32(dest.offset), base);

ma_add(base, Imm32(dest.offset), ScratchRegister);
base = ScratchRegister;

@@ +1626,5 @@
> +        base = ScratchRegister;
> +        ma_add(base, Imm32(dest.offset), base);
> +    }
> +    ma_strb(src, DTRAddr(base, DtrRegImmShift(dest.index, LSL, scale)));
> +

Extra space.

@@ +1663,5 @@
> +        if (index != ScratchRegister) {
> +            ma_mov(index, ScratchRegister);
> +            index = ScratchRegister;
> +        }
> +        ma_add(Imm32(address.offset), index);

ma_add(index, Imm32(address.offset), ScratchRegister);
index = ScratchRegister;  // It doesn't matter if index already is ScratchRegister.

@@ +1702,5 @@
> +
> +    if (dest.offset != 0) {
> +        ma_mov(base, ScratchRegister);
> +        base = ScratchRegister;
> +        ma_add(base, Imm32(dest.offset), base);

Ditto.

@@ +1705,5 @@
> +        base = ScratchRegister;
> +        ma_add(base, Imm32(dest.offset), base);
> +    }
> +    ma_str(src, DTRAddr(base, DtrRegImmShift(dest.index, LSL, scale)));
> +

Space.

::: js/src/ion/arm/MacroAssembler-arm.h
@@ +961,5 @@
> +        // if it is <0, then we want to clamp to 0, otherwise, we wish to clamp to 255
> +        as_mov(ScratchRegister, asr(src, 8), SetCond);
> +        ma_mov(src, dest);
> +        ma_mov(Imm32(0xff), dest, NoSetCond, NotEqual);
> +        ma_mov(Imm32(0), dest, NoSetCond, Signed);

If you have ARMv6, you could just use usat. If you want to support older architectures, this is fine.

::: js/src/ion/shared/IonAssemblerBufferWithConstantPools.h
@@ +672,5 @@
>          }
>          if (!perforation.assigned()) {
>              // There isn't a perforation here, we need to dump the pool with a guard.
>              BufferOffset branch = this->nextOffset();
> +            this->markNextAsBranch();

This is an unrelated change. Is it intentional? (It looks safe to me, I think.)
Comment 6 Jacob Bramley [:jbramley] 2012-04-17 03:49:17 PDT
Actually, I don't think either of our solutions properly handles infinities. IDL specifies that they are represented as 0, but in our cases they will saturate to 0 or 255, depending on the infinity. NaNs and -0 are all handled correctly, however.
Comment 7 Marty Rosenberg [:mjrosenb] 2012-07-30 08:58:33 PDT
landed: http://hg.mozilla.org/projects/ionmonkey/rev/c45668383f51

Note You need to log in before you can comment on or make changes to this bug.