Last Comment Bug 669132 - Nitro's FP loads are inefficient on ARM.
: Nitro's FP loads are inefficient on ARM.
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: ARM All
: -- enhancement (vote)
: mozilla9
Assigned To: Jacob Bramley [:jbramley]
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-07-04 01:45 PDT by Jacob Bramley [:jbramley]
Modified: 2011-08-17 04:40 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Optimize VFP memory accesses on ARM. (5.70 KB, patch)
2011-07-04 04:17 PDT, Jacob Bramley [:jbramley]
cdleary: review+
Details | Diff | Splinter Review
Rebase. (14.22 KB, patch)
2011-08-08 09:03 PDT, Jacob Bramley [:jbramley]
marty.rosenberg: review+
Details | Diff | Splinter Review

Description Jacob Bramley [:jbramley] 2011-07-04 01:45:58 PDT
We actually have some optimized load sequences in ARMAssembler::doubleTransfer, but they're hidden behind an alignment check that will never pass, so the optimized sequences are unreachable.

http://mxr.mozilla.org/mozilla-central/source/js/src/assembler/assembler/ARMAssembler.cpp#349

Ironically, the slow-case (which isn't guarded by the alignment check) will also fail if given an unaligned address. (We assert this, though WebKit don't.)

This is a fairly trivial fix and might come with a performance boost.
Comment 1 Jacob Bramley [:jbramley] 2011-07-04 04:17:13 PDT
Created attachment 543743 [details] [diff] [review]
Optimize VFP memory accesses on ARM.

The performance gain is lost in the noise, but the patch is simple and I don't like leaving unreachable code lying around.
Comment 2 Jacob Bramley [:jbramley] 2011-07-05 02:47:21 PDT
Comment on attachment 543743 [details] [diff] [review]
Optimize VFP memory accesses on ARM.

Review of attachment 543743 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/assembler/assembler/ARMAssembler.cpp
@@ +367,5 @@
>              fdtr_u(isLoad, srcDst, ARMRegisters::S0, (offset >> 2) & 0xff);
>              return;
>          }
> +    } else {
> +        if (offset >= 0x3ff) {

That one should be "offset >= -0x3ff".
Comment 3 Chris Leary [:cdleary] (not checking bugmail) 2011-07-13 13:13:18 PDT
Comment on attachment 543743 [details] [diff] [review]
Optimize VFP memory accesses on ARM.

Review of attachment 543743 [details] [diff] [review]:
-----------------------------------------------------------------

All of the (blah << 8) to LSL conversions check out on my end -- kind of crazy that was hardcoded to begin with!

Thanks for the patch!

::: js/src/assembler/assembler/ARMAssembler.h
@@ +318,5 @@
>              m_buffer.putInt(op | RN(rn) | RD(rd) | op2);
>          }
>  
> +        // Work out the pre-shifted constant necessary to encode the specified
> +        // logical shift left for op2 immediates. Only even shifs can be

s/shifs/shifts

@@ +321,5 @@
> +        // Work out the pre-shifted constant necessary to encode the specified
> +        // logical shift left for op2 immediates. Only even shifs can be
> +        // applied.
> +        //
> +        // Input validity is asserted in debug builds.

Is this generally implied?

@@ +324,5 @@
> +        //
> +        // Input validity is asserted in debug builds.
> +        ARMWord getOp2RotLSL(int lsl)
> +        {
> +            ASSERT((lsl >= 0) && (lsl <= 24));

Isn't this valid for all lsl <= 30?
Comment 4 Jacob Bramley [:jbramley] 2011-07-14 00:43:18 PDT
Comment on attachment 543743 [details] [diff] [review]
Optimize VFP memory accesses on ARM.

Review of attachment 543743 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/assembler/assembler/ARMAssembler.h
@@ +318,5 @@
>              m_buffer.putInt(op | RN(rn) | RD(rd) | op2);
>          }
>  
> +        // Work out the pre-shifted constant necessary to encode the specified
> +        // logical shift left for op2 immediates. Only even shifs can be

Good catch, thanks!

@@ +321,5 @@
> +        // Work out the pre-shifted constant necessary to encode the specified
> +        // logical shift left for op2 immediates. Only even shifs can be
> +        // applied.
> +        //
> +        // Input validity is asserted in debug builds.

This method assumes that the calling code is correct, and just asserts if it's not. It acts as a drop-in replacement for all those hard-coded 'rotation' field encodings, so it doesn't try to do anything clever. This method makes the complicated rotation-field encoding look like a left shift, and hopefully makes the calling code easier to read.

@@ +324,5 @@
> +        //
> +        // Input validity is asserted in debug builds.
> +        ARMWord getOp2RotLSL(int lsl)
> +        {
> +            ASSERT((lsl >= 0) && (lsl <= 24));

ARM immediates are formed from an 8-bit value _rotated_ right (by an even number). This method tries to make it look like a shift, but a shift of more than 24 bits would actually be a rotation. I suppose I could rename this to 'getOp2RotROL' (for rotate-left), but I doubt it would ever be useful and 'LSL' is probably more readily recognizable.
Comment 5 Chris Leary [:cdleary] (not checking bugmail) 2011-07-14 11:16:58 PDT
(In reply to comment #4)
> more
> than 24 bits would actually be a rotation. I suppose I could rename this to
> 'getOp2RotROL' (for rotate-left), but I doubt it would ever be useful and
> 'LSL' is probably more readily recognizable.

Thanks for the explanation -- didn't recognize that at first!
Comment 6 Jacob Bramley [:jbramley] 2011-07-22 07:19:31 PDT
By the way, I'm waiting for bug 673078 so I can push this. I haven't forgotten it!
Comment 7 Jacob Bramley [:jbramley] 2011-08-08 09:03:42 PDT
Created attachment 551472 [details] [diff] [review]
Rebase.

Marty, this patch had quite a hefty rebase, mostly around the areas that you changed in one of your recent patches, so if you don't mind I'd like you to give it a quick sanity-check to make sure I haven't broken anything. It's in Try at the moment.
Comment 8 Marty Rosenberg [:mjrosenb] 2011-08-08 14:36:28 PDT
Comment on attachment 551472 [details] [diff] [review]
Rebase.


># HG changeset patch
># Parent c2a23df05c4c280a6936f1f5a875e502a5e15328
>Bug 669132: Optimize VFP memory accesses on ARM. [r=cdleary]
>
>diff --git a/js/src/assembler/assembler/ARMAssembler.cpp b/js/src/assembler/assembler/ARMAssembler.cpp
>--- a/js/src/assembler/assembler/ARMAssembler.cpp
>+++ b/js/src/assembler/assembler/ARMAssembler.cpp
>@@ -288,19 +288,19 @@ void ARMAssembler::dataTransferN(bool is
>         // add upper bits of offset to the base, and store the result into the temp registe
*register.  I think I added that line in.  My bad.

>-        void fdtr_u(bool isLoad, int dd, int rn, ARMWord offset, Condition cc = AL)
>-        {
>-            char const * ins = isLoad ? "vldr.f64" : "vstr.f64";
>-            js::JaegerSpew(js::JSpew_Insns,
>-                    IPFX   "%-15s %s, [%s, #+%u]\n", MAYBE_PAD, ins, nameFpRegD(dd), nameGpReg(rn), offset);
>-            ASSERT(offset <= 0xff);
>-            emitInst(static_cast<ARMWord>(cc) | FDTR | DT_UP | (isLoad ? DT_LOAD : 0), dd, rn, offset);
>-        }
>-
>-        void fdtr_d(bool isLoad, int dd, int rn, ARMWord offset, Condition cc = AL)
>-        {
>-            char const * ins = isLoad ? "vldr.f64" : "vstr.f64";
>-            js::JaegerSpew(js::JSpew_Insns,
>-                    IPFX   "%-15s %s, [%s, #-%u]\n", MAYBE_PAD, ins, nameFpRegD(dd), nameGpReg(rn), offset);
>-            ASSERT(offset <= 0xff);
>-            emitInst(static_cast<ARMWord>(cc) | FDTR | (isLoad ? DT_LOAD : 0), dd, rn, offset);
>-        }
>-
Good to see some of the older functions are being removed.
Comment 9 Gary Kwong [:gkw] [:nth10sd] 2011-08-09 09:36:11 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/ac20c581f617
Comment 10 Matt Brubeck (:mbrubeck) 2011-08-09 12:04:21 PDT
Backed out because of test failures on Android:
https://hg.mozilla.org/integration/mozilla-inbound/rev/29e59859d415
Comment 11 Jacob Bramley [:jbramley] 2011-08-16 06:54:47 PDT
Seems Ok on try, so I re-pushed. The failures were either caused by other changesets (as I pushed three), or something else.

http://hg.mozilla.org/integration/mozilla-inbound/rev/25fad7b95686

Note You need to log in before you can comment on or make changes to this bug.