Nitro's FP loads are inefficient on ARM.

RESOLVED FIXED in mozilla9

Status

()

Core
JavaScript Engine
--
enhancement
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: jbramley, Assigned: jbramley)

Tracking

unspecified
mozilla9
ARM
All
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

6 years ago
We actually have some optimized load sequences in ARMAssembler::doubleTransfer, but they're hidden behind an alignment check that will never pass, so the optimized sequences are unreachable.

http://mxr.mozilla.org/mozilla-central/source/js/src/assembler/assembler/ARMAssembler.cpp#349

Ironically, the slow-case (which isn't guarded by the alignment check) will also fail if given an unaligned address. (We assert this, though WebKit don't.)

This is a fairly trivial fix and might come with a performance boost.
(Assignee)

Comment 1

6 years ago
Created attachment 543743 [details] [diff] [review]
Optimize VFP memory accesses on ARM.

The performance gain is lost in the noise, but the patch is simple and I don't like leaving unreachable code lying around.
Attachment #543743 - Flags: review?(cdleary)
(Assignee)

Comment 2

6 years ago
Comment on attachment 543743 [details] [diff] [review]
Optimize VFP memory accesses on ARM.

Review of attachment 543743 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/assembler/assembler/ARMAssembler.cpp
@@ +367,5 @@
>              fdtr_u(isLoad, srcDst, ARMRegisters::S0, (offset >> 2) & 0xff);
>              return;
>          }
> +    } else {
> +        if (offset >= 0x3ff) {

That one should be "offset >= -0x3ff".
Comment on attachment 543743 [details] [diff] [review]
Optimize VFP memory accesses on ARM.

Review of attachment 543743 [details] [diff] [review]:
-----------------------------------------------------------------

All of the (blah << 8) to LSL conversions check out on my end -- kind of crazy that was hardcoded to begin with!

Thanks for the patch!

::: js/src/assembler/assembler/ARMAssembler.h
@@ +318,5 @@
>              m_buffer.putInt(op | RN(rn) | RD(rd) | op2);
>          }
>  
> +        // Work out the pre-shifted constant necessary to encode the specified
> +        // logical shift left for op2 immediates. Only even shifs can be

s/shifs/shifts

@@ +321,5 @@
> +        // Work out the pre-shifted constant necessary to encode the specified
> +        // logical shift left for op2 immediates. Only even shifs can be
> +        // applied.
> +        //
> +        // Input validity is asserted in debug builds.

Is this generally implied?

@@ +324,5 @@
> +        //
> +        // Input validity is asserted in debug builds.
> +        ARMWord getOp2RotLSL(int lsl)
> +        {
> +            ASSERT((lsl >= 0) && (lsl <= 24));

Isn't this valid for all lsl <= 30?
Attachment #543743 - Flags: review?(cdleary) → review+
(Assignee)

Comment 4

6 years ago
Comment on attachment 543743 [details] [diff] [review]
Optimize VFP memory accesses on ARM.

Review of attachment 543743 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/assembler/assembler/ARMAssembler.h
@@ +318,5 @@
>              m_buffer.putInt(op | RN(rn) | RD(rd) | op2);
>          }
>  
> +        // Work out the pre-shifted constant necessary to encode the specified
> +        // logical shift left for op2 immediates. Only even shifs can be

Good catch, thanks!

@@ +321,5 @@
> +        // Work out the pre-shifted constant necessary to encode the specified
> +        // logical shift left for op2 immediates. Only even shifs can be
> +        // applied.
> +        //
> +        // Input validity is asserted in debug builds.

This method assumes that the calling code is correct, and just asserts if it's not. It acts as a drop-in replacement for all those hard-coded 'rotation' field encodings, so it doesn't try to do anything clever. This method makes the complicated rotation-field encoding look like a left shift, and hopefully makes the calling code easier to read.

@@ +324,5 @@
> +        //
> +        // Input validity is asserted in debug builds.
> +        ARMWord getOp2RotLSL(int lsl)
> +        {
> +            ASSERT((lsl >= 0) && (lsl <= 24));

ARM immediates are formed from an 8-bit value _rotated_ right (by an even number). This method tries to make it look like a shift, but a shift of more than 24 bits would actually be a rotation. I suppose I could rename this to 'getOp2RotROL' (for rotate-left), but I doubt it would ever be useful and 'LSL' is probably more readily recognizable.
(In reply to comment #4)
> more
> than 24 bits would actually be a rotation. I suppose I could rename this to
> 'getOp2RotROL' (for rotate-left), but I doubt it would ever be useful and
> 'LSL' is probably more readily recognizable.

Thanks for the explanation -- didn't recognize that at first!
(Assignee)

Comment 6

6 years ago
By the way, I'm waiting for bug 673078 so I can push this. I haven't forgotten it!
(Assignee)

Comment 7

6 years ago
Created attachment 551472 [details] [diff] [review]
Rebase.

Marty, this patch had quite a hefty rebase, mostly around the areas that you changed in one of your recent patches, so if you don't mind I'd like you to give it a quick sanity-check to make sure I haven't broken anything. It's in Try at the moment.
Attachment #543743 - Attachment is obsolete: true
Attachment #551472 - Flags: review?(mrosenberg)
Comment on attachment 551472 [details] [diff] [review]
Rebase.


># HG changeset patch
># Parent c2a23df05c4c280a6936f1f5a875e502a5e15328
>Bug 669132: Optimize VFP memory accesses on ARM. [r=cdleary]
>
>diff --git a/js/src/assembler/assembler/ARMAssembler.cpp b/js/src/assembler/assembler/ARMAssembler.cpp
>--- a/js/src/assembler/assembler/ARMAssembler.cpp
>+++ b/js/src/assembler/assembler/ARMAssembler.cpp
>@@ -288,19 +288,19 @@ void ARMAssembler::dataTransferN(bool is
>         // add upper bits of offset to the base, and store the result into the temp registe
*register.  I think I added that line in.  My bad.

>-        void fdtr_u(bool isLoad, int dd, int rn, ARMWord offset, Condition cc = AL)
>-        {
>-            char const * ins = isLoad ? "vldr.f64" : "vstr.f64";
>-            js::JaegerSpew(js::JSpew_Insns,
>-                    IPFX   "%-15s %s, [%s, #+%u]\n", MAYBE_PAD, ins, nameFpRegD(dd), nameGpReg(rn), offset);
>-            ASSERT(offset <= 0xff);
>-            emitInst(static_cast<ARMWord>(cc) | FDTR | DT_UP | (isLoad ? DT_LOAD : 0), dd, rn, offset);
>-        }
>-
>-        void fdtr_d(bool isLoad, int dd, int rn, ARMWord offset, Condition cc = AL)
>-        {
>-            char const * ins = isLoad ? "vldr.f64" : "vstr.f64";
>-            js::JaegerSpew(js::JSpew_Insns,
>-                    IPFX   "%-15s %s, [%s, #-%u]\n", MAYBE_PAD, ins, nameFpRegD(dd), nameGpReg(rn), offset);
>-            ASSERT(offset <= 0xff);
>-            emitInst(static_cast<ARMWord>(cc) | FDTR | (isLoad ? DT_LOAD : 0), dd, rn, offset);
>-        }
>-
Good to see some of the older functions are being removed.
Attachment #551472 - Flags: review?(mrosenberg) → review+
(Assignee)

Updated

6 years ago
Whiteboard: [inbound]
http://hg.mozilla.org/integration/mozilla-inbound/rev/ac20c581f617
Backed out because of test failures on Android:
https://hg.mozilla.org/integration/mozilla-inbound/rev/29e59859d415
Whiteboard: [inbound]
(Assignee)

Comment 11

6 years ago
Seems Ok on try, so I re-pushed. The failures were either caused by other changesets (as I pushed three), or something else.

http://hg.mozilla.org/integration/mozilla-inbound/rev/25fad7b95686
Whiteboard: [inbound]
http://hg.mozilla.org/mozilla-central/rev/25fad7b95686
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
Target Milestone: --- → mozilla9
You need to log in before you can comment on or make changes to this bug.