Closed Bug 588033 Opened 14 years ago Closed 2 years ago

IonMonkey: Investigate the use of LDRD and STRD for handling fatvals on ARM.

Categories

(Core :: JavaScript Engine, enhancement)

ARM
Linux
enhancement
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: jbramley, Unassigned)

References

Details

Attachments

(1 file)

ARM can perform 64-bit integer loads and stores using LDRD and STRD. These will be ideal for handling fatvals. However, for ARM code at least, there is one significant restriction: The registers used must be adjacent and aligned (like r0-r1, but not r1-r2).

In order to make use of this on ARM, the register allocator needs to be able to find adjacent registers. Also, the scratch register set should really put two scratch registers next to each other so the back-end can issue LDRD and STRD even if JM specifies awkward arguments (like immediates).
A good example, from InjectJaegerReturn:

    "ldr r1, [r11, #40]"                    "\n" /* fp->rval data */
    "ldr r2, [r11, #44]"                    "\n" /* fp->rval type */

If a more suitable register pair were used (such as r2+r3), the following would be preferable:

    "ldrd r2, r3, [r11, #40]"               "\n"

It should be too hard to re-assign S0 to r1 and use r2-r3 for the data/type pair.
Attached patch v1: Use LDRD.Splinter Review
This patch makes very little difference to performance with either SunSpider or V8 (on A9 at least). I think the LDRD sequence is faster than the old LDR-based one, but I've reserved an extra scratch register for some variants so that probably negates any benefit.

Also note that I haven't changed the register allocation patterns, so we won't be making use of all the new code-paths.

I'll investigate further at some point.
If the register allocator is not nice to us, in some cases, we can produce more efficent code using the writeback bit.
I am explicitly thinking of the code generated by load64WithAddressOffsetPatch
which looks like:
ldr s0, c
addr s0, s0, r_base
ldr dest1, [s0]
ldr dest2, [s0,4]

We should be able to eliminate the add by:
ldr s0, c
ldr dest1, [s0, r_base]!
ldr dest2, [s0,4]

I suspect that this in particular won't help by a huge amount, but there are likely other places that we can apply this.
Does IM's ARM backend do this?
Flags: needinfo?(mrosenberg)
yes, and no.  IM's backend has the ability to emit ldrd for loading a value, but it requires the register allocator to allocate the two halves of a value in two consecutive registers, with the even register being lower *and* the tag needs to go in the higher register.  Unfortunately, from the examples that I've looked at (several months ago at this point), the register allocator always puts the tag in the lower register, meaning we almost never actually use this instruction.  Other hair-brained ideas that I mentioned in this bug have not been implemented for IM.
Flags: needinfo?(mrosenberg)
Given comment 5, I'm changing the description to tag ion.
Summary: JM: Investigate the use of LDRD and STRD for handling fatvals on ARM. → IonMonkey: Investigate the use of LDRD and STRD for handling fatvals on ARM.

The bug assignee didn't login in Bugzilla in the last 7 months.
:sdetar, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: Jacob.Bramley → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(sdetar)

We use LDRD/STRD when possible, and it's not worth investing time for ARM32 perf optimizations at this point.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(sdetar)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: