Closed Bug 520905 Opened 15 years ago Closed 15 years ago

TM: collapse callee saved register spills/reloads into LDMIA/STMIA instructions

Categories

(Core :: JavaScript Engine, defect)

ARM
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: gal, Assigned: jseward)

References

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey)

Attachments

(1 file)

## compiling trunk 615C2CA0
      r4 = iparam 0 r4
  0061bdaa24   str r4, [fp, #-8]      r4(r4)  <= spill r4
      r5 = iparam 1 r5
  0061bdaa28   str r5, [fp, #-12]     r5(r5)  <= spill r5
      r6 = iparam 2 r6
  0061bdaa2c   str r6, [fp, #-16]     r6(r6)  <= spill r6
      r7 = iparam 3 r7
  0061bdaa30   str r7, [fp, #-20]     r7(r7)  <= spill r7
      r8 = iparam 4 r8
  0061bdaa34   str r8, [fp, #-24]     r8(r8)  <= spill r8
      r9 = iparam 5 r9
  0061bdaa38   str r9, [fp, #-28]     r9(r9)  <= spill r9
      r10 = iparam 6 r10
  0061bdaa3c   str r10, [fp, #-32]    r10(r10)  <= spill r10
      state = iparam 0 r0

...

  ## merging registers (intersect) with existing edge
  0061bd1464           restore r10   
  0061bd1464   ldr r10, [fp, #-32]   
  0061bd1468           restore r9     r10(r10)
  0061bd1468   ldr r9, [fp, #-28]     r10(r10)
  0061bd146c           restore r8     r9(r9) r10(r10)
  0061bd146c   ldr r8, [fp, #-24]     r9(r9) r10(r10)
  0061bd1470           restore r7     r8(r8) r9(r9) r10(r10)
  0061bd1470   ldr r7, [fp, #-20]     r8(r8) r9(r9) r10(r10)
  0061bd1474           restore r6     r7(r7) r8(r8) r9(r9) r10(r10)
  0061bd1474   ldr r6, [fp, #-16]     r7(r7) r8(r8) r9(r9) r10(r10)
  0061bd1478           restore r5     r6(r6) r7(r7) r8(r8) r9(r9) r10(r10)
  0061bd1478   ldr r5, [fp, #-12]     r6(r6) r7(r7) r8(r8) r9(r9) r10(r10)
  0061bd147c           restore r4     r5(r5) r6(r6) r7(r7) r8(r8) r9(r9) r10(r10)
  0061bd147c   ldr r4, [fp, #-8]      r5(r5) r6(r6) r7(r7) r8(r8) r9(r9) r10(r10)

The backend should recognize LIR_ld/LIR_sti sequences that can be collapsed into load/store multiple instructions.
Blocks: 506138
The "platform" field should be ARM, right?
Yup.
OS: Mac OS X → All
Hardware: x86 → ARM
Blocks: 526914
No longer blocks: 506138
(In reply to comment #0)
> The backend should recognize LIR_ld/LIR_sti sequences that can be collapsed
> into load/store multiple instructions.

Yes.  However, this isn't as simple as it sounds, because the sequences
as shown are the "wrong" way round.  LDM/STM require the lowest numbered
register to be at the lowest address, and what we've got currently has
the highest numbered register at the lowest address.  That's easily
enough fixed by changing the direction of the loop in 
Assembler::intersectRegisterState.
Assignee: general → jseward
Status: NEW → ASSIGNED
It's worth noting that on Cortex-A9, you'll do better to have a few small LDMs than one big one. The practical upshot of this is that you get the data earlier than you would do with a single LDM. Of course, it's a trade-off between code size and data latency, and as there are multiple factors it's hard to give general guidance, but it's usually beneficial to stick to around 4 registers in a single LDM.

On Cortex-A8, you're _probably_ better off with a single LDMIA, but when compared to two 4-register LDMs, the difference is minimal.

None of this is too critical; you'll only save a couple of cycles anyway (except in the probably rare case where you get a cache line miss on the first few registers but a hit in later ones), but it's worth noting, and if we can easily limit each LDM to four registers, it's worth doing.
> patch to generate LDMDB/STMDB insns

Gives a 1.1% improvement on SunSpider default set and 2.1% on the v8
set.  Although SunSpider is even noisier on ARM linux than it is on
x86 linux, for whatever reason.

It generates LDMDB for the exit blocks and STMDB for the entries.
On the exits, (which obviously are the most significant target,
because there are a lot of them) it merges 8 LDR instructions into
a single LDMDB; on the entries, it merges 7 stores into one.  Code
size numbers obtained by TMFLAGS=fragprofile indicate roughly a 1/3
reduction in the amount of exit code in total.
Attachment #411752 - Flags: review?(vladimir)
Comment on attachment 411752 [details] [diff] [review]
patch to generate LDMDB/STMDB insns

Looks good!  Thanks for the comments as well, they made reviewing much easier.
Attachment #411752 - Flags: review?(vladimir) → review+
Keywords: checkin-needed
http://hg.mozilla.org/tracemonkey/rev/92ea7f43feb6
Keywords: checkin-needed
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/92ea7f43feb6
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: