Closed Bug 552868 Opened 14 years ago Closed 14 years ago

TM: merge limit and capacity checks in denseArrayElement()

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Once the dslots==NULL check is removed (bug 552837) the limit and capacity checks can be merged into a single guard.  This probably won't improve the generated code, but it'll improve compile-time because there will be fewer calls to asm_leave_trace(), which is expensive.
Wow, this turns out to be a terrible idea.  With two guards the code looks
like this:

      ult8 = ult ld1, ld22       # codegen'd with the xf
      xf14: xf ult8 -> pc=...... imacpc=(nil) sp+32 rp+0 (GuardID=003)
  ......   cmp ebx,eax
  ......   jnb   ......
  ----------------------------------- ## END exit block ......
      ld20 = ld ld21[-4]
  ......   mov eax,-4(ecx)
      ult7 = ult ld1, ld20       # codegen'd with the xf
      xf13: xf ult7 -> pc=...... imacpc=(nil) sp+32 rp+0 (GuardID=005)
  ......   cmp ebx,eax
  ......   jnb   ......
  ----------------------------------- ## BEGIN exit block (LIR_xt|LIR_xf)
  ----------------------------------- ## END exit block ......

With them combined into an and-expression and a single guard, the code looks
like this:

      ult8 = ult ld1, ld22
  ......   cmp ebx,eax
  ......   setb eax
  ......   movzx eax,eax
      ld20 = ld ld21[-4]
  ......   mov edx,-4(ecx)
      ult7 = ult ld1, ld20
  ......   cmp ebx,edx
  ......   setb edx
  ......   movzx edx,edx
      and17 = and ult8, ult7
  ......   and eax,edx
      eq14 = eq and17, 0       # codegen'd with the xt
      xt10: xt eq14 -> pc=...... imacpc=(nil) sp+32 rp+0 (GuardID=003)
  ......   test eax,eax
  ......   je    ......

Basically, on x86 saving the result of a conditional test is awkward.  The
net result is that the array access fast path has 5 more x86 instructions.
The effect on SunSpider instruction counts is unsurprisingly bad.

However, if we instead find the minimum of 'length' and 'capacity' and then
compare the index against that:

     ld20 = ld ld22[-4]
  ......   mov edx,-4(ecx)
      ult8 = ult ld21, ld20       # codegen'd with the cmov
      cmov4 = cmov ult8 ? ld21 : ld20
  ......   cmp eax,edx
  ......   cmovae eax,edx
      ult7 = ult ld1, cmov4       # codegen'd with the xf
      xf10: xf ult7 -> pc=...... imacpc=(nil) sp+32 rp+0 (GuardID=003)
  ......   cmp ebx,eax
  ......   jnb   ......
  ----------------------------------- ## BEGIN exit block (LIR_xt|LIR_xf)
  ----------------------------------- ## END exit block ......

Back to five x86 instructions.  More analysis later...
Abandoning this, doesn't seem like a good idea.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.