Closed Bug 707644 Opened 13 years ago Closed 13 years ago

f2i appears very poorly implemented

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P2)

x86
macOS
defect

Tracking

(Not tracked)

VERIFIED FIXED
Q2 12 - Cyril

People

(Reporter: lhansen, Assigned: edwsmith)

References

Details

Test case:

package
{
  function f(x:float): int
  {
    return x;
  }
  f(0.0f);
}

Generated code:

    immi2 = immi 4 (in reg? -1)
    ldf1 = ldf.o/c parami3[4] (in reg? -1)
0x671fb2  xorpd xmm0,xmm0
0x671fb6  movss xmm0,4(eax)
    floatToAtom1 = calli.none #floatToAtom ( immi1/*0x516020 core*/ ldf1 )
      (in reg? -1)
0x671fbb  sub esp,8
0x671fbe  sub esp,4
0x671fc1  movss 0(esp),xmm0
0x671fc6  push 5333024
0x671fcb  call floatToAtom
0x671fd0  add esp,16
    integer1 = calli.none #integer ( floatToAtom1 ) (in reg? -1)
0x671fd3  sub esp,12
0x671fd6  push eax
0x671fd7  call integer
0x671fdc  add esp,16
    ldi4 = ldi.o methodFrame[0] (in reg? -1)
0x671fdf  mov ecx,-12(ebp)
    sti.o immi1/*0x516020 core*/[44] = ldi4 (in reg? -1)
0x671fe2  mov 5333068(0),ecx
    reti integer1 (in reg? -1)
0x671fe8  mov esp,ebp
0x671fea  pop ebp
0x671feb  ret
    livei methodFrame (in reg? -1)
    livei parami1 (in reg? -1)
    livei immi1/*0x516020 core*/ (in reg? -1)
    livei immi2/*4*/ (in reg? -1)

If I read that correctly we first call floatToAtom() to box the atom, then integer() to convert the atom to an int.  Given the guaranteed boxing that is fairly bad.

In contrast, for a Number -> int conversion we call doubleToInt32 directly (and then only if the argument is not an integer value).
Assignee: nobody → edwsmith
Doing a fast f2i (CVTTSS2SI), then taking a slow path for inexact results is straightfoward.  in the slow path, is doubleToInt32((double)f) okay, or should we develop an optimized floatToInt32() helper?  (a lot of bit banging happens in doubleToInt32).  I'm inclined to just reuse the code for double instead of rewrite it.
For now it's probably OK to reuse the double code but you should file a bug to follow up on it unless you have good evidence that the performance difference between that and the "optimal" code is close.  Which it might be, given that it's the slow path.
It appears that promoting to float, then using all the existing double->int machenery, is the fastest thing to do at least on x86-32:

float->double->int (CVTTSD2SI fastpath, doubleToInt32 slowpath)

test                          avg     avg     %diff
Dir: asmicro/
  float-toint-1               264.2  1591.4   502.4   502.4   // int-valued floats
  float-toint-2               265.2  1601.4   503.8   503.8   // fractional floats
  float-toint-3               207.8   437.6   110.6   110.6   // out-of-range floats

float->int (CVTTSS2SI fastpath, doubleToInt32(float->double) slowpath)

test                          avg     avg     %diff
Dir: asmicro/
  float-toint-1               265.7  1272.7   379.1   379.1
  float-toint-2               265.2  1254.7   373.1   373.1
  float-toint-3               206.6    62.3   -69.9   -69.9

Notice how along the fast *and* slow path, its faster on x86 to use doubles.
tr-float:

changeset:   7032:16fc97ad3fc6
tag:         tip
user:        Edwin Smith <edwsmith@adobe.com>
date:        Mon Dec 05 12:55:34 2011 -0500
summary:     Bug 707644 - f2i appears very poorly implemented

This patch implements f2i as AvmCore::integer_d(double(f)), which
tested fastest on my Core i7 notebook with x83-32.  I opened Bug 707738
to investigate why CVTTSS2SI wasn't faster; in the mean time this establishes 
a toe-hold.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Blocks: 708204
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.