Closed Bug 707644 Opened 13 years ago Closed 13 years ago

f2i appears very poorly implemented

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P2)

x86
macOS
defect

Tracking

(Not tracked)

VERIFIED FIXED
Q2 12 - Cyril

People

(Reporter: lhansen, Assigned: edwsmith)

References

Details

Test case: package { function f(x:float): int { return x; } f(0.0f); } Generated code: immi2 = immi 4 (in reg? -1) ldf1 = ldf.o/c parami3[4] (in reg? -1) 0x671fb2 xorpd xmm0,xmm0 0x671fb6 movss xmm0,4(eax) floatToAtom1 = calli.none #floatToAtom ( immi1/*0x516020 core*/ ldf1 ) (in reg? -1) 0x671fbb sub esp,8 0x671fbe sub esp,4 0x671fc1 movss 0(esp),xmm0 0x671fc6 push 5333024 0x671fcb call floatToAtom 0x671fd0 add esp,16 integer1 = calli.none #integer ( floatToAtom1 ) (in reg? -1) 0x671fd3 sub esp,12 0x671fd6 push eax 0x671fd7 call integer 0x671fdc add esp,16 ldi4 = ldi.o methodFrame[0] (in reg? -1) 0x671fdf mov ecx,-12(ebp) sti.o immi1/*0x516020 core*/[44] = ldi4 (in reg? -1) 0x671fe2 mov 5333068(0),ecx reti integer1 (in reg? -1) 0x671fe8 mov esp,ebp 0x671fea pop ebp 0x671feb ret livei methodFrame (in reg? -1) livei parami1 (in reg? -1) livei immi1/*0x516020 core*/ (in reg? -1) livei immi2/*4*/ (in reg? -1) If I read that correctly we first call floatToAtom() to box the atom, then integer() to convert the atom to an int. Given the guaranteed boxing that is fairly bad. In contrast, for a Number -> int conversion we call doubleToInt32 directly (and then only if the argument is not an integer value).
Assignee: nobody → edwsmith
Doing a fast f2i (CVTTSS2SI), then taking a slow path for inexact results is straightfoward. in the slow path, is doubleToInt32((double)f) okay, or should we develop an optimized floatToInt32() helper? (a lot of bit banging happens in doubleToInt32). I'm inclined to just reuse the code for double instead of rewrite it.
For now it's probably OK to reuse the double code but you should file a bug to follow up on it unless you have good evidence that the performance difference between that and the "optimal" code is close. Which it might be, given that it's the slow path.
It appears that promoting to float, then using all the existing double->int machenery, is the fastest thing to do at least on x86-32: float->double->int (CVTTSD2SI fastpath, doubleToInt32 slowpath) test avg avg %diff Dir: asmicro/ float-toint-1 264.2 1591.4 502.4 502.4 // int-valued floats float-toint-2 265.2 1601.4 503.8 503.8 // fractional floats float-toint-3 207.8 437.6 110.6 110.6 // out-of-range floats float->int (CVTTSS2SI fastpath, doubleToInt32(float->double) slowpath) test avg avg %diff Dir: asmicro/ float-toint-1 265.7 1272.7 379.1 379.1 float-toint-2 265.2 1254.7 373.1 373.1 float-toint-3 206.6 62.3 -69.9 -69.9 Notice how along the fast *and* slow path, its faster on x86 to use doubles.
tr-float: changeset: 7032:16fc97ad3fc6 tag: tip user: Edwin Smith <edwsmith@adobe.com> date: Mon Dec 05 12:55:34 2011 -0500 summary: Bug 707644 - f2i appears very poorly implemented This patch implements f2i as AvmCore::integer_d(double(f)), which tested fastest on my Core i7 notebook with x83-32. I opened Bug 707738 to investigate why CVTTSS2SI wasn't faster; in the mean time this establishes a toe-hold.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Blocks: 708204
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.