Closed Bug 606441 Opened 15 years ago Closed 15 years ago

TM: specialize Math.abs() for integers when possible

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 1 obsolete file)

imaging-gaussian-blur in Kraken has several lines like this: r += squidImageData[4 * ((y + j) * width + (x + i)) + 0] * kernel[Math.abs(j)][Math.abs(i)]; We only have a double version of Math.abs(), so we convert 'i' and 'j' to doubles, call the double version, then convert the result back to an integer. The LIR looks like this: ldi6 = ldi.sp sp[-16] $stack10 = i2d ldi6 ... mathCache = immi 0x91add90 math_abs_tn1 = calld.none #math_abs_tn ( mathCache/*0x91add90*/ $stack10 ) std.sp sp[24] = math_abs_tn1 ... d2i1 = d2i math_abs_tn1 i2d7 = i2d d2i1 eqd1 = eqd math_abs_tn1, i2d7 xf7: xf eqd1 -> pc=0x91ca26a imacpc=(nil) sp+32 rp+0 (GuardID=019) ... ...use d2i1 as the array index... It can be improved to this: ldi6 = ldi.sp sp[-16] ... neg = negi ldi6 isNeg = lti ldi6, 0 abs = cmovi isNeg, neg, ldi6 sti.sp sp[24] = abs ... ...use abs as the array index... The attached draft patch does this. imaging-gaussian-blur runs ~1.6x faster, and Kraken overall runs ~1.07x faster. Unfortunately, Waldo pointed out this is too optimistic because it doesn't handle INT_MIN correctly. So I'll need to add a guard for that case which will pull the improvement back a little, but hopefully not too much -- avoiding the function call is the most important part of the change.
I made this version: LIns* shift_right = lir->ins2(LIR_rshi, a, lir->insImmI(31)); LIns* _xor = lir->ins2(LIR_xori, shift_right, a); LIns* sub = lir->ins2(LIR_subi, _xor, shift_right); set(&vp[0], i2d(abs_ins)); Based on the code of msvc: cdq xor eax, edx sub eax, edx And because we again don't have cdq, i replaced this instruction by: mov edx, eax sar edx, 31 xor eax, edx sub eax, edx This solutions is a tiny bit faster and avoids branching, but the guard for INT_MIN is needed anyway.
(In reply to comment #1) > I made this version: > LIns* shift_right = lir->ins2(LIR_rshi, a, lir->insImmI(31)); > LIns* _xor = lir->ins2(LIR_xori, shift_right, a); > LIns* sub = lir->ins2(LIR_subi, _xor, shift_right); > > set(&vp[0], i2d(abs_ins)); > Based on the code of msvc: > cdq > xor eax, edx > sub eax, edx > And because we again don't have cdq, i replaced this instruction by: > mov edx, eax > sar edx, 31 > xor eax, edx > sub eax, edx > > This solutions is a tiny bit faster and avoids branching, but the guard for > INT_MIN is needed anyway. Thanks for the suggestion. To paraphrase, you've done this: mask = a >> 31 abs = (v ^ mask) - mask According to http://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs that version is patented (how ridiculous!) but this similar version is not: mask = a >> 31 abs = (v + mask) ^ mask I tried that and surprisingly enough it was clearly worse than my cmov version. First, Nanojit generated one more x86 instruction: negi1 = negi ldi1 0xf6db6f6d mov ecx,ebx 0xf6db6f6f neg ecx isNeg = lti ldi1, strict/*0*/ abs = cmovi isNeg ? negi1 : ldi1 0xf6db6f71 cmp ebx,0 0xf6db6f74 cmovge ecx,ebx versus: immi4 = immi 31 mask = rshi ldi1, immi4/*31*/ 0xf6cfff6c mov edx,ebx 0xf6cfff6e sar edx,31 addi2 = addi ldi1, mask 0xf6cfff71 mov ecx,ebx 0xf6cfff73 add ecx,edx abs = xori addi2, mask 0xf6cfff75 xor ecx,edx On my MacBook Pro, my version gave a speed-up of 1.65x (this is with the INT_MIN guard, so that didn't hurt at all, thankfully), the non-cmov version gave a speed-up of 1.48x. On my Linux desktop the corresponding numbers were 1.45x and 1.34x. Maybe the non-cmov version is slower because the computation is entirely serial, compared to the cmov version where the 'neg' and the 'cmp' can be done in parallel. Since the cmov version is faster and more obvious I'll stick with it. dvander: I'm confident about the LIR generation; but less so about the native/get/set/pendingSpecializedNative changes, which I cribbed from the charAt()-handling code, so please check that carefully.
Attachment #485254 - Attachment is obsolete: true
Attachment #485640 - Flags: review?(dvander)
Comment on attachment 485640 [details] [diff] [review] patch (against TM 55731:95ba3d6a1ecf) >+ /* abs(INT_MIN) can't be done using integers; exit if we see it. */ >+ LIns* intMin_ins = addName(lir->insImmI(0x80000000), "INT_MIN"); >+ LIns* isIntMin_ins = addName(lir->ins2(LIR_eqi, a, intMin_ins), "isIntMin"); >+ guard(false, isIntMin_ins, OVERFLOW_EXIT); Need a compile-time check for this to deoptimize, otherwise we'll keep trying to build new traces (unless you change that to MISMATCH_EXIT). r=me w/ that.
Attachment #485640 - Flags: review?(dvander) → review+
Whiteboard: fixed-in-tracemonkey
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: