Closed
Bug 606441
Opened 15 years ago
Closed 15 years ago
TM: specialize Math.abs() for integers when possible
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: n.nethercote, Assigned: n.nethercote)
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(1 file, 1 obsolete file)
5.82 KB,
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
imaging-gaussian-blur in Kraken has several lines like this:
r += squidImageData[4 * ((y + j) * width + (x + i)) + 0] * kernel[Math.abs(j)][Math.abs(i)];
We only have a double version of Math.abs(), so we convert 'i' and 'j' to
doubles, call the double version, then convert the result back to an
integer. The LIR looks like this:
ldi6 = ldi.sp sp[-16]
$stack10 = i2d ldi6
...
mathCache = immi 0x91add90
math_abs_tn1 = calld.none #math_abs_tn ( mathCache/*0x91add90*/ $stack10 )
std.sp sp[24] = math_abs_tn1
...
d2i1 = d2i math_abs_tn1
i2d7 = i2d d2i1
eqd1 = eqd math_abs_tn1, i2d7
xf7: xf eqd1 -> pc=0x91ca26a imacpc=(nil) sp+32 rp+0 (GuardID=019)
...
...use d2i1 as the array index...
It can be improved to this:
ldi6 = ldi.sp sp[-16]
...
neg = negi ldi6
isNeg = lti ldi6, 0
abs = cmovi isNeg, neg, ldi6
sti.sp sp[24] = abs
...
...use abs as the array index...
The attached draft patch does this. imaging-gaussian-blur runs ~1.6x faster,
and Kraken overall runs ~1.07x faster.
Unfortunately, Waldo pointed out this is too optimistic because it doesn't handle INT_MIN correctly. So I'll need to add a guard for that case which
will pull the improvement back a little, but hopefully not too much -- avoiding the function call is the most important part of the change.
Comment 1•15 years ago
|
||
I made this version:
LIns* shift_right = lir->ins2(LIR_rshi, a, lir->insImmI(31));
LIns* _xor = lir->ins2(LIR_xori, shift_right, a);
LIns* sub = lir->ins2(LIR_subi, _xor, shift_right);
set(&vp[0], i2d(abs_ins));
Based on the code of msvc:
cdq
xor eax, edx
sub eax, edx
And because we again don't have cdq, i replaced this instruction by:
mov edx, eax
sar edx, 31
xor eax, edx
sub eax, edx
This solutions is a tiny bit faster and avoids branching, but the guard for INT_MIN is needed anyway.
![]() |
Assignee | |
Comment 2•15 years ago
|
||
(In reply to comment #1)
> I made this version:
> LIns* shift_right = lir->ins2(LIR_rshi, a, lir->insImmI(31));
> LIns* _xor = lir->ins2(LIR_xori, shift_right, a);
> LIns* sub = lir->ins2(LIR_subi, _xor, shift_right);
>
> set(&vp[0], i2d(abs_ins));
> Based on the code of msvc:
> cdq
> xor eax, edx
> sub eax, edx
> And because we again don't have cdq, i replaced this instruction by:
> mov edx, eax
> sar edx, 31
> xor eax, edx
> sub eax, edx
>
> This solutions is a tiny bit faster and avoids branching, but the guard for
> INT_MIN is needed anyway.
Thanks for the suggestion. To paraphrase, you've done this:
mask = a >> 31
abs = (v ^ mask) - mask
According to http://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs
that version is patented (how ridiculous!) but this similar version is not:
mask = a >> 31
abs = (v + mask) ^ mask
I tried that and surprisingly enough it was clearly worse than my cmov
version. First, Nanojit generated one more x86 instruction:
negi1 = negi ldi1
0xf6db6f6d mov ecx,ebx
0xf6db6f6f neg ecx
isNeg = lti ldi1, strict/*0*/
abs = cmovi isNeg ? negi1 : ldi1
0xf6db6f71 cmp ebx,0
0xf6db6f74 cmovge ecx,ebx
versus:
immi4 = immi 31
mask = rshi ldi1, immi4/*31*/
0xf6cfff6c mov edx,ebx
0xf6cfff6e sar edx,31
addi2 = addi ldi1, mask
0xf6cfff71 mov ecx,ebx
0xf6cfff73 add ecx,edx
abs = xori addi2, mask
0xf6cfff75 xor ecx,edx
On my MacBook Pro, my version gave a speed-up of 1.65x (this is with the
INT_MIN guard, so that didn't hurt at all, thankfully), the non-cmov version
gave a speed-up of 1.48x. On my Linux desktop the corresponding numbers
were 1.45x and 1.34x. Maybe the non-cmov version is slower because the
computation is entirely serial, compared to the cmov version where the 'neg'
and the 'cmp' can be done in parallel.
Since the cmov version is faster and more obvious I'll stick with it.
dvander: I'm confident about the LIR generation; but less so about the
native/get/set/pendingSpecializedNative changes, which I cribbed from the
charAt()-handling code, so please check that carefully.
Attachment #485254 -
Attachment is obsolete: true
Attachment #485640 -
Flags: review?(dvander)
Comment on attachment 485640 [details] [diff] [review]
patch (against TM 55731:95ba3d6a1ecf)
>+ /* abs(INT_MIN) can't be done using integers; exit if we see it. */
>+ LIns* intMin_ins = addName(lir->insImmI(0x80000000), "INT_MIN");
>+ LIns* isIntMin_ins = addName(lir->ins2(LIR_eqi, a, intMin_ins), "isIntMin");
>+ guard(false, isIntMin_ins, OVERFLOW_EXIT);
Need a compile-time check for this to deoptimize, otherwise we'll keep trying to build new traces (unless you change that to MISMATCH_EXIT).
r=me w/ that.
Attachment #485640 -
Flags: review?(dvander) → review+
![]() |
Assignee | |
Comment 4•15 years ago
|
||
Whiteboard: fixed-in-tracemonkey
Comment 5•15 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•