Closed
Bug 412978
Opened 18 years ago
Closed 18 years ago
TT: doubleToInt32 needs optimization
Categories
(Tamarin Graveyard :: Tracing Virtual Machine, defect, P1)
Tamarin Graveyard
Tracing Virtual Machine
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: stejohns, Unassigned)
Details
Attachments
(1 file)
|
6.41 KB,
patch
|
stejohns
:
review+
|
Details | Diff | Splinter Review |
Some benchmarks show the doubleToInt32 primitive at over 3% of processing time. The code itself can be optimized further (especially on SSE2), but it also is likely getting called too often (eg, in unnecessary int->double promotion in loops).
Comment 1•18 years ago
|
||
The performance of the "doubleToInt32(double d)" primitive in TT can be significantly improved by using the highly optimized code that we propose here. This primitive is hot and is used for array index conversions among others.
Here are the speedups I can reproduce on a Core-Duo laptop:
Test % Speedup
------------ ---------
math-cordic 111.11
bitops-bit-in-byte 63.71
run-nsieve 57.45
bitops-nsieve-bits 34.76
crypto-sha1 34.29
run-access-fannkuch 25.20
3dmorph 20.51
*Note that Sunspider performance numbers with extremely short running time should be seen with caution.
JSBench/HeapSort 25%
This is a long-running test, and the speedup is stable.
Here's the propose code:
-------------------------------------------------------
typedef unsigned __int64 UINT64;
typedef signed __int64 SINT64;
typedef union {
double d;
UINT64 i;
struct { unsigned int il, ih;} i32;
} double_int;
int32_t FASTCALL doubleToInt32(double d)
{
double_int du, duh, two32;
#if DBLTOINT32_INT64
double_int adu;
UINT64 sign_d;
SINT64 MASK;
#endif
unsigned DI_H, u_tmp, expon, shift_amount;
signed int res, mask32;
// Here are the ECMA Specs
// From the ES3 spec, 9.5
// 2. If Result(1) is NaN, +0, -0, +Inf, or -Inf, return +0.
// 3. Compute sign(Result(1)) * floor(abs(Result(1))).
// 4. Compute Result(3) modulo 2^32; that is, a finite integer value k of Number
// type with positive sign and less than 2^32 in magnitude such the mathematical
// difference of Result(3) and k is mathematically an integer multiple of 2^32.
// 5. If Result(4) is greater than or equal to 2^31, return Result(4)- 2^32,
// otherwise return Result(4).
// Algorithm Outline
// Step 1. If d is NaN, +/-Inf or |d|>=2^84 or |d|<1, then return 0
// All of this is implemented based on an exponent comparison.
// Step 2. If |d|<2^31, then return (int)d
// The cast to integer (conversion in RZ mode) returns the correct result.
// Step 3. If |d|>=2^32, d:=fmod(d, 2^32) is taken -- but without a call
// Step 4. If |d|>=2^31, then the fractional bits are cleared before applying the correction by 2^32: d - sign(d)*2^32
// Step 5. Return (int)d
du.d = d;
DI_H = du.i32.ih;
u_tmp = (DI_H & 0x7ff00000) - 0x3ff00000;
if(u_tmp >= (0x45300000-0x3ff00000))
// d is Nan, +/-Inf or +/-0, or |d|>=2^(32+52) or |d|<1, in which case result=0
return 0;
if(u_tmp < 0x01f00000) // |d|<2^31
{
res = (signed int)d;
return (int32_t)res;
}
if(u_tmp > 0x01f00000)
{
// |d|>=2^32
expon = u_tmp >> 20;
shift_amount = expon - 21;
duh.i = du.i;
#if DBLTOINT32_INT64
MASK = 0x8000000000000000ll;
MASK = MASK >> shift_amount;
#else
mask32 = 0x80000000;
if(shift_amount<32) {
mask32 >>= shift_amount;
duh.i32.ih = du.i32.ih & mask32;
duh.i32.il = 0;
}
else {
mask32 >>= (shift_amount-32);
duh.i32.ih = du.i32.ih;
duh.i32.il = du.i32.il & mask32;
}
#endif
du.d -= duh.d;
}
DI_H = du.i32.ih;
// eliminate fractional bits
u_tmp = (DI_H & 0x7ff00000);
if(u_tmp >= 0x41e00000) // |d|>=2^31
{
expon = u_tmp >> 20;
shift_amount = expon - (0x3ff - 11);
#if DBLTOINT32_INT64
MASK = 0x8000000000000000ll;
MASK = MASK >> shift_amount;
du.i &= (UINT64)MASK;
#else
mask32 = 0x80000000;
if(shift_amount<32) {
mask32 >>= shift_amount;
du.i32.ih &= mask32;
du.i32.il = 0;
}
else {
mask32 >>= (shift_amount-32);
du.i32.il &= mask32;
}
#endif
#if DBLTOINT32_INT64
sign_d = du.i & 0x8000000000000000ull;
two32.i = 0x41f0000000000000ull ^ sign_d;
#else
two32.i32.ih = 0x41f00000 ^ (du.i32.ih & 0x80000000);
two32.i32.il = 0;
#endif
du.d -= two32.d;
}
res = (signed int)du.d;
return (int32_t)res;
}
| Reporter | ||
Comment 2•18 years ago
|
||
I'm unclear on what DBLTOINT32_INT64 indicates and how/whether it is defined.
Also, a minor point: we're now using C99 integer types for new code in TT, so this chould probably use uint64_t, etc.
Comment 3•18 years ago
|
||
Here's the same code in patch form. I converted to C99 types with the following assumptions, besides the obvious conversion:
(signed int) => int32_t
hopefully that's right for both 32bit and 64bit compilers. I also separated out the code based on DBLTOINT32_INT64.
Attachment #319495 -
Flags: review?(mohammad.r.haghighat)
Comment 4•18 years ago
|
||
winxp bootcamp on macbook pro 2.6ghz core 2 duo (winxp noise is +/- 15ms)
TT TT+patch
boids 5813 5735
boidshackHeadless 454 407
gameoflife 422 360
access-binary-trees 141 156
access-fannkuch 157 125
access-nbody 187 188
access-nsieve 62 62
bitops-3bit-bits-in-byte 16 16
bitops-bits-in-byte 47 32
bitops-bitwise-and 266 251
bitops-nsieve-bits 79 47
crypto-md5 187 187
crypto-sha1 47 47
math-cordic 79 47
math-partial-sums 234 235
math-spectral-norm 32 31
s3d-cube 172 156
s3d-morph 94 78
s3d-raytrace 266 250
string-fasta 172 172
controlflow-recursive 31 31
chess 51 50
Crypt 1078 703
SOR 1218 860
HeapSort 10807 9493
Comment 5•18 years ago
|
||
Setting DBLTOINT32_INT64, should result in even more speedup on 64-bit systems (e.g., Intel EM64T).
Comment 6•18 years ago
|
||
A clarification: DBLTOINT32_INT64 should be set only if the target platform is 64-bit, otherwise (i.e., if it is set when the target platform is 32-bit) the code will still run correctly but not necessarily faster.
Comment 7•18 years ago
|
||
The code looks good. It, however, needs to be guarded so it is used only on little-endian targets, such as x86. I'm not sure how this would work on all versions of ARM.
Comment 8•18 years ago
|
||
Comment on attachment 319495 [details] [diff] [review]
code below as patch to TT
marking + on behalf of Moh
Attachment #319495 -
Flags: review?(mohammad.r.haghighat) → review+
Comment 9•18 years ago
|
||
In SpiderMonkey on Windows, the total number of calls to js_DoubleToECMAInt32()
during a complete run of Sunspider (5 iterations) is currently ~7.07M, while in
TT, the function doubleToInt32() is currently called ~33.36M times (almost a
factor of 5x). We should be able to eliminate a large number of unnecessary calls.
Comment 10•18 years ago
|
||
SM has another function, js_ValueToInt32() that is used in other situations. And, TT is using doubleToInt32 (ecma semantics) in cases when it could just use the simpler double->int cast.
So, TT should use a simpler primitive where possible, but we should track stats on both uses.
Even once the stats are adjusted i still agree TT could eliminate many calls to either primitive. some factors
* SM (presumably) tries to keep ints as ints, by checking for overflow. TT currently will promote int->double, in cases then later use a primitive to check if the result is in the integer range. partly this is so type-analysis can be done... the result of an op is either always int or always double. Because SM doesn't need a result to always be one type or another it can be more aggressive about avoiding int->double promotion.
(TT can/should do this too, but its more interesting while also trying to do type analysis both statically and at trace time).
Updated•18 years ago
|
Priority: -- → P1
Comment 11•18 years ago
|
||
(In reply to comment #10)
> SM has another function, js_ValueToInt32() that is used in other situations.
Just to clarify, this is the original value-to-int32 function but as you can see from a quick grep:
$ grep js_ValueToInt32 *.[ch]
jsapi.c: *ip = js_ValueToInt32(cx, &tvr.u.value);
jsnum.c:js_ValueToInt32(JSContext *cx, jsval *vp)
jsnum.h:js_ValueToInt32(JSContext *cx, jsval *vp);
it's no longer used except by its API wrapper. The old name stuck in the JS API, and the API entry point JS_ValueToInt32 is used from setTimeout still (no clear DOM standard yet -- the WHATWG is working on it, though I think back-burner for now). Compatibility means more names, thus the ECMA name-parts.
So no benchmark should show any SM code in js_ValueToInt32 -- only in the js_{Double,Value}ToECMA{Ui,I}nt32 quad.
/be
Comment 12•18 years ago
|
||
scrubbing tamarin bugs changing component to Tracing Virtual Machine.
Component: Virtual Machine → Tracing Virtual Machine
QA Contact: vm → tracing-vm
Updated•18 years ago
|
Attachment #319495 -
Flags: review+ → review?(stejohns)
Updated•18 years ago
|
Attachment #319495 -
Flags: review?(rwinchel)
| Reporter | ||
Comment 13•18 years ago
|
||
Comment on attachment 319495 [details] [diff] [review]
code below as patch to TT
Does the double_int union need an ifdef AVMPLUS_LITTLE_ENDIAN/AVMPLUS_BIG_ENDIAN?
Otherwise, looks good
Attachment #319495 -
Flags: review?(stejohns) → review+
Comment 14•18 years ago
|
||
aside from not having any big endian targets at the moment, yes.
changeset: 415:ef89a85db977
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Updated•18 years ago
|
Attachment #319495 -
Flags: review?(rwinchel)
Comment 15•17 years ago
|
||
The 64-bit implementation in this patch is buggy. It fails TM's mul-overflow test.
You need to log in
before you can comment on or make changes to this bug.
Description
•