Closed Bug 412978 Opened 18 years ago Closed 18 years ago

TT: doubleToInt32 needs optimization

Categories

(Tamarin Graveyard :: Tracing Virtual Machine, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: stejohns, Unassigned)

Details

Attachments

(1 file)

Some benchmarks show the doubleToInt32 primitive at over 3% of processing time. The code itself can be optimized further (especially on SSE2), but it also is likely getting called too often (eg, in unnecessary int->double promotion in loops).
The performance of the "doubleToInt32(double d)" primitive in TT can be significantly improved by using the highly optimized code that we propose here. This primitive is hot and is used for array index conversions among others. Here are the speedups I can reproduce on a Core-Duo laptop: Test % Speedup ------------ --------- math-cordic 111.11 bitops-bit-in-byte 63.71 run-nsieve 57.45 bitops-nsieve-bits 34.76 crypto-sha1 34.29 run-access-fannkuch 25.20 3dmorph 20.51 *Note that Sunspider performance numbers with extremely short running time should be seen with caution. JSBench/HeapSort 25% This is a long-running test, and the speedup is stable. Here's the propose code: ------------------------------------------------------- typedef unsigned __int64 UINT64; typedef signed __int64 SINT64; typedef union { double d; UINT64 i; struct { unsigned int il, ih;} i32; } double_int; int32_t FASTCALL doubleToInt32(double d) { double_int du, duh, two32; #if DBLTOINT32_INT64 double_int adu; UINT64 sign_d; SINT64 MASK; #endif unsigned DI_H, u_tmp, expon, shift_amount; signed int res, mask32; // Here are the ECMA Specs // From the ES3 spec, 9.5 // 2. If Result(1) is NaN, +0, -0, +Inf, or -Inf, return +0. // 3. Compute sign(Result(1)) * floor(abs(Result(1))). // 4. Compute Result(3) modulo 2^32; that is, a finite integer value k of Number // type with positive sign and less than 2^32 in magnitude such the mathematical // difference of Result(3) and k is mathematically an integer multiple of 2^32. // 5. If Result(4) is greater than or equal to 2^31, return Result(4)- 2^32, // otherwise return Result(4). // Algorithm Outline // Step 1. If d is NaN, +/-Inf or |d|>=2^84 or |d|<1, then return 0 // All of this is implemented based on an exponent comparison. // Step 2. If |d|<2^31, then return (int)d // The cast to integer (conversion in RZ mode) returns the correct result. // Step 3. If |d|>=2^32, d:=fmod(d, 2^32) is taken -- but without a call // Step 4. If |d|>=2^31, then the fractional bits are cleared before applying the correction by 2^32: d - sign(d)*2^32 // Step 5. Return (int)d du.d = d; DI_H = du.i32.ih; u_tmp = (DI_H & 0x7ff00000) - 0x3ff00000; if(u_tmp >= (0x45300000-0x3ff00000)) // d is Nan, +/-Inf or +/-0, or |d|>=2^(32+52) or |d|<1, in which case result=0 return 0; if(u_tmp < 0x01f00000) // |d|<2^31 { res = (signed int)d; return (int32_t)res; } if(u_tmp > 0x01f00000) { // |d|>=2^32 expon = u_tmp >> 20; shift_amount = expon - 21; duh.i = du.i; #if DBLTOINT32_INT64 MASK = 0x8000000000000000ll; MASK = MASK >> shift_amount; #else mask32 = 0x80000000; if(shift_amount<32) { mask32 >>= shift_amount; duh.i32.ih = du.i32.ih & mask32; duh.i32.il = 0; } else { mask32 >>= (shift_amount-32); duh.i32.ih = du.i32.ih; duh.i32.il = du.i32.il & mask32; } #endif du.d -= duh.d; } DI_H = du.i32.ih; // eliminate fractional bits u_tmp = (DI_H & 0x7ff00000); if(u_tmp >= 0x41e00000) // |d|>=2^31 { expon = u_tmp >> 20; shift_amount = expon - (0x3ff - 11); #if DBLTOINT32_INT64 MASK = 0x8000000000000000ll; MASK = MASK >> shift_amount; du.i &= (UINT64)MASK; #else mask32 = 0x80000000; if(shift_amount<32) { mask32 >>= shift_amount; du.i32.ih &= mask32; du.i32.il = 0; } else { mask32 >>= (shift_amount-32); du.i32.il &= mask32; } #endif #if DBLTOINT32_INT64 sign_d = du.i & 0x8000000000000000ull; two32.i = 0x41f0000000000000ull ^ sign_d; #else two32.i32.ih = 0x41f00000 ^ (du.i32.ih & 0x80000000); two32.i32.il = 0; #endif du.d -= two32.d; } res = (signed int)du.d; return (int32_t)res; }
I'm unclear on what DBLTOINT32_INT64 indicates and how/whether it is defined. Also, a minor point: we're now using C99 integer types for new code in TT, so this chould probably use uint64_t, etc.
Here's the same code in patch form. I converted to C99 types with the following assumptions, besides the obvious conversion: (signed int) => int32_t hopefully that's right for both 32bit and 64bit compilers. I also separated out the code based on DBLTOINT32_INT64.
Attachment #319495 - Flags: review?(mohammad.r.haghighat)
winxp bootcamp on macbook pro 2.6ghz core 2 duo (winxp noise is +/- 15ms) TT TT+patch boids 5813 5735 boidshackHeadless 454 407 gameoflife 422 360 access-binary-trees 141 156 access-fannkuch 157 125 access-nbody 187 188 access-nsieve 62 62 bitops-3bit-bits-in-byte 16 16 bitops-bits-in-byte 47 32 bitops-bitwise-and 266 251 bitops-nsieve-bits 79 47 crypto-md5 187 187 crypto-sha1 47 47 math-cordic 79 47 math-partial-sums 234 235 math-spectral-norm 32 31 s3d-cube 172 156 s3d-morph 94 78 s3d-raytrace 266 250 string-fasta 172 172 controlflow-recursive 31 31 chess 51 50 Crypt 1078 703 SOR 1218 860 HeapSort 10807 9493
Setting DBLTOINT32_INT64, should result in even more speedup on 64-bit systems (e.g., Intel EM64T).
A clarification: DBLTOINT32_INT64 should be set only if the target platform is 64-bit, otherwise (i.e., if it is set when the target platform is 32-bit) the code will still run correctly but not necessarily faster.
The code looks good. It, however, needs to be guarded so it is used only on little-endian targets, such as x86. I'm not sure how this would work on all versions of ARM.
Comment on attachment 319495 [details] [diff] [review] code below as patch to TT marking + on behalf of Moh
Attachment #319495 - Flags: review?(mohammad.r.haghighat) → review+
In SpiderMonkey on Windows, the total number of calls to js_DoubleToECMAInt32() during a complete run of Sunspider (5 iterations) is currently ~7.07M, while in TT, the function doubleToInt32() is currently called ~33.36M times (almost a factor of 5x). We should be able to eliminate a large number of unnecessary calls.
SM has another function, js_ValueToInt32() that is used in other situations. And, TT is using doubleToInt32 (ecma semantics) in cases when it could just use the simpler double->int cast. So, TT should use a simpler primitive where possible, but we should track stats on both uses. Even once the stats are adjusted i still agree TT could eliminate many calls to either primitive. some factors * SM (presumably) tries to keep ints as ints, by checking for overflow. TT currently will promote int->double, in cases then later use a primitive to check if the result is in the integer range. partly this is so type-analysis can be done... the result of an op is either always int or always double. Because SM doesn't need a result to always be one type or another it can be more aggressive about avoiding int->double promotion. (TT can/should do this too, but its more interesting while also trying to do type analysis both statically and at trace time).
Priority: -- → P1
(In reply to comment #10) > SM has another function, js_ValueToInt32() that is used in other situations. Just to clarify, this is the original value-to-int32 function but as you can see from a quick grep: $ grep js_ValueToInt32 *.[ch] jsapi.c: *ip = js_ValueToInt32(cx, &tvr.u.value); jsnum.c:js_ValueToInt32(JSContext *cx, jsval *vp) jsnum.h:js_ValueToInt32(JSContext *cx, jsval *vp); it's no longer used except by its API wrapper. The old name stuck in the JS API, and the API entry point JS_ValueToInt32 is used from setTimeout still (no clear DOM standard yet -- the WHATWG is working on it, though I think back-burner for now). Compatibility means more names, thus the ECMA name-parts. So no benchmark should show any SM code in js_ValueToInt32 -- only in the js_{Double,Value}ToECMA{Ui,I}nt32 quad. /be
scrubbing tamarin bugs changing component to Tracing Virtual Machine.
Component: Virtual Machine → Tracing Virtual Machine
QA Contact: vm → tracing-vm
Attachment #319495 - Flags: review+ → review?(stejohns)
Attachment #319495 - Flags: review?(rwinchel)
Comment on attachment 319495 [details] [diff] [review] code below as patch to TT Does the double_int union need an ifdef AVMPLUS_LITTLE_ENDIAN/AVMPLUS_BIG_ENDIAN? Otherwise, looks good
Attachment #319495 - Flags: review?(stejohns) → review+
aside from not having any big endian targets at the moment, yes. changeset: 415:ef89a85db977
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Attachment #319495 - Flags: review?(rwinchel)
The 64-bit implementation in this patch is buggy. It fails TM's mul-overflow test.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: