Closed Bug 432516 Opened 16 years ago Closed 14 years ago

SM: consider doubleToInt32 optimization

Categories

(Core :: JavaScript Engine, defect)

Other Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jorendorff, Unassigned)

References

Details

Attachments

(4 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #412978 +++

Intel posted a straightforward patch there that improves tamarin-tracing performance dramatically for some benchmarks.  We should try it out in SpiderMonkey.
I'll check this out.
Assignee: general → sayrer
We should also look in using that idea to optimize JSDOUBLE_IS_INT(). In  particular, the sequence JSDOUBLE_IS_INT(d, i_) && INT_FITS_IN_JSVAL(i_) should benefit from this. 
I'll also look at what Igor mentions in Comment #2 above.
(In reply to comment #2)
> We should also look in using that idea to optimize JSDOUBLE_IS_INT(). In 
> particular, the sequence JSDOUBLE_IS_INT(d, i_) && INT_FITS_IN_JSVAL(i_) should
> benefit from this. 
> 

To clarify: it would be nice to have a fast macro or function like:

jsval v = JSDOUBLE_FITS_IN_JSVAL(d);
if (v != JSVAL_NULL) {
   // v here is JSVAL_INT
}


This tests measures the performance gain of the proposed optimized version of js_DoubleToECMAInt32(). The speedup depends on the value of the argument of the function. The test generates a large number of random variables in various ranges and measures the speedup. To run the test, do the following:

cl /O2 d2i.c
d2i.exe

Here's an example output on a Core2 Duo laptop:

c:\code\doubletoint32>cl /O2 d2i.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

d2i.c
Microsoft (R) Incremental Linker Version 9.00.21022.08
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:d2i.exe
d2i.obj

c:\code\doubletoint32>d2i.exe
max = 1
    orig time =   5313, res = 0
    new  time =    468, res = 0
speedup =  11.35x

max = 2147483648
    orig time =   5875, res = 31759025306893781
    new  time =    688, res = 31759025306893781
speedup =   8.54x

max = 4294967296
    orig time =   6546, res = -151450805860420
    new  time =   1766, res = -151450805860420
speedup =   3.71x

max = 1782633656570947600000000000000
    orig time =   9047, res = 0
    new  time =    469, res = 0
speedup =  19.29x
This tests measures the performance gain of the proposed optimized version of
js_DoubleToECMAInt32(). The speedup depends on the value of the argument of the
function. The test generates a large number of random variables in various
ranges and measures the speedup. To run the test, do the following:

cl /O2 d2i.c
d2i.exe

Here's an example output on a Core2 Duo laptop:

c:\code\doubletoint32>cl /O2 d2i.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

d2i.c
Microsoft (R) Incremental Linker Version 9.00.21022.08
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:d2i.exe
d2i.obj

c:\code\doubletoint32>d2i.exe
max = 1
    orig time =   5313, res = 0
    new  time =    468, res = 0
speedup =  11.35x

max = 2147483648
    orig time =   5875, res = 31759025306893781
    new  time =    688, res = 31759025306893781
speedup =   8.54x

max = 4294967296
    orig time =   6546, res = -151450805860420
    new  time =   1766, res = -151450805860420
speedup =   3.71x

max = 1782633656570947600000000000000
    orig time =   9047, res = 0
    new  time =    469, res = 0
speedup =  19.29x
Attachment #319700 - Attachment is obsolete: true
This is what I am going to performance test.
Assignee: sayrer → igor
In SpiderMonkey on Windows, the total number of calls to js_DoubleToECMAInt32() during a complete run of Sunspider (5 iterations) is currently ~7.07M, while in TT, the corresponding function is currently called ~33.36M times (almost a factor of 5x). So, the performance impact of this optimization on SpiderMonkey is going to be accordingly much smaller.
 
Blocks: js1.8.5
Flags: wanted-next+
Attached file Sunspider analysis
I just did some Sunspider perf testing with this modification. 

A. My test procedure was to build a JS shell with "make -f Makefile.ref BUILD_OPT=1" and run sunspider with 50 trials. This is on my MacBook Pro with 2.2 GHz Intel Core 2 Duo and 2 GB 667 MHz DDR2 SDRAM. I left my other processes (emacs, etc) open, but I didn't touch the machine while the tests were running.

B. The 3 configurations I tested were:

  - base (baseline trunk SM)
  - ints (replace toInt32 operation with Moh's version)
  - uint (replace toInt32 and toUint32 operation with Moh's version)

I don't have code for the toUint32 operation specifically. I just used Moh's toInt32 code with the return type changed to uint32. I think that might actually even give correct answers.

C. I wrote my own script to analyze the results, as the Sunspider comparison script doesn't give really detailed statistics. Also, they add in an extra degree of freedom for the t test that I think shouldn't be there, but maybe I'm missing something. My results, which are attached, have 3 columns:

  - %diff: the percentage relative difference in average time taken. For example, the very first result is a %diff of -0.4 for base vs. ints. This means the total time was 0.4% less with the ints patch applied, i.e., a 1.004x speedup.

  - t: the t statistic for the difference in means (i.e., the ratio (estimated difference in means) / (standard error)). Bigger absolute value means more significant. For reference, t >= about 2 means significance at the 95% level (note: I don't recommend using 95% confidence intervals for analyzing this). 

  - p-value: A p-value of 'z' means this: "If the means are in fact equal, the probability of observing this big of a difference (i.e., this big a t statistic) is 'z'." (This is true only if the assumptions of the model hold, in this case that test runs are independent and normally distributed with equal variance.) Thus, if you decide to call anything with p-value <= 0.05 significant (equivalent to using the 95% confidence interval), then 1/20 of what you call significant is actually bogus. Because there are 26 tests, at this level one would be pretty likely to be making some false inferences of significance.

I also print *s for significance level on each line: * means better than 0.05, ** for 0.01, *** for 0.001, and **** for 0.0001. In this context, * doesn't mean much, but I would say *** is safe to consider a real difference.

D. With all that explained, in my tests we're looking at a highly significant but small 0.3-0.4% speedup from using the new toInt code. 

Some tests see a greater speedup. For example, access-nbody is 5% faster. 

Some tests slow down using the new code, which is surprising. For example, crypto-sha1 is about 2% slower in both experimental runs. I have no idea what's going on there, but i-cache effects would be my dumb guess. Perhaps VTune can shed more light on this.
I switch the #if 0/1s to enable each item. The Uint one can be enabled only if the other one is.
Assignee: igor → general
Whiteboard: DUPEME
Commited here: http://hg.mozilla.org/tracemonkey/rev/319361b18289
Inlined here: http://hg.mozilla.org/tracemonkey/rev/fcd321cd60c1
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: DUPEME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: