Closed
Bug 432516
Opened 16 years ago
Closed 14 years ago
SM: consider doubleToInt32 optimization
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: jorendorff, Unassigned)
References
Details
Attachments
(4 files, 1 obsolete file)
5.61 KB,
text/plain
|
Details | |
4.84 KB,
patch
|
Details | Diff | Splinter Review | |
6.89 KB,
text/plain
|
Details | |
6.72 KB,
patch
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #412978 +++ Intel posted a straightforward patch there that improves tamarin-tracing performance dramatically for some benchmarks. We should try it out in SpiderMonkey.
Comment 2•16 years ago
|
||
We should also look in using that idea to optimize JSDOUBLE_IS_INT(). In particular, the sequence JSDOUBLE_IS_INT(d, i_) && INT_FITS_IN_JSVAL(i_) should benefit from this.
Comment 3•16 years ago
|
||
I'll also look at what Igor mentions in Comment #2 above.
Comment 4•16 years ago
|
||
(In reply to comment #2) > We should also look in using that idea to optimize JSDOUBLE_IS_INT(). In > particular, the sequence JSDOUBLE_IS_INT(d, i_) && INT_FITS_IN_JSVAL(i_) should > benefit from this. > To clarify: it would be nice to have a fast macro or function like: jsval v = JSDOUBLE_FITS_IN_JSVAL(d); if (v != JSVAL_NULL) { // v here is JSVAL_INT }
Comment 5•16 years ago
|
||
This tests measures the performance gain of the proposed optimized version of js_DoubleToECMAInt32(). The speedup depends on the value of the argument of the function. The test generates a large number of random variables in various ranges and measures the speedup. To run the test, do the following: cl /O2 d2i.c d2i.exe Here's an example output on a Core2 Duo laptop: c:\code\doubletoint32>cl /O2 d2i.c Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. d2i.c Microsoft (R) Incremental Linker Version 9.00.21022.08 Copyright (C) Microsoft Corporation. All rights reserved. /out:d2i.exe d2i.obj c:\code\doubletoint32>d2i.exe max = 1 orig time = 5313, res = 0 new time = 468, res = 0 speedup = 11.35x max = 2147483648 orig time = 5875, res = 31759025306893781 new time = 688, res = 31759025306893781 speedup = 8.54x max = 4294967296 orig time = 6546, res = -151450805860420 new time = 1766, res = -151450805860420 speedup = 3.71x max = 1782633656570947600000000000000 orig time = 9047, res = 0 new time = 469, res = 0 speedup = 19.29x
Comment 6•16 years ago
|
||
This tests measures the performance gain of the proposed optimized version of js_DoubleToECMAInt32(). The speedup depends on the value of the argument of the function. The test generates a large number of random variables in various ranges and measures the speedup. To run the test, do the following: cl /O2 d2i.c d2i.exe Here's an example output on a Core2 Duo laptop: c:\code\doubletoint32>cl /O2 d2i.c Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08 for 80x86 Copyright (C) Microsoft Corporation. All rights reserved. d2i.c Microsoft (R) Incremental Linker Version 9.00.21022.08 Copyright (C) Microsoft Corporation. All rights reserved. /out:d2i.exe d2i.obj c:\code\doubletoint32>d2i.exe max = 1 orig time = 5313, res = 0 new time = 468, res = 0 speedup = 11.35x max = 2147483648 orig time = 5875, res = 31759025306893781 new time = 688, res = 31759025306893781 speedup = 8.54x max = 4294967296 orig time = 6546, res = -151450805860420 new time = 1766, res = -151450805860420 speedup = 3.71x max = 1782633656570947600000000000000 orig time = 9047, res = 0 new time = 469, res = 0 speedup = 19.29x
Attachment #319700 -
Attachment is obsolete: true
Comment 7•16 years ago
|
||
This is what I am going to performance test.
Updated•16 years ago
|
Assignee: sayrer → igor
Comment 8•16 years ago
|
||
In SpiderMonkey on Windows, the total number of calls to js_DoubleToECMAInt32() during a complete run of Sunspider (5 iterations) is currently ~7.07M, while in TT, the corresponding function is currently called ~33.36M times (almost a factor of 5x). So, the performance impact of this optimization on SpiderMonkey is going to be accordingly much smaller.
Updated•16 years ago
|
Flags: wanted-next+
Comment 9•16 years ago
|
||
I just did some Sunspider perf testing with this modification. A. My test procedure was to build a JS shell with "make -f Makefile.ref BUILD_OPT=1" and run sunspider with 50 trials. This is on my MacBook Pro with 2.2 GHz Intel Core 2 Duo and 2 GB 667 MHz DDR2 SDRAM. I left my other processes (emacs, etc) open, but I didn't touch the machine while the tests were running. B. The 3 configurations I tested were: - base (baseline trunk SM) - ints (replace toInt32 operation with Moh's version) - uint (replace toInt32 and toUint32 operation with Moh's version) I don't have code for the toUint32 operation specifically. I just used Moh's toInt32 code with the return type changed to uint32. I think that might actually even give correct answers. C. I wrote my own script to analyze the results, as the Sunspider comparison script doesn't give really detailed statistics. Also, they add in an extra degree of freedom for the t test that I think shouldn't be there, but maybe I'm missing something. My results, which are attached, have 3 columns: - %diff: the percentage relative difference in average time taken. For example, the very first result is a %diff of -0.4 for base vs. ints. This means the total time was 0.4% less with the ints patch applied, i.e., a 1.004x speedup. - t: the t statistic for the difference in means (i.e., the ratio (estimated difference in means) / (standard error)). Bigger absolute value means more significant. For reference, t >= about 2 means significance at the 95% level (note: I don't recommend using 95% confidence intervals for analyzing this). - p-value: A p-value of 'z' means this: "If the means are in fact equal, the probability of observing this big of a difference (i.e., this big a t statistic) is 'z'." (This is true only if the assumptions of the model hold, in this case that test runs are independent and normally distributed with equal variance.) Thus, if you decide to call anything with p-value <= 0.05 significant (equivalent to using the 95% confidence interval), then 1/20 of what you call significant is actually bogus. Because there are 26 tests, at this level one would be pretty likely to be making some false inferences of significance. I also print *s for significance level on each line: * means better than 0.05, ** for 0.01, *** for 0.001, and **** for 0.0001. In this context, * doesn't mean much, but I would say *** is safe to consider a real difference. D. With all that explained, in my tests we're looking at a highly significant but small 0.3-0.4% speedup from using the new toInt code. Some tests see a greater speedup. For example, access-nbody is 5% faster. Some tests slow down using the new code, which is surprising. For example, crypto-sha1 is about 2% slower in both experimental runs. I have no idea what's going on there, but i-cache effects would be my dumb guess. Perhaps VTune can shed more light on this.
Comment 10•16 years ago
|
||
I switch the #if 0/1s to enable each item. The Uint one can be enabled only if the other one is.
Updated•15 years ago
|
Assignee: igor → general
Updated•15 years ago
|
Whiteboard: DUPEME
Comment 11•14 years ago
|
||
Commited here: http://hg.mozilla.org/tracemonkey/rev/319361b18289 Inlined here: http://hg.mozilla.org/tracemonkey/rev/fcd321cd60c1
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•