Closed
Bug 461808
Opened 16 years ago
Closed 9 years ago
Build shell with icc on mac and linux and compare perf
Categories
(Core :: JavaScript Engine, enhancement)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: gal, Assigned: gal)
References
Details
Attachments
(1 file, 4 obsolete files)
5.45 KB,
patch
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Comment 1•16 years ago
|
||
Attached patch makes us build and pass sunspider with DEBUG=1, but we iloop on BUILD_OPT=1. Also, performance lacks behind gcc at least on mac.
Assignee | ||
Comment 2•16 years ago
|
||
Attachment #344956 -
Attachment is obsolete: true
Assignee | ||
Comment 3•16 years ago
|
||
It seems -Os caused the iloop/crash. With -O2 we get a working OPT built with excellent results (28% speedup over the gcc build using icc+PGO). TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: *1.28x as slow* 1019.2ms +/- 0.9% 1303.4ms +/- 0.9% significant ============================================================================= 3d: *1.166x as slow* 104.7ms +/- 0.9% 122.1ms +/- 1.3% significant cube: *1.183x as slow* 37.1ms +/- 2.6% 43.9ms +/- 3.1% significant morph: *1.076x as slow* 27.5ms +/- 1.4% 29.6ms +/- 2.3% significant raytrace: *1.21x as slow* 40.1ms +/- 1.3% 48.6ms +/- 1.2% significant access: *1.49x as slow* 141.7ms +/- 0.9% 211.6ms +/- 1.5% significant binary-trees: *1.080x as slow* 35.1ms +/- 1.8% 37.9ms +/- 1.7% significant fannkuch: *1.62x as slow* 69.1ms +/- 1.2% 111.9ms +/- 1.5% significant nbody: *1.95x as slow* 25.7ms +/- 1.3% 50.0ms +/- 1.5% significant nsieve: - 11.8ms +/- 2.6% 11.8ms +/- 3.8% bitops: *2.19x as slow* 37.8ms +/- 2.3% 82.7ms +/- 1.3% significant 3bit-bits-in-byte: - 1.6ms +/- 23.1% 1.6ms +/- 23.1% bits-in-byte: ?? 7.8ms +/- 3.9% 8.1ms +/- 2.8% not conclusive: might be *1.038x as slow* bitwise-and: - 2.7ms +/- 12.8% 2.5ms +/- 15.1% nsieve-bits: *2.74x as slow* 25.7ms +/- 1.3% 70.5ms +/- 1.3% significant controlflow: 1.019x as fast 32.5ms +/- 1.2% 31.9ms +/- 1.3% significant recursive: 1.019x as fast 32.5ms +/- 1.2% 31.9ms +/- 1.3% significant crypto: *1.27x as slow* 47.1ms +/- 1.5% 59.7ms +/- 1.3% significant aes: *1.23x as slow* 26.6ms +/- 1.4% 32.8ms +/- 1.4% significant md5: *1.37x as slow* 14.6ms +/- 2.5% 20.0ms +/- 1.7% significant sha1: *1.169x as slow* 5.9ms +/- 6.9% 6.9ms +/- 3.3% significant date: *1.21x as slow* 176.4ms +/- 1.3% 214.1ms +/- 1.3% significant format-tofte: *1.27x as slow* 87.5ms +/- 1.0% 111.0ms +/- 1.2% significant format-xparb: *1.160x as slow* 88.9ms +/- 1.6% 103.1ms +/- 1.4% significant math: 1.032x as fast 41.5ms +/- 1.7% 40.2ms +/- 2.2% significant cordic: 1.27x as fast 24.1ms +/- 1.7% 19.0ms +/- 1.8% significant partial-sums: *1.38x as slow* 9.9ms +/- 2.3% 13.7ms +/- 2.5% significant spectral-norm: - 7.5ms +/- 5.0% 7.5ms +/- 5.0% regexp: *1.30x as slow* 154.0ms +/- 0.9% 199.6ms +/- 1.1% significant dna: *1.30x as slow* 154.0ms +/- 0.9% 199.6ms +/- 1.1% significant string: *1.20x as slow* 283.5ms +/- 1.0% 341.5ms +/- 1.0% significant base64: *1.23x as slow* 12.5ms +/- 3.0% 15.4ms +/- 2.4% significant fasta: *1.159x as slow* 61.6ms +/- 1.6% 71.4ms +/- 1.0% significant tagcloud: *1.170x as slow* 89.4ms +/- 0.9% 104.6ms +/- 1.6% significant unpack-code: *1.28x as slow* 94.0ms +/- 1.1% 119.9ms +/- 1.0% significant validate-input: *1.162x as slow* 26.0ms +/- 1.3% 30.2ms +/- 1.0% significant
Assignee | ||
Comment 4•16 years ago
|
||
The icc build causes a regression in 3d-morph. Both interpreter and jit produce an incorrect result. This is probably a builtin issue (potentially rounding error).
Comment 5•15 years ago
|
||
The part of the patch that applies to jstracer.cpp seems to have been landed on the TraceMonkey branch. The part that applies to Makefile.ref hasn't, but it doesn't seem appropriate as it has hardwired paths. As well as SunSpider failures, there are also some failures in trace-test.js which I see on my Mac when I build 'js' with ICC (see below; I grepped for "FAILURE"). Similar to the SunSpider case, these occur with an optimised build (--disable-debug --enable-optimize) but not with a debug build (--enable-debug --disable-optimize). All the errors occur both with and without tracing (-j) so it does looks like a built-in issue. Infinity/Math.asin(-0) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Infinity/Math.atan(-0) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Math.atan2(0,-0) : FAILED: expected number ( 3.141592653589793 ) != actual number ( 0 ) Infinity/Math.atan2(-0,1) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Math.atan2(-0, -0) : FAILED: expected number ( -3.141592653589793 ) != actual number ( 0 ) Math.atan2(-0, -1) : FAILED: expected number ( -3.141592653589793 ) != actual number ( 3.141592653589793 ) Math.atan2(-1,Number.POSITIVE_INFINITY) : FAILED: expected number ( 0 ) != actual number ( -0 ) Infinity/Math.ceil('-0') : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Infinity/Math.ceil(-0) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Infinity/Math.ceil(-Number.MIN_VALUE) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Math.ceil(-0.9) : FAILED: expected number ( 0 ) != actual number ( -0 ) Math.ceil(-0.9) : FAILED: expected number ( 0 ) != actual number ( -0 ) Infinity/Math.floor(-0) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Infinity/Math.max(-0,-0) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Infinity/Math.min(0,-0) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Infinity/Math.min(-0,-0) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Math.pow(Number.NEGATIVE_INFINITY, -1) : FAILED: expected number ( 0 ) != actual number ( -0 ) Math.pow(Number.NEGATIVE_INFINITY, -3) : FAILED: expected number ( 0 ) != actual number ( -0 ) Infinity/Math.pow(-0, 1) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Infinity/Math.pow(-0,3) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Math.pow(-0, -1) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Math.pow(-0, -10001) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Infinity/Math.round(-0) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Math.round(-0.49) : FAILED: expected number ( 0 ) != actual number ( -0 ) Math.round(-0.5) : FAILED: expected number ( 0 ) != actual number ( -0 ) Infinity/Math.sqrt(-0) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Infinity/Math.tan(-0) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) FAILED: Infinity/Math.asin(-0),Infinity/Math.atan(-0),Math.atan2(0,-0),Infinity/Math.atan2(-0,1),Math.atan2(-0, -0),Math.atan2(-0, -1),Math.atan2(-1,Number.POSITIVE_INFINITY),Infinity/Math.ceil('-0'),Infinity/Math.ceil(-0),Infinity/Math.ceil(-Number.MIN_VALUE),Math.ceil(-0.9),Math.ceil(-0.9),Infinity/Math.floor(-0),Infinity/Math.max(-0,-0),Infinity/Math.min(0,-0),Infinity/Math.min(-0,-0),Math.pow(Number.NEGATIVE_INFINITY, -1),Math.pow(Number.NEGATIVE_INFINITY, -3),Infinity/Math.pow(-0, 1),Infinity/Math.pow(-0,3),Math.pow(-0, -1),Math.pow(-0, -10001),Infinity/Math.round(-0),Math.round(-0.49),Math.round(-0.5),Infinity/Math.sqrt(-0),Infinity/Math.tan(-0)
Comment 6•15 years ago
|
||
The fails all seem related to handling special values (NaN, Inf, etc.). I'll investigate.
Comment 7•15 years ago
|
||
The following one-liner demonstrates the problem: print(Math.atan2(0,-0)); The answer is supposed to be 3.141592653589793; with ICC-opt the answer is zero. By the time math_atan() is reached, the problem is already manifest -- the 2nd argument has somehow become the (jsval) integer 0, rather than the (jsval) double -0. I suspect this one problem (-0 becoming 0) is causing all the above failures. Working out where the bad conversion took place is beyond me at the moment...
Comment 8•15 years ago
|
||
(In reply to comment #7) > Working out where the bad conversion took place is beyond me at the moment... Since you did all of the hard work, I figured I'd swoop in with the easy stuff: It appears that the code at http://hg.mozilla.org/mozilla-central/file/23aa9ede6535/js/src/jsinterp.cpp#l3901 wants to be turned into a configure test (JS_NEG_ZERO_BUG?) and not specify HPUX specifically.
Comment 9•15 years ago
|
||
Changing all three places where the HPUX-specific code is reduces the number of failures from 28 to 10: Math.ceil('-0') : FAILED: expected number ( -0 ) != actual number ( 0 ) Infinity/Math.ceil('-0') : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Math.ceil(-0) : FAILED: expected number ( -0 ) != actual number ( 0 ) Infinity/Math.ceil(-0) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Math.ceil(-Number.MIN_VALUE) : FAILED: expected number ( -0 ) != actual number ( 0 ) Infinity/Math.ceil(-Number.MIN_VALUE) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) Math.ceil(-Number.MIN_VALUE) : FAILED: expected number ( -0 ) != actual number ( 0 ) Math.floor(-0) : FAILED: expected number ( -0 ) != actual number ( 0 ) Infinity/Math.floor(-0) : FAILED: expected number ( -Infinity ) != actual number ( Infinity ) I used diagnostic printf's to confirm that all three places are executed. Attached is a very dirty patch that fixes those tests if you're using ICC (it breaks with all other compilers). This will be a good start for a cleaner patch, I wasn't sure how the #defines should be done. For the remaining failures, ICC's implementation of ceil() and floor() give the wrong answer when the argument is -0, returning 0 instead of -0! I'm also not sure how best to address that.
Assignee | ||
Comment 10•15 years ago
|
||
Without -O2 ICC seems to generate correct code, so this looks definitively like a compiler bug in ICC (if you configure SM without --enable-optimization, we pass all tests). Maybe Moh can get us a non-buggy version. In parallel I think we should hack up a work-around. The ICC misoptimization seems localized enough to work around it.
Comment 11•15 years ago
|
||
I submitted a bug report against ICC and will let you know when the fix is ready. Meanwhile, we may try a quick and dirty fix such as: #ifdef ICC #define ceil(x) (x == -0.0) ? -0.0 : ceil(x) #define floor(x) (x == -0.0) ? -0.0 : floor(x) #endif
Assignee | ||
Comment 12•15 years ago
|
||
Yeah, I added macros just like that. Unfortunately 3d-morph is still wrong. There must be some other regression somewhere. Probably also -0 related. Patch to follow.
Assignee | ||
Comment 13•15 years ago
|
||
Passes trace-tests. 3d-morph still incorrect.
Attachment #344958 -
Attachment is obsolete: true
Attachment #364474 -
Attachment is obsolete: true
Assignee | ||
Comment 14•15 years ago
|
||
3d morph is a fairly simple loop involving sin() and icc shows some serious rounding error in the result. I will try to verify whether sinus is the culprit. expected: 6.394884621840902e-14 icc: 6.750155989720952e-14
Comment 15•15 years ago
|
||
I submitted a bug report against ICC and will let you know when the fix is ready. Meanwhile, we may try a quick and dirty fix such as: #ifdef ICC #define ceil(x) (x == -0.0) ? -0.0 : ceil(x) #define floor(x) (x == -0.0) ? -0.0 : floor(x) #endif
Comment 16•15 years ago
|
||
Sorry for pushing the wrong button ;) I think I found the reason. It is a one liner: printf ("%lf", neg0*0.0); ICC at O2 computes (-0.0)*0.0 as 0.0, while at -Od, it computes it as -0.0, which I guess is the intended result. In 3d-morph, we have the expression sin()* -f30, which turns to 0.0 * (-0.0). I don't have access to your ICC build of TM. If 3d-morph is changed appropriately, this can quickly be tested.
Assignee | ||
Comment 17•15 years ago
|
||
I posted these in the wrong bug: So it seems icc's sin() implementation is a bit off. This affects ICC -O2 and no optimization. whale:src gal$ ./Darwin_OPT.OBJ/js -e "print(Math.sin(10))" -0.5440211108893699 whale:src gal$ ./Darwin_ICC_OPT.OBJ/js -e "print(Math.sin(10))" -0.5440211108893698 whale:src gal$ MSVC: -0.5440211108893698 (tested using ff windows build). Maybe GCC is off here. This should be investigated further in a separate bug.
Assignee | ||
Comment 19•15 years ago
|
||
Latest numbers: We are now around 20% speedup through icc. Probably went down as we trace more code. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.197x as fast 1030.6ms +/- 0.1% 860.8ms +/- 0.1% significant ============================================================================= 3d: 1.24x as fast 154.1ms +/- 0.2% 124.4ms +/- 0.3% significant cube: 1.182x as fast 40.4ms +/- 0.7% 34.2ms +/- 0.8% significant morph: 1.23x as fast 29.1ms +/- 0.3% 23.7ms +/- 0.6% significant raytrace: 1.27x as fast 84.6ms +/- 0.2% 66.5ms +/- 0.3% significant access: 1.078x as fast 132.2ms +/- 0.2% 122.6ms +/- 0.2% significant binary-trees: 1.074x as fast 40.0ms +/- 0.4% 37.3ms +/- 0.3% significant fannkuch: 1.058x as fast 57.0ms +/- 0.2% 53.9ms +/- 0.2% significant nbody: 1.170x as fast 23.9ms +/- 0.5% 20.4ms +/- 0.7% significant nsieve: 1.025x as fast 11.3ms +/- 1.1% 11.0ms +/- 0.5% significant bitops: 1.060x as fast 35.5ms +/- 0.5% 33.5ms +/- 0.5% significant 3bit-bits-in-byte: ?? 1.6ms +/- 8.8% 1.6ms +/- 8.4% not conclusive: might be *1.025x as slow* bits-in-byte: 1.049x as fast 8.1ms +/- 1.0% 7.7ms +/- 1.7% significant bitwise-and: *1.108x as slow* 2.0ms +/- 2.8% 2.3ms +/- 5.6% significant nsieve-bits: 1.086x as fast 23.8ms +/- 0.5% 21.9ms +/- 0.7% significant controlflow: 1.035x as fast 32.5ms +/- 0.4% 31.4ms +/- 0.4% significant recursive: 1.035x as fast 32.5ms +/- 0.4% 31.4ms +/- 0.4% significant crypto: 1.26x as fast 60.9ms +/- 0.5% 48.4ms +/- 0.4% significant aes: 1.20x as fast 34.6ms +/- 0.4% 28.8ms +/- 0.4% significant md5: 1.41x as fast 19.7ms +/- 0.7% 14.0ms +/- 0.7% significant sha1: 1.183x as fast 6.6ms +/- 2.2% 5.6ms +/- 2.6% significant date: 1.24x as fast 170.1ms +/- 0.1% 137.5ms +/- 0.2% significant format-tofte: 1.32x as fast 67.5ms +/- 0.2% 51.2ms +/- 0.3% significant format-xparb: 1.188x as fast 102.5ms +/- 0.2% 86.3ms +/- 0.2% significant math: 1.061x as fast 38.8ms +/- 0.5% 36.6ms +/- 0.6% significant cordic: *1.149x as slow* 19.0ms +/- 0.5% 21.8ms +/- 0.5% significant partial-sums: 1.51x as fast 13.8ms +/- 0.9% 9.1ms +/- 1.1% significant spectral-norm: 1.071x as fast 6.1ms +/- 1.5% 5.7ms +/- 2.4% significant regexp: 1.024x as fast 44.1ms +/- 0.3% 43.0ms +/- 0.4% significant dna: 1.024x as fast 44.1ms +/- 0.3% 43.0ms +/- 0.4% significant string: 1.28x as fast 362.6ms +/- 0.1% 283.5ms +/- 0.1% significant base64: 1.30x as fast 16.2ms +/- 0.7% 12.5ms +/- 1.1% significant fasta: 1.23x as fast 75.3ms +/- 0.2% 61.3ms +/- 0.2% significant tagcloud: 1.190x as fast 99.4ms +/- 0.2% 83.5ms +/- 0.3% significant unpack-code: 1.41x as fast 140.5ms +/- 0.1% 99.7ms +/- 0.2% significant validate-input: 1.176x as fast 31.2ms +/- 0.4% 26.5ms +/- 0.5% significant
Assignee | ||
Comment 20•15 years ago
|
||
whale:v8 gal$ ../Darwin_ICC_OPT.OBJ/js -j run.js Richards: 358 DeltaBlue: 142 Crypto: 867 RayTrace: 292 EarleyBoyer: 367 ---- Score: 342 whale:v8 gal$ ../Darwin_OPT.OBJ/js -j run.js Richards: 298 DeltaBlue: 109 Crypto: 832 RayTrace: 243 EarleyBoyer: 350 ---- Score: 297
Assignee | ||
Comment 22•15 years ago
|
||
CCing jim. icc gets confused by the shell and editline static libraries. The pgo data is placed with the executable, and the linker can't find them when linking the libraries. I can also currently not use -ipo for a similar reason (not compatible with static linking). I don't see a huge win from the static libraries. Lets just integrated them into the regular build/link.
Assignee | ||
Comment 23•15 years ago
|
||
Back to 27% speedup using ipo: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.27x as fast 1030.6ms +/- 0.1% 813.3ms +/- 0.2% significant ============================================================================= 3d: 1.30x as fast 154.1ms +/- 0.2% 118.3ms +/- 0.3% significant cube: 1.25x as fast 40.4ms +/- 0.7% 32.3ms +/- 0.7% significant morph: 1.29x as fast 29.1ms +/- 0.3% 22.5ms +/- 0.6% significant raytrace: 1.33x as fast 84.6ms +/- 0.2% 63.5ms +/- 0.2% significant access: 1.147x as fast 132.2ms +/- 0.2% 115.3ms +/- 0.2% significant binary-trees: 1.154x as fast 40.0ms +/- 0.4% 34.7ms +/- 0.4% significant fannkuch: 1.110x as fast 57.0ms +/- 0.2% 51.4ms +/- 0.3% significant nbody: 1.32x as fast 23.9ms +/- 0.5% 18.2ms +/- 0.6% significant nsieve: 1.020x as fast 11.3ms +/- 1.1% 11.1ms +/- 1.0% significant bitops: 1.095x as fast 35.5ms +/- 0.5% 32.4ms +/- 0.7% significant 3bit-bits-in-byte: - 1.6ms +/- 8.8% 1.5ms +/- 9.4% bits-in-byte: 1.047x as fast 8.1ms +/- 1.0% 7.7ms +/- 1.7% significant bitwise-and: - 2.0ms +/- 2.8% 2.0ms +/- 3.5% nsieve-bits: 1.122x as fast 23.8ms +/- 0.5% 21.2ms +/- 0.6% significant controlflow: *1.007x as slow* 32.5ms +/- 0.4% 32.7ms +/- 0.4% significant recursive: *1.007x as slow* 32.5ms +/- 0.4% 32.7ms +/- 0.4% significant crypto: 1.30x as fast 60.9ms +/- 0.5% 46.7ms +/- 0.4% significant aes: 1.24x as fast 34.6ms +/- 0.4% 27.9ms +/- 0.5% significant md5: 1.46x as fast 19.7ms +/- 0.7% 13.5ms +/- 1.1% significant sha1: 1.23x as fast 6.6ms +/- 2.2% 5.3ms +/- 2.5% significant date: 1.31x as fast 170.1ms +/- 0.1% 129.3ms +/- 0.2% significant format-tofte: 1.37x as fast 67.5ms +/- 0.2% 49.4ms +/- 0.3% significant format-xparb: 1.28x as fast 102.5ms +/- 0.2% 80.0ms +/- 0.2% significant math: 1.067x as fast 38.8ms +/- 0.5% 36.4ms +/- 0.7% significant cordic: *1.153x as slow* 19.0ms +/- 0.5% 21.9ms +/- 0.4% significant partial-sums: 1.54x as fast 13.8ms +/- 0.9% 8.9ms +/- 0.9% significant spectral-norm: 1.086x as fast 6.1ms +/- 1.5% 5.6ms +/- 2.5% significant regexp: 1.039x as fast 44.1ms +/- 0.3% 42.4ms +/- 0.3% significant dna: 1.039x as fast 44.1ms +/- 0.3% 42.4ms +/- 0.3% significant string: 1.40x as fast 362.6ms +/- 0.1% 259.8ms +/- 0.3% significant base64: 1.40x as fast 16.2ms +/- 0.7% 11.6ms +/- 1.2% significant fasta: 1.45x as fast 75.3ms +/- 0.2% 52.1ms +/- 0.3% significant tagcloud: 1.30x as fast 99.4ms +/- 0.2% 76.3ms +/- 0.6% significant unpack-code: 1.49x as fast 140.5ms +/- 0.1% 94.5ms +/- 0.2% significant validate-input: 1.23x as fast 31.2ms +/- 0.4% 25.4ms +/- 0.5% significant Build instructions: mkdir icc-build cd icc-build AR="/opt/intel/Compiler/11.0/056/bin/ia32/xiar" CXX="icpc -m32 -ipo" CC="icc -m32 -ipo" ../configure --enable-optimization --disable-debug MOZ_PROFILE_GENERATE=1 make cd .. ./bench-icc.sh cd icc-build cp pgopti.dpi editline cp pgopti.dpi shell MOZ_PROFILE_USE=1 make
Comment 24•15 years ago
|
||
gal: it's been a while since I fiddled with icc, but I believe I had it working for PGO if you ran the binary out of dist/bin. Can you try running your bench script with ./dist/bin/js and see if the build system can find the PGO files properly? What's the status here, are you able to build correctly with a shipping version of ICC?
Assignee | ||
Comment 25•15 years ago
|
||
Ted, I did run js out of the bin directory. The pgo data is always placed in the same directory as the executable. Its the subsequent re-compilation/linking that fails. I need essentially the two cp lines in the makefile (see #23). As for icc, yes, with v4 applied we build correctly with icc and pass all our JIT regression tests. I will run the full JS regression tests today.
Comment 26•15 years ago
|
||
Andreas, Would you please try the ICC command-line option fp-model precise on the origianl source? I.e., the original ceil, floor, etc. Linux: icc -fp-model precise ... Windows: icl /fp:precise ... This should solve the problem, but we need to know the performance impact of enforcing the precise floating-point model. We'd need a SunSpider run.
Assignee | ||
Comment 27•15 years ago
|
||
Hi Moh. I am running the benchmarks with -fp-model precise right now.
Comment 28•15 years ago
|
||
I apparently attempted to handle this using the -prof-dir option: http://mxr.mozilla.org/mozilla-central/source/js/src/configure.in#4661 Did that change (or stop working right?)
Assignee | ||
Comment 29•15 years ago
|
||
New test run with precise fp and without the modifications to the ceil/floor code and the various other workarounds (NEGZERO_BUG). This passes trace tests. We lose about 3ms performance. Still a very good result. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.26x as fast 1030.6ms +/- 0.1% 816.2ms +/- 0.1% significant ============================================================================= 3d: 1.29x as fast 154.1ms +/- 0.2% 119.5ms +/- 0.3% significant cube: 1.23x as fast 40.4ms +/- 0.7% 32.8ms +/- 0.8% significant morph: 1.27x as fast 29.1ms +/- 0.3% 22.9ms +/- 0.6% significant raytrace: 1.32x as fast 84.6ms +/- 0.2% 63.9ms +/- 0.3% significant access: 1.132x as fast 132.2ms +/- 0.2% 116.8ms +/- 0.2% significant binary-trees: 1.151x as fast 40.0ms +/- 0.4% 34.7ms +/- 0.4% significant fannkuch: 1.097x as fast 57.0ms +/- 0.2% 52.0ms +/- 0.3% significant nbody: 1.25x as fast 23.9ms +/- 0.5% 19.1ms +/- 0.5% significant nsieve: 1.029x as fast 11.3ms +/- 1.1% 11.0ms +/- 0.7% significant bitops: 1.103x as fast 35.5ms +/- 0.5% 32.2ms +/- 0.7% significant 3bit-bits-in-byte: - 1.6ms +/- 8.8% 1.5ms +/- 9.6% bits-in-byte: 1.047x as fast 8.1ms +/- 1.0% 7.7ms +/- 1.7% significant bitwise-and: 1.074x as fast 2.0ms +/- 2.8% 1.9ms +/- 4.5% significant nsieve-bits: 1.129x as fast 23.8ms +/- 0.5% 21.1ms +/- 0.5% significant controlflow: *1.031x as slow* 32.5ms +/- 0.4% 33.5ms +/- 0.4% significant recursive: *1.031x as slow* 32.5ms +/- 0.4% 33.5ms +/- 0.4% significant crypto: 1.31x as fast 60.9ms +/- 0.5% 46.6ms +/- 0.3% significant aes: 1.24x as fast 34.6ms +/- 0.4% 27.9ms +/- 0.3% significant md5: 1.47x as fast 19.7ms +/- 0.7% 13.5ms +/- 1.1% significant sha1: 1.27x as fast 6.6ms +/- 2.2% 5.2ms +/- 2.2% significant date: 1.31x as fast 170.1ms +/- 0.1% 130.2ms +/- 0.2% significant format-tofte: 1.36x as fast 67.5ms +/- 0.2% 49.7ms +/- 0.3% significant format-xparb: 1.27x as fast 102.5ms +/- 0.2% 80.4ms +/- 0.2% significant math: 1.129x as fast 38.8ms +/- 0.5% 34.4ms +/- 0.7% significant cordic: 1.109x as fast 19.0ms +/- 0.5% 17.1ms +/- 0.5% significant partial-sums: 1.168x as fast 13.8ms +/- 0.9% 11.8ms +/- 1.0% significant spectral-norm: 1.110x as fast 6.1ms +/- 1.5% 5.5ms +/- 2.6% significant regexp: 1.095x as fast 44.1ms +/- 0.3% 40.2ms +/- 0.5% significant dna: 1.095x as fast 44.1ms +/- 0.3% 40.2ms +/- 0.5% significant string: 1.38x as fast 362.6ms +/- 0.1% 262.9ms +/- 0.1% significant base64: 1.35x as fast 16.2ms +/- 0.7% 12.0ms +/- 0.3% significant fasta: 1.42x as fast 75.3ms +/- 0.2% 53.1ms +/- 0.2% significant tagcloud: 1.26x as fast 99.4ms +/- 0.2% 78.7ms +/- 0.3% significant unpack-code: 1.49x as fast 140.5ms +/- 0.1% 94.2ms +/- 0.2% significant validate-input: 1.25x as fast 31.2ms +/- 0.4% 24.9ms +/- 0.5% significant
Assignee | ||
Comment 30•15 years ago
|
||
Ted, I am not very good at parsing configure scripts. Could you spell out what exactly I should try?
Comment 31•15 years ago
|
||
(In reply to comment #29) Great. The ICC team analyzed my submitted test and they suggest to use -fp-model precise or -fp-model source. The performance loss due to -fp-model source is higher than -fp-precise. But, it seems "fp-model precise" is sufficiently precise for us here. I'll post a link to a reference on the details for possible future use.
Assignee | ||
Comment 32•15 years ago
|
||
This is the current configure setting to build with ICC: AR="/opt/intel/Compiler/11.0/056/bin/ia32/xiar" CXX="icpc -m32 -ipo -fp-model precise" CC="icc -m32 -ipo -fp-model precise" ../configure --enable-optimization --disable-debug Also note that you must delete config.* from the build directory, otherwise configure is not picking up changes to CXX/CC.
Comment 33•15 years ago
|
||
(In reply to comment #30) > Ted, I am not very good at parsing configure scripts. Could you spell out what > exactly I should try? I don't have any suggestions, I'm just wondering whether "-prof-dir" no longer works. moh: do you happen to know?
Comment 34•15 years ago
|
||
prof-dir should work. It provides a convenient way of pointing profmerge and the compiler to the intended profile director. The prof-gen build should also remove the old .dyn/.dpi files from the profile directory. Otherwise, if a new change in the source results in the change of the control-flow graph, we'll end up having profile data (.dyn/.dpi files) that have different assumptions about the structure of the source. Later, in profmerge or prf-use phase, profmerge will pick the profile files in a given order and throws away inconsistent profile data (a warning is issued regarding the mismatch of profile data. it's good to always check for that warning). But this can have a major negative performance impact. I hope this is also taken care in the build scripts.
Comment 35•15 years ago
|
||
From the profile data, one can get nice coverage reports very easily. I opened a separate item for tracking. https://bugzilla.mozilla.org/show_bug.cgi?id=480603
Assignee | ||
Comment 36•15 years ago
|
||
The shell makefile is still just as broken as ever, but here are some updated numbers for an icc pgo build: ============================================ RESULTS (means and 95% confidence intervals) -------------------------------------------- Total: 760.4ms +/- 0.2% -------------------------------------------- 3d: 111.4ms +/- 0.3% cube: 33.9ms +/- 0.9% morph: 20.7ms +/- 0.7% raytrace: 56.8ms +/- 0.3% access: 113.1ms +/- 0.2% binary-trees: 36.1ms +/- 0.2% fannkuch: 47.0ms +/- 0.3% nbody: 18.8ms +/- 0.6% nsieve: 11.3ms +/- 1.1% bitops: 31.5ms +/- 0.8% 3bit-bits-in-byte: 1.4ms +/- 10.0% bits-in-byte: 7.7ms +/- 1.7% bitwise-and: 2.6ms +/- 5.6% nsieve-bits: 19.8ms +/- 0.5% controlflow: 32.0ms +/- 0.3% recursive: 32.0ms +/- 0.3% crypto: 43.8ms +/- 0.9% aes: 25.2ms +/- 1.2% md5: 11.9ms +/- 0.8% sha1: 6.8ms +/- 1.8% date: 110.6ms +/- 0.2% format-tofte: 53.7ms +/- 0.3% format-xparb: 56.9ms +/- 0.2% math: 22.8ms +/- 0.9% cordic: 8.1ms +/- 1.8% partial-sums: 9.0ms +/- 0.6% spectral-norm: 5.7ms +/- 2.4% regexp: 40.2ms +/- 0.3% dna: 40.2ms +/- 0.3% string: 255.1ms +/- 0.3% base64: 12.3ms +/- 1.1% fasta: 54.2ms +/- 0.4% tagcloud: 78.4ms +/- 0.6% unpack-code: 84.9ms +/- 0.3% validate-input: 25.3ms +/- 0.6%
Assignee | ||
Comment 37•15 years ago
|
||
v8-bleeding-edge vs icc build TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: *1.46x as slow* 518.7ms +/- 0.6% 754.9ms +/- 0.5% significant ============================================================================= 3d: *1.32x as slow* 83.0ms +/- 0.6% 109.6ms +/- 0.5% significant cube: *1.39x as slow* 23.8ms +/- 1.9% 33.1ms +/- 1.2% significant morph: 1.70x as fast 34.6ms +/- 1.1% 20.4ms +/- 1.8% significant raytrace: *2.28x as slow* 24.6ms +/- 1.5% 56.1ms +/- 0.4% significant access: *2.99x as slow* 37.5ms +/- 1.3% 112.1ms +/- 0.6% significant binary-trees: *11.6x as slow* 3.1ms +/- 7.3% 35.9ms +/- 0.6% significant fannkuch: *3.47x as slow* 13.4ms +/- 2.8% 46.5ms +/- 0.8% significant nbody: *1.108x as slow* 16.7ms +/- 2.9% 18.5ms +/- 2.0% significant nsieve: *2.60x as slow* 4.3ms +/- 8.0% 11.2ms +/- 2.7% significant bitops: 1.145x as fast 36.4ms +/- 1.4% 31.8ms +/- 1.4% significant 3bit-bits-in-byte: 2.07x as fast 3.1ms +/- 7.3% 1.5ms +/- 25.1% significant bits-in-byte: - 8.0ms +/- 0.0% 8.0ms +/- 0.0% bitwise-and: 3.70x as fast 10.0ms +/- 0.0% 2.7ms +/- 12.8% significant nsieve-bits: *1.28x as slow* 15.3ms +/- 2.3% 19.6ms +/- 1.9% significant controlflow: *11.0x as slow* 2.9ms +/- 7.8% 31.8ms +/- 0.9% significant recursive: *11.0x as slow* 2.9ms +/- 7.8% 31.8ms +/- 0.9% significant crypto: *1.191x as slow* 36.6ms +/- 1.4% 43.6ms +/- 2.5% significant aes: *1.47x as slow* 17.0ms +/- 0.0% 25.0ms +/- 3.8% significant md5: *1.175x as slow* 10.3ms +/- 3.4% 12.1ms +/- 1.9% significant sha1: 1.43x as fast 9.3ms +/- 3.7% 6.5ms +/- 5.8% significant date: *1.81x as slow* 60.9ms +/- 1.3% 110.0ms +/- 0.6% significant format-tofte: *1.52x as slow* 35.1ms +/- 1.2% 53.3ms +/- 0.6% significant format-xparb: *2.20x as slow* 25.8ms +/- 1.8% 56.7ms +/- 0.9% significant math: 1.99x as fast 44.9ms +/- 0.5% 22.6ms +/- 3.1% significant cordic: 2.21x as fast 17.7ms +/- 2.0% 8.0ms +/- 6.0% significant partial-sums: 2.17x as fast 19.5ms +/- 1.9% 9.0ms +/- 0.0% significant spectral-norm: 1.38x as fast 7.7ms +/- 4.5% 5.6ms +/- 6.6% significant regexp: *1.57x as slow* 25.5ms +/- 1.5% 40.1ms +/- 0.6% significant dna: *1.57x as slow* 25.5ms +/- 1.5% 40.1ms +/- 0.6% significant string: *1.33x as slow* 191.0ms +/- 0.7% 253.3ms +/- 0.6% significant base64: 1.59x as fast 19.4ms +/- 1.9% 12.2ms +/- 2.5% significant fasta: *1.94x as slow* 27.7ms +/- 1.2% 53.6ms +/- 0.7% significant tagcloud: *1.61x as slow* 48.4ms +/- 1.0% 77.9ms +/- 1.2% significant unpack-code: *1.27x as slow* 66.5ms +/- 0.8% 84.5ms +/- 0.4% significant validate-input: 1.155x as fast 29.0ms +/- 1.2% 25.1ms +/- 0.9% significant
Updated•13 years ago
|
Summary: TM: Build shell with icc on mac and linux and compare perf → Build shell with icc on mac and linux and compare perf
Assignee | ||
Updated•9 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•