Closed Bug 443237 Opened 17 years ago Closed 7 years ago

investigate int64 jsval slowdown on 32-bit x86

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID
mozilla1.9.1

People

(Reporter: brendan, Assigned: mohammad.r.haghighat)

Details

Attachments

(4 files)

The attached patch is fresh. If it has bugs, my apologies -- I hacked it up last week and IIRC it passed the JS testsuite, but you should reconfirm that. We run the suite by cd'ing to js/tests and typing ./jsDriver.pl -t -e smdebug -L lc2 lc3 spidermonkey-n.tests slow-n.tests after building the debug ("smdebug") SpiderMonkey js shell in js/src via make -f Makefile.ref I built optimized via make BUILD_OPT=1 OPTIMIZER=-Os -f Makefile.ref on my MacBook Pro (since Apple's gcc does better with -Os than stock gcc, IIRC, I always build optimized this way) and saw ~27% slowdown in the js-shell-based SunSpider benchmark (from http://webkit.org/) when I enlarged jsval to int64 but kept jsval int domain restricted to 31 bits. When I enlarged ints to 32 bits I got back 1-2%. Disappointing but not totally surprising to me. If there is an easy way to win back the lost perf, we are interested. If you guys can show why we lose perf, that would be helpful to guide future work. In any case, Moh kindly offered to help investigate this question, so I'm giving him the bug and patch. /be
The patch includes jsinterp.cpp and jsscope.cpp, while in the mozilla source tree these files are .c files. Is there any particular reason for this? Thanks. -- Carmen
Hi Carmen, we're using http://hg.mozilla.org/mozilla-central now, cvs.mozilla.org is for Mozilla 1.9.0.x maintenance releases (Firefox 3.0.x, I think) only. For more on Mercurial, see http://developer.mozilla.org/en/docs/Mercurial /be
Hello, I have built both the debug and optimized versions of the JS Shell using the original JS sources from Mozilla Central on Windows w/ the Visual C++ compiler. I have changed a few files in order to get the JS Shell to build, namely changed jsinterp.c to jsinterp.cpp in the rules.mk file on line 99 and added an explicit cast to uint32* in js.cpp on line 858. However, when I run jsDriver.pl I get the following errors: 1) Running jsDriver.pl w/ the debug, unoptimized, original version of Mozilla Central's JS Shell: ./jsDriver.pl -t -e smdebug -L lc2 lc3 spidermonkey-n.tests slow-n.tests -*- executing: ./../src/WINNT5.1_DBG.OBJ/js.exe -f ./shell.js -f ./js1_5/shell.js -f ./js1_5/Regress/shell.js -f ./js1_5/Regress/regress-281487.js -f ./js-test-driver-end.js An unhandled win 32 exception occurred in js.exe [1344]: js.cpp, line 272 : Unhandled exception at 0x610b12f4 in js.exe: 0xC0000005: Access violation reading location 0x00045174. jsDriver.pl resumed tests' execution after exiting the Visual Studio debugger. FINAL OUTPUT: -#- 22 test(s) failed ------------------------------------------------------------------------------------------------------------------- 2) Running jsDriver.pl w/ the optimized, original version of Mozilla Central's JS Shell: ./jsDriver.pl -t -e smopt -L lc2 lc3 spidermonkey-n.tests slow-n.tests -*- executing: ./../src/WINNT5.1_OPT.OBJ/js.exe -f ./shell.js -f ./e4x/shell.js -f ./e4x/decompilation/shell.js -f ./e4x/decompilation/decompile-xml-escapes.js -f ./js-test-driver-end.js An unhandled win 32 exception occurred in js.exe [6124]: _file.c, line 238: Unhandled exception at 0x7c918fea in js.exe: 0xC0000005: Access violation writing location 0x00000010. The error repeats for the subsequent tests. Note that this is the original code from mozilla.central. Also, in the debug version, this error occurs in js.cpp while in the optimized version, the error happens in _file.c. Do you know what is causing this problem? Should I be testing this on iMac instead? Thank you. -- Carmen
Can you post a stack dump? I will look at the code in the meantime from the line number info.
This looks like a weird bug. A stack dump would help a lot. In the meantime I suggest you switch to mac if possible. Brendan and I both use mac, so that tends to be what we test again.
Here's the call stack for the first error(from Visual Studio): js32.dll!61076ef4() [Frames below may be incorrect and/or missing, no symbols loaded for js32.dll] js32.dll!610853ff() js32.dll!61076d88() msvcr80.dll!78134c58() js32.dll!610b12cd() msvcr80.dll!78134c58() js32.dll!6100e113() js32.dll!61084d06() msvcr80.dll!7813ee63() js32.dll!61064205() js32.dll!6104b601() js32.dll!6108db72() js32.dll!610a4718() js32.dll!610a4718() js32.dll!61015794() js32.dll!6103b61a() js32.dll!610369f3() js32.dll!6102e19a() js32.dll!6103b415() js32.dll!6102e371() ntdll.dll!7c910732() ntdll.dll!7c910732() ntdll.dll!7c911596() js32.dll!6108ea39() js32.dll!6108ea8a() js32.dll!610a0d23() js32.dll!6108e021() js32.dll!6108dcdc() js32.dll!610a4718() js32.dll!6108db72() js32.dll!610a4718() js32.dll!6103b303() js32.dll!6102e19a() js32.dll!6102dd10() ntdll.dll!7c910e91() ntdll.dll!7c91056d() js32.dll!610a4718() ntdll.dll!7c91056d() msvcr80.dll!78134c39() msvcr80.dll!78134c58() js32.dll!6103c059() js32.dll!61015983() js32.dll!61086043() js32.dll!6108679b() js32.dll!610867b1() js32.dll!61062b59() js32.dll!6101375d() > js.exe!Process(JSContext * cx=0x00a0b398, JSObject * obj=0x00b11000, char * filename=0x00a05f26, int forceTTY=0x00000000) Line 272 + 0x16 bytes C++ js.exe!ProcessArgs(JSContext * cx=0x00a0b398, JSObject * obj=0x00b11000, char * * argv=0x00a05e84, int argc=0x0000000a) Line 500 + 0x19 bytes C++ js.exe!main(int argc=0x0000000a, char * * argv=0x00a05e84, char * * envp=0x00a041c0) Line 3940 + 0x15 bytes C++ js.exe!__tmainCRTStartup() Line 586 + 0x17 bytes C kernel32.dll!7c816ff7() Here's the call stack for the second error: ntdll.dll!7c918fea() [Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll] ntdll.dll!7c915041() ntdll.dll!7c915233() ntdll.dll!7c9155c9() ntdll.dll!7c90104b() > js32.dll!_lock_file(_iobuf * pf=0x0041ed20) Line 238 C js32.dll!getc(_iobuf * stream=0x0041ed20) Line 70 + 0x6 bytes C js32.dll!_js_fgets() + 0x35 bytes C++ js32.dll!_js_GetToken() + 0x285e bytes C++ js32.dll!_js_GetToken() + 0x6f0 bytes C++ js32.dll!_js_PeekToken() + 0x35 bytes C++ js32.dll!_js_CompileScript() + 0x102 bytes C++ js32.dll!_JS_CompileFileHandleForPrincipals() + 0x37 bytes C++ js32.dll!_JS_CompileFileHandle() + 0x16 bytes C++ js.exe!_main() + 0x877 bytes C++ js.exe!_main() + 0x660 bytes C++ js.exe!_main() + 0x1a6 bytes C++ js.exe!__tmainCRTStartup() Line 318 + 0x12 bytes C kernel32.dll!7c816ff7() In the meantime,I'll build the JS shell on the iMac we have. Thanks for your help, Carmen
Hello, I have built the debug and optimized versions of the original JS code from Mozilla Central on the iMac we have. So far I ran the tests using the jsDriver.pl for both versions and some tests failed. For the debug version, the following tests failed: e4x/decompilation/decompile-xml-escapes.js e4x/Expressions/11.1.4-08.js e4x/Global/13.1.2.1.js e4x/Namespace/regress-292863.js e4x/TypeConversion/10.2.1.js ecma/Math/15.8.2.6.js ecma_3/RegExp/regress-311414.js ecma_3/String/15.5.4.11.js ecma_3/String/regress-392378.js js1_5/extensions/regress-322957.js js1_5/Regress/regress-320119.js js1_7/geniter/regress-347739.js js1_7/geniter/regress-349012-01.js js1_7/geniter/regress-349331.js js1_7/iterable/regress-340526-02.js js1_7/lexical/regress-346642-03.js js1_7/regress/regress-410649.js The same tests, plus ecma/TypeConversion/9.3.1-3.js, failed for the optimized version. Are these failures to be expected? Thanks. -- Carmen
Yeah, those look familiar at a quick scan. You can baseline the set of failures against the unpatched tree, and use that to verify the patch. (We should get those tests fixed or excluded promptly, IMO, but I haven't been pulling my weight there lately. :/)
Hello, After building the debug and optimized versions of the patched JS code, the js-shell-based SunSpider benchmarks show a performance degradation of ~10% for the debug version and ~17% for the optimized version on the iMac I used. However, many more tests fail for the jsDriver.pl, namely 91 for the debug version and 69 for the optimized version. I will look into finding out why this is happening using Shark. -- Carmen
The 91 failures are expected. The debug version contains a lot of debug/assertions overhead, so I think you can focus on the optimized build. For the non-failing cases identifying the reason for th 17% slowdown would be very helpful. I am betting on register pressure (more L1 traffic), not memory bandwidth issues (L2 and misses), but lets see what shark has to say.
The bottleneck is of course more visible in the optimized build. Number of retired instructions is also a good starting metric.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
91 failures with the debug js shell is too many -- I got 17 the other day. Let me refresh the patch and test myself... /be
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
We have some internal tools for diff'ing the performance characteristics of a binary on two workloads or two builds of the same source using different compiler options. I'm not sure if it is available on iMac. Will investigate and provide an update.
Is sunspider working with the patch? Or is that failing like the 91 test cases?
The shell-based Sunspider works with the patch on iMac.
On Win32, it turned out that the original debug version is built with the -MD option while the optimized version is not using this switch. The crash of the optimized version (still without the patch) would occur in the system fgetc when running js.exe on Sunspider. We tried the -MD option on the optimized build. The problem goes away and Sunspider runs to completion. The optimized version with the patch still crashes even with the -MD option. The crash occurs in jsgc.cpp. On its build, however, there are a lot of warnings about potential loss of data in the jsval to int32 conversion. These warning are not generated on the Mac build. Are they safe to ignore? Some of the warnings: jsgc.cpp(2240) : warning C4244: 'argument' : conversion from 'jsval' to 'uint32', possible loss of data jsgc.cpp(2555) : warning C4244: '=' : conversion from 'jsval' to 'uint32', possible loss of data jsgc.cpp(2675) : warning C4244: 'argument' : conversion from 'jsval' to 'uint32', possible loss of data jsgc.cpp(2682) : warning C4244: 'argument' : conversion from 'jsval' to 'uint32', possible loss of data jsgc.cpp(2701) : warning C4244: 'argument' : conversion from 'jsval' to 'uint32', possible loss of data jsgc.cpp(2703) : warning C4244: 'argument' : conversion from 'jsval' to 'uint32', possible loss of data jsgc.cpp(2705) : warning C4244: 'argument' : conversion from 'jsval' to 'uint32', possible loss of data jsgc.cpp(2747) : warning C4244: 'argument' : conversion from 'jsval' to 'uint32', possible loss of data jsgc.cpp(2809) : warning C4244: 'argument' : conversion from 'jsval' to 'uint32', possible loss of data jsgc.cpp(2816) : warning C4244: 'argument' : conversion from 'jsval' to 'uint32', possible loss of data jsgc.cpp(2822) : warning C4244: 'argument' : conversion from 'jsval' to 'uint32', possible loss of data jsgc.cpp(2851) : warning C4244: 'argument' : conversion from 'jsval' to 'uint32', possible loss of data
On Win32, the debug build of the patched version dies on this assertion failure: Line 1768 of jsgc.cpp: /* Try to get thing from the free list. */ thing = arenaList->freeList; if (thing) { arenaList->freeList = thing->next; flagp = thing->flagp; JS_ASSERT(*flagp & GCF_FINAL); The value of *flagp is 0x48 and GCF_FINAL is 32: 0x48: 0010 1000 32: 0001 0000 and: 0000 0000
(In reply to comment #17) oops! 0x48: 0100 1000 32: 0010 0000 and: 0000 0000
Update: We profiled the original and patched version of the JS Shell with Shark on the Mac OS X (there was a bit of trouble as Shark will not show mixed source and assembly unless we built the JS Shell with the -gstabs+ option). I'll be looking into why the slowdown happens next. Still it would be good to have the Win32 version of the patch working as VTune is not supported on Mac OS X. -- Carmen
Update: I have looked at the generated code for Mac OS X for both the original as well as the patched versions. The performance degradation of the patch on the optimized build is consistently ~17%. According to Shark, the only function that shows a major sample increase is the js_interpret function (no surprise). For this function, the increase in the samples is ~4.3%. I manually inspected some of the hot spots, and from the assembly code can see the extra instructions. So far, the extra instructions have been all extra moves to take care of the 64 bit vs. 32 bit data movements. At the first glance, it does not seem thee extra instructions are spill/reloads. I plan to do a more thorough analysis by breaking down the instructions. We have a Pin tool (our binary instrumentation tool) that can help here. Unfortunately, it is not supported on Leopard. It works on Linux & windows. But, the patch doesn't work on Windows. So, I'll try to see if I can get it to work on Linux. In that case, we would be able to have a better understanding of the instruction mix profile with and without the patch. If this doesn't work, I'll try to write scripts to statically find the instruction mix from the compiler generated assembly listings. If you have a suggestion, please let me know. Thanks.
I can't be much help with Windows, but if you have build issues with Linux I can take a look and we should be able to get that to work easily. I am building tracemonkey on mac & linux now and its mostly in sync with mozilla central, minus the jsval64 patch. Can you post an example of the code differences? No spills is definitively good news. I wonder what exactly happens in the code. Did you succeed building the code on mac in 64-bit mode? (-m64) I wonder what the code looks like with more and wider registers.
Hopefully there won't be any issues on Linux, but if there's any trouble I'll definitely ask you for help :). Here is the assembly code generated for the original JS Shell version for line 5539 in jsinterp.cpp (PUSH_OPND(fp->vars[slot])): mov esi, dword ptr [ebp-396] mov ecx, dword ptr [ebp-76] mov edx, dword ptr [esi+52] mov eax, dword ptr [edx+eax*4] mov dword ptr [ecx], eax lea eax, dword ptr [ecx+4] mov dword ptr [ebp-76], eax and here's the code for the same line with the patched version of the JS Shell: mov esi, dword ptr [ebp-740] mov ecx, dword ptr [ebp-92] mov edx, dword ptr [esi+56] lea edx, dword ptr [edx+eax*8] mov eax, dword ptr [edx] mov edx, dword ptr [edx+4] mov dword ptr [ecx], eax lea eax, dword ptr [ecx+8] mov dword ptr [ecx+4], edx mov dword ptr [ebp-92], eax We haven't tried building in 64-bit mode yet... -- Carmen
This is really disappointing. Looks like the loads & stores do not get fused and generate more bus traffic. I will hack up a microbenchmark to confirm this.
Looks like Brendan was right all along. Even when comparing 32-bit x86 with 32-bit access to 64-bit data movements in 64-bit x86 mode we take tank performance by 18%. 64-bit data movements in 32-bit mode have 80% overhead. Not sure icc would produce different results. Its probably difficult to schedule this tight loop differently. Pretty disappointing. This might mean a dead-end for 64-bit slots on x86. 64-bit x86 mode might we worth a second look. The 18% overhead we might be able to recover (big maybe though) from other wins (no GC for doubles etc). #define I long #define L long long TT a[4096]; TT b[4096]; int main() { int i, j; for (i = 0; i < 500000; ++i) for (j = 0; j < 4096; ++j) a[j] = b[j]; } h-233:tmp gal$ touch test.c && make CFLAGS="-O6 -DTT=I -m32" test cc -O6 -DTT=I -m32 test.c -o test h-233:tmp gal$ time ./test real 0m1.671s user 0m1.658s sys 0m0.005s h-233:tmp gal$ touch test.c && make CFLAGS="-O6 -DTT=I -m64" test cc -O6 -DTT=I -m64 test.c -o test h-233:tmp gal$ time ./test real 0m1.981s user 0m1.966s sys 0m0.005s h-233:tmp gal$ touch test.c && make CFLAGS="-O6 -DTT=L -m32" test cc -O6 -DTT=L -m32 test.c -o test h-233:tmp gal$ time ./test real 0m2.944s user 0m2.920s sys 0m0.008s h-233:tmp gal$ touch test.c && make CFLAGS="-O6 -DTT=L -m64" test cc -O6 -DTT=L -m64 test.c -o test h-233:tmp gal$ time ./test real 0m1.981s user 0m1.966s sys 0m0.005s
similar results on linux x86_64 (core 2 duo) dvander@mindknight:~/mozilla/tracemonkey/js/src$ cc test.c -O6 -DTT=I -m32 -otest dvander@mindknight:~/mozilla/tracemonkey/js/src$ time ./test real 0m0.968s user 0m0.964s sys 0m0.000s dvander@mindknight:~/mozilla/tracemonkey/js/src$ cc test.c -O6 -DTT=I -m64 -otest dvander@mindknight:~/mozilla/tracemonkey/js/src$ time ./test real 0m2.168s user 0m2.164s sys 0m0.004s dvander@mindknight:~/mozilla/tracemonkey/js/src$ cc test.c -O6 -DTT=L -m32 -otest dvander@mindknight:~/mozilla/tracemonkey/js/src$ time ./test real 0m3.078s user 0m3.080s sys 0m0.000s dvander@mindknight:~/mozilla/tracemonkey/js/src$ cc test.c -O6 -DTT=L -m64 -otest dvander@mindknight:~/mozilla/tracemonkey/js/src$ time ./test real 0m2.140s user 0m2.136s sys 0m0.004s
Actually the 2nd test case is off. long is 64-bit in 64-bit mode. Performance for that probably matches the 1st case if we had used #define I int.
proc: Dual Core AMD Opteron(tm) Processor 170 dvander@hayate:~$ cc test.c -O6 -DTT=I -m32 -o test dvander@hayate:~$ time ./test real 0m5.547s user 0m5.548s sys 0m0.000s dvander@hayate:~$ cc test.c -O6 -DTT=I -m64 -o test dvander@hayate:~$ time ./test real 0m5.744s user 0m5.728s sys 0m0.004s dvander@hayate:~$ cc test.c -O6 -DTT=L -m32 -o test dvander@hayate:~$ time ./test real 0m11.795s user 0m11.797s sys 0m0.000s dvander@hayate:~$ cc test.c -O6 -DTT=L -m64 -o test dvander@hayate:~$ time ./test real 0m8.490s user 0m8.481s sys 0m0.004s
David corrected long -> int for the 32-bit case. There is still serious slowdown for 64-bit data movement even in 64-bit mode and AMD and Intel seem to behave identically.
Once we get the Pin dynamic instruction-mix profiles, we'll have a better understanding.
Status: REOPENED → ASSIGNED
Update: I've built the original and patched optimized versions of the JS Shell on a Linux machine. With the jsDriver.pl, 16 tests fail for the original JSShell and 70 tests fail for the patched JSShell. I have also run the SunSpider benchmarks. On Linux, the patched optimized JSShell shows only a 9.2% performance degradation compared to the original optimized JSShell. This is much better a situation than on Mac, where the overhead was 17%. I will see what further info we can get regarding dynamic instruction-mix profiles using Pin. -- Carmen
Very interesting! I am not worried about the 70 fails for now.
Are you compiling with gcc or with icc? David is telling me gcc on Mac is from the last millenium. Linux+icc might be quite interesting if that works out of the box.
Thi is still with gcc. The situation with icc can be different or similar. I think, next we'll do the instruction mix study before trying icc.
Update: I've uploaded the instruction mix results for both the original and the patched version on one run of Sunspider. We do have the results for all functions, but the output size was larger than what Bugzilla allowed. So, I include the results of only js_interpret and the total. Here is the sorted list of the summary: type original patched %increase mem-write-4 1691677705 2167068651 28.1 stack-write 1420936462 1752317798 23.3 mem-read-4 2857687955 3224465488 12.8 stack-read 1893735049 2099248483 10.9 mem-read-1 398201620 399689215 0.4 mem-read-variable 2018520 2019241 0.0 mem-write-variable 2261468 2262255 0.0 mem-atomic 48 48 0.0 mem-write-1 14588693 14574886 -0.1 mem-read-2 99033945 96835214 -2.2 mem-write-2 33464422 32001450 -4.4 mem-read-8 39821162 37870002 -4.9 mem-write-8 47275542 44860871 -5.1 TOTAL 9313636391 10742292707 15.3 The attached files show the number of executed instructions of each possible type. Meanwhile, I'll take a closer look, you're welcome to do so as well.
A comparative study of instruction mix of JSShell and TT would also be interesting.
Update: I have also built the original and patched version of the JS Shell on iMac using the Intel C++ Compiler as follows: - w/ 64-bit ICPC (BUILD_OPT=1 OPTIMIZER=-Os): the patched JS Shell shows a 1.7% improvement over the original JS Shell on the Sunspider benchmarks This is encouraging news. - w/ 32-bit ICPC (BUILD_OPT=1 OPTIMIZER=-O2): the patched JS Shell shows ~10% performance degradation over the original JS Shell on the Sunspider benchmarks On iMac, out of the compilers I've used to build the original JS Shell(g++ -Os, 64-bit ICPC -Os and 32-bit ICPC -O2), the JS Shell built w/ the 32-bit ICPC does sligtly better than the others on the Sunspider benchmarks. All these results are without PGO.
What do you see on 32-bit with OPTIMIZER=-Os? We generally do better with -Os than with -O2, maybe that's not the case here?
With the 32-bit ICPC and the -Os option, the JS Shell crashes with a bus error. Someone from the Intel compiler team is looking into this specific problem.
I used vprof to profile the Mozilla Central JS Shell on Sunspider benchmarks. Here are the execution counts of all the executed JS OPs. Any ideas on why there are so many "nops"? This is an optimized version: BUILT_OPT=1, OPTIMIZER=-Os. Thanks, Carmen OP COUNT ------ -------- getvar 66437114 nop 35041756 getarg 16673490 setvar 7608591 add 5810054 lt 5520459 getelem 5127731 int8 5117817 name 4198866 one 3922465 call 3810889 bitand 3789341 return 3519948 mul 3032927 varinc 2984431 getgvar 2874105 setelem 2829807 setname 2450781 sub 2418990 rsh 2402669 callname 2101845 lsh 1725135 zero 1617453 uint16 1604418 pop 1577637 callvar 1515630 ifeq 1427404 callarg 1318400 goto 1172378 bindname 1052321 getprop 940707 forvar 750556 getvarprop 736472 le 676907 div 659412 popv 652447 uint24 650968 getthisprop 649996 gvarinc 608725 dup 603585 callprop 599996 eq 584141 gt 552905 dup2 501730 bitnot 433304 group 365130 FALSE 359971 neg 352021 incvar 344020 ge 340196 ifne 312143 getargprop 301731 getxprop 265970 not 263939 bitxor 258988 TRUE 258248 bitor 247140 stop 241568 this 218677 mod 217847 or 172517 double 161927 string 159337 null 158148 setprop 127892 initelem 87504 ne 70076 new 68861 enditer 58505 iter 58505 ursh 52617 setarg 45586 length 45117 incgvar 43215 endinit 38347 newinit 38347 vardec 32045 nameinc 31574 int32 12613 and 11681 argdec 9830 setgvar 9123 stricteq 7502 typeof 7502 strictne 7500 initprop 5037 regexp 4192 propinc 408 anonfunobj 226 deffun 137 forname 132 defvar 92 push 4 deflocalfun 1 lineno 1
(In reply to comment #39) > With the 32-bit ICPC and the -Os option, the JS Shell crashes with a bus error. > Someone from the Intel compiler team is looking into this specific problem. > The person from the Intel compiler team who was looking into this problem got back to me. He identified the problem to be the jsdhash.cpp file. As such I have built the original and patched versions of the Mozilla Central JS Shell using the 32-bit Intel C++ compiler with the OPTIMIZER=-Os option and only the jsdhash.cpp file with the -O2 option (on iMac). I have also built the orginal and patched JS Shell w/ PGO on jsinterp.cpp only. I ran the Sunspider benchmarks using all the built versions of JS Shell. Here are the performance results: BUILD ORIGINAL PATCHED %DIFF ICPC32+O2 2114.7 2270 7.34 ICPC32+Os* 2164.3 2322.6 7.31 ICPC32+O2+PGO_jsinterp.cpp 2015.2 2130.1 5.70 ICPC32+Os*+PGO_jsinterp.cpp 2151.7 2295 6.66 *jsdhash.cpp built w/ -O2 Out of all the builds, the combination that has the smallest performance degradation is using ICC w/ -O2 & PGO and that is 5.7%. Moreover, this combination gives the best raw numbers as well. -- Carmen
(In reply to comment #40) > I used vprof to profile the Mozilla Central JS Shell on Sunspider benchmarks. > Here are the execution counts of all the executed JS OPs. Any ideas on why > there are so many "nops"? This is an optimized version: BUILT_OPT=1, > OPTIMIZER=-Os. > > Thanks, > Carmen > > OP COUNT > ------ -------- > getvar 66437114 > nop 35041756 How did you instrument these ops? Could you attach the patch you applied? /be
(In reply to comment #42) > (In reply to comment #40) > > I used vprof to profile the Mozilla Central JS Shell on Sunspider benchmarks. > > Here are the execution counts of all the executed JS OPs. Any ideas on why > > there are so many "nops"? This is an optimized version: BUILT_OPT=1, > > OPTIMIZER=-Os. > > > > Thanks, > > Carmen > > > > OP COUNT > > ------ -------- > > getvar 66437114 > > nop 35041756 > > > How did you instrument these ops? Could you attach the patch you applied? > > /be > After having vprof.cpp and vprof.h as part of the build, the only thing you need to do is redefine the BEGIN_CASE(OP) macro as follows: #define BEGIN_CASE(OP) case OP: _nvprof(js_CodeName[op], 0); But, in order to get the js_CodeName[op] right in an OPTIMIZED build, lines 108 and 118 in jsopcode.cpp need to be uncommented. These are the lines that guard the initialization of js_CodeName. vprof.cpp and vprof.h are in the Tamarin Tracing distribution under the vprof directory. I have done other changes also to my copy of mozillacentral, so a patch would show those up too. If it's really necessary, I can create a patch off another copy of mozillacentral. -- Carmen
vprof.cpp and vprof.h are available in the tracemonkey repository
The threaded interpreter case handles JSOP_NOP with ADD_EMPTY_CASE which in turn involves BEGIN_CASE, but you show a case OP: label, which says you are using the non-threaded (switch in a for (;;) loop) version -- which does not layer empty cases such as JSOP_NOP on top of BEGIN_CASE. Did you modify the !JS_THREADED_INTERP definition of ADD_EMPTY_CASE too? Again a patch, even by email, would help me render help here most quickly. /be
Oops, I'm wrong: ADD_EMPTY_CASE is layered on BEGIN_CASE in both versions, threaded and switch-loop. Still, I find the NOP count hard to understand. So if you could mail me the patch, that would be ideal. Thanks, /be
Ok, I'll get a fresh copy, apply my changes and create a patch. FYI, the vproffed JS Shell results were collected on Windows. -- Carmen
The attached patch enables the collecting of execution counts for each executed JS opcode. I have built the vproffed JS Shell (from Mozilla Central) on Windows w/ the Visual 2008 C++ compiler as follows: make BUILD_OPT=1 OPTIMIZER="-Os -MD" -f Makefile.ref -- Carmen
The value representation has changed since this bug was filed
Status: ASSIGNED → RESOLVED
Closed: 17 years ago7 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: