Closed Bug 507746 Opened 15 years ago Closed 8 years ago

Compare Microbenchmark Performance

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: wagnerg, Unassigned)

References

Details

Attachments

(2 files)

Attached file Testfile
The results of my attached testfile are: no JIT -j Function: 39293 39475 Array: 4667 1553 Number: 4450 1986 String: 4447 2028 Boolean: 4347 1867 Date: 6561 4052 RegExp: 9729 7192 Overall: 73494ms 58154ms And now with a different order (functions at the end): Array: 4469 1309 Number: 4421 1963 String: 4377 1936 Boolean: 4302 1793 Date: 6529 4010 RegExp: 10147 7487 Function: 62508 62700 Overall: 96754ms 81198ms There is a huge difference if I allocate functions at the begin or at the end of the test and there is no speedup for function allocation with JIT enabled.
This is really good. But now I have some more suggestions on how to turn this into a proper ubenchmark suite. :-) - Separate tests into individual files - Create a test harness that: - can run the files in any order - inserts the timing code itself (e.g., using -e or -f arguments to js) so that less boilerplate is required in the test cases - generates the loops itself or otherwise allows control over the number of iterations - can do comparisons with various VMs (incl SFX and V8) - can compare with empty loop perf, to get potentially more direct measurements of the time taken by the operation being tested.
Yeah my benchmark file could be better... :) As for the numbers, it seems GC and allocation are not to blame for the time difference. 207:src mozilla$ ./OPT.OBJ/js ./empty.js Function: 39893 Number: 5900 String: 5739 Boolean: 5618 Date: 7789 RegExp: 11056 Array: 5909 Overall: 81905ms Ticks in GC: 11981960340, 5.99 sec Ticks in NewGCThing: 32510511444, 16.2 sec 207:src mozilla$ ./OPT.OBJ/js ./empty.js Number: 5625 String: 5687 Boolean: 5570 Date: 7821 RegExp: 11362 Array: 5752 Function: 63180 Overall: 104997ms Ticks in GC: 11914319808, 5.95 sec Ticks in NewGCThing: 32676061224, 16.34 sec
(In reply to comment #2) > Yeah my benchmark file could be better... :) To be clear, I wasn't criticizing. I do think that we need the things I mentioned and they're on my personal to-do list but they'll probably have to wait until I'm all done with closures, so if you would enjoy doing them, that would be excellent.
It seems js_PCToLineNumber is the "slow" function. In the "fast" version I spend about 8% of the whole time in this function. If I allocate functions at the end of the test, I spend about 33% of the total time in this function. In seconds: 22 vers. 4.1
Here is output for some recentish builds: sm tm sfx v8 add.js 80.3 4.3 3.3 138.3 bitand.js 72.3 4.4 2.8 137.1 bitor.js 72.5 4.4 3.3 141.5 bitxor.js 74.6 4.3 4.3 142.4 call.js 164.0 4.0 9.0 147.0 div.js 99.2 14.7 31.8 185.0 double.js 60.8 4.0 3.0 141.5 function.js 305.0 284.0 105.0 322.0 getprop.js 80.8 14.4 5.3 136.0 int.js 58.0 3.4 2.8 134.7 mul.js 75.6 5.0 3.7 140.5 new_date.js 504.0 280.0 212.0 331.0 new_object.js 334.0 117.0 67.0 176.0 new_object_1prop.js 498.0 278.0 57.0 163.0 new_object_2prop.js 566.0 295.0 70.0 170.0 new_object_braces.js 288.0 90.0 50.0 170.0 new_regexp.js 732.0 490.0 914.0 1600.0 new_string.js 346.0 126.0 78.0 181.0 setprop.js 82.4 8.0 4.5 14.6 sub.js 80.4 4.1 3.7 147.6 [*] Numbers are time per iteration in microseconds Standout items: call.js Inlining FTW! Still 16x slower than WebKit in interpreter. major opportunities for improvement: getprop.js 3x slower than WebKit with tracing setprop.js 2x slower than WebKit *object*.js 2x slower than WebKit (5x slower with properties in {} syntax)
There is an easy explanation for the slowdown with the functions allocated at the begin or at the end of the benchmark. For each function js_PCToLineNumber is called and it traverses the whole script until the function is found. If the line number is 36 instead of 5, it takes about 7 times longer until the line number is found. Is this really necessary in the optimized build? Is it possible/worth to store previous found line numbers?
Can you show some representative stacks for js_PCToLineNumber? Would be good to know where we're calling it here, since in the non-error case I can't recall why we would use that at all.
The calls are from the Function constructor. Old bug, I argue low priority -- but a bug to be sure. /be
(In reply to comment #5) Thanks, Dave. > Standout items: > > call.js Inlining FTW! Still 16x slower than WebKit in interpreter. Bug 471425 at least. Nitro has really optimized call overhead down, we should do likewise. > major opportunities for improvement: > > getprop.js 3x slower than WebKit with tracing > setprop.js 2x slower than WebKit > *object*.js 2x slower than WebKit (5x slower with properties in {} syntax) These are all with tracing, right? We have bugs on shape guard issues, but beyond those we have too many guards. With brute-force invalidation of cached code on rare, hazardous events, we should aim to guard once per get, set, or init. Need a bug on this. /be
(In reply to comment #9) > (In reply to comment #5) > > Thanks, Dave. > > > Standout items: > > > > call.js Inlining FTW! Still 16x slower than WebKit in interpreter. > > Bug 471425 at least. Nitro has really optimized call overhead down, we should > do likewise. Yes. It will probably be many small steps. I also hope that the activation record formats will become more similar, which may make upvar/arguments-type tracing code simpler and more efficient. > > major opportunities for improvement: > > > > getprop.js 3x slower than WebKit with tracing > > setprop.js 2x slower than WebKit > > *object*.js 2x slower than WebKit (5x slower with properties in {} syntax) > > These are all with tracing, right? Yes. > We have bugs on shape guard issues, but beyond those we have too many guards. > With brute-force invalidation of cached code on rare, hazardous events, we > should aim to guard once per get, set, or init. Need a bug on this. Yes. Next step for me is to inspect the x86 and builtin code to see where the differences and inefficiencies are.
Results with current JS shell. INTERP TM JM JM+TI Array: 3033 3037 943 833 Number: 2414 2446 845 839 String: 2874 2866 1156 1140 Boolean: 2462 2495 900 929 Date: 22796 22864 20366 20347 RegExp: 4100 4247 2343 2312 Function: 43420 43371 41923 44138 Overall: 81101ms 81329ms 68481ms 70540ms Looks like JM is a clear win across the board. JM+TI seems to be a mixed bag vs. plain JM. Sorry to say that I don't have a Webkit or d8 shell handy to test.
Blocks: 467263
Summary: TM: Microbenchmark Performance → Compare Microbenchmark Performance
d8 Results: Array: 232 Number: 437 String: 207 Boolean: 497 Date: 1685 RegExp: 2335 Function: 9653 Overall: 15050ms
Assignee: general → nobody
I tried this now, with Firefox 31 (release), Nightly 34 and Chrome 36 (release). I changed the Function bench to be 10x shorter. Nightly has some things turned on that might slow down a bit (GC poisoning or something?), right? But I don't know if that changed something here. Firefox 31 Array: 32 Number: 1109 String: 1051 Boolean: 953 Date: 6784 RegExp: 2184 Function: 6298 Nightly 34 Array: 31 Number: 1080 String: 1177 Boolean: 910 Date: 2697 RegExp: 3135 Function: 8580 Chrome 36 Array: 197 Number: 398 String: 134 Boolean: 401 Date: 3018 RegExp: 2085 Function: 1010 The improvement on Date should be because I'm on Win7 and Jan de Mooij made a huge improvement on 'new date' 1 or 2 months ago. The RegExp regression might be a regression from Irregexp vs YARR? The Function regression I have no idea. The worst cases for Firefox are 'String' and 'Function', compared to Chrome.
Component: JavaScript Engine → JavaScript Engine: JIT
OS: Mac OS X → All
Hardware: x86 → All
(In reply to Guilherme Lima from comment #13) > I changed the Function bench to be 10x shorter. Nightly has some things > turned on that might slow down a bit (GC poisoning or something?), right? You can turn this off by running with the environment variable JSGC_DISABLE_POISONING set. For instance, from a batch file: set JSGC_DISABLE_POISONING=1 "path/to/firefox.exe" -no-remote -profile "path/to/empty/profile"
New results: sm v8 sub.js 1.9 2.3 int.js 1.8 2.3 new_string.js 70.0 16.0 new_regexp.js 341.0 124.0 function.js 10.0 15.0 mul.js 1.7 2.2 new_object_2prop.js 5.0 15.0 bitor.js 2.1 2.3 double.js 1.8 2.2 new_object_braces.js 6.0 15.0 bitand.js 2.3 2.2 new_object.js 150.0 30.0 setprop.js 2.0 2.0 getprop.js 1.7 2.2 call.js 2.0 3.0 new_object_1prop.js 6.0 13.0 new_date.js 106.0 98.0 add.js 2.2 2.2 div.js 2.0 9.2 bitxor.js 2.1 2.2 At this point, it seems that object creation in general is a bit slower, but we could open new bugs to track issues there: - the ubench code is great, but it creates empty loops in which variables are not getting used, so they're very likely to be dead-code-eliminated (DCE'd). Result values that are different from a few milliseconds just betray the fact that the engines don't DCE this code, so this is something we could investigate (or maybe it's just the time needed for the JITs to kick in). - although it does have its cons, http://jsperf.com/ does a good job at micro-benchmarking tight loops as well, and has better ergonomics (no need to build a shell, anyone can use their browser instead). - for micro-benchmarks, arewefastyet contains a suite dedicated to this purpose, so we can keep track of regressions and differences across different browsers/shells: https://arewefastyet.com/#machine=29&view=breakdown&suite=asmjs-ubench Also, we have other bugs for the fact that objects creation is slow. Closing.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: