Hmm, running Firebug, it seems most of the time is spent in memory allocation functions, which are called a lot but do very little (stackAlloc(), stackEnter(), stackExit(), etc.). So perhaps the efficiency of function calls is a factor here. (The generated code should be much more efficient, for sure - focus so far has been on accuracy, not performance.)
I see a fair amount of memory traffic in the profile: finalizer thread is 15% of the total samples; allocating arrays (not the GC, the allocations) is 10%. Other than that, many methodjit stub calls. Alon, I wouldn't trust the Firebug profiler to tell you anything useful at all...
> Alon, I wouldn't trust the Firebug profiler to tell you > anything useful at all... I guess not, but I was hoping at least the order would be meaningful. Anyhow it does seem that there are many, many calls to short functions. Perhaps v8 does better at that sort of thing?
Thanks, yeah, typed arrays are on my todo list. It isn't trivial since I'll need separate arrays for ints and floats, and another for everything else. But looks like typed arrays help a simple benchmark of compiled C++ code, fannkuch (that happens to only use ints, so was easy to test) by 30%.
Ok, this is with inlining all the small method calls. The previous code was bottlenecked on that entirely - this runs several times faster, and is closer to what code compiled from C++ should look like. The difference between V8 and SM remains: V8 takes 5.2 seconds, trunk TraceMonkey takes 17.8 seconds, which is almost 3.5X slower.
Attachment #480974 - Attachment is obsolete: true
Given that TM would inline those anyway, sounds like we weren't tracing this? Or do you mean you inlined the C calls (thus not having to do the stack setup etc)?
I meant that I inlined when compiling the C++ to JS. So before there were calls to stackEnter() in the generated JS code, and I replaced them with the contents of that function. The original C++ was not changed.
OK, so yeah, the tracer would have done that anyway.
Hmm, perhaps it isn't surprising that those weren't traced - those function calls were not inside loops, they were done right at the beginning of a function, before any loops.
Right, but are those functions in any loops? I guess maybe not well enough to trace.
Inlining in the C++ to JS translation process can indeed win more than current JS-level optimizations. Getting TMFLAGS=stats output would be good. /be
Ok, did pulls and clean rebuilds for everything, here is some better data (running on a faster machine): tm 7.61 seconds tm -j 5.66 tm -m 3.10 tm -m -j 3.11 v8 1.91 (tm = tracemonkey trunk). So, the difference is around 60%. Results with TMFLAGS=stats: recorder: started(51), aborted(39), completed(56), different header(0), trees trashed(15), slot promoted(0), unstable loop variable(3), breaks(32), returns(4), merged loop exits(0), unstableInnerCalls(10), blacklisted(530) monitor: exits(630), timeouts(0), type mismatch(0), triggered(630), global mismatch(4), flushed(4)
Same benchmark, after more optimizations, including passing through Closure Compiler's advanced optimizations. The code is now somewhat readable, unlike before, so hopefully easier to figure out what would make it faster. In particular it now looks like it would greatly benefit from a COPYELEM bytecode. Benchmarks: v8: 2.63 seconds tracemonkey -m -j: 4.10 seconds (55% slower)
Attachment #481422 - Attachment is obsolete: true
And here is a version that uses typed arrays. Benchmarks: v8: 2.54 tracemonkey -m -j: 3.32 (30% slower) So, this is better than without typed arrays, but not tremendously so. Interestingly though, in other benchmarks with typed arrays, tracemonkey beats v8.
I tried to adapt this to a browser test so I could run with Chrome 10, but it seems not to work in Chrome.
You can see this code live here http://www.syntensity.com/static/raytrace.html It prints out how long it takes to run. On my laptop I get 2.91 seconds in Firefox (nightly) and 3.53 seconds in Chrome 10. I suspect typed arrays are a factor here. So, looks good!
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
At least until they make typed arrays work with Crankshaft... ;)
You need to log in before you can comment on or make changes to this bug.