Last Comment Bug 762561 - Specialize big typed arrays with a singleton type
: Specialize big typed arrays with a singleton type
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
-- normal (vote)
: mozilla16
Assigned To: Brian Hackett (:bhackett)
: Jason Orendorff [:jorendorff]
Depends on: 769433 785776
Blocks: gecko-games
  Show dependency treegraph
Reported: 2012-06-07 10:05 PDT by Alon Zakai (:azakai)
Modified: 2012-08-26 19:34 PDT (History)
8 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

basic patch (e758973c6ab1) (9.50 KB, patch)
2012-06-07 17:24 PDT, Brian Hackett (:bhackett)
dvander: review+
Details | Diff | Splinter Review

Description User image Alon Zakai (:azakai) 2012-06-07 10:05:24 PDT
In compiled code from Emscripten and other compilers, memory is implemented as a big typed array. Speeding that up would give improvements across the board in compiled code benchmarks. luke and bhackett say 

> we can give big typed arrays a singleton type so
> that we could further specialize get/set element
> paths to individual typed arrays.  That would allow
> us to bake in the element base and hardcode the limit in
> the bounds check.  That take us down from ~12 ops to do
> a get/set to 3-5.
Comment 1 User image Brian Hackett (:bhackett) 2012-06-07 17:24:53 PDT
Created attachment 631224 [details] [diff] [review]
basic patch (e758973c6ab1)

Basic patch that gives singleton types to typed arrays allocated in global code, and optimizes GETELEM/SETELEM based on them by baking in lengths and base addresses.  This doesn't do LICM based on such arrays, e.g. we will keep testing the length on accesses in a loop and will keep on loading the constant base into a register.  Fixing this wouldn't be horribly complicated but I don't know if this is necessary (IM will do better with LICM and CSE) and anyways it should be done in a separate patch.  Times:

var memory = new Int32Array(10000);
function foo() { 
  var n = 0;
  for (var i = 0; i < 10000; i++) {
    for (var j = 0; j < 10000; j++)

Before: 186
After:  111

var memory = new Int32Array(10000);
function foo() { 
  var n = 0;
  for (var i = 0; i < 10000; i++) {
    for (var j = 0; j < 10000; j++)
      memory[j] = j;

Before: 194
After:  148

This microbenchmark will be pretty sensitive to regalloc etc. and probably not representative of behavior in general.  Wondering how this does on the tests of interest.
Comment 2 User image Alon Zakai (:azakai) 2012-06-07 18:23:21 PDT
Some results:

corrections: unchanged
dlmalloc: 8% faster
fannkuch: 15% faster
fasta: unchanged
memops: 7% faster
primes: unchanged
skinning: 10% faster

So looks very nice.

However, I see no speedup on larger benchmarks, box2d and bullet ( ), where I was hoping/expecting for an improvement. They do plenty of memory accesses, but also a lot of float math. Any ideas?
Comment 3 User image Luke Wagner [:luke] 2012-06-07 19:28:28 PDT
It would be useful to see what % of element reads/writes are seeing the singleton array type; perhaps something about the large benchmarks is confounding TI.
Comment 4 User image Brian Hackett (:bhackett) 2012-06-08 07:52:09 PDT
I looked at one of the hot functions in the bullet benchmark and the type information is fine and the optimization is performed.  This is a short benchmark though (.8 seconds for me) and seems to be compiling a lot of code; the two hot functions only execute about 1 million times each.  I'm guessing that the performance gains from this patch are being drowned out by compilation costs.  What happens if you make the benchmark longer running?
Comment 5 User image Alon Zakai (:azakai) 2012-06-08 10:10:57 PDT
I made it run a lot longer, 13-14 seconds. I now get a 3% speedup. Is it surprising it is that little?
Comment 6 User image Brian Hackett (:bhackett) 2012-06-08 17:03:20 PDT
Not really, I guess.  We may just be running into Amdahl's law.  I think that what I mainly need is a more complete understanding of the workflows for how C code (or LLVM bytecode?) gets turned into machine code via our approach and via NaCl's approach and the avenues for moving towards the latter.  Will get on that right after I get back from China.
Comment 7 User image Brian Hackett (:bhackett) 2012-06-26 17:48:58 PDT
Pushed, with some tweaks.  This gives singleton types to all typed arrays and data views above a certain limit (10MB), to be more robust against other initialization patterns.  (e.g. mandreel apps seem to do their initialization in a function, not in global code)
Comment 8 User image Phil Ringnalda (:philor) 2012-06-26 19:57:28 PDT
Backed out in - the debug builds all say ( "jit-test/tests/jaeger/recompile/bug651119.js: Assertion failure: !fe->isConstant(), at ../../../js/src/methodjit/FrameState.cpp:1591"
Comment 9 User image Brian Hackett (:bhackett) 2012-06-27 07:11:35 PDT
Oops, last minute cleanup broke things.
Comment 10 User image Ed Morley [:emorley] 2012-06-28 01:10:38 PDT

Note You need to log in before you can comment on or make changes to this bug.