Closed Bug 762561 Opened 11 years ago Closed 11 years ago

Specialize big typed arrays with a singleton type


(Core :: JavaScript Engine, defect)

Not set





(Reporter: azakai, Assigned: bhackett1024)


(Blocks 1 open bug)


(Whiteboard: [js:t])


(1 file)

In compiled code from Emscripten and other compilers, memory is implemented as a big typed array. Speeding that up would give improvements across the board in compiled code benchmarks. luke and bhackett say 

> we can give big typed arrays a singleton type so
> that we could further specialize get/set element
> paths to individual typed arrays.  That would allow
> us to bake in the element base and hardcode the limit in
> the bounds check.  That take us down from ~12 ops to do
> a get/set to 3-5.
Blocks: gecko-games
Basic patch that gives singleton types to typed arrays allocated in global code, and optimizes GETELEM/SETELEM based on them by baking in lengths and base addresses.  This doesn't do LICM based on such arrays, e.g. we will keep testing the length on accesses in a loop and will keep on loading the constant base into a register.  Fixing this wouldn't be horribly complicated but I don't know if this is necessary (IM will do better with LICM and CSE) and anyways it should be done in a separate patch.  Times:

var memory = new Int32Array(10000);
function foo() { 
  var n = 0;
  for (var i = 0; i < 10000; i++) {
    for (var j = 0; j < 10000; j++)

Before: 186
After:  111

var memory = new Int32Array(10000);
function foo() { 
  var n = 0;
  for (var i = 0; i < 10000; i++) {
    for (var j = 0; j < 10000; j++)
      memory[j] = j;

Before: 194
After:  148

This microbenchmark will be pretty sensitive to regalloc etc. and probably not representative of behavior in general.  Wondering how this does on the tests of interest.
Assignee: general → bhackett1024
Attachment #631224 - Flags: review?(dvander)
Some results:

corrections: unchanged
dlmalloc: 8% faster
fannkuch: 15% faster
fasta: unchanged
memops: 7% faster
primes: unchanged
skinning: 10% faster

So looks very nice.

However, I see no speedup on larger benchmarks, box2d and bullet ( ), where I was hoping/expecting for an improvement. They do plenty of memory accesses, but also a lot of float math. Any ideas?
It would be useful to see what % of element reads/writes are seeing the singleton array type; perhaps something about the large benchmarks is confounding TI.
Attachment #631224 - Flags: review?(dvander) → review+
I looked at one of the hot functions in the bullet benchmark and the type information is fine and the optimization is performed.  This is a short benchmark though (.8 seconds for me) and seems to be compiling a lot of code; the two hot functions only execute about 1 million times each.  I'm guessing that the performance gains from this patch are being drowned out by compilation costs.  What happens if you make the benchmark longer running?
I made it run a lot longer, 13-14 seconds. I now get a 3% speedup. Is it surprising it is that little?
Not really, I guess.  We may just be running into Amdahl's law.  I think that what I mainly need is a more complete understanding of the workflows for how C code (or LLVM bytecode?) gets turned into machine code via our approach and via NaCl's approach and the avenues for moving towards the latter.  Will get on that right after I get back from China.
Whiteboard: [js:t]
Pushed, with some tweaks.  This gives singleton types to all typed arrays and data views above a certain limit (10MB), to be more robust against other initialization patterns.  (e.g. mandreel apps seem to do their initialization in a function, not in global code)
Backed out in - the debug builds all say ( "jit-test/tests/jaeger/recompile/bug651119.js: Assertion failure: !fe->isConstant(), at ../../../js/src/methodjit/FrameState.cpp:1591"
Oops, last minute cleanup broke things.
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla16
You need to log in before you can comment on or make changes to this bug.