Closed Bug 1145676 Opened 9 years ago Closed 8 years ago

OdinMonkey: Combine and hoist bounds checks

Categories

(Core :: JavaScript Engine: JIT, enhancement)

x86
All
enhancement
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1282618

People

(Reporter: sunfish, Unassigned)

References

Details

To reduce bounds checking overhead in OdinMonkey on 32-bit platforms, one of the first things we need to do is eliminate directly redundant bounds checks (in loops, and in straight-line code). This is conceptually simple to do, by splitting bounds checking into a separate instruction in MIR, and letting GVN and LICM do their work.

A more advanced optimization is to push constant offsets below the check operators, so that checks for heap accesses separated only by constant offsets can be factored out as redundant. This requires adding guard regions at the beginning and end of the heap and handling signals. Fortunately, we can limit the size of these guard regions by limiting the maximum constant offset we push below the bounds check operator.
Separating the bounds check from the access nodes seems useful, if practical.

GVN and LICM might be of limited help. Even straight line repeated accesses are typically to different addresses so their checks could not be trivially hoisted.

The protected guard zones could be of use.

Let's consider an example, array size = 16:
a[i+10] = 1;  // valid for i: -10 to 5 inclusive
a[i+11] = 2;  // valid for i: -11 to 4 inclusive
a[i+12] = 3;  // valid for i: -12 to 3 inclusive
a[i+13] = 4;  // valid for i: -13 to 2 inclusive

Lets say that OOB stores can be emulated in the SIGSEGV handler, within small guard zones.

Hoisting an OOB check:
if (-13 <= i && i <= 5) {
  a[i+10] = 1;  // SIGSEGV for i: -13 to -11 inclusive
  a[i+11] = 2;  // SIGSEGV for i: -13 to -12, and 5, inclusive
  a[i+12] = 3;  // SIGSEGV for i: -13, and 4 to 5, inclusive
  a[i+13] = 4;  // SIGSEGV for i: 3 to 5, inclusive
}

Even if there is a call that changes the heap length then the code might hold after patching if the guard zones are the same length?

It seems to add a lot of complexity when Emscripten could have just emitted hoisted checks or masking. All that is gained is a slight performance improvement on 64-bit.

Also Emscripten knows if the edges cases are important so can optimizes the checks further, for example it could emit the following and not need guard zones, and type derivation could eliminate the bounds checks:

if ((i >>> 0) < (3 >>> 0)) {
  a[i+10] = 1;
  a[i+11] = 2;
  a[i+12] = 3;
  a[i+13] = 4;
}
(In reply to Douglas Crosher [:dougc] from comment #1)
> All that is gained is a slight performance
> improvement on 64-bit.

1. Code without bounds checks *and* without masks would be faster than code with masks, on 32-bit as well.

2. Code with manual masking added by Emscripten is larger, so longer download and parse times.
(In reply to Alon Zakai (:azakai) from comment #2)
> (In reply to Douglas Crosher [:dougc] from comment #1)
> > All that is gained is a slight performance
> > improvement on 64-bit.
> 
> 1. Code without bounds checks *and* without masks would be faster than code
> with masks, on 32-bit as well.

For 64-bit yes, but I think the 32-bit outcome is yet to be determined. Masking is already hoisted by Odin in straight line code, and can be implemented in one instruction on ARM, and as noted in comment 2 Emscripten can further optimize explicitly hoisted bounds checks whereas the JS compiler needs to handle all the edge cases.

In the example in comment 1, the JS implementation needs to check two limits, but perhaps only one needs to be inlined by using an unsigned comparison and dealing with the other edge OOL?

The ARM can only efficiently encode some constants as an immediate argument to a comparison. Perhaps if the guard zones are large enough then the limit checks can be encoded efficiently on ARM.

The heart of this bug appears to be the addition of guard zones. These could help in deriving that bounds checks are unnecessary in general. They could be used to take advantage of masking with a non-power-of-two heap size too, and it could be used with explicitly hoisted bounds checks. But how practical is it to get all significant JS implementations using protected guard zones on asm.js heap buffers? This is not even implemented in Ion, and no one has been supportive of adding an asm.js specific buffer allocation function. There appear to be no other asm.js implementations using memory protection, even on 64-bit with all it's current advantages.

Lets be realistic, there are often uses of pointers in hot loops for which masking or bounds checks can not be hoisted. For example, code that loads a pointer from the heap and then uses it immediately and perhaps just once within a hot loop. These will not be eliminated or hoisted and limit performance improvements. For the ARM, a single instruction masking instruction seems optimal for performance and for code size.

> 2. Code with manual masking added by Emscripten is larger, so longer
> download and parse times.

The increase in the compressed JS code size might not be that significant. It's not very significant with explicit masking and there is a development path to eliminate some of this from the JS code, but it depends on improving the type derivation of deployed JS implements which will be slow and spotty. Parsing can be pipelined with downloads, and Chrome now implements this, see: http://blog.chromium.org/2015/03/new-javascript-techniques-for-rapid.html
(In reply to Douglas Crosher [:dougc] from comment #3)
> Parsing can be pipelined with downloads, and Chrome now implements this,
> see:
> http://blog.chromium.org/2015/03/new-javascript-techniques-for-rapid.html

Note that there is already bug 1061886 filed for async incremental parsing, which would allow pipelining downloads and parsing.
(In reply to Dan Gohman [:sunfish] from comment #0)
> To reduce bounds checking overhead in OdinMonkey on 32-bit platforms, one of
> the first things we need to do is eliminate directly redundant bounds checks
> (in loops, and in straight-line code). This is conceptually simple to do, by
> splitting bounds checking into a separate instruction in MIR, and letting
> GVN and LICM do their work.

What would the behavior of the split off bounds check instruction be?  We already have LBoundsCheck but this instruction bails out so isn't used by Odin afaik.  Do out of bounds memory accesses in Odin now throw or something?
The original idea (perhaps changed now) is that we already have an alignment mask op that mutates the pointer operand so, if we fold in the bounds check then the out-of-line path can mutate the pointer to a known-faulting offset and jump back in-line.  The signal handler can then do the same thing it does on x64.
The other thing to point out is that Odin doesn't currently use bailouts. Avoiding snapshots and OSR simplify Odin, but this also means that Odin has to keep running after an out-of-bounds array access. Scalar accesses don't throw (following regular JS semantics), so Odin has slow-path code to fix things up and then branch back to resume executing.
Depends on: 1148232
Bug 1282618 has set up a framework to eliminate bounds check in general, and ongoing work is being done in 1298203. Closing this.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.