Bug 865516 already added an optimization so that heap loads of the form H32[(i&M)>>2] (for a literal or const-variable M) would avoid any bounds check on x86/ARM (x64 already avoids the bounds check) when implied by the minimum heap length. It would be beneficial to extend this optimization to H32[(i&M)+C>>2] (for literal constants C) so that: 1. in a sequence of loads at different constant offsets (which is common), the (i&M) expression is hoistable 2. +C can be folded into the effective address of the load expression. We can put guard pages at the end of the asm.js heap so that out-of-bounds accesses are handled by AsmJSFaultHandler like normal IIUC, the alignment mask (required to simulate >> followed by array indexing) should always be foldable into the heap mask since: ((i & HM) + C) & AM = (i & (HM & AM)) + C where HM is the heap mask, AM is alignment mask (~(alignment-1)) and C is a constant offset which is a multiple of the alignment. This of course requires Emscripten to emit loads/stores of this form (which it currently does not). As this all validates as asm.js either way, we don't need to change the asm.js spec.
Would it be possible to do it like this? var x = 0; x = f()&M; g(H32[x+4>>2], H32[x+8>>2], H32[x+12>>2]); } That is, avoid writing &M repeatedly, and the JS engine would see the range of uses where x is known to be in a safe range (all uses dominated by x = something & M)?
Since we're already solidly in the "optimization, not type system" domain, I guess that would work and it'd be reliable. The downside is we won't know if Emscripten or other sources of asm.js intend to mask their loads, but do it wrong in a way that misses the optimization. I mean, ideally, it *would* be part of the type system so users would get a clear validation error if they intend for all heap accesses to be masked but they make a mistake (which is what they want, see bug 952847). We'd want masked heap accesses to be optional and backwards-compatible so we'd probably have some way to declare the intent at module scope, preceding all function definitions and determining which validation to apply later on.
There has already been some work on this in bug 915157 and perhaps it would be appropriate to mark one as a duplicate. I can upload a rebased patch. Note that it also requires the heap buffer to be 2^+C and might the asm.js spec. need updating regarding the heap lengths? The LICM often does a good job of hoisting the mask, once the mask has been explicitly hoisted before the +C. I have reservations about encouraging this pattern if Emscipten can not explicitly hoist the mask as suggested in comment 1 because it would result in asm.js performance being dependant on LICM which might be avoidable in future, and is seems practically impossible to cause a performance regression. However the LICM results do show that it is common to see the mask hoisted before a good block of heap accesses. For testing, a quick hack is to rewrite the heap access to explicitly hoist the mask, and most code still works even if it's not equivalent. The performance gains from this alone have not been great in the code explored. The critical loops just do not appear to benefit. There are often code sequences in loops that load a pointer from the heap and then dereference it within the critical loop and it is not possible to hoist the mask. There is still more to explore here. Using masked heap access alone can give a 5% performance gain in some applications, and adding Emscripten support for this might be a useful start.
Bug 870743 is exploring adding partial constant propagation to the range analysis and this propagates the partial constant bits created by a hoisted logical AND operation. This propagated type information is then used to eliminate redundant AND operations, such as those at a heap accesses. For testing, Emscripten was modified to emit masking of pointers at the entry to functions based on the point type and this supported the elimination of a good number of heap access masking operations. There was little performance improvement from this alone. The propagation of the partial constants into loop was not developed and is still to be explored. For example, making use of the loop stride could help eliminate low bit masking. Bug 910715 is another idea that could help this scheme, allowing asm.js function arguments and results to have a declared partial constant type. This might be used to help eliminate masking of arguments and results - it is only necessary for either the caller or callee to do the masking. The elimination of heap access masking does appear to have some real limits for asm.js code. For example, a loop running down a linked list would not be able to hoist heap access masking. There is just no concept of objects with a safe slot type. Developers might just need to take this into account when writing performance critical asm.js code.
Oops, I must have forgot to record bug 915157 in the OdinMonkey tracking wiki. Carry on then! One thing: I don't think the heap has to be a power of 2. The mask only allows Emscripten code to *access* a power-of-2 region of the heap, but the heap can be arbitrarily larger than the mask used.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 915157
You need to log in before you can comment on or make changes to this bug.