Closed Bug 1080477 Opened 10 years ago Closed 2 years ago

IonMonkey: optimize heap access with a masked pointer plus a constant.

Categories

(Core :: JavaScript Engine: JIT, defect, P5)

x86
All
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: dougc, Unassigned)

References

Details

(Whiteboard: leave-open)

Attachments

(4 files, 9 obsolete files)

1.56 KB, patch
bhackett1024
: review+
Details | Diff | Splinter Review
11.49 KB, patch
dougc
: review+
Details | Diff | Splinter Review
8.72 KB, patch
Details | Diff | Splinter Review
33.54 KB, patch
Details | Diff | Splinter Review
      No description provided.
This bug optimizes similar typed array access patterns to those discussed in bug 915157 for Odin - masking the index to allow bounds checks to be optimized away. Without some Ion optimizations the index masking would slow the compiled code and this might deter the use of this useful optimization. With some optimization to take advantage of these patterns it is possible to avoid degrading performance and to give a useful performance gain in many cases.

Bug 864214 added specialized typed array access paths for accesses with shifted indexes as used in asm.js style code. These paths are future optimized here to: optimize away the bounds check using range analysis; eliminate the shift masking when the index is already being masked; and to de-hoist a constant offset when the index can be proven to be within bounds.

Similar optimizations are being developed for the V8 Javascript implementation which also shows degraded performance when using index masking without some optimization. With similar optimizations it often shows a very useful performance gain when using the index masking code pattern which should help make this pattern more attractive to application developers.
This patch extends the LoadTypedArrayElementStatic and StoreTypedArrayElementStatic operations with a flag to disable the bounds check and an offset that is added to the access instruction addressing mode.

Range analysis collectRangeInfoPreTrunc methods are added to optimize away the bounds check when the index is within bounds, and this goes some way to optimize the index masking pattern.
Should bug 915157 dup to this?
Ah, n/m, I see this is for Ion.
(In reply to Luke Wagner [:luke] from comment #4)
> Ah, n/m, I see this is for Ion.

Yes, this is the Ion optimization for the same patterns that are optimized in bug 915157. Alon had requested this, and I understand some developers have their reasons for using asm.js style but not the Odin compiler. Also these optimizations are necessary on other JS implementations and they do not have a validating asm.js compiler like Odin so developers might make use of these patterns in general JS code and we would want to keep Ion competitive on such code.
This patch adds the offset de-hoisting support, and also eliminates a redundant masking operation when the index is already masked. It also eliminations the bounds check early during IonBuilder if possible.

Results for the zlib benchmark in Emscripten compiled to use index masking.

Ion compiled, runtimes with these patches/runtimes without: 80%

Ion compiled with these patches/Odin compiled without: 98%

Some very useful performance improvements can be obtain using index masking, enough to close the gap between Odin and Ion in this test, although obviously Odin will be well optimized too in bug 915157.
Rebase.
Attachment #8502502 - Attachment is obsolete: true
Comment on attachment 8515463 [details] [diff] [review]
1. Optimize heap access with a masked pointer plus a constant, using range analysis.

This patch allows the bounds check to be avoided using the range info.
Attachment #8515463 - Flags: review?(bhackett1024)
Comment on attachment 8515464 [details] [diff] [review]
3. Optimize heap access with a masked pointer plus a constant.

This patch detects the pattens a[(i&m)+c>>n] and a[(i&m)>>n] early and optimizes the shift to a low bit masking operation and merges this with the explicit index masking. It also optimizes away the bounds check if possible. The mask m is a constant that ensures that the access is within bounds. This is an asm.js pattern that might be common in future.

It has a chunk of code that detects these patterns. This avoids some MIR generation. An alternative would be to merge the bitand operations in a later optimization pass, and this will probably be done too in future, but it might save compile time and memory to optimize this potentially very common pattern early.

Is this an approach you would like to advance now?
Attachment #8515464 - Flags: feedback?(jdemooij)
Comment on attachment 8515463 [details] [diff] [review]
1. Optimize heap access with a masked pointer plus a constant, using range analysis.

Review of attachment 8515463 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +8195,5 @@
>              setResultType(MIRType_Int32);
>      }
>  
>      AlwaysTenured<JSObject*> someTypedArray_;
> +    int32_t offset_;

This field needs a comment.

@@ +8398,4 @@
>      {}
>  
>      AlwaysTenured<JSObject*> someTypedArray_;
> +    int32_t offset_;

Ditto.

::: js/src/jit/RangeAnalysis.cpp
@@ +2874,5 @@
> +
> +    if (range && range->hasInt32LowerBound() && (range->lower() + offset) >= 0 &&
> +        range->hasInt32UpperBound() && (uint32_t) (range->upper() + offset) < length) {
> +        setNeedsBoundsCheck(false);
> +    }

This arithmetic should use CheckedInt.

@@ +2887,5 @@
> +
> +    if (range && range->hasInt32LowerBound() && (range->lower() + offset) >= 0 &&
> +        range->hasInt32UpperBound() && (uint32_t) (range->upper() + offset) < length) {
> +        setNeedsBoundsCheck(false);
> +    }

Ditto.
Attachment #8515463 - Flags: review?(bhackett1024) → review+
(In reply to Douglas Crosher [:dougc] from comment #10)
> It has a chunk of code that detects these patterns. This avoids some MIR
> generation. An alternative would be to merge the bitand operations in a
> later optimization pass, and this will probably be done too in future, but
> it might save compile time and memory to optimize this potentially very
> common pattern early.
> 
> Is this an approach you would like to advance now?

I think bug 1064374 is supposed to make it easier to pattern match MIR instructions like this. Maybe we can try this on top of the patches there? :)
Also, if we do this in a later pass instead of in IonBuilder, can we share code with Odin?
Attachment #8515464 - Flags: feedback?(jdemooij)
This patch uses a low bit mask rather than a right and left shift for the pattern b[i>>n] on non-x86 paths.

A separate wip patch adds partial constant support to range analysis and can eliminate the redundant bitwise low bit mask generated in this patch. The bounds check can also be eliminated using range bounds information. The generated code is much improved on x64 and ARM but it still remains to dehoist the index offset into the load/store instructions and I am still looking in this.

The improvements to range analysis are just as applicable to asm.js and eliminate the asm.js low bit masking too. It's not yet clear if the offset dehoisting code can be shared.
While working to get this working well on the x64 and ARM, which do not use the *Static ops, I notice cases in which the typed array length is a known value on the x86 but not on the x64. The difference seems to be due to the different tests below.

The addTypedArrayLengthAndData() is used on the x64 and ARM, and obj->isConstant() test fails, even when obj->resultTypeSet() is true and obj->resultTypeSet()->getSingleton() returns an object.

IonBuilder.cpp:

addTypedArrayLengthAndData():
    ...
    if (obj->isConstant() && obj->toConstant()->value().isObject()) {
        JSObject *tarr = &obj->toConstant()->value().toObject();
        ...

The getElemTryTypedStatic() is used on the x86 and the length obtained from the tarrObj is hard coded into the JIT code. Could the x64 and ARM use this same pattern, even if obj->isConstant() is false?

getElemTryTypedStatic():
    ...
    if (!obj->resultTypeSet())
        return true;
    JSObject *tarrObj = obj->resultTypeSet()->getSingleton();
    if (!tarrObj)
        return true;
    ...
Flags: needinfo?(bhackett1024)
Here's an example patch to address the issue raised in comment 15. Does the resultTypeSet hold a type found by Baseline or the Interpreter?
Flags: needinfo?(bhackett1024)
Attachment #8531204 - Flags: review?(bhackett1024)
Comment on attachment 8531204 [details] [diff] [review]
2. recognize and optimize a singleton object typed array access

Review of attachment 8531204 [details] [diff] [review]:
-----------------------------------------------------------------

This is fine, though in general it would be nice if we could ensure that these definitions that only have one possible value are an MConstant.
Attachment #8531204 - Flags: review?(bhackett1024) → review+
Address reviewer feedback:

* Commented the offset slot.

* Reworked the new collectRangeInfoPreTrunc() methods to use int64_t maths to deal with overflows - this is a commonly used pattern in the same file and has been used rather than the suggested CheckedInt.

Also added a congruentTo method for MLoadTypedArrayElementStatic.

Carry forward r+.
Attachment #8515463 - Attachment is obsolete: true
Attachment #8532011 - Flags: review+
Attachment #8531204 - Attachment description: 4. recognize and optimize a singleton object typed array access → 2. recognize and optimize a singleton object typed array access
Attachment #8515464 - Attachment description: 2. Optimize heap access with a masked pointer plus a constant. → 3. Optimize heap access with a masked pointer plus a constant.
Attachment #8521392 - Attachment description: 3. Optimize type array access with an explicitly shifted index → 4. Optimize type array access with an explicitly shifted index
Requesting checkin of patches 1 and 2. The try run has some timeouts but they do not appear to be related: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=cb9a6ae7f2f8
Keywords: checkin-needed
Whiteboard: leave-open
(In reply to Douglas Crosher [:dougc] from comment #19)
> Requesting checkin of patches 1 and 2. The try run has some timeouts but
> they do not appear to be related:
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=cb9a6ae7f2f8

Hi Douglas,

seems the patch does not apply cleanly, see 

renamed 1080477 -> bug1080477-1-3.patch
applying bug1080477-1-3.patch
patching file js/src/jit/x86/CodeGenerator-x86.cpp
Hunk #1 FAILED at 300
Hunk #2 FAILED at 444
2 out of 2 hunks FAILED -- saving rejects to file js/src/jit/x86/CodeGenerator-x86.cpp.rej

could you take a look, thanks!
Flags: needinfo?(dtc-moz)
(In reply to Carsten Book [:Tomcat] from comment #20)
> (In reply to Douglas Crosher [:dougc] from comment #19)
> > Requesting checkin of patches 1 and 2. The try run has some timeouts but
> > they do not appear to be related:
> > https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=cb9a6ae7f2f8
> 
> Hi Douglas,
> 
> seems the patch does not apply cleanly, see 
> 
> renamed 1080477 -> bug1080477-1-3.patch
> applying bug1080477-1-3.patch
> patching file js/src/jit/x86/CodeGenerator-x86.cpp
> Hunk #1 FAILED at 300
> Hunk #2 FAILED at 444
> 2 out of 2 hunks FAILED -- saving rejects to file
> js/src/jit/x86/CodeGenerator-x86.cpp.rej
> 
> could you take a look, thanks!

Must be something incoming. Shall rebase and try again after it's all merged. Thanks.
Rebase. Carry forward r+.
Attachment #8532011 - Attachment is obsolete: true
Flags: needinfo?(dtc-moz)
Attachment #8532792 - Flags: review+
Requesting checkin of patches 1 and 2. Try run looks good: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=92868efc9113
Keywords: checkin-needed
Depends on: 1110494
Attachment #8515464 - Attachment is obsolete: true
Attachment #8521392 - Attachment is obsolete: true
Add a pass to dehoist the offsets. Works with Ion typed array access (static and non-static paths) and Odin typed array accesses.

This pass needs to run late, after redundant bit AND operations have been eliminated, and after bounds checks have been eliminated. Need to study the interaction with licm etc which run earlier. It seems to generate subtly different code to dehoisting early by detecting code patterns as in patch 4.
This version removes the pattern matching from the IonBuilder stage. The strategy being explored now is for optimizing these patterns and dehoist the offset after range analysis.

This patch is now focused on optimizing the pattern u[i>>n], transforming the shifts into a low bit masking operation. While there was already support for this on the x86 by using the *Static ops, this patch adds a 'prescaled-index' flag to the regular ops to also support this optimization on the other backends in Ion.
Attachment #8540980 - Attachment is obsolete: true
Attachment #8540981 - Attachment is obsolete: true
With bug 1137573 helping to optimize uses of the low bit alignment mask, might it be timely to advance this patch for Ion which convert the shifts to a mask?

This also adds some support for dehoisting an offset. The actual dehoisting is in a separate patch, but while you are working on this perhaps you could keep in mind how this could be appropriately done.

With the Odin dehoisting support landed, I was hoping that it might be possible for Emscripten to change how it emits code and to emit the index offset dehoisted which also saves variables? Perhaps also adding the Ion support would make a better case for doing this.

The x86 already makes this optimization, but by emitting separate MIR ops. I'm not sure if the Ion bounds check optimizations work with those MIR ops, and it might be better to use the strategy in this patch for the x86 too and remove those separate MIR ops.

Not requesting review yet as I would like to double check that the scale and offset are correctly handled wrt bounds checks.
Attachment #8565205 - Attachment is obsolete: true
Attachment #8570474 - Flags: feedback?(sunfish)
Comment on attachment 8570474 [details] [diff] [review]
3. Optimize heap access by replacing a right and left index shift with a low bit mask.

Review of attachment 8570474 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Douglas Crosher [:dougc] from comment #30)
> With the Odin dehoisting support landed, I was hoping that it might be
> possible for Emscripten to change how it emits code and to emit the index
> offset dehoisted which also saves variables? Perhaps also adding the Ion
> support would make a better case for doing this.

Are you saying Emscripten should emit A[(x>>2)+i]? That's not currently valid asm.js. Should it be? The "hoisted" form is not complicated to optimize, when needed, and it keeps the language simpler to have a single syntax, so it's not clearly beneficial.

> Not requesting review yet as I would like to double check that the scale and
> offset are correctly handled wrt bounds checks.

That sounds good. Another thing to check is to make sure that prescaling doesn't cause AliasAnalysis to get confused and think that two loads are at the same address though they may have different scales, or that they're at different addresses if they have the same index though they have scales which cause them to alias.

::: js/src/jit/MIR.h
@@ +8909,5 @@
>    : public MTernaryInstruction,
>      public StoreTypedArrayPolicy::Data
>  {
>      Scalar::Type arrayType_;
> +    bool prescaled_;

The direction I'm anticipating in bug 1116773 is to have the MIR nodes store the scale value itself, so that their state represents something more like an actual address mode. I think that makes sense for these Ion nodes too. Store the scale value in the non-prescaled case, and 1 in the prescaled case. This will also simplify the CodeGenerator code a bit, because it doesn't need to distinguish between the two cases.
Attachment #8570474 - Flags: feedback?(sunfish)
(In reply to Dan Gohman [:sunfish] from comment #31)
> Comment on attachment 8570474 [details] [diff] [review]
...
> (In reply to Douglas Crosher [:dougc] from comment #30)
> > With the Odin dehoisting support landed, I was hoping that it might be
> > possible for Emscripten to change how it emits code and to emit the index
> > offset dehoisted which also saves variables? Perhaps also adding the Ion
> > support would make a better case for doing this.
> 
> Are you saying Emscripten should emit A[(x>>2)+i]? That's not currently
> valid asm.js. Should it be? The "hoisted" form is not complicated to
> optimize, when needed, and it keeps the language simpler to have a single
> syntax, so it's not clearly beneficial.

No, just to dehoist the offsets to:
 A[(x + i)>>2]
rather than
 var x1 = x + 1;
 ...
 A[x1>>2]

This is a similar dehoisting operation to what Emscripten's aggressive variable elimination currently does, but it might be focused only on these indexes. If the compiler moves the offset into the MIR op before GVN then it will avoid hoisting them again before dehoisting.

This support helps Ion show an improvement with index masking - if index masking degrades performance in some compilers then it will be less attractive to developers. It is quite effective and almost narrowed the performance gap to Odin in some benchmarks, although with bug 986981 landed Ion probably has more catching up to do.

The bounds checking approach used in bug 986981 might also be useful for Ion. Ion has a high and low bounds check but I'm not sure it's as well optimized as the path in bug 986981.
 
> > Not requesting review yet as I would like to double check that the scale and
> > offset are correctly handled wrt bounds checks.
...
> >      Scalar::Type arrayType_;
> > +    bool prescaled_;
> 
> The direction I'm anticipating in bug 1116773 is to have the MIR nodes store
> the scale value itself, so that their state represents something more like
> an actual address mode. I think that makes sense for these Ion nodes too.
> Store the scale value in the non-prescaled case, and 1 in the prescaled
> case. This will also simplify the CodeGenerator code a bit, because it
> doesn't need to distinguish between the two cases.

Good idea, shall explore.
Douglas, still on your radar?
Flags: needinfo?(dtc-moz)
Priority: -- → P5
Yes, last I check this general approach of masking gave the best performance on some platforms, so the general approach is still of interest, but this issue is probably stale.
Flags: needinfo?(dtc-moz)

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: dtc-moz → nobody

A lot has changed and this bug is probably unlikely to go anywhere.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: