Closed Bug 1329676 Opened 7 years ago Closed 7 years ago

Baldr: re-add constant-access bounds-check-removal to Ion backend

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: luke, Assigned: lth)

References

(Depends on 1 open bug)

Details

Attachments

(2 files, 4 obsolete files)

I think this may be coming with some broader WasmBCE improvements, but I just wanted to make a specific bug to track adding back bounds-check-removal on 32-bit for constant accesses that are less than the minimum heap length.

The bounds check for call_indirect's table access (in MacroAssembler::wasmCallIndirect) could also be removed for constant accesses below the table's minimum size.  This could be a popular pattern in various dynamic linking scenarios which use Tables to implement the PLT.
(In reply to Luke Wagner [:luke] from comment #0)

> The bounds check for call_indirect's table access (in
> MacroAssembler::wasmCallIndirect) could also be removed for constant
> accesses below the table's minimum size.  This could be a popular pattern in
> various dynamic linking scenarios which use Tables to implement the PLT.

Extra points for either hiding such bounds checking entirely in masm.wasmCallIndirect or exposing it in the API to that function so that the baseline compiler can easily opt into it.
Priority: -- → P3
Attached patch bug1329676-bce-constant.patch (obsolete) — Splinter Review
This is a possibility, but it seems to be neither fish nor fowl.  WasmBCE runs before constant folding, so this eliminates only true constant addresses.  But if that's all we want to do we can do it in WasmIonCompile, in the same way we do it in Rabaldr.

Also, it seems inelegant to store the min heap length in every BCE node but I've not yet found a way to get at the wasm environment from within Ion, possibly for good reason...
Dan: do you know if there is any reason BCE can't or shouldn't go after GVN?

Also, Lars: one place we use to stash one-off bits of state like this is MIRGenerator; you can see there's already a wasmMaxStackArgBytes this could go alongside.
Flags: needinfo?(sunfish)
No, I'm not aware of any reason BCE can't go after GVN.
Flags: needinfo?(sunfish)
Actually BCE runs after constant folding, but the node introduced for the constant is not (MConstant 4) but (MTruncateToInt32 (MConstant 4)), so the matcher has to be a little smarter.  Lowering discards the redundant truncation.
I put this in lowering because that's where we can rely on constants being visible as constants without having to walk a tree, see my previous comment.

(I do have another patch that has a more complex decision procedure in WasmBCE that handles the result of compile-time expression evaluation, if that's of interest, but it is less elegant and more brittle.)
Attachment #8840797 - Flags: review?(luke)
Comment on attachment 8840797 [details] [diff] [review]
bug1329676-bce-on-constant-pointers.patch

Review of attachment 8840797 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: js/src/jit/Lowering.cpp
@@ +4267,5 @@
> +
> +    if (input->isConstant()) {
> +        // WasmIonCompile will always fold large offsets into the pointer, so we
> +        // only have to test the pointer against the minimum heap length.
> +        // Small-offset overflows will be caught by the guard page.

Since MWasmBoundsCheck is independent of any MWasm(Load|Store) and has no offset itself, I don't think this comment really belongs: the decision that an offset can be safely ignored has already been made "higher up" and here we're just observing that we're given a value i that is a constant and less than the smallest possible heap length and thus no dynamic check is needed.

::: js/src/wasm/WasmIonCompile.cpp
@@ +706,5 @@
>          // If the offset is bigger than the guard region, a separate instruction
>          // is necessary to add the offset to the base and check for overflow.
> +        //
> +        // Lowering depends (for correctness of eliminating bounds checks) on
> +        // large offsets being folded into the pointer here.

Based on the other comment, I think this added comment can be removed as well: MWasmBoundsCheck has no notion of offset, only the value it is given which either does or does not include the offset.
Attachment #8840797 - Flags: review?(luke) → review+
Part 2: Eliminate bounds checks on call_indirect if it has a constant index that is below the minimum table length.

(This does not move the needle on emscripten-translated "raybench" code since none of the translated vcalls have constant table indices.)
Attachment #8841313 - Flags: review?(luke)
Comment on attachment 8841313 [details] [diff] [review]
bug1329676-bce-on-indirect-call.patch

Review of attachment 8841313 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +2545,5 @@
>          loadI32(WasmTableCallIndexReg, indexVal);
>  
>          CallSiteDesc desc(call.lineOrBytecode, CallSiteDesc::Dynamic);
>          CalleeDesc callee = CalleeDesc::wasmTable(table, sig.id);
> +        masm.wasmCallIndirect(desc, callee, true);

nit: can you add /* needsBoundCheck = */ before 'true'?
Attachment #8841313 - Flags: review?(luke) → review+
Eventually we should probably do full bounds check elimination on table accesses, thus enabling Wasm generators to hoist bounds checks into assertions or to mask accesses.  From what I can tell Baldr already generates good code for heap accesses that are covered by hoisted checks or whose index is masked (to the heap minlength).
Assignee: luke → lhansen
(In reply to Lars T Hansen [:lth] from comment #10)
> From what I can tell Baldr already
> generates good code for heap accesses that are covered by hoisted checks or
> whose index is masked (to the heap minlength).

I take that back; BCE does not remove either reundant bounds check in this case.  That'll teach me not to run a 64-bit build when testing these things.

Test case for guarded loads:

     (module
      (memory 1)
      (func (param i32) (result i32)
       (block
	(br_if 0 (i32.lt_u (get_local 0) (i32.const 65536)))
	(unreachable))
       (i32.load (get_local 0)))
      (export "" 0))

Test case for masked load:

     (module
      (memory 1)
      (func (param i32) (result i32)
       (i32.load (i32.and (get_local 0) (i32.const 0xFFFF))))
      (export "" 0))
(In reply to Lars T Hansen [:lth] from comment #6)
> Created attachment 8840797 [details] [diff] [review]
> bug1329676-bce-on-constant-pointers.patch
> 
> I put this in lowering because that's where we can rely on constants being
> visible as constants without having to walk a tree, see my previous comment.

That turns out to be the wrong thing (or it needs more complexity to be right).  The reason is that with the patching elimination in bug 1338217, there will be separate MIR nodes for the memory base and the memory limit, hanging off the MIR node for the bounds check.  When the MIR node for the bounds check is lowered, we will already have lowered the node for the memory limit, but we may then decide that the bounds check is redundant and we won't need the limit.  Thus there may be a pointless load of the memory limit emitted since we can't un-lower the memory limit node (or so it seems).
An alternative take on eliminating the bounds check by putting the code back in WasmBCE, so as to avoid the spurious load for the memory limit resulting from the previous solution, see my previous comment.

This is not as messy as I thought it would be; there really seem to be only two interesting cases - MConstant(Int32) and MTruncateToInt32(MConstant(Int32)).
Attachment #8841480 - Flags: review?(luke)
(In reply to Lars T Hansen [:lth] from comment #12)
> The reason is that with the patching elimination in bug 1338217,
> there will be separate MIR nodes for the memory base and the memory limit,
> hanging off the MIR node for the bounds check.

Hmm, I don't follow this: I was expecting a new MIR node `MLoadTls(tlsPointer, offset=N)` and one MLoadTls was created for each of heap-base and heap-length.  These two nodes would then be used as inputs to MWasmBoundsCheck/MWasmLoad/Store, and so if the MWasmBoundsCheck was eliminated, the heap-length MLoadTls would be dead/removed.  I suppose the way redundancy is marked by BCE doesn't actually *eliminate* the MWasmBoundsCheck, but I think maybe that could be fixed.
Comment on attachment 8841480 [details] [diff] [review]
bug1329676-bce-on-constant-pointers-v2.patch

Review of attachment 8841480 [details] [diff] [review]:
-----------------------------------------------------------------

Could we move BCE after GVN so that operand->isConstant() is all you need?
(In reply to Luke Wagner [:luke] from comment #14)
> (In reply to Lars T Hansen [:lth] from comment #12)
> > The reason is that with the patching elimination in bug 1338217,
> > there will be separate MIR nodes for the memory base and the memory limit,
> > hanging off the MIR node for the bounds check.
> 
> Hmm, I don't follow this: I was expecting a new MIR node
> `MLoadTls(tlsPointer, offset=N)` and one MLoadTls was created for each of
> heap-base and heap-length.  These two nodes would then be used as inputs to
> MWasmBoundsCheck/MWasmLoad/Store, 

That is indeed how it is set up (with some variation in naming).

> and so if the MWasmBoundsCheck was
> eliminated, the heap-length MLoadTls would be dead/removed.

It is not, see below.

> I suppose the
> way redundancy is marked by BCE doesn't actually *eliminate* the
> MWasmBoundsCheck, but I think maybe that could be fixed.

That's right.  BCE leaves the node in the graph and then leaves it to lowering to get rid of the node.  In principle that works fine, but GVN has commoned the memoryBase and memoryLength nodes.  We can't mark the memoryBase and memoryLength as redundant when we mark the bounds check as redundant, because the former nodes might be shared.

As you say, moving BCE after GVN might fix that, but there are probably more local fixes.  I'm experimenting with that.
Right, makes sense that marking WasmBoundsCheck as redundant cannot go out and mark the heap-base/length nodes as redundant.  I think the problem is that the EliminateDeadCode() pass doesn't know about the special meaning of MWasmBoundsCheck::isRedundant() and so it considers MWasmBoundsCheck and its args as alive.  I think we need MWasmBoundsCheck::isRedundant() to get reflected in one of the fields that is queried by jit::DeadIfUnused(); I'm guessing isEffectful() or isGuard(), depending on what these mean.
That sounds nice and clean, I'll investigate that.  It'll require moving BCE to an earlier point, since DCE runs before it now.

The alternative, which seems to work is for the memoryLimit node to check whether all its uses (which must be MWasmBoundsCheck nodes) are marked as redundant.  This keeps the fix fairly local in Lowering (duplicated for all 32-bit platforms potentially) but is still fairly hacky.  It would be better for the bounds check node and its inputs just to never show up.
As the previous proposal, but gets DCE to remove redundant bounds checking nodes, and removes that code from Lowering.  Will also simplify the patch removal work.
Attachment #8841480 - Attachment is obsolete: true
Attachment #8841480 - Flags: review?(luke)
Attachment #8841613 - Flags: review?(luke)
Comment on attachment 8841613 [details] [diff] [review]
bug1329676-bce-on-constant-pointers-v3.patch

Review of attachment 8841613 [details] [diff] [review]:
-----------------------------------------------------------------

Ah yes, this is looking very nice.

::: js/src/jit/WasmBCE.cpp
@@ +49,5 @@
> +                //
> +                // Most constant folding that matters here results in an
> +                // MConstant node.  However, arithmetic folding results in a
> +                // TruncateToInt32 node with an MConstant node embedded in it,
> +                // see MBinaryArithInstruction::foldsTo().

But this still seems strange: if we have Int32 arguments and a truncated operation, I don't see why an MTruncateToInt32 would ever be necessary.  If this is common enough to warrant a special-case here, it seems like we should instead try to improve MBinaryArithInstruction::foldsTo() to eliminate that truncate because it'll still have runtime overhead.
(In reply to Luke Wagner [:luke] from comment #20)

> But this still seems strange: if we have Int32 arguments and a truncated
> operation, I don't see why an MTruncateToInt32 would ever be necessary.  If
> this is common enough to warrant a special-case here, it seems like we
> should instead try to improve MBinaryArithInstruction::foldsTo() to
> eliminate that truncate because it'll still have runtime overhead.

I don't know precisely why the MTruncateToInt32 node is there (I can guess, but so can we all :) but it breaks otherwise reasonable constant expression evaluation for Wasm, and plausibly for JS.  This code:

      (func (param i32) (result i32)
       (i32.load (i32.add (i32.add (i32.const 0) (i32.const 64)) (i32.const 8))))

should turn into a constant load but turns into an actual add with a subsequent bounds check of the (provably-good) pointer:

[Codegen] instruction WasmParameter
[Codegen] instruction WasmParameter
[Codegen] instruction Integer
[Codegen] movl       $0x40, %eax
[Codegen] instruction AddI
[Codegen] addl       $8, %eax
[Codegen] instruction WasmMemoryBase
[Codegen] movl       0xc(%esi), %ecx
[Codegen] instruction WasmMemoryLimit
[Codegen] movl       0x10(%esi), %edx
[Codegen] instruction WasmBoundsCheck
[Codegen] cmpl       %edx, %eax
[Codegen] jae        .Lfrom73
[Codegen] instruction WasmLoad
[Codegen] movl       0x0(%ecx,%eax,1), %eax
[Codegen] instruction WasmReturn

I will descend further down this rabbit's hole.
Depends on: 1343007
Blocks: 1338217
https://hg.mozilla.org/integration/mozilla-inbound/rev/757e96ecffc2d17406d7b66afe5368a801a412ec
Bug 1329676 - Wasm: eliminate redundant bounds checks on indirect calls. r=luke
Keywords: leave-open
Attachment #8841613 - Flags: review?(luke)
We should definitely remove the explicit tree walking here.  Bug 1343007 tries to get rid of the complex trees so that we can look for a constant directly here (and has uncovered a bug around the range analysis of MMod that needs to be fixed regardless), but it may be that the best fix for WasmBCE is to use ExtractLinearSum() to check for constant addressing expressions.  According to nbp it "should" walk through any TruncateToInt32 nodes for us, but he had some second thoughts about maybe that not being properly implemented just before he left on Thursday.  To be continued.
Depends on: 1346202
No longer blocks: 1338217
With the patches on bug 1343007 this is now further simplified and BCE only needs check for an MConstant node.  Otherwise this is the same as the previous iteration of this patch.
Attachment #8839368 - Attachment is obsolete: true
Attachment #8840797 - Attachment is obsolete: true
Attachment #8841613 - Attachment is obsolete: true
Attachment #8849471 - Flags: review?(luke)
Oh, and I think we should later attempt to change this code to use ExtractLinearSum, but that function needs to be fixed first, see bug 1346202, and I'd prefer not to wait for that to happen, as nbp is busy elsewhere.  So I can file a bug for that.
Comment on attachment 8849471 [details] [diff] [review]
bug1329676-bce-on-constant-pointers-v4.patch

Review of attachment 8849471 [details] [diff] [review]:
-----------------------------------------------------------------

Great, thanks!

::: js/src/jit/WasmBCE.cpp
@@ +52,4 @@
>  
> +                if (addr->isConstant() && addr->toConstant()->type() == MIRType::Int32) {
> +                    if (uint32_t(addr->toConstant()->toInt32()) < mir->minWasmHeapLength())
> +                        bc->setRedundant();

Could you add a static assert(MaxMemoryAccessSize < GuardSize, "guard page handles partial out-of-bounds")?

@@ +56,2 @@
>                  } else {
> +                    LastSeenMap::AddPtr ptr = lastSeen.lookupForAdd(addr->id());

To catch the case of redundant bounds checks on larger-than-minimum-heap-length constants, could the control flow be:
  if (isConstant && isInt32 && toInt32() < min) {
    bc->setRedundant()
  } else {
    ... lookup in lastSeen
  }
?
Attachment #8849471 - Flags: review?(luke) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/0893e7c2a3bb9abd4df4c0e90915e73f7f5fa840
Bug 1329676 - Wasm: eliminate redundant bounds checks on constant heap addresses. r=luke
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/0893e7c2a3bb
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.