Closed Bug 1295130 Opened 4 years ago Closed 4 years ago

Fold AddI opcode with constant into other AddI with constant

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: sandervv, Assigned: sandervv, Mentored)

References

Details

Attachments

(2 files, 6 obsolete files)

I'm seeing two AddI opcodes with a constant that operate on the same register. These opcodes should be folded together.

The following test case is extracted from an inner loop found in Bullet compiled with Cheerp:

Lpoints = new Uint8Array(2000);
do {
    var Mpoints = 8;
    var tmp4 = 1.2, tmp6 = 3.3, tmp8 = 1.4, tmp9,tmp10,tmp11,tmp12;
    for (var i = 0; i < 10;) {
        tmp9=i<<2;
        tmp10=+(Lpoints[(Mpoints>>0)+tmp9>>0]);
        tmp11=+(Lpoints[(Mpoints>>0)+(tmp9+1>>0)>>0]);
        tmp12=+(Lpoints[(Mpoints>>0)+(tmp9+2>>0)>>0]);
        tmp10=(tmp4*tmp10+tmp6*tmp11)+tmp8*tmp12;
        i=i+1>>0;
    }
} while(!inIon());
print(tmp10);

This will generate the following opcodes and assembly code for the loop body:

BB #5 [00156,6,8] -> #4 :: 10 hits
[MoveGroup]
movl       %ebp, %esi
[ShiftI:lsh]
shll       $2, %esi
[StoreSlotT]
movl       %esi, 0xc40(%rax)
[MoveGroup]
movl       %esi, %r8d
[AddI]
addl       $8, %r8d
[BoundsCheckRange]
[LoadUnboxedScalar]
movzbl     0x0(%rbx,%r8,1), %edi
[KeepAliveObject]
[StoreSlotT]
movl       %edi, 0xc48(%rax)
movl       $0xfff88000, 0xc4c(%rax)
[MoveGroup]
movl       %esi, %r9d
[AddI]
addl       $1, %r9d
[AddI]
addl       $8, %r9d
[LoadUnboxedScalar]
movzbl     0x0(%rbx,%r9,1), %r8d
[KeepAliveObject]
[StoreSlotT]
movl       %r8d, 0xc50(%rax)
[AddI]
addl       $2, %esi
[AddI]
addl       $8, %esi
[...]

When I print the add operator in foldTo, I'm seeing the following log messages:

[...]
MBinaryArithInstruction::foldsTo(): add145 = add lsh75 constant74
lhs: lsh75 = lsh loadslot65 constant74                           
rhs: constant74 = constant 0x2                                   
MBinaryArithInstruction::foldsTo(): add149 = add constant41 rsh148
lhs: constant41 = constant 0x8
rhs: rsh148 = rsh add145 constant57
rhs op0: add145 = add lsh75 constant74
rhs op1: constant57 = constant 0x0
[...]

which translates to the expression (8 + ((loadslot65 + 2)>>0)).

I'm seeing that rsh148 is folded in the log messages for MBinaryBitwiseInstruction::foldUnnecessaryBitop:

MBinaryBitwiseInstruction::foldUnnecessaryBitop(): rsh148 = rsh add145 constant57
lhs: add145 = add lsh75 constant74
rhs: constant57 = constant 0x0

Eliminating >>0 is implemented in MBinaryBitwiseInstruction::foldUnnecessaryBitop, but that happens after MBinaryArithInstruction::foldTo is called. This is causing the two add instructions.

I'm not sure what the correct solution would be to solve this problem. I've found that there is a reason that foldUnnecessaryBitop is not called during GVN (it could affect range analysis results): https://dxr.mozilla.org/mozilla-central/source/js/src/jit/RangeAnalysis.cpp#3182
Currently the Effective Address Analysis (EAA) is trying to do exactly this task of folding sequences of additions.

I think this would be a nice thing to add, to fold sequences of additions after range analysis, and before the EAA, and this would remove the loop from EAA.

Then, In this particular case it might make sense to also fold the sum into LoadUnboxedScalar BaseIndex address displacement as part of EAA.
Implemented folding of AddI with a constant in a separate pass "FoldLinearArithConstants" as discussed on IRC.

This patch depends on the IsRightShiftZero utility function from bug 1301379.
Assignee: nobody → sandervv
Attachment #8790225 - Flags: review?(nicolas.b.pierron)
This patch folds the AddI's constant into the LoadUnboxedScalar opcode, as discussed in this bug report.

This patch depends on the same IsRightShiftZero utility function as mentioned in the previous patch/comment.
Attachment #8790226 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8790225 [details] [diff] [review]
bug1295130-fold-addi-opcodes-with-constant.patch

Review of attachment 8790225 [details] [diff] [review]:
-----------------------------------------------------------------

Any reasons not to extract EAA part which is already doing the sum of multiple additions, and remove the code from EAA?

::: js/src/jit/FoldLinearArithConstants.cpp
@@ +32,5 @@
> +
> +    if (!lhs->isConstant())
> +        return;
> +
> +    // Detect: c1 + (y + c2) => c3 + y, where c3 = c1 + c2

Any reasons not to support:
  (y + c2) + c1
  c1 + (c2 + y) + c1
  (c2 + y) + c1
?

Also, note that the current code generates  c3 + y  instead of  y + c3, so this implies that this can not even cascade, as this addition would not match the (y + c2) rhs pattern.

@@ +38,5 @@
> +        if (rhs->isRecoveredOnBailout())
> +            return;
> +
> +        // Check if the right AddI is only used by the AddI node. Otherwise,
> +        // the right AddI cannot be folded into the AddI.

We are replacing 2 additions, by one addition, I can understand that this might matter when the constants are smallish as they can be encoded in shorter amount of code, but I am not sure if this is good as a general rule.

@@ +70,5 @@
> +        lhs->block()->insertBefore(lhs->toInstruction(), c3);
> +        add->replaceOperand(0, c3);
> +
> +        MDefinition* y = rhs->toAdd()->lhs();
> +        rhs->replaceAllUsesWith(y);

If I read correctly, "y + c2"  is replaced by "y" in all its uses, which includes resume points.
This sounds like a bug to me.

Any reasons to not create a new AddI with the correct operands and let the rest be either eliminated by DCE, or recovered on bailout by the Sink phase?

Also, if you create a new addition, I would expect the Truncate flag[1] to be set accordingly based on the two additions.

[1] http://searchfox.org/mozilla-central/rev/591354c5e5521cf845bf6b6ef44e8b3b0aeda17d/js/src/jit/MIR.h#654
Attachment #8790225 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8790226 [details] [diff] [review]
merge-addi-into-loadunboxedscalar.patch

Review of attachment 8790226 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/EffectiveAddressAnalysis.cpp
@@ +110,5 @@
> +//
> +// into:
> +//
> +//   [LoadUnboxedScalar]
> +//   movsd      0x48(%rbx,%ecx,8), %xmm4

nit: I do not think we can ever produce a mix of 64bits registers and 32bits registers in ModR/M.

@@ +124,5 @@
> +        return;
> +
> +    MAdd* add = load->getOperand(1)->toAdd();
> +
> +    if (add->specialization() != MIRType::Int32 || add->isRecoveredOnBailout())

nit: Check that the truncate kind is Truncate.

@@ +129,5 @@
> +        return;
> +
> +    // Check if the Add is only used by the LoadUnboxedScalar node. Otherwise,
> +    // the Add cannot be folded into the LoadUnboxedScalar.
> +    for (MUseIterator iter = add->usesBegin(); iter != add->usesEnd(); ++iter) {

nit: Same comment as in the previous patch, I think this loop is not necessary.

@@ +173,5 @@
> +    if (!SafeAdd(c1, c2, &offset))
> +        return;
> +
> +    load->setOffsetAdjustment(offset);
> +    add->replaceAllUsesWith(rhs);

As in the previous patch, no we cannot do that, as this will destroy any hope to bailout correctly.
Attachment #8790226 - Flags: review?(nicolas.b.pierron)
I'm using ExtractLinearSum to avoid duplicating logic for matching linear sums.

    Any reasons not to support:
      (y + c2) + c1
      c1 + (c2 + y) + c1
      (c2 + y) + c1
    ?

The expressions above are now folded successfully.

    Any reasons to not create a new AddI with the correct operands and let the rest be either eliminated by DCE, or recovered on bailout by the Sink phase?

I'm setting the flag RecoveredOnBailout for the to-be-removed nodes and leafs.
Attachment #8790225 - Attachment is obsolete: true
Attachment #8794755 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8794755 [details] [diff] [review]
bug1295130-fold-addi-opcodes-with-constant.patch

Review of attachment 8794755 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/FoldLinearArithConstants.cpp
@@ +22,5 @@
> +markNodesAsRecoveredOnBailout(MDefinition* def)
> +{
> +    for (MUseIterator iter = def->usesBegin(); iter != def->usesEnd(); iter++)
> +        if (iter->consumer()->isDefinition())
> +            markNodesAsRecoveredOnBailout(iter->consumer()->toDefinition());

nit: add curly-braces to the for loop.

@@ +39,5 @@
> +
> +    int uses = 0;
> +
> +    for (MUseIterator iter = def->usesBegin(); iter != def->usesEnd(); iter++) {
> +        if (iter->consumer()->isDefinition()) {

nit:
> if (!…)
>     continue;

@@ +44,5 @@
> +            MDefinition* use = iter->consumer()->toDefinition();
> +
> +            // Ignore unused nodes (DCE will ran after this pass).
> +            if (!use->hasUses())
> +                continue;

If this is a guard, then we would ignore it and mark the guard as recovered-on-bailout, which does not sounds safe.  I think you should also add a DeadIfUnused(use) check here.
I will also recommend using  hasLiveDefUses()  which will filter out all uses which are recovered-on-bailout and all resume points.

@@ +58,5 @@
> +        return true;
> +
> +    for (size_t i = 0; i < def->numOperands(); i++)
> +        if (hasMultipleUses(def->getOperand(i)))
> +            return true;

What is this loop made for?

@@ +98,5 @@
> +
> +    add->replaceAllLiveUsesWith(addNew);
> +    add->block()->insertBefore(add, addNew);
> +
> +    markNodesAsRecoveredOnBailout(add->getOperand(1 - idx));

I really want to understand why is this necessary in the first place, AFAIK, this should already be handled by Sink.
Can you paste the result of IONFLAGS=sink for the compilation you are interested in?
Attachment #8794755 - Flags: review?(nicolas.b.pierron)
As discussed on IRC, the Sink pass happens earlier in the pipeline. DCE is done after FLAC. Therefore, the to-be-removed nodes need a RecoveredOnBailout flag set.
Attachment #8794755 - Attachment is obsolete: true
Attachment #8796125 - Flags: review?(nicolas.b.pierron)
This patch merges an AddI into a LoadUnboxedScalar when the add has only one use (the load).
Attachment #8790226 - Attachment is obsolete: true
Attachment #8796127 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8796125 [details] [diff] [review]
bug1295130-fold-addi-opcodes-with-constant.patch

Review of attachment 8796125 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/FoldLinearArithConstants.cpp
@@ +22,5 @@
> +static void
> +markNodesAsRecoveredOnBailout(MDefinition* def)
> +{
> +    if (def->hasLiveDefUses() || !DeadIfUnused(def))
> +        return;

nit: … || !def->canRecoverOnBailout()

@@ +31,5 @@
> +    for (size_t i = 0; i < def->numOperands(); i++)
> +        markNodesAsRecoveredOnBailout(def->getOperand(i));
> +
> +    JitSpew(JitSpew_FLAC, "mark as recovered on bailout: %s%u", def->opName(), def->id());
> +    def->setRecoveredOnBailoutUnchecked();

nit: Move this before the for-loop, such that the uses are recovered on bailout before the operands.

@@ +64,5 @@
> +    // necessary because a node could be an unused right shift zero and we are
> +    // looking for the unused adds.
> +    for (size_t i = 0; i < def->numOperands(); i++) {
> +        if (hasMultipleUses(def->getOperand(i)))
> +            return true;

I still do not understand the usefulness of this loop, whatever this loop does, this sounds to me that we are always better to not have it.

@@ +97,5 @@
> +    }
> +
> +    // Nodes with multiple uses cannot be folded.
> +    if (hasMultipleUses(add->getOperand(1 - idx)))
> +        return;

The only thing where it seems to me that a similar condition might matter, is if we were computing a base operand.  In which case ARM code could benefit from adding a large base offset only once.
Apart from this consideration, this condition does not makes any sense as we are at least replacing (N+1) additions by N additions.

Also, if we were optimizing for ARM, we should probably make a custom ExtractLinearSum function to stop as soon as the immediate value overflow/underflow the -4096/4095 range.

@@ +112,5 @@
> +
> +    // Mark the stale nodes as RecoveredOnBailout since the Sink pass has
> +    // been run before this pass. DCE will then remove the unused nodes.
> +    add->setRecoveredOnBailoutUnchecked();
> +    markNodesAsRecoveredOnBailout(add->getOperand(1 - idx));

nit: replace these 2 lines by:
     markNodesAsRecoveredOnBailout(add);

@@ +118,5 @@
> +
> +bool
> +FoldLinearArithConstants(MIRGraph& graph)
> +{
> +    for (ReversePostorderIterator block(graph.rpoBegin()); block != graph.rpoEnd(); block++) {

You are doing optimizations which are looking at the operands.  By doing a RPO traversal, you will replace every sequence of addition by one addition.  Thus if you have a sequence of N additions, you will add N-1 addition, with N-2 which would be dead right after this optimization.

Change this loop to iterate in post-order instead of reverse-post-order.

@@ +121,5 @@
> +{
> +    for (ReversePostorderIterator block(graph.rpoBegin()); block != graph.rpoEnd(); block++) {
> +        for (MInstructionIterator i = block->begin(); i != block->end(); i++) {
> +            if (!graph.alloc().ensureBallast())
> +                return false;

nit: Check the shouldCancel function, as this loop might take a long time to traverse all nodes of the MIR graph. (see Bug 1304081)
Attachment #8796125 - Flags: review?(nicolas.b.pierron)
Fixed the nits.

> @@ +64,5 @@
> > +    // necessary because a node could be an unused right shift zero and we are
> > +    // looking for the unused adds.
> > +    for (size_t i = 0; i < def->numOperands(); i++) {
> > +        if (hasMultipleUses(def->getOperand(i)))
> > +            return true;
> 
> I still do not understand the usefulness of this loop, whatever this loop
> does, this sounds to me that we are always better to not have it.
I agree. It was not necessary.
 
> @@ +97,5 @@
> > +    }
> > +
> > +    // Nodes with multiple uses cannot be folded.
> > +    if (hasMultipleUses(add->getOperand(1 - idx)))
> > +        return;
> 
> The only thing where it seems to me that a similar condition might matter,
> is if we were computing a base operand.  In which case ARM code could
> benefit from adding a large base offset only once.
> Apart from this consideration, this condition does not makes any sense as we
> are at least replacing (N+1) additions by N additions.
> 
> Also, if we were optimizing for ARM, we should probably make a custom
> ExtractLinearSum function to stop as soon as the immediate value
> overflow/underflow the -4096/4095 range.
It's meant as a general optimization, not specific to ARM.

Try jobs results are good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7c47183d7936
Attachment #8796125 - Attachment is obsolete: true
Attachment #8797065 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8797065 [details] [diff] [review]
bug1295130-fold-addi-opcodes-with-constant.patch

Review of attachment 8797065 [details] [diff] [review]:
-----------------------------------------------------------------

Sounds good, r=me with the following not fixed.

::: js/src/jit/FoldLinearArithConstants.cpp
@@ +83,5 @@
> +            return false;
> +
> +        for (MInstructionIterator i = block->begin(); i != block->end(); i++) {
> +            if (!graph.alloc().ensureBallast())
> +                return false;

nit: Add mir->shouldCancel call here, such that we do not block the GC while compiling.
Attachment #8797065 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8796127 [details] [diff] [review]
merge-addi-into-loadunboxedscalar.patch

Review of attachment 8796127 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/EffectiveAddressAnalysis.cpp
@@ +126,5 @@
> +    JitSpew(JitSpew_EAA, "analyze: %s%u", load->opName(), load->id());
> +
> +    MAdd* add = load->getOperand(1)->toAdd();
> +
> +    if (add->specialization() != MIRType::Int32 || add->isRecoveredOnBailout() ||

nit: add being an operand of load, which is not recovered on bailout, implies that add should not be recovered on bailout.  So "add->isRecoveredOnBailout()" would always be false.

@@ +180,5 @@
> +    load->setOffsetAdjustment(offset);
> +    load->replaceOperand(1, node);
> +
> +    JitSpew(JitSpew_EAA, "mark as recovered on bailout: %s%u", add->opName(), add->id());
> +    add->setRecoveredOnBailoutUnchecked();

nit: Remove the previous loop and guard this statement with:
  if (!add->hasLiveDefUses() && DeadIfUnused(add) && add->canRecoverOnBailout())

Note, GVN might create multiple uses for the same Add expression if we see both add adding identical constants, which would thus prevent these optimizations.
Attachment #8796127 - Flags: review?(nicolas.b.pierron) → review+
Priority: -- → P5
I'll bump this to P1 (land in this release).
Seems the only thing left here is landing. Having P1 on this means we will get notifications if it didn't land before the merge.
Priority: P5 → P1
Added shouldCancel() in inner loop.
Attachment #8797065 - Attachment is obsolete: true
Attachment #8799755 - Flags: review+
Removed loop, and added check instead.
Removed add->isRecoveredOnBailout().
Attachment #8796127 - Attachment is obsolete: true
Attachment #8799756 - Flags: review+
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/07358be0ec02
Fold AddI opcode with constant into other AddI with constant r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/44726da7a286
Merge AddI into LoadUnboxedScalar r=nbp
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/07358be0ec02
https://hg.mozilla.org/mozilla-central/rev/44726da7a286
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
You need to log in before you can comment on or make changes to this bug.