Closed Bug 1176230 Opened 9 years ago Closed 8 years ago

MPhi::foldsTernary() does not seems to work on Flog.RayTracer.Vector.prototype.initialize (raytrace.js:223)

Categories

(Core :: JavaScript Engine: JIT, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox41 --- affected
firefox52 --- fixed

People

(Reporter: nbp, Assigned: jschulte)

References

Details

Attachments

(3 files, 5 obsolete files)

While looking at raytrace, I noticed that at the end of the compilation, we still have patterns like:

          Block 1
        1 MUnbox 0  (Double)
          …
        2 MTest 1 Block2 Block3

  Block 2                         Block 3
4 Goto Block4                   5 Constant 0.0  (Double)
                                6 Goto Block4

          Block 4
        7 MPhi 1 5  (Double)

MPhi::foldsTernary() does not work on double, because the in the "x ? x : 0" expression, x can be NaN.  Thus replacing "x ? x : 0" is wrong.

Still the NaN case is probably the not as frequent as any other cases, and we can avoid the 2 blocks by replacing the MTest by a

2 MTestCond 1
3 Constant 0.0
3 MReplaceByIf 1 3 2

Where the MReplaceByIf generates an out-of-line path.  Another option would be to use a JS profiler to move the basic block automatically at the end of the function.
I try to realize the problem.

Does it aim to simplify the control paths to avoid MPhi? 
(Because MPhi::foldsTernary() does not handle double type operand)

So, as the suggested solution illustrates, 
we can remove the unnecessary blocks by replacing MTest with
MTestCond, Constant, and MReplaceByIf IRs.

I am not sure if I misunderstand the problem.
Another solution, might be to have a special instruction for that

 2 MConvertNaNToZero 1

And the important point here, is that these instructions can easily be recovered on bailout.  Which means that we would not have to keep computations of fields if they are unused, and we can move the unlikely case out-side the fast-path if they are used.
I check MIR.h and cannot find the relevant IRs.

Thus, to solve this issue, I should implement these new IRs.

No matter which solution I choose for implementation.

Is that correct?
(In reply to andy.zsshen from comment #3)
> Thus, to solve this issue, I should implement these new IRs.
> Is that correct?

This is correct.
I try to implement the new IR MConvertNaNToZero.

Here is the draft:
class MConvertNaNToZero
  : public MUnaryInstruction,
    public FloatingPointPolicy<0>::Data
{
    explicit MConvertNaNToZero(MDefinition* def)
      : MUnaryInstruction(def)
    {
        MOZ_ASSERT(def->type() == MIRType_Double);
        setResultType(MIRType_Double);
    }

  public:
    INSTRUCTION_HEADER(ConvertNaNToZero)

    static MConvertNaNToZero* New(TempAllocator& alloc, MDefinition* def)
    {
        return new(alloc) MConvertNaNToZero(def);
    }

    MDefinition* foldsTo(TempAllocator& alloc) override;
};

MDefinition*
MConvertNaNToZero::foldsTo(TempAllocator& alloc)
{
    MDefinition *def = getOperand(0);
    if (!def->isConstant())
        return this;

    Value *val = def->toConstant()->value();
    if (!val.isDouble())
        return this;

    if (!IsNaN(val))
        return this;

    MConstant* constant = MConstant::New(alloc, DoubleValue(0));
    constant->setResultType(MIRType_Double);
    return constant;
}

But I am not sure what are the necessary materials for this IR.
For example: Loop hoisting, bailout recovery, and etc.

Also, is folding the correct way for the implementation?

Sorry for my poor expression.
Could you provide more hints to guide the implementation?
Flags: needinfo?(nicolas.b.pierron)
(In reply to andy.zsshen from comment #5)
> But I am not sure what are the necessary materials for this IR.
> For example: Loop hoisting, bailout recovery, and etc.

You should not think of any of these at first, and just try to replace the basic block by this instruction, and make sure that we can produce code for this instruction.

> Also, is folding the correct way for the implementation?

Folding is used when we want to replace the current instruction by another.  Thus implementing MConvertNaNToZero::foldsTo makes sense if you want to remove MConvertNaNToZero by its operands when we can guarantee that the input is not NaN. This will not help you create such instruction.

Your implementation sounds good, but I would not start with it.

> Sorry for my poor expression.
> Could you provide more hints to guide the implementation?

I think you should focus on making sure that you can lower, and generate code for this instruction.  Then look for matching the pattern described in comment 0, and how to replace it by this instruction.
Flags: needinfo?(nicolas.b.pierron)
Is it appropriate to directly modify IonBuilder::jsop_ifeq(JSOp op) to 
remove the redundant diamonds?

After
https://dxr.mozilla.org/mozilla-central/source/js/src/jit/IonBuilder.cpp#4242

Like:
MDefinition *def = ins->input();
if (def->isConstant()) {
    Value *val = def->toConstant()->value();
    if (val.isDouble() && IsNaN(val)) {
        MConvertNaNToZero *rep = new MConvertNaNToZero(def);
        current->add(rep);
        current->push(rep);
        return true;
    }
}
Sorry!
I mis-understand the JSOP_IFEQ and MTest.
I should re-replan the position to match the pattern described in comment 0.
To remove the branch, you should replace the condition of the test instruction, and let GVN remove the branch for you.  I would not recommend doing it in IonBuilder, but doing it later, in MPhi::foldsTernary, or similar.
Attached patch v1.patch (obsolete) — Splinter Review
With bug 1064543 this results in a ~5% win on raytrace.
Attachment #8711069 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 8711069 [details] [diff] [review]
v1.patch

Review of attachment 8711069 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work!

::: js/src/jit/MIR.cpp
@@ +1780,5 @@
>      }
>  
> +    // If testArg is a double type we can
> +    // replace the two blocks by an MConvertNaNToZero
> +    if (testArg->type() == MIRType_Double && c->vp()->toNumber() == 0 && c != trueDef) {

comment-nit:
   // - fold testArg ? testArg : 0.0 to ConvertNaNToZero(testArg)

::: js/src/jit/ValueNumbering.cpp
@@ +761,5 @@
>  
>          // If |sim| doesn't belong to a block, insert it next to |def|.
> +        if (isNewInstruction) {
> +            if (def->isPhi())
> +                def->block()->insertBefore(*def->block()->begin(), sim->toInstruction());

The beginning of basic blocks is an highly tricky location.  Usually the best way to do such thing is to use safeInsertTop function, but in this case this would not even work, because the instruction might be used, while being captured by the entry resume point.

I suggest to undo the change from this function, and to change MPhi::foldsTernary to insert the new instruction before the MTest instruction in the immediate dominator block.
Attachment #8711069 - Flags: feedback?(nicolas.b.pierron) → feedback+
Attached patch v2.patch (obsolete) — Splinter Review
Thanks for the quick feedback!
Assignee: nobody → j_schulte
Attachment #8711069 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8715865 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8715865 [details] [diff] [review]
v2.patch

Review of attachment 8715865 [details] [diff] [review]:
-----------------------------------------------------------------

Apart from the few nits below, this patch would be Perfect :)
Please re-submit a new patch with the following nits fixed.

::: js/src/jit/CodeGenerator.cpp
@@ +10761,5 @@
>      // output *= ScaleInv
>      masm.mulDoublePtr(ImmPtr(&ScaleInv), tempReg, output);
>  }
>  
> +class OoLConvertNaNToZero : public OutOfLineCodeBase<CodeGenerator>

style-nit: All other out-of-line classes have explicit names, I suggest to follow the same implicit rule by naming this class "OutOfLineConvertNaNToZero".

::: js/src/jit/CodeGenerator.h
@@ +348,5 @@
>      void visitCheckReturn(LCheckReturn* ins);
>      void visitCheckObjCoercible(LCheckObjCoercible* ins);
>      void visitDebugCheckSelfHosted(LDebugCheckSelfHosted* ins);
> +    void visitOoLConvertNaNToZero(OoLConvertNaNToZero* ool);
> +    void visitConvertNaNToZero(LConvertNaNToZero* ins);

nit: swap the 2 lines to follow the same implicit rule that the out-of-line variant appear after.

::: js/src/jit/MIR.h
@@ +7189,5 @@
> +    public DoublePolicy<0>::Data
> +{
> +    bool operandIsNeverNaN_;
> +    MConvertNaNToZero(MDefinition* input)
> +        : MUnaryInstruction(input), operandIsNeverNaN_(false)

style-nit: the colon should be indented with 2 spaces instead of 4.

@@ +7197,5 @@
> +    }
> +  public:
> +    INSTRUCTION_HEADER(ConvertNaNToZero)
> +    static MConvertNaNToZero* New(TempAllocator& alloc, MDefinition* input)
> +    {

style-nit: Move the curly brace to the previous line, as done with the function below.

@@ +7208,5 @@
> +
> +    bool operandIsNeverNaN() const {
> +        return operandIsNeverNaN_;
> +    }
> +    void collectRangeInfoPreTrunc() override;

follow-up improvement: Provide a computeRange function, such that range analysis can keep track of the exponent of the value when it cannot be NaN.

::: js/src/jit/ValueNumbering.cpp
@@ +764,2 @@
>              def->block()->insertAfter(def->toInstruction(), sim->toInstruction());
> +        }

style-nit: no curly braces for single statements in the JS engine.

::: js/src/jit/shared/LIR-shared.h
@@ +3881,5 @@
> +  public:
> +    LIR_HEADER(ConvertNaNToZero)
> +
> +    explicit LConvertNaNToZero(const LAllocation& input)
> +    {

style-nit:  move this open brace at the end of the previous line, as done with the functions below.
Attachment #8715865 - Flags: review?(nicolas.b.pierron) → review+
Attached patch v3.patch (obsolete) — Splinter Review
So, I was finally able to look into this again. Problem was, that we didn't convert -0.0 to 0.0. As there seems to be no efficient way to test for negative zero, I ended up taking the OOL-path, whenever input is zero or NaN. But this hurts performance quite a lot, so that our win on raytrace is reduced to ~1000 points above m-c. Thus, I'm not sure, whether this is the right approach and whether we still want to land this, wdyt?

Also, maybe s/ConvertNaNToZero/ConvertFalseToZero, if this should land?
Attachment #8715865 - Attachment is obsolete: true
Flags: needinfo?(nicolas.b.pierron)
(In reply to Johannes Schulte [:jschulte] from comment #14)
> So, I was finally able to look into this again. Problem was, that we didn't
> convert -0.0 to 0.0. As there seems to be no efficient way to test for
> negative zero, I ended up taking the OOL-path, whenever input is zero or
> NaN. But this hurts performance quite a lot, so that our win on raytrace is
> reduced to ~1000 points above m-c. Thus, I'm not sure, whether this is the
> right approach and whether we still want to land this, wdyt?

Octane benchmarks are quite noisy. So a difference of ~1000 does not mean anything, and you might have to re-run the benchmark.  I did run the benchmark 20 times, and found that this does not give any speed-up.

I have not yet checked if we could win, if we were to enable MConvertNaNtoZero as a recover instruction.  In which case, we could remove unused conditions if the result of the property is also unused.

I also tried to inline the code without the out-of-line path if we have negative zero inputs:

        masm.loadConstantDouble(0.0, tmp);
        // DoubleNotEqual == DoubleNotEqualAndNotUnordered
        masm.branchDouble(Assembler::DoubleNotEqual, input, tmp, &identical);
        masm.moveDouble(tmp, output);
        masm.bind(&identical);

but this does not show any speed-up.  I have not checked yet if any of the compilations where using this code.

> Also, maybe s/ConvertNaNToZero/ConvertFalseToZero, if this should land?

If there is no improvement, there is no need to land.  I am hoping that we could get improvements with recover instructions, as we do not yet have RPhi, and this instruction should be easy to recover, while being a common JavaScript idiom. (for default values, even out-side doubles)
Sorry for the late reply and for digging this up again, I know you're quite busy, but I didn't want to leave the bug in this unclear state.

(In reply to Nicolas B. Pierron [:nbp] from comment #15)
> Octane benchmarks are quite noisy. So a difference of ~1000 does not mean
> anything, and you might have to re-run the benchmark.  I did run the
> benchmark 20 times, and found that this does not give any speed-up.

That's strange. I also averaged the benchmark over 100 runs and was able to reproduce the speed-up. Did you also apply the patch from bug 1064543 (which removes the then unneccesary MTest)? Or maybe my benchmarking method is flawed. Either way, I suppose even an 1000 point speedup doesn't justify the landing the added complexity? So WONTFIX, or are there any obvious blockers, after which it would make sense to look into this bug again?

> I have not yet checked if we could win, if we were to enable
> MConvertNaNtoZero as a recover instruction.  In which case, we could remove
> unused conditions if the result of the property is also unused.

Would there be more changes necessary than those, I already made in the patch?
(In reply to Johannes Schulte [:jschulte] from comment #16)
> > I have not yet checked if we could win, if we were to enable
> > MConvertNaNtoZero as a recover instruction.  In which case, we could remove
> > unused conditions if the result of the property is also unused.
> 
> Would there be more changes necessary than those, I already made in the
> patch?

Yes, this implies that we should add RConvertNaNToZero class in Recover.h & Recover.cpp.

The problem being that MPhi cannot be recovered on bailouts because we do not know how to recover from branches.  But if we were to encode the branch as part of the recover instruction, then we could get rid of the RConvertNaNToZero instruction.

Also, now, we have one more issue, which might make this a bit more difficult, which is that Branch Pruning might removes NaN/NegZero side of the branch if it is never used, and thus enforces us to keep the condition alive :/

Also, Have you tried on a micro benchmark dedicated to highlight this improvement?

Otherwise, if you think this is not worth the complexity, I will leave that to your judgment as you worked on it :)  Feel free to won't fix this issue.
Flags: needinfo?(nicolas.b.pierron)
Attached file microbenchmark.js
(In reply to Nicolas B. Pierron [:nbp] from comment #17)
> Yes, this implies that we should add RConvertNaNToZero class in Recover.h &
> Recover.cpp.
I've already added the class and implemented the recover logic (see patch). Is there more, that would be necessary?
 
> The problem being that MPhi cannot be recovered on bailouts because we do
> not know how to recover from branches.  But if we were to encode the branch
> as part of the recover instruction, then we could get rid of the
> RConvertNaNToZero instruction.
> 
> Also, now, we have one more issue, which might make this a bit more
> difficult, which is that Branch Pruning might removes NaN/NegZero side of
> the branch if it is never used, and thus enforces us to keep the condition
> alive :/
> 
> Also, Have you tried on a micro benchmark dedicated to highlight this
> improvement?

The attached benchmark is about 30% faster with the patches applied.

> Otherwise, if you think this is not worth the complexity, I will leave that
> to your judgment as you worked on it :)  Feel free to won't fix this issue.

Well, I'm obviously quite biased towards landing this, as it's my work :) Moreover I don't really have much experience, making such decisions, so I'm not really comfortable, deciding on this. We can also ask Jan, if you don't want to make that decision either :)
(In reply to Johannes Schulte [:jschulte] from comment #18)
> Created attachment 8776287 [details]
> microbenchmark.js
> 
> (In reply to Nicolas B. Pierron [:nbp] from comment #17)
> > Yes, this implies that we should add RConvertNaNToZero class in Recover.h &
> > Recover.cpp.
> I've already added the class and implemented the recover logic (see patch).
> Is there more, that would be necessary?

Oh, sorry, I was looking at the previous patch.
No, there is nothing more to do.  I just wished they were recovered on bailout on raytrace.

> > Also, Have you tried on a micro benchmark dedicated to highlight this
> > improvement?
> 
> The attached benchmark is about 30% faster with the patches applied.

Sweet :)

> > Otherwise, if you think this is not worth the complexity, I will leave that
> > to your judgment as you worked on it :)  Feel free to won't fix this issue.
> 
> Well, I'm obviously quite biased towards landing this, as it's my work :)
> Moreover I don't really have much experience, making such decisions, so I'm
> not really comfortable, deciding on this. We can also ask Jan, if you don't
> want to make that decision either :)

I think this is good to have that as this is a common idiom that we might find in code bases, and that being able to remove them with recover instruction make sense, while we were not able to with MPhi.

I think Hannes (:h4writer) might be a less partial party than us here.
This picture is taken with IonGraph (within gdb, and an optimized build compiled with --enable-jitspew) [1]

I think it would be good to investigate next how we can remove the second "test unbox" instruction which has identical (empty) branches, and no phi after.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Hacking_Tips#Viewing_the_MIRGraph_of_IonOdin_compilations_(from_gdb)
(In reply to Nicolas B. Pierron [:nbp] from comment #20)
> Created attachment 8777301 [details]
> Raytrace MIR Graph during the Lowering phase.
> 
> This picture is taken with IonGraph (within gdb, and an optimized build
> compiled with --enable-jitspew) [1]
> 
> I think it would be good to investigate next how we can remove the second
> "test unbox" instruction which has identical (empty) branches, and no phi
> after.
> 
> [1]
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/
> Hacking_Tips#Viewing_the_MIRGraph_of_IonOdin_compilations_(from_gdb)

These MTests and blocks are removed by the patch in bug 1064543. Both patches (this bug and bug 1064543) were applied when I measured raytrace and the microbenchmark.
(In reply to Nicolas B. Pierron [:nbp] from comment #19)
> (In reply to Johannes Schulte [:jschulte] from comment #18)
> > Well, I'm obviously quite biased towards landing this, as it's my work :)
> > Moreover I don't really have much experience, making such decisions, so I'm
> > not really comfortable, deciding on this. We can also ask Jan, if you don't
> > want to make that decision either :)
> 
> I think this is good to have that as this is a common idiom that we might
> find in code bases, and that being able to remove them with recover
> instruction make sense, while we were not able to with MPhi.
> 
> I think Hannes (:h4writer) might be a less partial party than us here.

Hannes, what do you think about this patch, is it worth landing? Wins are ~1000 points on raytrace and 30% on the attached microbenchmark (for more information see above).
Flags: needinfo?(hv1989)
Comment on attachment 8765609 [details] [diff] [review]
v3.patch

Review of attachment 8765609 [details] [diff] [review]:
-----------------------------------------------------------------

I think we should go ahead and land this:
1) Can you also implement MConvertToNaNToZero::computeRange?
2) Not sure about the name since it converts NaN and -0.0. Can you definitely add a comment above the MIR explaining both are happening. To make the MIR name shorter I think MNaNToZero would work too.

::: js/src/jit/MIR.cpp
@@ +2289,5 @@
>              c->block()->moveBefore(pred->lastIns(), c);
>          return trueDef;
>      }
>  
> +    // - fold testArg ? testArg : 0.0 to ConvertNaNToZero(testArg)

// If testArg is an double type we can:
// - fold testArg ? testArg : 0.0 to MConvertNaNToZero(testArg)
Flags: needinfo?(hv1989)
Priority: -- → P5
Attached patch v4.patch (obsolete) — Splinter Review
Let's get this in. Rebased, applied comments and re-measured (with same results).
Attachment #8765609 - Attachment is obsolete: true
Attachment #8800864 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8800864 [details] [diff] [review]
v4.patch

Review of attachment 8800864 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, one last thing and the few nits, and this patch would be ready to land. :)

::: js/src/jit/CodeGenerator.cpp
@@ +12018,5 @@
> +    if (lir->mir()->operandIsNeverNegativeZero()){
> +        masm.branchDouble(Assembler::DoubleUnordered, input, input, ool->entry());
> +    } else {
> +        ScratchDoubleScope scratch(masm);
> +        masm.loadConstantDouble(0.0, scratch);

We should not use scratch register outside the MacroAssembler definitions, lower this instruction with a tempDouble() instead in Lowering.cpp

::: js/src/jit/MIR.cpp
@@ +2350,5 @@
>          return trueDef;
>      }
>  
> +    // If testArg is an double type we can:
> +    // - fold testArg ? testArg : 0.0 to MConvertNaNToZero(testArg)

nit: Update this comment with the new name of the MIR instruction.

::: js/src/jit/RangeAnalysis.cpp
@@ +1886,5 @@
>  
> +void
> +MNaNToZero::computeRange(TempAllocator& alloc)
> +{
> +    Range other(getOperand(0));

nit: s/getOperand(0)/input()/

@@ +3487,5 @@
> +MNaNToZero::collectRangeInfoPreTrunc()
> +{
> +    if (!Range(input()).canBeNaN())
> +        operandIsNeverNaN_ = true;
> +    if (!Range(input()).canBeNegativeZero())

nit: The Range constructor is not cheap, do it once.

::: js/src/jit/Recover.h
@@ +483,5 @@
> +{
> +  public:
> +    RINSTRUCTION_HEADER_(NaNToZero)
> +
> +    virtual uint32_t numOperands() const {

nit: Use RINSTRUCTION_HEADER_NUM_OP_ now, as above&below
Attachment #8800864 - Flags: review?(nicolas.b.pierron) → feedback+
Attached patch v5.patch (obsolete) — Splinter Review
Thanks for the quick feedback!
Attachment #8800864 - Attachment is obsolete: true
Attachment #8802634 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8802634 [details] [diff] [review]
v5.patch

Review of attachment 8802634 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: js/src/jit/Lowering.cpp
@@ +4849,5 @@
> +    if (ins->operandIsNeverNaN() && ins->operandIsNeverNegativeZero()) {
> +        redefine(ins, input);
> +        return;
> +    }
> +    LNaNToZero* lir = new(alloc()) LNaNToZero(useRegisterAtStart(input), tempDouble());

maybe-follow-up: tempDouble is not needed if operandIsNeverNegativeZero is true, in such case we could use a LDefinition::BogusTemp().

::: js/src/jit/MIR.h
@@ +7741,5 @@
> +    }
> +  public:
> +    INSTRUCTION_HEADER(NaNToZero)
> +    static MNaNToZero* New(TempAllocator& alloc, MDefinition* input) {
> +        return new(alloc) MNaNToZero(input);

nit: use TRIVIAL_NEW_WRAPPERS macro, which does the same thing for the infallible allocator and the fallible allocator version of it.

@@ +7745,5 @@
> +        return new(alloc) MNaNToZero(input);
> +    };
> +
> +    MDefinition* input() const {
> +        return getOperand(0);

nit: Remove this function as it is already defined as part of MUnaryInstruction [1].  Otherwise, in the future use NAMED_OPERANDS macro for such definitions.

[1] http://searchfox.org/mozilla-central/rev/d96317a351af8aa78ab9847e7feed964bbaac7d7/js/src/jit/MIR.h#1300
Attachment #8802634 - Flags: review?(nicolas.b.pierron) → review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/01d621c2dbe3
Try to fold ternary's with double-argument to NaNToZero. r=nbp
Keywords: checkin-needed
(In reply to Pulsebot from comment #29)
> Pushed by ryanvm@gmail.com:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/01d621c2dbe3
> Try to fold ternary's with double-argument to NaNToZero. r=nbp

Nice work! If you are interested in fixing another bug, bug 1277086 is also about improving a foldsTo.
https://hg.mozilla.org/mozilla-central/rev/01d621c2dbe3
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Depends on: 1314438
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: