Closed Bug 1302367 Opened 4 years ago Closed 4 years ago

Huge regression in Emscripten's conditionals caused by using ModD instead of integer modulo

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: sandervv, Assigned: sandervv)

References

Details

Attachments

(3 files, 5 obsolete files)

When running Emscripten's conditionals benchmark, there is a huge regression for running Emscripten's JavaScript code on SpiderMonkey with asmjs disabled and Cheerp's code on plain SpiderMonkey.

smvv@14d63f6018c8 ~/work/emsdk_portable/emscripten/master/tests $ python2 runner.py benchmark.test_conditionals
WARNING:root:use EM_ALL_ENGINES=1 in the env to run against all JS engines, which is slower but provides more coverage
LLVM_3_2 not defined, using our LLVM instead (/home/smvv/work/emsdk_portable/clang/fastcomp/build_master_64/bin)
Running Emscripten benchmarks... [ Tue Sep 13 08:33:39 2016 | em: commit 2dc0821adbdff0b54f0eae4e0252b2b9839754a6 | llvm: /home/smvv/work/emsdk_portable/clang/fastcomp/build_master_64/bin ]
test_conditionals (test_benchmark.benchmark) ...
        clang: mean: 1.920 (+-0.003) secs  median: 1.918  range: 1.918-1.924  (noise: 0.146%)  (3 runs)
           sm: mean: 1.697 (+-0.018) secs  median: 1.686  range: 1.675-1.719  (noise: 1.062%)  (3 runs)   Relative: 0.88 X slower
     sm-noasm: mean: 89.040 (+-0.088) secs  median: 88.978  range: 88.958-89.162  (noise: 0.099%)  (3 runs)   Relative: 46.37 X slower
    sm-cheerp: mean: 83.776 (+-0.792) secs  median: 83.216  range: 83.180-84.895  (noise: 0.945%)  (3 runs)   Relative: 43.63 X slower

The reason is that the inner loop for sm-noasm and sm-cheerp contains the following opcodes and instructions:

BB #6 [00065,12,3] -> #10 -> #8 :: 5030337 hits
[MoveGroup]
movl       0xc(%rsp), %eax
[AddI]
addl       $2, %eax
[MoveGroup]
movl       %eax, 0x8(%rsp)
movl       0xc(%rsp), %eax
[MulI:Integer]
imull      0xc(%rsp), %eax
[AddI]
addl       $11, %eax
[MoveGroup]
movl       %eax, %ecx
[UrshD]
cvtsi2sd   %rcx, %xmm0
[MoveGroup]
movsd      0x18(%rsp), %xmm1
[ModD]
movq       %rsp, %rax
andq       $0xfffffffffffffff0, %rsp
push       %rax
subq       $8, %rsp
call       .Lfrom368
addq       $8, %rsp
pop        %rsp
[TruncateDToInt32]
cvttsd2si  %xmm0, %rcx
cmpq       $0x1, %rcx
jo         .Lfrom388
movl       %ecx, %ecx
.set .Llabel390, .
[Compare:stricteq]
testl      %ecx, %ecx
sete       %al
movzbl     %al, %eax
[TestIAndBranch]
testl      %eax, %eax
jne        .Lfrom406

line 12 is:

    Lx$p11$pi=(((Math.imul(Lx$p11$pi,Lx$p11$pi)+11>>0)>>>0)%3>>0)>>0===0||(((Math.imul(tmp3,Lx$p11$pi)+17>>0)>>0)%5>>0)>>0===0?tmp3:(Lx$p11$pi+1>>0);

In asmjs mode, an integer modulo operations is performed, while a double modulo operator is used when asmjs is disabled.
This is a big regression, because this was not the case a few months back.

Is this a regression in range analysis, or what would be a likely area of regression?
Using auto-bisect:

changeset:   314334:3aa448374082
user:        Nicolas B. Pierron <nicolas.b.pierron@mozilla.com>
date:        Tue Mar 08 12:54:19 2016 +0000
summary:     Bug 1247880 - Only remove MUrsh operands when the input of MUrsh is guaranteed to be unsigned. r=sunfish

(This patch is correct, just mentioning when the regression started)
So, if I understand correctly the problem is that MUsrh as a double output because we monitored it as having values larger than Int32 max as output.  Thus, we use the ModD variant instead of the unsigned one.

I guess the problem is the same as always, we do not have any way to express MIRType::UInt32.

Note, using a call to implement ModD sounds bad.
Since it touches range analysis I've asked Jandem for a second review as well.

Try job results are pending: https://treeherder.mozilla.org/#/jobs?repo=try&revision=94ac768b1d8c

Benchmark results for Conditionals:
        clang: mean: 1.919 (+-0.002) secs  median: 1.917  range: 1.917-1.922  (noise: 0.100%)  (3 runs)
           sm: mean: 1.613 (+-0.050) secs  median: 1.594  range: 1.543-1.651  (noise: 3.086%)  (3 runs)   Relative: 0.84 X slower
     sm-noasm: mean: 2.054 (+-0.056) secs  median: 2.015  range: 2.012-2.133  (noise: 2.724%)  (3 runs)   Relative: 1.07 X slower
    sm-cheerp: mean: 1.210 (+-0.000) secs  median: 1.210  range: 1.210-1.210  (noise: 0.005%)  (3 runs)   Relative: 0.63 X slower
           v8: mean: 1.268 (+-0.002) secs  median: 1.267  range: 1.266-1.271  (noise: 0.138%)  (3 runs)   Relative: 0.66 X slower
Assignee: nobody → sandervv
Attachment #8792477 - Flags: review?(nicolas.b.pierron)
Attachment #8792477 - Flags: review?(jdemooij)
Comment on attachment 8792477 [details] [diff] [review]
bug1302367-unsigned-integer-modulo.patch

Review of attachment 8792477 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.cpp
@@ +2736,5 @@
> +        if (constant->type() == MIRType::Int32 && constant->toInt32() == 0) {
> +            for (MUseIterator use(usesBegin()); use != usesEnd(); use++) {
> +                MNode* ins = use->consumer();
> +                if (!ins->isDefinition())
> +                    break;

nit: hasOneDefUse look for live uses of MUrsh, which excludes resume points.  MUseIterator, as opposed to MUseDefIterator iterates over all uses, including resume points.  Thus you might miss opportunities here.

::: js/src/jit/MIR.h
@@ +7064,5 @@
>              specialization_ = type;
>          setResultType(type);
> +        possiblyUnsigned = NotPossible;
> +
> +        if (left->isUrsh() && left->getOperand(1)->isConstant()) {

I don't think this is the right place for this condition, the reason why I have such doubt, is that this would not trigger after branch pruning optimization.

  a[((x ? y : z) >>> 0) % page_size]

The way IonBuilder works, it will introduce Phi node for all stack slots, so having a branch between the Ursh and the Modulo. Adding the condition in the MIR node constructor will work only for cases where both operations are next to each others, without any branches.  (which I admit is likely to be the most common cases)

On the other hand, I think this is an acceptable patch until we can come up with a MTruncateUint32, and a MIRType::Uint32.

::: js/src/jit/RangeAnalysis.cpp
@@ +1580,5 @@
> +                unsigned_ = true;
> +            break;
> +          case RHSPossible:
> +            if (lhs.lower() > 0 && !lhs.canHaveFractionalPart())
> +                unsigned_ = true;

nit: With an int32 specialization, we can even accept   lhs.lower() >= 0  , as we should not expect any negative zero.
Attachment #8792477 - Flags: review?(nicolas.b.pierron) → review+
Updated patch according to nbp's comments.

@npb can you confirm that I should use |hasOneUse| and use |continnue| when |!ins->isDefinition()| is true? I used |break| there in the previous patch. You said that it can miss opportunities there, so I thought I should use |continue| and |hasOneUse|.
Attachment #8792477 - Attachment is obsolete: true
Attachment #8792477 - Flags: review?(jdemooij)
Flags: needinfo?(nicolas.b.pierron)
Attachment #8792518 - Flags: review?(jdemooij)
(In reply to Sander Mathijs van Veen from comment #5)
> Created attachment 8792518 [details] [diff] [review]
> bug1302367-unsigned-integer-modulo.patch
> 
> Updated patch according to nbp's comments.
> 
> @npb can you confirm that I should use |hasOneUse| and use |continnue| when
> |!ins->isDefinition()| is true? I used |break| there in the previous patch.
> You said that it can miss opportunities there, so I thought I should use
> |continue| and |hasOneUse|.

If you use hasOneUse, then this will discard any MUrsh which are captured by a resume point, which will certainly occur inside loops as we are adding resume point in-between every instruction.

You can use |hasOneUseDef| and |MUseDefIterator|, or |MUseIterator| with a |continue|.
Flags: needinfo?(nicolas.b.pierron)
Comment on attachment 8792518 [details] [diff] [review]
bug1302367-unsigned-integer-modulo.patch

Review of attachment 8792518 [details] [diff] [review]:
-----------------------------------------------------------------

Clearing review per comment 6..

::: js/src/jit/MIR.cpp
@@ +2730,5 @@
>  
>  MDefinition*
>  MBinaryBitwiseInstruction::foldUnnecessaryBitop()
>  {
> +    if (isUrsh() && hasOneUse() && getOperand(1)->isConstant()) {

Add a brief comment explaining the case(s) we're trying to optimize here.
Attachment #8792518 - Flags: review?(jdemooij)
I've added a comment for the optimization use case, and fixed the hasOneDefUse() / MUseDefIterator issue.
Attachment #8792518 - Attachment is obsolete: true
Attachment #8794768 - Flags: review?(jdemooij)
Comment on attachment 8794768 [details] [diff] [review]
bug1302367-unsigned-integer-modulo.patch

Review of attachment 8794768 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +7035,5 @@
>  
>  class MMod : public MBinaryArithInstruction
>  {
> +  public:
> +    enum PossiblyUnsigned {

Nit: make this an enum class, better type safety is worth some verbosity IMO.

@@ +7049,5 @@
>      bool canBePowerOfTwoDivisor_;
>      bool canBeDivideByZero_;
>      bool trapOnError_;
>  
> +    PossiblyUnsigned possiblyUnsigned;

Nit: possiblyUnsigned_ (trailing '_'), for consistency.

@@ +7062,5 @@
>      {
>          if (type != MIRType::Value)
>              specialization_ = type;
>          setResultType(type);
> +        possiblyUnsigned = NotPossible;

Nit: I'd move this to the initializer list, possiblyUnsigned_(NotPossible)
Attachment #8794768 - Flags: review?(jdemooij) → review+
Fixed :jandem's comments.
Attachment #8794768 - Attachment is obsolete: true
Attachment #8797103 - Flags: review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6e75141df030
Use unsigned integer modulo instead of ModD opcode. r=nbp, r=jandem
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/6e75141df030
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Backed out in bug 1307523 for a fuzzblocker regression. (Feel free to fix here or there)

https://hg.mozilla.org/integration/mozilla-inbound/rev/7d42989271c4
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Priority: -- → P3
The problem was that specialization() == MIRType::Int32 should be true in order to skip the lhs.hasInt32Bounds() check. Otherwise, it will check specialization == MIRType::Int32 below (which will fail), and proceed with https://dxr.mozilla.org/mozilla-central/source/js/src/jit/RangeAnalysis.cpp#1609 since unsigned is not set.
Attachment #8797103 - Attachment is obsolete: true
Attachment #8798457 - Flags: review?(jdemooij)
Comment on attachment 8798457 [details] [diff] [review]
bug1302367-unsigned-integer-modulo.patch

Review of attachment 8798457 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/RangeAnalysis.cpp
@@ +1562,5 @@
>      // If either operand is a NaN, the result is NaN. This also conservatively
>      // handles Infinity cases.
> +    if (((specialization() != MIRType::Int32 ||
> +          possiblyUnsigned_ == PossiblyUnsigned::NotPossible) &&
> +         !lhs.hasInt32Bounds()) || !rhs.hasInt32Bounds())

Are the parentheses correct? Why do we add an extra condition for lhs.hasInt32Bounds() but not rhs.hasInt32Bounds()?
Attachment #8798457 - Flags: review?(jdemooij) → review?(nicolas.b.pierron)
Comment on attachment 8798457 [details] [diff] [review]
bug1302367-unsigned-integer-modulo.patch

Review of attachment 8798457 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +7171,5 @@
> +
> +        if (left->isUrsh() && left->getOperand(1)->isConstant()) {
> +            MConstant* constant = left->getOperand(1)->toConstant();
> +            if (constant->type() == MIRType::Int32 && constant->toInt32() == 0)
> +                possiblyUnsigned_ = PossiblyUnsigned::LHSPossible;

Now that I think about it, I am not sure I understand why these checks are here, knowing that we are not supposed to remove them before we reached the foldUnnecessaryBitop phase, which happens after Range Analysis.

::: js/src/jit/RangeAnalysis.cpp
@@ +1584,5 @@
> +                unsigned_ = true;
> +            break;
> +          case PossiblyUnsigned::RHSPossible:
> +            if (lhs.lower() >= 0 && !lhs.canHaveFractionalPart())
> +                unsigned_ = true;

I think we should also check that "rhs.lower() > 0".

@@ +1590,5 @@
> +          case PossiblyUnsigned::BothPossible:
> +            if (lhs.lower() >= 0 && !lhs.canHaveFractionalPart())
> +                unsigned_ = true;
> +            else if (rhs.lower() > 0 && !rhs.canHaveFractionalPart())
> +                unsigned_ = true;

I am not completely sure why we are having more checks when we got the information that both inputs are flowing from MUrsh.
Attachment #8798457 - Flags: review?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #18)
> Comment on attachment 8798457 [details] [diff] [review]
> bug1302367-unsigned-integer-modulo.patch
> 
> Review of attachment 8798457 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/MIR.h
> @@ +7171,5 @@
> > +
> > +        if (left->isUrsh() && left->getOperand(1)->isConstant()) {
> > +            MConstant* constant = left->getOperand(1)->toConstant();
> > +            if (constant->type() == MIRType::Int32 && constant->toInt32() == 0)
> > +                possiblyUnsigned_ = PossiblyUnsigned::LHSPossible;
> 
> Now that I think about it, I am not sure I understand why these checks are
> here, knowing that we are not supposed to remove them before we reached the
> foldUnnecessaryBitop phase, which happens after Range Analysis.

Consider the following test case:

$ cat jit-test/tests/ion/bug1247880.js
function f(x) {
    var a = x;
    a = Number ? a | 0 : 0;
    a = a >>> 0;
    a = Math.imul(0x100000001, a);
    a = a % 2;
    a = a | 0;
    return a;
};

assertEq(f(0), 0);
assertEq(f(-1), -1);

Note that the imul operator is removed (since a = 1 * a => a = a => no-op), and that happens before range analysis.
If we don't check during the MIR construction, but only during range analysis, the modulo will become unsigned since the unsigned shift right zero is now a direct operand of the modulo operator.

> ::: js/src/jit/RangeAnalysis.cpp
> @@ +1584,5 @@
> > +                unsigned_ = true;
> > +            break;
> > +          case PossiblyUnsigned::RHSPossible:
> > +            if (lhs.lower() >= 0 && !lhs.canHaveFractionalPart())
> > +                unsigned_ = true;
> 
> I think we should also check that "rhs.lower() > 0".

I agree. Otherwise we get a different result for (x >>> 0) % 0 when x is int32 compared to when ModD is used.

> @@ +1590,5 @@
> > +          case PossiblyUnsigned::BothPossible:
> > +            if (lhs.lower() >= 0 && !lhs.canHaveFractionalPart())
> > +                unsigned_ = true;
> > +            else if (rhs.lower() > 0 && !rhs.canHaveFractionalPart())
> > +                unsigned_ = true;
> 
> I am not completely sure why we are having more checks when we got the
> information that both inputs are flowing from MUrsh.

I agree.
Attachment #8798457 - Attachment is obsolete: true
Attachment #8799807 - Flags: review?(nicolas.b.pierron)
(In reply to Sander Mathijs van Veen from comment #19)
> Note that the imul operator is removed (since a = 1 * a => a = a => no-op),
> and that happens before range analysis.
> If we don't check during the MIR construction, but only during range
> analysis, the modulo will become unsigned since the unsigned shift right
> zero is now a direct operand of the modulo operator.

So, this sounds like another problem where we replace the Math.imul operation by a no-op instead of a TruncateInt32.  If we do not do that, we mess-up with the implicit Uint32 return type provided by MUrsh at the construction of the graph.

I hate rejecting patches, but adding logic in the MIR, means that you cannot benefit from other optimizations.  This implies that the following would work:

  a = a >>> 0; b = b >>> 0;
  c = a % b;

but not:

  a = a >>> 0; b = b >>> 0;
  if (true) {}
  c = a % b;

I do not like doing special case optimizations, because this implies that we would be re-implementing multiple times the same code all over the compiler, and this fails us with a performance cliff as soon as we go out of the special case.

Can you open another bug, and fix this issue first?

By doing so, you might be able to remove the logic from the constructor, and directly check for MUrsh within range analysis code.
Comment on attachment 8799807 [details] [diff] [review]
bug1302367-unsigned-integer-modulo.patch

Review of attachment 8799807 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/basic/bug1307523.js
@@ +1,3 @@
> +var x = {} || [];
> +(function() {
> +    x % (2147483647 >>> 0) ? 1 : 0

nit: use AssertEq.

::: js/src/jit/MIR.cpp
@@ +2743,5 @@
> +        MConstant* constant = getOperand(1)->toConstant();
> +        if (constant->type() == MIRType::Int32 && constant->toInt32() == 0) {
> +            for (MUseDefIterator use(this); use; use++) {
> +                if (use.def()->isMod() && use.def()->toMod()->isUnsigned())
> +                    return getOperand(0);

The goal here, is to remove the MUrsh shift operation.  Maybe we should add a MToUint32 in MUrsh::foldIfZero  replace MUrsh by this new instruction which when lowered redefine the register.
Attachment #8799807 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8803870 [details]
Bug 1302367 - Regression in Emscripten's conditionals caused by ModD

https://reviewboard.mozilla.org/r/88070/#review87040

::: js/src/jit/IonAnalysis.h:30
(Diff revision 2)
> +MOZ_MUST_USE bool
> +IsUint32Type(const MDefinition* def);

There is no allocation, thus remove the MOZ_MUST_USE macro.

::: js/src/jit/MIR.cpp:4149
(Diff revision 2)
> +    if (input->type() == MIRType::Int32 && !IsUint32Type(input))
>          return input;

Do the same in MToInt32::foldsTo

::: js/src/jit/RangeAnalysis.cpp:1564
(Diff revision 2)
> +        getOperand(1)->type() == MIRType::Int32 &&
> +        (IsUint32Type(getOperand(1)) || getOperand(1)->isConstant()))

This condition does not check against negative, or zero inputs of the rhs argument.

I suggest to merge this "if" statement with the other if statement which is dedicated of setting the unsigned flag.

::: js/src/jit/RangeAnalysis.cpp:1567
(Diff revision 2)
> +    if (specialization() == MIRType::Int32 &&
> +        IsUint32Type(getOperand(0)) &&
> +        getOperand(1)->type() == MIRType::Int32 &&
> +        (IsUint32Type(getOperand(1)) || getOperand(1)->isConstant()))
> +    {
> +        unsigned_ = true;

This mutate a flag which will change the compilation of MMod, and does it before the guards against rhs being 0.

The condition, or this flag set should be moved after the 2 early returns which are below this code.
Attachment #8803870 - Flags: review?(nicolas.b.pierron) → review-
Comment on attachment 8803870 [details]
Bug 1302367 - Regression in Emscripten's conditionals caused by ModD

https://reviewboard.mozilla.org/r/88070/#review87726

This sounds good to me.
Attachment #8803870 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8803870 [details]
Bug 1302367 - Regression in Emscripten's conditionals caused by ModD

https://reviewboard.mozilla.org/r/88070/#review89608
Attachment #8803870 - Flags: review?(jdemooij) → review+
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9653ade83c4b
Regression in Emscripten's conditionals caused by ModD r=jandem,nbp
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/9653ade83c4b
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Depends on: 1315640
You need to log in before you can comment on or make changes to this bug.