716504 - IonMonkey: LMulI should not bailout if the result is positive zero

Assignee

Description

•

13 years ago

Currently we always bailout if both operands are not constant and the result is 0. We should only bailout if the result will be -0, not if it's +0.

This should make ss-spectral-norm faster.

Hannes Verschore [:h4writer]

Comment 1

•

13 years ago

When implementing MUL it was deliberaty choosen not to do that (yet). It's also how JM/TM did it before and from the look of AWFY JM don't have a problem with that benchmark? So could it be the problem is somewhere else? Ofcourse when doing a loop and bailing on the first iteration will decrease speed for the first few iterations, but shouldn't OSR kick in to get back on the fast path? Now if it indeed does decreases performance a lot, this should get fixed. 
It is possible to check it rather easy by adding the checks (to check if the operands are negative) after the muli instruction and jump over the those checks when result isn't zero. This will of course degrade performance of normal multiplation a bit (because of taking the jmp). A better solution is to have those checks out of line.

Jan de Mooij [:jandem]

Assignee

Comment 2

•

13 years ago

Attached patch Patch — Details — Splinter Review

Jump to an OOL path if the result is 0. The OOL path then bails out if lhs or rhs is negative.

This patch helps spectral-norm, but only with the patch for bug 706472 applied, due to inlining different functions.

Attachment #587012 - Flags: review?(mrosenberg)

Attachment #587012 - Flags: review?(dvander)

Hannes Verschore [:h4writer]

Comment 3

•

13 years ago

Cool, that is already implemented. Just noticing but this uses an extra register. So extra pressure one the registers. Even when canBeNegativeZero() is set to false!

Jan de Mooij [:jandem]

Assignee

Comment 4

•

13 years ago

(In reply to Hannes Verschore from comment #1)
> When implementing MUL it was deliberaty choosen not to do that (yet). It's
> also how JM/TM did it before and from the look of AWFY JM don't have a
> problem with that benchmark?

JM also checks whether the lhs/rhs is negative, see jsop_binary_full (look for the "Restore original value" comment). Also, even if JM didn't do this, it'd fall back to a stub, which is more efficient than bailing out to the interpreter.

> Ofcourse when doing a loop and bailing on the first iteration will decrease
> speed for the first few iterations, but shouldn't OSR kick in to get back on
> the fast path? Now if it indeed does decreases performance a lot, this
> should get fixed. 

Yeah, OSR is "forbidden" once we bailout somewhere. We'll have to fix that too, but in the meantime this fix is still worthwhile.

> It is possible to check it rather easy by adding the checks (to check if the
> operands are negative) after the muli instruction and jump over the those
> checks when result isn't zero. This will of course degrade performance of
> normal multiplation a bit (because of taking the jmp). A better solution is
> to have those checks out of line.

Yep, see the patch I just attached.

Jan de Mooij [:jandem]

Assignee

Comment 5

•

13 years ago

(In reply to Hannes Verschore from comment #3)
> Cool, that is already implemented. Just noticing but this uses an extra
> register. So extra pressure one the registers. Even when canBeNegativeZero()
> is set to false!

The register for the original LHS value is also used by the snapshot, so it doesn't really matter. It's an extra register once we have mul's without overflow or -0 checks (because we don't need a snapshot then), but I didn't think handling that case was worth the complexity now.

Marty Rosenberg [:mjrosenb]

Comment 6

•

13 years ago

Comment on attachment 587012 [details] [diff] [review]
Patch

Review of attachment 587012 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/arm/CodeGenerator-arm.cpp
@@ +472,2 @@
>              masm.ma_cmp(ToRegister(lhs), Imm32(0));
> +            masm.ma_b(ool->entry(), Assembler::Equal);

I don't remember why we're comparing lhs with zero, shouldn't we be comparing dest with zero?

@@ +488,5 @@
> +    JS_ASSERT(result != lhs);
> +    JS_ASSERT(result != rhs);
> +
> +    // Result is -0 if lhs or rhs is negative.
> +    masm.ma_orr(lhs, rhs, result, SetCond);

use ma_cmn.  It adds its two arguments together, setting the condition bits accordingly, and discards the results.  This way we don't have to stomp over our poor return register.

@@ +492,5 @@
> +    masm.ma_orr(lhs, rhs, result, SetCond);
> +    if (!bailoutIf(Assembler::Signed, ins->snapshot()))
> +        return false;
> +
> +    masm.ma_mov(Imm32(0), result);

This can go away as well.

Is it actually worth it to put two instructions into an OOL path?

Attachment #587012 - Flags: review?(mrosenberg) → review+

Jan de Mooij [:jandem]

Assignee

Comment 7

•

13 years ago

(In reply to Marty Rosenberg [:Marty] from comment #6)
> I don't remember why we're comparing lhs with zero, shouldn't we be
> comparing dest with zero?

Yes, good catch. On x86/x64 dest and lhs are the same, but on ARM it should be dest.

> Is it actually worth it to put two instructions into an OOL path?

It's probably good for consistency.. On x86/x64 we have more instructions in the OOL path, and I'd like to use a forward (unlikely) branch there for the edge case (result == 0).  I don't know how this works on ARM, are forward branches seen as unlikely to be taken?

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

13 years ago

Attachment #587012 - Flags: review?(dvander) → review+

Jan de Mooij [:jandem]

Assignee

Comment 8

•

13 years ago

https://hg.mozilla.org/projects/ionmonkey/rev/23ce5e04429d

Got rid of the OOL path on ARM, as suggested in comment 6. It'd have only 2 instructions and especially on ARM the compile-time overhead is probably not worth it.

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Hannes Verschore [:h4writer]

Comment 9

•

13 years ago

Uhmm I was just wondering about this and I think there is a better way.
We have a scratch register, right?

So we could do the following:

 bool
 CodeGeneratorX86Shared::visitMulI(LMulI* mul) {
   ...

+  if (mul->canBeNegativeZero())
+    masm.mov(lhs, ScratchReg);

   masm.imull(ToOperand(rhs), ToRegister(lhs));

   ...
 }

 bool
 CodeGeneratorX86Shared::visitMulNegativeZeroCheck(MulNegativeZeroCheck *ool)
 {
     LMulI *ins = ool->ins();
     Register result = ToRegister(ins->output());
-    Register lhsCopy = ToRegister(ins->lhsCopy());
+    Register lhsCopy = ScratchReg;
     Operand rhs = ToOperand(ins->rhs());
     JS_ASSERT(lhsCopy != result);
 
     // Result is -0 if lhs or rhs is negative.
     masm.movl(lhsCopy, result);
     masm.orl(rhs, result);
     if (!bailoutIf(Assembler::Signed, ins->snapshot()))
         return false;
 
     masm.xorl(result, result);
     masm.jmp(ool->rejoin());
     return true;
 }


As result:
1) LMulNegativeZeroCheck can become LMul again
2) lowerForALU can get used again
3) We get the extra register back
4) We only create a copy of the lhs when needed.

One problem atm is that ScratchRegister isn't defined in CodeGenerator-shared.
(I think because it is a different value on X86 and X64)

Jan de Mooij [:jandem]

Assignee

Comment 10

•

13 years ago

(In reply to Hannes Verschore from comment #9)
> Uhmm I was just wondering about this and I think there is a better way.
> We have a scratch register, right?

Not on x86.

Also note that in the most common cases there's no extra move/register atm. Assuming LHS is in eax and the result goes into ecx, we'll emit something like:

mov  eax, ecx
imul $rhs, ecx

The original lhs, eax, is used by the snapshot, other instructions and to recover the lhs. If we used a scratch reg here, we'd have

mov  eax, ecx
mov  ecx, scratch
imul $rhs, ecx

We'd still need eax for the snapshot and other uses of the lhs by other instructions.

Hannes Verschore [:h4writer]

Comment 11

•

13 years ago

(In reply to Jan de Mooij (:jandem) from comment #10)
> (In reply to Hannes Verschore from comment #9)
> > Uhmm I was just wondering about this and I think there is a better way.
> > We have a scratch register, right?
> 
> Not on x86.
> 
Ah really. Oh ok. Thank you for explanation. I should probably look a bit more into the register allocator too ;)

Bugzilla

Quick Search

IonMonkey: LMulI should not bailout if the result is positive zero

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Comment 10

Comment 11

Attachment

General

Description

File Name

Content Type