Closed Bug 868535 Opened 7 years ago Closed 7 years ago

x64 backend uses idiv for divisions by constant powers of two

Categories

(Core :: JavaScript Engine, enhancement)

x86_64
All
enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla23

People

(Reporter: sunfish, Unassigned)

Details

Attachments

(1 file, 1 obsolete file)

Some profiling in a large Javascript application on x64 turned up a hot spot where the code was doing an idiv with a constant power-of-two denominator. This can be significantly sped up by using shifts and simpler arithmetic.

This optimization is especially important for emscripten-generated code, since LLVM's optimizer assumes that backends will be able to perform this optimization.
Attached patch an experimental patch (obsolete) — Splinter Review
Attached is an experimental patch which implements this. It's experimental; I'm pretty unfamiliar with this code, so I'd appreciate any and all feedback on whether this is the right approach or whether I've missed important considerations.

Also, this patch doesn't implement all possible division-by-constant optimizations, just positive powers of two.
Attachment #745293 - Flags: review?(nicolas.b.pierron)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment on attachment 745293 [details] [diff] [review]
an experimental patch

Review of attachment 745293 [details] [diff] [review]:
-----------------------------------------------------------------

The approach is good.  As you have different allocations, it makes sense to have a different LIR nodes.
I don't see any unhandled corner cases.

As I think you understand the code better than I do I will let you add the comment to the current patch.
Then attach the revised patch and ask me for checkin.

::: js/src/ion/shared/CodeGenerator-x86-shared.cpp
@@ +681,5 @@
>  bool
> +CodeGeneratorX86Shared::visitDivPowTwoI(LDivPowTwoI *ins)
> +{
> +    Register lhs = ToRegister(ins->getOperand(0));
> +    Register lhsCopy = ToRegister(ins->getOperand(1));

nit: Usually, we use accessors to wrap the getOperand, that way hard-coded indexes are all near each others.

@@ +697,5 @@
> +        if (shift > 1)
> +            masm.sarl(Imm32(31), lhs);
> +
> +        masm.shrl(Imm32(32 - shift), lhs);
> +        masm.addl(lhsCopy, lhs);

nit: Can you add a comment to explicit that these shifts and add are used to round negative numbers toward zero by adding "(1 << shift) - 1" to the original number.

Also, as an improvement for a later patch we can use the Range Analysis to check if they are needed, in which case we can even remove the copy of the left-hand-side.

@@ +702,5 @@
> +        masm.sarl(Imm32(shift), lhs);
> +    }
> +
> +    if (lhs == output)
> +      masm.movl(lhs, output);

typo: lhs != output.

nit: With defineReuseInput, they are not supposed to be different, so you can replace the current condition by an assertion.
Attachment #745293 - Flags: review?(nicolas.b.pierron) → review+
Attached patch revised patchSplinter Review
Thanks for the review! Here's the patch revised accordingly.
Attachment #745293 - Attachment is obsolete: true
Attachment #745458 - Flags: checkin?(nicolas.b.pierron)
Comment on attachment 745458 [details] [diff] [review]
revised patch

Review of attachment 745458 [details] [diff] [review]:
-----------------------------------------------------------------

https://hg.mozilla.org/integration/mozilla-inbound/rev/5c124c068992
Attachment #745458 - Flags: checkin?(nicolas.b.pierron) → checkin+
https://hg.mozilla.org/mozilla-central/rev/5c124c068992
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
You need to log in before you can comment on or make changes to this bug.