Closed Bug 898468 Opened 6 years ago Closed 6 years ago

min and max optimizations

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla25

People

(Reporter: sunfish, Unassigned)

Details

Attachments

(2 files)

The following are two min/max optimization patches for x86/x64.
On my micro-benchmarks, this implementation is about the same speed on inputs where one operand of the min/max is always less/greater than the other, and about 15% faster when the operands are chosen randomly, due to having fewer branch mispredictions.

It also uses andpd instead of addsd to take the max of -0 and 0. This has the advantage of not requiring a branch to test for zero, and andpd is commonly a single-cycle operation. This has the disadvantage that andpd commonly shares execution resources with branches, so it's slower on tight loops with dense branches. But, it's faster, even on non-zero values, under less extreme conditions.
Attachment #781743 - Flags: review?(evilpies)
This patch builds on the previous one, using range analysis to eliminate NaN checks when possible.
Attachment #781745 - Flags: review?(evilpies)
Comment on attachment 781743 [details] [diff] [review]
optimize-min-max.patch

Review of attachment 781743 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/shared/CodeGenerator-x86-shared.cpp
@@ +388,2 @@
>  
> +    // Do a ucomisd to catch equality and NaNs, which both require special

an ucomisd

@@ +397,3 @@
>  
> +    // Ordered and equal. The operands are bit-identical unless they are zero
> +    // and is negative zero. These instructions merge the sign bits in that

is? This description is kind of weird.

@@ +413,3 @@
>  
> +    // When the values are inequal, or second is NaN, x86's min and max will
> +    // return the value we need.

I think this page http://x86.renejeschke.de/html/file_module_x86_id_173.html actually hints at what you implemented. Nice.
Attachment #781743 - Flags: review?(evilpies) → review+
Comment on attachment 781745 [details] [diff] [review]
optimize min/max with range analysis

Review of attachment 781745 [details] [diff] [review]:
-----------------------------------------------------------------

This certainly makes the code a lot tighter compared to what I had.
Attachment #781745 - Flags: review?(evilpies) → review+
(In reply to Tom Schuster [:evilpie] from comment #3)
> ::: js/src/ion/shared/CodeGenerator-x86-shared.cpp
> @@ +388,2 @@
> >  
> > +    // Do a ucomisd to catch equality and NaNs, which both require special
> 
> an ucomisd

Depends how you pronounce it mentally.  I think of it as "you-com-eye-ess-dee", in which case "a ucomisd" makes sense.
I left "a ucomisd" because I agree with Jeff, but I'm ok changing it if you prefer. And I accidentally left the "and is negative zero" error in my first push, so I did an extra push to fix it.

https://hg.mozilla.org/integration/mozilla-inbound/rev/86ea384b9c6c
https://hg.mozilla.org/integration/mozilla-inbound/rev/f2f06acd92b5
https://hg.mozilla.org/integration/mozilla-inbound/rev/0066614e509f
Whiteboard: [leave open]
(In reply to Tom Schuster [:evilpie] from comment #3)
> @@ +413,3 @@
> >  
> > +    // When the values are inequal, or second is NaN, x86's min and max will
> > +    // return the value we need.
> 
> I think this page http://x86.renejeschke.de/html/file_module_x86_id_173.html
> actually hints at what you implemented. Nice.

Actually that's something different yet :). I believe that's talking about using cmpsd and then and+andn+or to synthesize a conditional move to fix up the nan case. It has the advantage of not using any branches. It has the disadvantage of using more instructions, and the dependence chain in common cases is longer. Then, JavaScript has a fairly unique additional requirement, that min and max treat negative zero as less than zero, so it'd require even more instructions to handle that.

My solution uses fewer instructions, though it does so at the cost of using branches (though there are fewer branches than before, and the branch which is most often mispredicted is gone). I expect it's better than a branchless approach, though I haven't measured that.
You need to log in before you can comment on or make changes to this bug.