Closed Bug 1073016 Opened 5 years ago Closed 5 years ago

Optimize LRound codegen

Categories

(Core :: JavaScript Engine: JIT, defect)

x86
All
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla35

People

(Reporter: jandem, Assigned: mukilanthiagarajan, Mentored)

Details

(Whiteboard: [good first bug][lang=c++])

Attachments

(1 file, 1 obsolete file)

Code generated for an inlined Math.round call looks like this:

// input < 0?
xorpd      %xmm7, %xmm7
ucomisd    %xmm2, %xmm7
ja         ((554))

// input == 0?
xorpd      %xmm7, %xmm7
ucomisd    %xmm7, %xmm2
jne        ((568))

The second xorpd + ucomisd is redundant. Furthermore, input > 0 is probably the most common case, so we should consider checking <= 0 first and then moving the negative zero check to that path.

Also, range analysis could probably elide the checks in some cases.

This should be a small (but measurable) win on Kraken audio-oscillator.
Hannes/Nicolas, if you're interested in bugs to mentor, this is probably a good one. Unfortunately I don't really have time to do it myself atm.
Mentor: nicolas.b.pierron
To work on this bug you need some knowledge about x86 assembly instruction.

The code which have to be modified is CodeGeneratorX86Share::visitRound [1], and probably do the same thing for visitRoundF in the same file.

To check performance on kraken benchmark, you can follow the Hacking Tips[2] documentation.  Be careful to stop any other program which might be active at the same time before benchmarking.

[1] http://dxr.mozilla.org/mozilla-central/source/js/src/jit/shared/CodeGenerator-x86-shared.cpp#1899
[2] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Hacking_Tips#Benchmarking_%28shell%29
Hardware: All → x86
Whiteboard: [good first bug][lang=c++]
Hi, I took a shot at this bug and implemented the change outlined in the description. 

On an x86 build, the results of the kraken benchmark (with 30 runs) were positive as expected:

TEST                COMPARISON            FROM                 TO               DETAILS

====================================================================================

** TOTAL **:        -                 1509.1ms +/- 0.4%   1502.9ms +/- 0.3%

====================================================================================

  audio:            -                 1509.1ms +/- 0.4%   1502.9ms +/- 0.3%
    beat-detection: -                  294.9ms +/- 1.0%    292.1ms +/- 1.1%
    dft:            -                  747.6ms +/- 0.2%    747.5ms +/- 0.2%
    fft:            ??                 173.9ms +/- 2.0%    174.7ms +/- 1.3%     might be *1.005x as slow*
    oscillator:     1.015x as fast     292.7ms +/- 0.5%    288.5ms +/- 0.5%     significant

However on an x64 build, the same changes cause the oscillator timings to increase rather than decrease as one would expect.

TEST                COMPARISON            FROM                 TO               DETAILS

====================================================================================

** TOTAL **:        *1.010x as slow*  1436.2ms +/- 0.4%   1450.0ms +/- 0.9%     significant

====================================================================================

  audio:            *1.010x as slow*  1436.2ms +/- 0.4%   1450.0ms +/- 0.9%     significant
    beat-detection: -                  281.2ms +/- 0.6%    281.4ms +/- 0.6%
    dft:            ??                 731.4ms +/- 0.7%    734.7ms +/- 1.8%     might be *1.005x as slow*
    fft:            ??                 156.9ms +/- 0.8%    157.5ms +/- 0.9%     might be *1.004x as slow*
    oscillator:     *1.036x as slow*   266.7ms +/- 0.5%    276.3ms +/- 0.5%     significant

I'm not sure what the reason for this is. Am I doing something wrong?

Thanks,
Mukilan
Attachment #8498073 - Flags: review?(nicolas.b.pierron)
(In reply to Mukilan Thiyagarajan from comment #3)
> Rev 1 - Optimize LRound and LRoundF
> 
> I'm not sure what the reason for this is. Am I doing something wrong?

Yes, the problem is that 30 runs is not enough to be precise enough, and running benchmarks accurately is complex / time consuming.
Assignee: nobody → mukilanthiagarajan
Comment on attachment 8498073 [details] [diff] [review]
Rev 1 - Optimize LRound and LRoundF

Review of attachment 8498073 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome, this looks good!
And I forgot in my previous comment, Welcome :)

::: js/src/jit/shared/MacroAssembler-x86-shared.cpp
@@ +198,5 @@
>  MacroAssemblerX86Shared::branchNegativeZero(FloatRegister reg,
>                                              Register scratch,
> +                                            Label *label,
> +                                            bool maybeNonZero
> +                                            )

style-nit: Move the closing bracket to the end of the previous line.

@@ +208,5 @@
>      Label nonZero;
>  
> +    // if not already compared to zero
> +    if (maybeNonZero) {
> +      // Compare to zero. Lets through {0, -0}.

style-nit: We indent by 4 in the JavaScript engine.
Attachment #8498073 - Flags: review?(nicolas.b.pierron) → review+
Hi Nicolas,

Thanks for the review :)

(In reply to Nicolas B. Pierron [:nbp] from comment #4)
> Yes, the problem is that 30 runs is not enough to be precise enough, and
> running benchmarks accurately is complex / time consuming.

Should I report back with more runs? 

Thanks,
Mukilan
Attachment #8498073 - Attachment is obsolete: true
Attachment #8498354 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8498354 [details] [diff] [review]
Rev 2 - Optimize LRound and LRoundF

Review of attachment 8498354 [details] [diff] [review]:
-----------------------------------------------------------------

I've submitted this patch to be checked on our continuous integration server[1].
Once it returns green, we would be able to land this patch on mozilla-inbound.

[1] https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=84bd32ea12ab
Attachment #8498354 - Flags: review?(nicolas.b.pierron) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/5beba92a3f8e

Congratulation for your patch!

This link above is the link to mozilla-inbound, which is where developers are landing their patches.  After a day or so, it would be merged by a Sheriff into mozilla-central, at which time it would be available in Nightly versions of firefox ;)

On October 14, this would be merged into the Aurora channel where it would ride the 6 weeks train to beta and release.

Do you have any idea what you might want to contribute next?
https://hg.mozilla.org/mozilla-central/rev/5beba92a3f8e
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
You need to log in before you can comment on or make changes to this bug.