Closed Bug 1176864 Opened 9 years ago Closed 9 years ago

Modulo operator is not truncated for int32

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox41 --- affected
firefox42 --- fixed

People

(Reporter: sandervv, Unassigned)

References

Details

Attachments

(1 file, 3 obsolete files)

Attached patch modulo-with-int32.patch (obsolete) — Splinter Review
When running the emscripten benchmark 'copy', the code generated for the modulo operator is slow compared to v8. Consider the generated assembly code for the DivOrModConstantI opcode.

> [DivOrModConstantI]                                                             
> movl       $0x80808081, %eax                                                    
> imull      %r8d                                                                 
> addl       %r8d, %edx                                                           
> sarl       $7, %edx                                                             
> movl       %r8d, %eax                                                           
> sarl       $31, %eax                                                            
> subl       %eax, %edx                                                           
> imull      $-255, %edx, %eax                                                    
> addl       %r8d, %eax                                                           
> testl      %r8d, %r8d                                                           
> jge        .Lfrom840                                                            
> testl      %eax, %eax                                                           
> je         .Lfrom848  

In the copy benchmark, the result of the module operator stored in an int32, and the modulo operand is a constant. This enables a similar optimization that is enabled for the division operator (see attached commit). After applying the patch, the following code is generated for the DivOrModConstantI opcode:

> [DivOrModConstantI]                                                           
> movl       $0x80808081, %eax                                                  
> imull      %r8d                                                               
> addl       %r8d, %edx                                                         
> sarl       $7, %edx                                                           
> movl       %r8d, %eax                                                         
> sarl       $31, %eax                                                          
> subl       %eax, %edx                                                         
> imull      $-255, %edx, %eax                                                  
> addl       %r8d, %eax 

Below are the benchmark results before the patch:

> LLVM_3_2 not defined, using our LLVM instead (/home/smvv/work/leaningtech/emsdk_portable/clang/fastcomp/build_master_64/bin)
> Running Emscripten benchmarks... [ Tue Jun 23 17:03:48 2015 | em: commit 2643bd9513638ea32f8023c153f50d2564745a5f | llvm: /home/smvv/work/leaningtech/emsdk_portable/clang/fastcomp/build_master_64/bin ]
> test_copy (test_benchmark.benchmark) ... 
>        clang: mean: 6.127 (+-0.001) secs  median: 6.127  range: 6.127-6.129  (noise: 0.019%)  (3 runs)
>           sm: mean: 5.989 (+-0.037) secs  median: 5.971  range: 5.938-6.023  (noise: 0.617%)  (3 runs)   Relative: 0.98 X slower
>     sm-noasm: mean: 6.936 (+-0.045) secs  median: 6.912  range: 6.875-6.983  (noise: 0.648%)  (3 runs)   Relative: 1.13 X slower
>           v8: mean: 1.161 (+-0.012) secs  median: 1.152  range: 1.152-1.177  (noise: 1.016%)  (3 runs)   Relative: 0.19 X slower

and after applying the patch:

> LLVM_3_2 not defined, using our LLVM instead (/home/smvv/work/leaningtech/emsdk_portable/clang/fastcomp/build_master_64/bin)
> Running Emscripten benchmarks... [ Tue Jun 23 16:53:14 2015 | em: commit 2643bd9513638ea32f8023c153f50d2564745a5f | llvm: /home/smvv/work/leaningtech/emsdk_portable/clang/fastcomp/build_master_64/bin ]
> test_copy (test_benchmark.benchmark) ... 
>        clang: mean: 6.122 (+-0.000) secs  median: 6.122  range: 6.122-6.123  (noise: 0.004%)  (3 runs)
>           sm: mean: 6.026 (+-0.039) secs  median: 5.998  range: 5.993-6.081  (noise: 0.653%)  (3 runs)   Relative: 0.98 X slower
>     sm-noasm: mean: 5.621 (+-0.022) secs  median: 5.605  range: 5.602-5.652  (noise: 0.394%)  (3 runs)   Relative: 0.92 X slower
>            v8: mean: 1.159 (+-0.014) secs  median: 1.149  range: 1.149-1.178  (noise: 1.210%)  (3 runs)   Relative: 0.19 X slower

You can see that with the patch applied the benchmark on 'sm-noasm' has a mean of 5.621 and without a mean of 6.936.
Attachment #8625466 - Flags: review?(jdemooij)
As it touches Range Analysis, cc'ing the RA people.
Comment on attachment 8625466 [details] [diff] [review]
modulo-with-int32.patch

Review of attachment 8625466 [details] [diff] [review]:
-----------------------------------------------------------------

Taking review from Jan. Since I flood him with patches. That way I can contribute back.
Also we have a small policy that we need two r+ for RA stuff. Since it is sooooo easy to break.
So nominating nbp.
Attachment #8625466 - Flags: review?(nicolas.b.pierron)
Attachment #8625466 - Flags: review?(jdemooij)
Attachment #8625466 - Flags: review?(hv1989)
Comment on attachment 8625466 [details] [diff] [review]
modulo-with-int32.patch

Review of attachment 8625466 [details] [diff] [review]:
-----------------------------------------------------------------

I cannot see any corner case for the moment.
Attachment #8625466 - Flags: review?(nicolas.b.pierron) → review+
Note that this int32 truncation is already done for modulo in asmjs: https://mxr.mozilla.org/mozilla-central/source/js/src/jit/MIR.h#6450
Comment on attachment 8625466 [details] [diff] [review]
modulo-with-int32.patch

Review of attachment 8625466 [details] [diff] [review]:
-----------------------------------------------------------------

Good find!

::: js/src/jit/RangeAnalysis.cpp
@@ +2905,5 @@
> +    // Special case integer division and modulo: a/b can be infinite, and a%b
> +    // can be NaN but cannot actually have rounding errors induced by truncation.
> +    if ((candidate->isDiv() || candidate->isMod())
> +        && candidate->to<MBinaryArithInstruction>()->specialization() == MIRType_Int32)
> +    {

Small style nit: can you move the && on the previous line?
if (xxxx &&
    yyyy)
{

}
Attachment #8625466 - Flags: review?(hv1989) → review+
Attached patch modulo-with-int32.patch (obsolete) — Splinter Review
Thanks for the review. I've corrected the style nit.
Attachment #8625466 - Attachment is obsolete: true
Attached patch modulo-with-int32.patch (obsolete) — Splinter Review
I had to use static_cast<>() instead of the candidate->to<>() method, since to() checks if the type's typename is equal to the target typename using MOZ_ASSERT. MBinaryArithInstruction is a super class, and thus not the same typename, hence a compile failure during a non-optimized build.

I've submitted the patch to the try servers as well; https://treeherder.mozilla.org/#/jobs?repo=try&revision=53236160ef0c
Attachment #8626287 - Attachment is obsolete: true
The try build is done (https://treeherder.mozilla.org/#/jobs?repo=try&revision=53236160ef0c), but there were two 'testfails' for 'Mochitest e10s Browser Chrome'. I do not think that those two errors are caused by the patch above.

(BTW, is there a way to submit the try build results automatically to this bug report?)
Attachment #8626347 - Attachment is obsolete: true
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/48efca5bf907
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
According to AWFY:

on Mac 32-bit this was:
1.3% regression on asmjs-apps-zlib-throughput
6.9% regression on asmjs-ubench-fasta
19.4% improvement on asmjs-ubench-copy
6.5% improvement on asmjs-ubench-primes

on Mac 64-bit this was:
12.6% regression on asmjs-ubench-fasta
19% improvement on asmjs-ubench-copy
4.5% improvement on asmjs-ubench-life

All these results are for "Firefox (no asmjs)"
See Also: → 1180874
Depends on: 1180874
See Also: 1180874
The regressions caused by this patch are fixed in bug 1180874.

The final results on Mac 32-bit (http://arewefastyet.com/#machine=28&view=single&suite=asmjs-ubench) are:
1.0% regression on asmjs-apps-zlib-throughput
(12.4% - 6.9%) = 5.5% improvement on asmjs-ubench-fasta.
19.4% improvement on asmjs-ubench-copy
6.5% improvement on asmjs-ubench-primes

Final results on Mac 64-bit (http://arewefastyet.com/#machine=29&view=breakdown&suite=asmjs-ubench) are:
0.1% improvement on asmjs-ubench-fasta
19% improvement on asmjs-ubench-copy
4.5% improvement on asmjs-ubench-life
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: