Closed Bug 1176864 Opened 9 years ago Closed 9 years ago

Modulo operator is not truncated for int32

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla42

Tracking Flags:

Tracking

Status

firefox41

---

affected

firefox42

---

fixed

People

(Reporter: sandervv, Unassigned)

References

Details

Attachments

(1 file, 3 obsolete files)

modulo-with-int32.patch 9 years ago Sander Mathijs van Veen 1.41 KB, patch	h4writer : review+ nbp : review+	Details \| Diff \| Splinter Review
modulo-with-int32.patch 9 years ago Sander Mathijs van Veen 1.41 KB, patch		Details \| Diff \| Splinter Review
modulo-with-int32.patch 9 years ago Sander Mathijs van Veen 1.37 KB, patch		Details \| Diff \| Splinter Review
modulo-with-int32.patch 9 years ago Sander Mathijs van Veen 1.67 KB, patch		Details \| Diff \| Splinter Review

Sander Mathijs van Veen

Reporter

Description

•

9 years ago

Attached patch modulo-with-int32.patch (obsolete) — Details — Splinter Review

When running the emscripten benchmark 'copy', the code generated for the modulo operator is slow compared to v8. Consider the generated assembly code for the DivOrModConstantI opcode.

> [DivOrModConstantI]                                                             
> movl       $0x80808081, %eax                                                    
> imull      %r8d                                                                 
> addl       %r8d, %edx                                                           
> sarl       $7, %edx                                                             
> movl       %r8d, %eax                                                           
> sarl       $31, %eax                                                            
> subl       %eax, %edx                                                           
> imull      $-255, %edx, %eax                                                    
> addl       %r8d, %eax                                                           
> testl      %r8d, %r8d                                                           
> jge        .Lfrom840                                                            
> testl      %eax, %eax                                                           
> je         .Lfrom848  

In the copy benchmark, the result of the module operator stored in an int32, and the modulo operand is a constant. This enables a similar optimization that is enabled for the division operator (see attached commit). After applying the patch, the following code is generated for the DivOrModConstantI opcode:

> [DivOrModConstantI]                                                           
> movl       $0x80808081, %eax                                                  
> imull      %r8d                                                               
> addl       %r8d, %edx                                                         
> sarl       $7, %edx                                                           
> movl       %r8d, %eax                                                         
> sarl       $31, %eax                                                          
> subl       %eax, %edx                                                         
> imull      $-255, %edx, %eax                                                  
> addl       %r8d, %eax 

Below are the benchmark results before the patch:

> LLVM_3_2 not defined, using our LLVM instead (/home/smvv/work/leaningtech/emsdk_portable/clang/fastcomp/build_master_64/bin)
> Running Emscripten benchmarks... [ Tue Jun 23 17:03:48 2015 | em: commit 2643bd9513638ea32f8023c153f50d2564745a5f | llvm: /home/smvv/work/leaningtech/emsdk_portable/clang/fastcomp/build_master_64/bin ]
> test_copy (test_benchmark.benchmark) ... 
>        clang: mean: 6.127 (+-0.001) secs  median: 6.127  range: 6.127-6.129  (noise: 0.019%)  (3 runs)
>           sm: mean: 5.989 (+-0.037) secs  median: 5.971  range: 5.938-6.023  (noise: 0.617%)  (3 runs)   Relative: 0.98 X slower
>     sm-noasm: mean: 6.936 (+-0.045) secs  median: 6.912  range: 6.875-6.983  (noise: 0.648%)  (3 runs)   Relative: 1.13 X slower
>           v8: mean: 1.161 (+-0.012) secs  median: 1.152  range: 1.152-1.177  (noise: 1.016%)  (3 runs)   Relative: 0.19 X slower

and after applying the patch:

> LLVM_3_2 not defined, using our LLVM instead (/home/smvv/work/leaningtech/emsdk_portable/clang/fastcomp/build_master_64/bin)
> Running Emscripten benchmarks... [ Tue Jun 23 16:53:14 2015 | em: commit 2643bd9513638ea32f8023c153f50d2564745a5f | llvm: /home/smvv/work/leaningtech/emsdk_portable/clang/fastcomp/build_master_64/bin ]
> test_copy (test_benchmark.benchmark) ... 
>        clang: mean: 6.122 (+-0.000) secs  median: 6.122  range: 6.122-6.123  (noise: 0.004%)  (3 runs)
>           sm: mean: 6.026 (+-0.039) secs  median: 5.998  range: 5.993-6.081  (noise: 0.653%)  (3 runs)   Relative: 0.98 X slower
>     sm-noasm: mean: 5.621 (+-0.022) secs  median: 5.605  range: 5.602-5.652  (noise: 0.394%)  (3 runs)   Relative: 0.92 X slower
>            v8: mean: 1.159 (+-0.014) secs  median: 1.149  range: 1.149-1.178  (noise: 1.210%)  (3 runs)   Relative: 0.19 X slower

You can see that with the patch applied the benchmark on 'sm-noasm' has a mean of 5.621 and without a mean of 6.936.

Sander Mathijs van Veen

Reporter

Updated

•

9 years ago

Attachment #8625466 - Flags: review?(jdemooij)

Benjamin Bouvier [:bbouvier] (inactive)

Comment 1

•

9 years ago

As it touches Range Analysis, cc'ing the RA people.

Hannes Verschore [:h4writer]

Comment 2

•

9 years ago

Comment on attachment 8625466 [details] [diff] [review]
modulo-with-int32.patch

Review of attachment 8625466 [details] [diff] [review]:
-----------------------------------------------------------------

Taking review from Jan. Since I flood him with patches. That way I can contribute back.
Also we have a small policy that we need two r+ for RA stuff. Since it is sooooo easy to break.
So nominating nbp.

Attachment #8625466 - Flags: review?(nicolas.b.pierron)

Attachment #8625466 - Flags: review?(jdemooij)

Attachment #8625466 - Flags: review?(hv1989)

Nicolas B. Pierron [:nbp]

Comment 3

•

9 years ago

Comment on attachment 8625466 [details] [diff] [review]
modulo-with-int32.patch

Review of attachment 8625466 [details] [diff] [review]:
-----------------------------------------------------------------

I cannot see any corner case for the moment.

Attachment #8625466 - Flags: review?(nicolas.b.pierron) → review+

Sander Mathijs van Veen

Reporter

Comment 4

•

9 years ago

Note that this int32 truncation is already done for modulo in asmjs: https://mxr.mozilla.org/mozilla-central/source/js/src/jit/MIR.h#6450

Hannes Verschore [:h4writer]

Comment 5

•

9 years ago

Comment on attachment 8625466 [details] [diff] [review]
modulo-with-int32.patch

Review of attachment 8625466 [details] [diff] [review]:
-----------------------------------------------------------------

Good find!

::: js/src/jit/RangeAnalysis.cpp
@@ +2905,5 @@
> +    // Special case integer division and modulo: a/b can be infinite, and a%b
> +    // can be NaN but cannot actually have rounding errors induced by truncation.
> +    if ((candidate->isDiv() || candidate->isMod())
> +        && candidate->to<MBinaryArithInstruction>()->specialization() == MIRType_Int32)
> +    {

Small style nit: can you move the && on the previous line?
if (xxxx &&
    yyyy)
{

}

Attachment #8625466 - Flags: review?(hv1989) → review+

Sander Mathijs van Veen

Reporter

Comment 6

•

9 years ago

Attached patch modulo-with-int32.patch (obsolete) — Details — Splinter Review

Thanks for the review. I've corrected the style nit.

Attachment #8625466 - Attachment is obsolete: true

Sander Mathijs van Veen

Reporter

Comment 7

•

9 years ago

Attached patch modulo-with-int32.patch (obsolete) — Details — Splinter Review

I had to use static_cast<>() instead of the candidate->to<>() method, since to() checks if the type's typename is equal to the target typename using MOZ_ASSERT. MBinaryArithInstruction is a super class, and thus not the same typename, hence a compile failure during a non-optimized build.

I've submitted the patch to the try servers as well; https://treeherder.mozilla.org/#/jobs?repo=try&revision=53236160ef0c

Attachment #8626287 - Attachment is obsolete: true

Sander Mathijs van Veen

Reporter

Comment 8

•

9 years ago

The try build is done (https://treeherder.mozilla.org/#/jobs?repo=try&revision=53236160ef0c), but there were two 'testfails' for 'Mochitest e10s Browser Chrome'. I do not think that those two errors are caused by the patch above.

(BTW, is there a way to submit the try build results automatically to this bug report?)

Sander Mathijs van Veen

Reporter

Comment 9

•

9 years ago

Attached patch modulo-with-int32.patch — Details — Splinter Review

Attachment #8626347 - Attachment is obsolete: true

Sander Mathijs van Veen

Reporter

Updated

•

9 years ago

Keywords: checkin-needed

Pulsebot

Comment 10

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/48efca5bf907

Keywords: checkin-needed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 11

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/48efca5bf907

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox42: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla42

Guilherme Lima

Comment 12

•

9 years ago

According to AWFY:

on Mac 32-bit this was:
1.3% regression on asmjs-apps-zlib-throughput
6.9% regression on asmjs-ubench-fasta
19.4% improvement on asmjs-ubench-copy
6.5% improvement on asmjs-ubench-primes

on Mac 64-bit this was:
12.6% regression on asmjs-ubench-fasta
19% improvement on asmjs-ubench-copy
4.5% improvement on asmjs-ubench-life

All these results are for "Firefox (no asmjs)"

Sander Mathijs van Veen

Reporter

Updated

•

9 years ago

Updated

•

9 years ago

Depends on: 1180874

Comment 13

•

9 years ago

The regressions caused by this patch are fixed in bug 1180874.

The final results on Mac 32-bit (http://arewefastyet.com/#machine=28&view=single&suite=asmjs-ubench) are:
1.0% regression on asmjs-apps-zlib-throughput
(12.4% - 6.9%) = 5.5% improvement on asmjs-ubench-fasta.
19.4% improvement on asmjs-ubench-copy
6.5% improvement on asmjs-ubench-primes

Final results on Mac 64-bit (http://arewefastyet.com/#machine=29&view=breakdown&suite=asmjs-ubench) are:
0.1% improvement on asmjs-ubench-fasta
19% improvement on asmjs-ubench-copy
4.5% improvement on asmjs-ubench-life

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Modulo operator is not truncated for int32

Categories

(Core :: JavaScript Engine: JIT, defect)

Tracking

()

People

(Reporter: sandervv, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 3 obsolete files)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Comment 11

Comment 12

Updated

Updated

Comment 13

Attachment

General

Description

File Name

Content Type