Closed
Bug 1176864
Opened 9 years ago
Closed 9 years ago
Modulo operator is not truncated for int32
Categories
(Core :: JavaScript Engine: JIT, defect)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
mozilla42
People
(Reporter: sandervv, Unassigned)
References
Details
Attachments
(1 file, 3 obsolete files)
1.67 KB,
patch
|
Details | Diff | Splinter Review |
When running the emscripten benchmark 'copy', the code generated for the modulo operator is slow compared to v8. Consider the generated assembly code for the DivOrModConstantI opcode. > [DivOrModConstantI] > movl $0x80808081, %eax > imull %r8d > addl %r8d, %edx > sarl $7, %edx > movl %r8d, %eax > sarl $31, %eax > subl %eax, %edx > imull $-255, %edx, %eax > addl %r8d, %eax > testl %r8d, %r8d > jge .Lfrom840 > testl %eax, %eax > je .Lfrom848 In the copy benchmark, the result of the module operator stored in an int32, and the modulo operand is a constant. This enables a similar optimization that is enabled for the division operator (see attached commit). After applying the patch, the following code is generated for the DivOrModConstantI opcode: > [DivOrModConstantI] > movl $0x80808081, %eax > imull %r8d > addl %r8d, %edx > sarl $7, %edx > movl %r8d, %eax > sarl $31, %eax > subl %eax, %edx > imull $-255, %edx, %eax > addl %r8d, %eax Below are the benchmark results before the patch: > LLVM_3_2 not defined, using our LLVM instead (/home/smvv/work/leaningtech/emsdk_portable/clang/fastcomp/build_master_64/bin) > Running Emscripten benchmarks... [ Tue Jun 23 17:03:48 2015 | em: commit 2643bd9513638ea32f8023c153f50d2564745a5f | llvm: /home/smvv/work/leaningtech/emsdk_portable/clang/fastcomp/build_master_64/bin ] > test_copy (test_benchmark.benchmark) ... > clang: mean: 6.127 (+-0.001) secs median: 6.127 range: 6.127-6.129 (noise: 0.019%) (3 runs) > sm: mean: 5.989 (+-0.037) secs median: 5.971 range: 5.938-6.023 (noise: 0.617%) (3 runs) Relative: 0.98 X slower > sm-noasm: mean: 6.936 (+-0.045) secs median: 6.912 range: 6.875-6.983 (noise: 0.648%) (3 runs) Relative: 1.13 X slower > v8: mean: 1.161 (+-0.012) secs median: 1.152 range: 1.152-1.177 (noise: 1.016%) (3 runs) Relative: 0.19 X slower and after applying the patch: > LLVM_3_2 not defined, using our LLVM instead (/home/smvv/work/leaningtech/emsdk_portable/clang/fastcomp/build_master_64/bin) > Running Emscripten benchmarks... [ Tue Jun 23 16:53:14 2015 | em: commit 2643bd9513638ea32f8023c153f50d2564745a5f | llvm: /home/smvv/work/leaningtech/emsdk_portable/clang/fastcomp/build_master_64/bin ] > test_copy (test_benchmark.benchmark) ... > clang: mean: 6.122 (+-0.000) secs median: 6.122 range: 6.122-6.123 (noise: 0.004%) (3 runs) > sm: mean: 6.026 (+-0.039) secs median: 5.998 range: 5.993-6.081 (noise: 0.653%) (3 runs) Relative: 0.98 X slower > sm-noasm: mean: 5.621 (+-0.022) secs median: 5.605 range: 5.602-5.652 (noise: 0.394%) (3 runs) Relative: 0.92 X slower > v8: mean: 1.159 (+-0.014) secs median: 1.149 range: 1.149-1.178 (noise: 1.210%) (3 runs) Relative: 0.19 X slower You can see that with the patch applied the benchmark on 'sm-noasm' has a mean of 5.621 and without a mean of 6.936.
Reporter | ||
Updated•9 years ago
|
Attachment #8625466 -
Flags: review?(jdemooij)
Comment 1•9 years ago
|
||
As it touches Range Analysis, cc'ing the RA people.
Comment 2•9 years ago
|
||
Comment on attachment 8625466 [details] [diff] [review] modulo-with-int32.patch Review of attachment 8625466 [details] [diff] [review]: ----------------------------------------------------------------- Taking review from Jan. Since I flood him with patches. That way I can contribute back. Also we have a small policy that we need two r+ for RA stuff. Since it is sooooo easy to break. So nominating nbp.
Attachment #8625466 -
Flags: review?(nicolas.b.pierron)
Attachment #8625466 -
Flags: review?(jdemooij)
Attachment #8625466 -
Flags: review?(hv1989)
Comment 3•9 years ago
|
||
Comment on attachment 8625466 [details] [diff] [review] modulo-with-int32.patch Review of attachment 8625466 [details] [diff] [review]: ----------------------------------------------------------------- I cannot see any corner case for the moment.
Attachment #8625466 -
Flags: review?(nicolas.b.pierron) → review+
Reporter | ||
Comment 4•9 years ago
|
||
Note that this int32 truncation is already done for modulo in asmjs: https://mxr.mozilla.org/mozilla-central/source/js/src/jit/MIR.h#6450
Comment 5•9 years ago
|
||
Comment on attachment 8625466 [details] [diff] [review] modulo-with-int32.patch Review of attachment 8625466 [details] [diff] [review]: ----------------------------------------------------------------- Good find! ::: js/src/jit/RangeAnalysis.cpp @@ +2905,5 @@ > + // Special case integer division and modulo: a/b can be infinite, and a%b > + // can be NaN but cannot actually have rounding errors induced by truncation. > + if ((candidate->isDiv() || candidate->isMod()) > + && candidate->to<MBinaryArithInstruction>()->specialization() == MIRType_Int32) > + { Small style nit: can you move the && on the previous line? if (xxxx && yyyy) { }
Attachment #8625466 -
Flags: review?(hv1989) → review+
Reporter | ||
Comment 6•9 years ago
|
||
Thanks for the review. I've corrected the style nit.
Attachment #8625466 -
Attachment is obsolete: true
Reporter | ||
Comment 7•9 years ago
|
||
I had to use static_cast<>() instead of the candidate->to<>() method, since to() checks if the type's typename is equal to the target typename using MOZ_ASSERT. MBinaryArithInstruction is a super class, and thus not the same typename, hence a compile failure during a non-optimized build. I've submitted the patch to the try servers as well; https://treeherder.mozilla.org/#/jobs?repo=try&revision=53236160ef0c
Attachment #8626287 -
Attachment is obsolete: true
Reporter | ||
Comment 8•9 years ago
|
||
The try build is done (https://treeherder.mozilla.org/#/jobs?repo=try&revision=53236160ef0c), but there were two 'testfails' for 'Mochitest e10s Browser Chrome'. I do not think that those two errors are caused by the patch above. (BTW, is there a way to submit the try build results automatically to this bug report?)
Reporter | ||
Comment 9•9 years ago
|
||
Attachment #8626347 -
Attachment is obsolete: true
Reporter | ||
Updated•9 years ago
|
Keywords: checkin-needed
Comment 10•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/48efca5bf907
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/48efca5bf907
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox42:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Comment 12•9 years ago
|
||
According to AWFY: on Mac 32-bit this was: 1.3% regression on asmjs-apps-zlib-throughput 6.9% regression on asmjs-ubench-fasta 19.4% improvement on asmjs-ubench-copy 6.5% improvement on asmjs-ubench-primes on Mac 64-bit this was: 12.6% regression on asmjs-ubench-fasta 19% improvement on asmjs-ubench-copy 4.5% improvement on asmjs-ubench-life All these results are for "Firefox (no asmjs)"
Updated•9 years ago
|
Reporter | ||
Comment 13•9 years ago
|
||
The regressions caused by this patch are fixed in bug 1180874. The final results on Mac 32-bit (http://arewefastyet.com/#machine=28&view=single&suite=asmjs-ubench) are: 1.0% regression on asmjs-apps-zlib-throughput (12.4% - 6.9%) = 5.5% improvement on asmjs-ubench-fasta. 19.4% improvement on asmjs-ubench-copy 6.5% improvement on asmjs-ubench-primes Final results on Mac 64-bit (http://arewefastyet.com/#machine=29&view=breakdown&suite=asmjs-ubench) are: 0.1% improvement on asmjs-ubench-fasta 19% improvement on asmjs-ubench-copy 4.5% improvement on asmjs-ubench-life
You need to log in
before you can comment on or make changes to this bug.
Description
•