Closed Bug 1416289 Opened 2 years ago Closed Last year

Don't use the MathCache for Math.sign and Math.trunc

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox62 --- fixed

People

(Reporter: anba, Assigned: anba)

References

Details

Attachments

(3 files, 3 obsolete files)

No longer use the Math cache for Math.sign() and Math.trunc(), instead directly inline the math_function<...> code in js::math_{trunc, sign}(...).

Math functions which don't use the cache mustn't set MMathFunction::cache_ [1], assert this in the MMathFunction constructor and update IonBuilder::inlineMathFunction() accordingly.

And I didn't rename math_{trunc,sign}_uncached to math_{trunc,sign}_impl (normally functions which don't use the cache have the _impl suffix) to avoid complicating this macro in WasmBuiltins.cpp [2].

[1] https://searchfox.org/mozilla-central/rev/30ead7d1ae5bf95b8bc0fd67b950cd46ca05e32c/js/src/jit/CodeGenerator.cpp#7004-7008
[2] https://searchfox.org/mozilla-central/rev/30ead7d1ae5bf95b8bc0fd67b950cd46ca05e32c/js/src/wasm/WasmBuiltins.cpp#665
Attachment #8927431 - Flags: review?(jdemooij)
Part 1 gives the following improvements to current tip.

// 290ms -> 220ms
function trunc1() {
    var q = 0;
    var t = dateNow();
    for (var i = 0; i < 10000000; ++i) {
        // Constant value to always hit the math cache.
        q += Math.trunc(1.23456);
    }
    return [dateNow() - t, q];
}

// 245ms -> 140ms
function sign1() {
    var q = 0;
    var t = dateNow();
    for (var i = 0; i < 10000000; ++i) {
        // Constant value to always hit the math cache.
        q += Math.sign(1.23456);
    }
    return [dateNow() - t, q];
}



Part 2 gives the following improvements compared to current tip.

// 310ms -> 100ms
function trunc2() {
    // Constant value to always hit the math cache.
    // Not passed as a constant to avoid optimizing away
    // the call in the patched version.
    var xs = Array(8).fill(1.23456);

    var q = 0;
    var t = dateNow();
    for (var i = 0; i < 10000000; ++i) {
        q += Math.trunc(xs[i & 7]);
    }
    return [dateNow() - t, q];
}

// 300ms -> 55ms
function sign2() {
    // Constant value to always hit the math cache.
    // Not passed as a constant to avoid optimizing away
    // the call in the patched version.
    var xs = Array(8).fill(1.23456);

    var q = 0;
    var t = dateNow();
    for (var i = 0; i < 10000000; ++i) {
        q += Math.sign(xs[i & 7]);
    }
    return [dateNow() - t, q];
}
Self-hosting Math.sign (bug 1021888) and Math.trunc could lead to additional speed-ups. For example with these implementations
---
function MathSign(x) {
    x = +x; // Inlined: ToNumber
    if (x > 0)
        return 1;
    if (x < 0)
        return -1;
    return x;
}

function MathTrunc(x) {
    x = +x; // Inlined: ToNumber
    if (x < 0) {
        // Equivalent to Math.ceil, but avoids repeated bailouts for certain
        // inputs like |-2147483648.1|.
        return -std_Math_floor(-x);
    }
    return std_Math_floor(x);
}
---

The Math.trunc() µ-benchmark from comment 3 improves to 25ms and the Math.sign() µ-benchmark improves to 20ms.
Priority: -- → P3
Comment on attachment 8927431 [details] [diff] [review]
bug1416289-part1-dont-cache-sign-and-trunc.patch

Review of attachment 8927431 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense.

::: js/src/jit/MIR.cpp
@@ +1912,5 @@
> +      case Floor: return false;
> +      case Ceil:  return false;
> +      case Round: return false;
> +      default:
> +        MOZ_CRASH("Unknown math function");

Nit: if you remove the 'default:' and add the MOZ_CRASH after this switch statement, the compiler will warn/error when we add a new enum value without updating this code.
Attachment #8927431 - Flags: review?(jdemooij) → review+
Comment on attachment 8927433 [details] [diff] [review]
bug1416289-part2-double-return-values.patch

Review of attachment 8927433 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to André Bargull [:anba] from comment #2)
> Always return double values from MMathFunction specialized functions to
> allow inlining them in MCallOptimize [1]. Also applies to Math.random [2]
> and Math.hypot [3].

This makes sense for Math.random because it ~always returns a double, but I think for sign() and trunc() it would be nice to specialize for int32 return types in Ion because these cases are so common. This patch is a performance improvement so maybe it's fine but the problem with returning doubles is that it can affect a lot of other expressions. I'll think about it more.
(Note: The three new patches completely replace the original two patches.)


js/src/jsmath.{h,cpp}
- |math_trunc_uncached| wasn't renamed to |math_trunc_impl| to keep supporting this code in WasmBuiltins.cpp: https://searchfox.org/mozilla-central/rev/b28b94dc81d60c6d9164315adbd4a5073526d372/js/src/wasm/WasmBuiltins.cpp#764-795

js/src/jit/{Lowering,MCallOptimize,MIR,Recover,LIR-shared}.{h,cpp}:
- I've basically only copied the existing approach for the inlining code of Math.{floor,ceil,round}().
- This includes the code for recovering and float32 specializations.

js/src/jit/{arm,arm64,mips-shared,x86-shared} files:
- The assembler code in CodeGenerator::visitTrunc(F) for x86 resp. in MacroAssemblerARMCompat::trunc(f) for ARM was copied from the existing code for ceil/floor (including comments and formatting).
- While I could have done the same for MIPS, I considered it too risky to check in MIPS code which cannot be tested by me, therefore the MIPS code currently simply crashes for the two new methods.

js/src/jit-test/tests/ion/dce-with-rinstructions.js
- Fixed what looks like a copy-paste error in rceil_double().
- Added rtrunc_number and rtrunc_double based on the existing tests for Math.round in that file.

js/src/jit-test/tests/ion/math{Floor,Round}.js
- Replaced try-catch-statement with a with-statement to actually make sure the function is not run in Ion.

js/src/jit-test/tests/ion/mathTrunc.js
- Copied and adapted from the existing math{Floor,Round}.js files in the same directory.

js/src/jit-test/tests/ion/testFloat32.js
- Added new test for Math.trunc().
- Also added missing test Math.round().
- And added negative tests for Math.log() and Math.exp().
Attachment #8927431 - Attachment is obsolete: true
Attachment #8927433 - Attachment is obsolete: true
Attachment #8927433 - Flags: review?(jdemooij)
Attachment #8973679 - Flags: review?(jdemooij)
Open question:
I've first tried to have only two lowering classes (LSignI and LSignD) and then use MToNumberInt32 for the case when the input is MIRType::Double and the expected output is MIRType::Int32 (in MCallOptimize). Unfortunately this breaks the recover tests for Math.sign(), because MToNumberInt32 itself is not recoverable. As a workaround I've simply added another lowering class (LSignDI) to handle the Double->Int32 case. Does that make sense or is there another, more preferable approach I should have taken?


CodeGenerator::visitSignI():

The assembler code was adapted from the Clang assembly for this C function (per godbolt.org for Clang6 with -O3).

int sign(int num) {
    if (num > 0)
        return 1;
    if (num < 0)
        return -1;
    return 0;
}

sign(int): # @sign(int)
  mov ecx, edi
  sar ecx, 31
  test edi, edi
  mov eax, 1
  cmovle eax, ecx
  ret

Except that I didn't know how to encode the conditional move with the existing MacroAssembler methods, so instead I've simply used a normal branch.


CodeGenerator::visitSignD():
- I don't know if there's much of a difference which comparisons are executed in which exact order.


CodeGenerator::visitSignDI():
- Copied (again) the code to detect negative zero. Maybe one day branchNegativeZero will be available for all archs. :-)
- The generated assembler code is probably still not really efficient, but at least it's better than what we had before...
Attachment #8973680 - Flags: review?(jdemooij)
js/src/jsmath.h:
- Removed duplicate declaration of math_imul().
- Moved math_sqrt_handle() next to math_sqrt() to match how other nearby code is structured.

js/src/jsmath.cpp:
- Removed unnecessary includes.
- Moved the |math_function| template function to the top, so it can be used by more functions.
- Also changed |math_function| to use |MutableHandleValue::setDouble()| instead of |MutableHandleValue::setNumber()| to ensure the functions using |math_function| can be inlined by MMathFunction.
- That means the following functions will now always return a DoubleValue:
  - Math.cbrt
  - Math.exp, Math.expm1
  - Math.log, Math.log10, Math.log2, Math.log1p
  - Math.cosh, Math.sinh, Math.tahn
  - Math.acosh, Math.asinh, Math.atanh
- Also changed Math.random() and Math.hypot() to always return DoubleValue for inlining purposes.
- Do you think it's okay if those remaining functions always return DoubleValues?
Attachment #8973682 - Flags: review?(jdemooij)
Ah I wanted to get back to this bug this week :) I'll review today or tomorrow.
Comment on attachment 8973679 [details] [diff] [review]
bug1416289-part1-inline-trunc.patch

Review of attachment 8973679 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!
Attachment #8973679 - Flags: review?(jdemooij) → review+
Comment on attachment 8973680 [details] [diff] [review]
bug1416289-part2-inline-sign.patch

Review of attachment 8973680 [details] [diff] [review]:
-----------------------------------------------------------------

We could make MToNumberInt32(double) recoverable maybe, but I actually like having the SignD and SignDI versions :)

::: js/src/jit/CodeGenerator.cpp
@@ +7444,5 @@
> +
> +    Label done;
> +    masm.move32(input, output);
> +    masm.rshift32Arithmetic(Imm32(31), output);
> +    masm.branch32(Assembler::LessThanOrEqual, input, Imm32(0), &done);

The cmov version requires an extra temp register, but then you could use cmp32move32 like this:

masm.rshift32Arithmetic(Imm32(31), output);
masm.move32(Imm32(1), temp);
masm.cmp32move32(Assembler::GreaterThan, output, Imm32(0), temp, output);

What you have is also fine though.

::: js/src/jit/MCallOptimize.cpp
@@ +1528,5 @@
> +
> +    if (returnType != MIRType::Int32 && returnType != MIRType::Double)
> +        return InliningStatus_NotInlined;
> +
> +    if (!IsFloatingPointType(argType) && !(argType == MIRType::Int32 && returnType == argType))

Maybe returnType == MIRType::Int32 instead of returnType == argType here?

::: js/src/jit/MIR.cpp
@@ +1898,5 @@
> +    if (!input->isConstant() || !input->toConstant()->isTypeRepresentableAsDouble())
> +        return this;
> +
> +    double in = input->toConstant()->numberToDouble();
> +    double out = js::math_sign_uncached(in);

This will use AutoUnsafeCallWithABI off-thread. Probably not the end of the world, but it would be nice to avoid it. Can we add a sign function we can call directly here? Maybe in jslibmath.h?
Attachment #8973680 - Flags: review?(jdemooij) → review+
Comment on attachment 8973682 [details] [diff] [review]
bug1416289-part3-double-return.patch

Review of attachment 8973682 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for these patches! Please double check at least Kraken doesn't regress a lot (or check AWFY/Talos after landing), it has some very hot Math calls (but these return doubles IIRC).
Attachment #8973682 - Flags: review?(jdemooij) → review+
Thanks for the reviews! 


(In reply to Jan de Mooij [:jandem] from comment #12)
> ::: js/src/jit/CodeGenerator.cpp
> What you have is also fine though.

Okay, then I'll stick with the current version for now. :-)


> ::: js/src/jit/MIR.cpp
> > +    double out = js::math_sign_uncached(in);
> 
> This will use AutoUnsafeCallWithABI off-thread. Probably not the end of the
> world, but it would be nice to avoid it. Can we add a sign function we can
> call directly here? Maybe in jslibmath.h?

Hmm, is it okay to file a follow-up bug for this one, because it actually affects all Math functions: https://searchfox.org/mozilla-central/rev/f30847c12e5fb13791401ed4330ced3e1b9c8d81/js/src/jit/MIR.cpp#1945-2020


(In reply to Jan de Mooij [:jandem] from comment #13)
> Thanks for these patches! Please double check at least Kraken doesn't
> regress a lot (or check AWFY/Talos after landing), it has some very hot Math
> calls (but these return doubles IIRC).

I didn't spot any major performance differences when running Kraken locally (the first two entries for each subtest are without part 3, the last two entries with part 3). But I guess we'll see if Talos decides differently here. ;-)


[#0] ai-astar                   Cycles:10  Average:152.10  Median:153.00  stddev:2.81 (1.8%)  stddev-sans-first:2.60
[#0] ai-astar                   Cycles:10  Average:149.00  Median:148.00  stddev:3.06 (2.1%)  stddev-sans-first:1.92
[#0] ai-astar                   Cycles:10  Average:152.00  Median:152.00  stddev:4.67 (3.1%)  stddev-sans-first:4.94
[#0] ai-astar                   Cycles:10  Average:150.90  Median:151.00  stddev:3.38 (2.2%)  stddev-sans-first:3.50

[#1] audio-beat-detection       Cycles:10  Average:260.70  Median:256.00  stddev:40.95 (16.0%)  stddev-sans-first:42.19
[#1] audio-beat-detection       Cycles:10  Average:274.30  Median:278.00  stddev:27.94 (10.1%)  stddev-sans-first:25.98
[#1] audio-beat-detection       Cycles:10  Average:255.10  Median:244.50  stddev:33.87 (13.9%)  stddev-sans-first:35.14
[#1] audio-beat-detection       Cycles:10  Average:266.10  Median:279.00  stddev:28.90 (10.4%)  stddev-sans-first:28.38

[#2] audio-dft                  Cycles:10  Average:265.50  Median:271.00  stddev:8.77 (3.2%)   stddev-sans-first:9.15
[#2] audio-dft                  Cycles:10  Average:272.30  Median:270.50  stddev:10.46 (3.9%)  stddev-sans-first:11.08
[#2] audio-dft                  Cycles:10  Average:274.70  Median:275.00  stddev:15.92 (5.8%)  stddev-sans-first:15.84
[#2] audio-dft                  Cycles:10  Average:268.30  Median:265.50  stddev:12.96 (4.9%)  stddev-sans-first:13.61

[#3] audio-fft                  Cycles:10  Average:187.20  Median:177.50  stddev:32.60 (18.4%)  stddev-sans-first:34.33
[#3] audio-fft                  Cycles:10  Average:189.40  Median:182.00  stddev:30.51 (16.8%)  stddev-sans-first:31.71
[#3] audio-fft                  Cycles:10  Average:178.60  Median:176.50  stddev:16.81 (9.5%)  stddev-sans-first:17.75
[#3] audio-fft                  Cycles:10  Average:180.60  Median:172.00  stddev:21.73 (12.6%)  stddev-sans-first:22.64

[#4] audio-oscillator           Cycles:10  Average:135.10  Median:130.50  stddev:20.04 (15.4%)  stddev-sans-first:21.14
[#4] audio-oscillator           Cycles:10  Average:133.20  Median:129.00  stddev:15.41 (11.9%)  stddev-sans-first:16.09
[#4] audio-oscillator           Cycles:10  Average:132.80  Median:126.00  stddev:18.87 (15.0%)  stddev-sans-first:17.87
[#4] audio-oscillator           Cycles:10  Average:129.50  Median:125.50  stddev:13.95 (11.1%)  stddev-sans-first:14.66

[#5] imaging-gaussian-blur      Cycles:10  Average:164.50  Median:166.50  stddev:11.48 (6.9%)  stddev-sans-first:11.53
[#5] imaging-gaussian-blur      Cycles:10  Average:164.80  Median:165.00  stddev:9.05 (5.5%)   stddev-sans-first:9.40
[#5] imaging-gaussian-blur      Cycles:10  Average:174.20  Median:169.50  stddev:22.99 (13.6%)  stddev-sans-first:8.00
[#5] imaging-gaussian-blur      Cycles:10  Average:165.70  Median:164.50  stddev:9.23 (5.6%)  stddev-sans-first:9.69

[#6] imaging-darkroom           Cycles:10  Average:190.00  Median:199.00  stddev:15.84 (8.0%)  stddev-sans-first:16.70
[#6] imaging-darkroom           Cycles:10  Average:199.70  Median:200.50  stddev:15.74 (7.9%)  stddev-sans-first:15.55
[#6] imaging-darkroom           Cycles:10  Average:206.40  Median:207.00  stddev:16.40 (7.9%)  stddev-sans-first:16.96
[#6] imaging-darkroom           Cycles:10  Average:199.20  Median:202.00  stddev:6.86 (3.4%)  stddev-sans-first:7.26

[#7] imaging-desaturate         Cycles:10  Average:154.00  Median:154.50  stddev:13.04 (8.4%)  stddev-sans-first:12.95
[#7] imaging-desaturate         Cycles:10  Average:146.70  Median:150.00  stddev:7.24 (4.8%)   stddev-sans-first:7.68
[#7] imaging-desaturate         Cycles:10  Average:158.80  Median:164.00  stddev:20.62 (12.6%)  stddev-sans-first:21.63
[#7] imaging-desaturate         Cycles:10  Average:151.80  Median:154.00  stddev:17.54 (11.4%)  stddev-sans-first:17.98

[#8] json-parse-financial       Cycles:10  Average:64.70   Median:65.00   stddev:1.77 (2.7%)  stddev-sans-first:1.87
[#8] json-parse-financial       Cycles:10  Average:64.90   Median:66.00   stddev:2.64 (4.0%)  stddev-sans-first:2.77
[#8] json-parse-financial       Cycles:10  Average:65.20   Median:65.00   stddev:4.18 (6.4%)  stddev-sans-first:4.39
[#8] json-parse-financial       Cycles:10  Average:65.10   Median:66.50   stddev:3.87 (5.8%)  stddev-sans-first:3.84

[#9] json-stringify-tinderbox   Cycles:10  Average:47.90   Median:46.00   stddev:6.24 (13.6%)   stddev-sans-first:6.53
[#9] json-stringify-tinderbox   Cycles:10  Average:51.60   Median:45.50   stddev:18.45 (40.6%)  stddev-sans-first:19.42
[#9] json-stringify-tinderbox   Cycles:10  Average:45.90   Median:47.00   stddev:1.73 (3.7%)  stddev-sans-first:1.66
[#9] json-stringify-tinderbox   Cycles:10  Average:45.90   Median:46.00   stddev:1.97 (4.3%)  stddev-sans-first:1.96

[#10] stanford-crypto-aes       Cycles:10  Average:118.10  Median:116.50  stddev:7.52 (6.5%)   stddev-sans-first:7.94
[#10] stanford-crypto-aes       Cycles:10  Average:120.10  Median:122.00  stddev:10.54 (8.6%)  stddev-sans-first:10.22
[#10] stanford-crypto-aes       Cycles:10  Average:120.50  Median:119.00  stddev:12.34 (10.4%)  stddev-sans-first:13.08
[#10] stanford-crypto-aes       Cycles:10  Average:118.80  Median:118.50  stddev:5.83 (4.9%)  stddev-sans-first:5.92

[#11] stanford-crypto-ccm       Cycles:10  Average:164.50  Median:166.00  stddev:5.89 (3.5%)    stddev-sans-first:6.22
[#11] stanford-crypto-ccm       Cycles:10  Average:176.20  Median:171.50  stddev:23.59 (13.8%)  stddev-sans-first:24.99
[#11] stanford-crypto-ccm       Cycles:10  Average:167.50  Median:169.00  stddev:5.15 (3.0%)  stddev-sans-first:5.43
[#11] stanford-crypto-ccm       Cycles:10  Average:176.60  Median:172.00  stddev:24.25 (14.1%)  stddev-sans-first:25.52

[#12] stanford-crypto-pbkdf2    Cycles:10  Average:233.00  Median:234.00  stddev:10.41 (4.5%)  stddev-sans-first:9.93
[#12] stanford-crypto-pbkdf2    Cycles:10  Average:235.30  Median:234.50  stddev:13.47 (5.7%)  stddev-sans-first:13.20
[#12] stanford-crypto-pbkdf2    Cycles:10  Average:230.00  Median:230.50  stddev:7.92 (3.4%)  stddev-sans-first:8.26
[#12] stanford-crypto-pbkdf2    Cycles:10  Average:239.30  Median:239.00  stddev:10.52 (4.4%)  stddev-sans-first:10.42

[#13] stanford-crypto-sha256-iterative  Cycles:10  Average:109.10  Median:92.00  stddev:32.15 (34.9%)  stddev-sans-first:33.50
[#13] stanford-crypto-sha256-iterative  Cycles:10  Average:95.90   Median:96.00  stddev:5.53 (5.8%)  stddev-sans-first:5.86
[#13] stanford-crypto-sha256-iterative  Cycles:10  Average:104.00  Median:96.50  stddev:12.23 (12.7%)  stddev-sans-first:12.97
[#13] stanford-crypto-sha256-iterative  Cycles:10  Average:107.50  Median:90.50  stddev:44.70 (49.4%)  stddev-sans-first:47.25
(In reply to André Bargull [:anba] from comment #14)
> Hmm, is it okay to file a follow-up bug for this one, because it actually
> affects all Math functions:

Ah in that case it's probably fine as-is.
Updated part 2 per review comments, carrying r+.
Attachment #8973680 - Attachment is obsolete: true
Attachment #8974091 - Flags: review+
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/06bee42b4d49
Part 1: Add Ion-inline support for Math.trunc. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/d37d926c33fa
Part 2: Add Ion-inline support for Math.sign. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/93f3dc3cecc3
Part 3: Always use double values for Math functions which are inlined through MMathFunction. r=jandem
Keywords: checkin-needed
You need to log in before you can comment on or make changes to this bug.