Closed Bug 1339089 Opened 3 years ago Closed 3 years ago

Inline floor/ceil/trunc/nearest for floating-point values when we have adequate SSE support

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox54 --- wontfix
firefox55 --- fixed

People

(Reporter: bbouvier, Assigned: bbouvier)

References

Details

Attachments

(3 files, 2 obsolete files)

Marking as blocking wasm, but this could be generalized to Ion as well.
Here is a profile of the effect of floor() calls in Ogg Vorbis decoding (highlighted in yellow), which totals at about 9.6% of the total execution time in decoding.

(the items in red are time spent in float-to-int conversion, a separate asm.js to wasm regression item)
Priority: -- → P1
Attached patch 1.wasm-ion.patch (obsolete) — Splinter Review
Assignee: nobody → bbouvier
Status: NEW → ASSIGNED
Attachment #8842090 - Flags: review?(sunfish)
Attached patch 2.wasm-baseline.patch (obsolete) — Splinter Review
You might say I have broken one abstraction or two, here, but the compiler didn't prevent me from doing so, so I'd say I broke no abstractions :-}
Attachment #8842091 - Flags: review?(lhansen)
Comment on attachment 8842090 [details] [diff] [review]
1.wasm-ion.patch

Review of attachment 8842090 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +12280,5 @@
>      ALLOW_CLONE(MCeil)
>  };
>  
> +// Inlined version of Math.round(). Rounds the floating-point input to the
> +// nearest integer, ties towards zero.

The JS Math.round function rounds ties *up* (which is special).

@@ +12335,5 @@
> +{
> +    explicit MNearbyInt(MDefinition* num, MIRType resultType)
> +      : MUnaryInstruction(num)
> +    {
> +        MOZ_ASSERT(IsCompilingWasm(), "ion wants a type policy");

Would MOZ_ASSERT(IsFloatingPointType(resultType)) also make sense here?

::: js/src/jit/shared/LOpcodes-shared.h
@@ +367,5 @@
>      _(ToIdV)                        \
>      _(Floor)                        \
>      _(FloorF)                       \
> +    _(FloorToDouble)                \
> +    _(FloorToFloat32)               \

The names FloorToDouble and FloorToFloat32 give the impression that they might be converting from some other input type. As a minor bikeshed, I'd suggest renaming the existing Floor/etc. to FloorToInt32/etc., and then have Floor/FloorF be the ones that just round and return the same type as their input (and similar for the others below).

::: js/src/wasm/WasmIonCompile.cpp
@@ +2356,5 @@
> +        break;
> +      case SymbolicAddress::TruncD:
> +      case SymbolicAddress::TruncF:
> +        if (MRound::HasAssemblySupport())
> +            def = f.unary<MRound>(input, input->type());

For JS, MRound implements Math.round semantics, which is round-to-nearest except that it rounds ties up. However, here and in the implementation below, it appears MRound with a floating-point result type is being used for round-toward-zero. It's not clear from this patch that overloading MRound for different kinds of rounding is intended.
Attachment #8842090 - Flags: review?(sunfish)
Comment on attachment 8842091 [details] [diff] [review]
2.wasm-baseline.patch

Review of attachment 8842091 [details] [diff] [review]:
-----------------------------------------------------------------

Maybe I'm feeling grumpy this morning but this can easily be split into one platform-dependent function that is the upper part of tryInlineUnaryMathBuiltin and which returns true or false and takes a rounding mode out param (suitably abstracted by a typedef), along with platform abstractions that perform the appropriate conversion based on the value of that out param, and a lower part which tests the return value of the former function and calls the latter abstractions when appropriate.  I would like to see that instead.  Indeed there's no reason why any of the abstractions have to be particularly x86-specific.  ARM64 has all sorts of float-to-int conversions with different rounding modes and I expect we want to use those eventually, even if not at the moment.

Rant:

Breaking the abstraction boundary is not inherently bad; sometimes that is the way to go.  What's bad is breaking the abstraction boundary wilfully when there are equally good ways to avoid doing so, and just because you are not prevented from doing so is not a good reason.  On that note, I'm reminded of Eben Moglen, speaking on the GPL (https://www.linux.com/news/eben-moglen-gpl-compliance-and-building-communities-what-works):

"Spontaneous compliance is the only conceivable way to run a legal system, I must tell you. The United States is a country with an extraordinary amount —- apparently —- of complaining about taxes every four years or every two. But every year, Americans pay their taxes, and they don’t do it because they see crowds of people sent to jail. They do it because spontaneous compliance is the way law really works.

The problem of legal engineering which presented itself to me in 1993 and the problem we are still talking about this afternoon is how to ensure spontaneous legal compliance, not how to figure out an adequate degree of coercion which will make an adequate degree of compliance at the other end."
Attachment #8842091 - Flags: review?(lhansen)
(In reply to Lars T Hansen [:lth] from comment #5)
> Comment on attachment 8842091 [details] [diff] [review]
> 2.wasm-baseline.patch
> 
> Review of attachment 8842091 [details] [diff] [review]:
> -----------------------------------------------------------------
> [...]
> Rant:
> [...]

I really don't feel the spontaneous compliance argument works here, since this argument can be transferred to the code style guidelines, and the wasm baseline compiler never showed "spontaneous compliance" on that matter.

I also feel the intertwining of platform-specific decision here (viz. do we support SSE4.1) makes this analysis more efficient that it would be if it were split into a decision test (is this math builtin equivalent to a rounding mode that an assembly instruction *might* support) and an platform-specific application (is the assembly instruction actually supported and if so, generates it). Indeed, we can waste work if the decision test passes but the platform-specific application doesn't. Also, the baseline compiler is a tradeoff of analysis vs compilation time, and this two-parts analysis might just makes things slower, so considering this, your pushback and the amount of boilerplate needed to implement it, I'm very tempted to not have this optimization in the baseline compiler at all.
I certainly can't force you to offer the patch if you don't want to :)
Attached patch 1.ion.patchSplinter Review
Good catch on the behavior of Math.round() which ties up. I've finally decided to go the other way and only have one MIR node (MNearbyInt) for doing all the conversions, since:

- there are precedents for having different MIR nodes with the same semantics (e.g. MFloor or MMathFunction(floor)).
- it is much cleaner from an API point of view.
- a single MIR node reduces to only two LIR nodes now, although before N MIR nodes would reduce to N*2 LIR nodes. In terms of codegen also, it makes more sense: there aren't 4 MIR nodes that reduce to the same instruction.

I've also enabled this for Ion in general and tested on several small tests, and it just worked. A try-build will confirm it's true for all the jit tests.
Attachment #8842090 - Attachment is obsolete: true
Attachment #8843345 - Flags: review?(sunfish)
Attached patch 2.baseline.patchSplinter Review
After discussing on irc, we realized we were not talking about the same boundaries; I was thinking about the macro-assemblers, which in hindsight could only be a wrong guess. This simpler patch seems to respect the comment around line 3500.
Attachment #8842091 - Attachment is obsolete: true
Attachment #8843346 - Flags: review?(lhansen)
Comment on attachment 8843345 [details] [diff] [review]
1.ion.patch

Review of attachment 8843345 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. A single MIR node class does eliminate a fair amount of redundant code :).

::: js/src/jit/IonTypes.h
@@ +878,5 @@
> +enum class RoundingMode {
> +    Down,
> +    Up,
> +    NearestTiesToEven,
> +    NearestTiesTowardsZero

This mode should be named just TowardsZero (it doesn't care about which integer is nearest).
Attachment #8843345 - Flags: review?(sunfish) → review+
Comment on attachment 8843346 [details] [diff] [review]
2.baseline.patch

Review of attachment 8843346 [details] [diff] [review]:
-----------------------------------------------------------------

Sweet.
Attachment #8843346 - Flags: review?(lhansen) → review+
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7183f7b691fb
Inline floor/ceil/trunc/nearest in Ion when we have sse4; r=sunfish
https://hg.mozilla.org/integration/mozilla-inbound/rev/9e31384f1ee1
Inline wasm::floor/trunc/nearest/round in the baseline compiler too; r=lth
https://hg.mozilla.org/mozilla-central/rev/7183f7b691fb
https://hg.mozilla.org/mozilla-central/rev/9e31384f1ee1
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Depends on: 1345427
thanks for landing this, talos is showing performance improvements:
== Change summary for alert #5325 (as of March 07 2017 13:55 UTC) ==

Improvements:

  3%  sessionrestore_no_auto_restore linux64 pgo e10s     671.62 -> 652.58

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=5325
You need to log in before you can comment on or make changes to this bug.