Closed Bug 1710024 Opened 3 years ago Closed 3 years ago

ARM64: Avoid defineReuseInput

Categories

(Core :: JavaScript: WebAssembly, enhancement, P3)

ARM64
All
enhancement

Tracking

()

RESOLVED FIXED
91 Branch
Tracking Status
firefox91 --- fixed

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(10 files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

Not too many operations on ARM64 still use defineReuseInput; the common lowering code is generally good about deferring to arm64 logic for this part. But some cases linger. Consider:

  (func $f1 (param $p i32) (param $q i32)
    (i32.store (local.get $p) (i32.mul (local.get $q) (i32.const -1)))
    (i32.store offset=4 (local.get $p) (local.get $q))))

The multiply by -1 is just a negate and the lowering optimizes this into an LNegI, but the LNegI is set up with defineReuseInput so the resulting code is:

0xe40902c1034  2a0103e2  mov     w2, w1
0xe40902c1038  4b0203e2  neg     w2, w2
0xe40902c103c  b8206aa2  str     w2, [x21, x0]
0xe40902c1040  91001010  add     x16, x0, #0x4 (4)
0xe40902c1044  b8306aa1  str     w1, [x21, x16]

where the initial mov is unnecessary because we could neg w2, w1 on this platform.

Edit: the following list is believed to be complete and will be maintained. See also bug 1712692 which is closely related and fixes some of these.

The operations that are relevant for wasm that use defineReuseInput in common lowering code appear to be:

  • MinMax for i32 (asm.js only) and f32/f64 (will "fix" these with a comment because SNaN handling makes the FP variants expensive and asm.js is dead; there's nothing to be gained from avoiding reuse)
  • Abs for i32 (asm.js only) and f32/f64
  • Neg (that is, Mul by -1)
  • WasmExtendU32Index, requires comment only
  • WasmWrapU32Index, requires comment only
  • Select has multiple cases:
    • The generic case pins output == trueReg but this is not necessary or desirable for any type
    • The optimized i32 compare-and-select case needs further study
    • There is no i64 compare-and-select case (moved to bug 1716580, so that x64 can be handled too)
    • The i64 case pins output == trueReg, but this is not necessary, we can use three different regs. Lowering.cpp has an ifdef arm64 here which is clearly not what we want either.

The operations that use defineReuseInput in platform code are:

  • copySignDouble
  • copySignFloat32
  • lowerForALUInt64 (add, subtract, and bit operations)
  • lowerForMulInt64
  • lowerForShiftInt64

(Folded into comment 0)

See Also: → 1694093

(Folded into comment 0)

Assignee: nobody → lhansen
Blocks: 1710403
Status: NEW → ASSIGNED

Reusing the input overconstrains the register allocator on this platform,
it is sufficient with 2x useAtStart and normal define, the regalloc will
still reuse the input if that is to our advantage.

Keywords: leave-open
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ef1ff05e33a6
ARM64: Do not use defineReuseInput for mul64. r=jseward

This moves the documentation for the whitebox codegen test harness out of
the SIMD subdirectory into the main wasm test directory, and updates it
to match current reality. The documentation in the harness code is also
updated.

This adds some initial arm64 support for codegen testing, which we will use
to validate some optimizations.

Only x86 requires the input register to be reused; other platforms are better
off if that constraint is absent. Delegate the decision to platform code.

Added test cases for x64 and arm64 that the n*-1 => neg(n)
transformation works as expected for i32 on these platforms, as that
is how integer negation needs to be expressed in wasm.

Drive-by-added a test case for n*-1 turning into a negate on x64 when
n is an i64, this transformation is only implemented on x64 so far.

Depends on D115694

See Also: → 1712692
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c3d0066ec4ea
Add arm64 support to whitebox codegen test harness. r=jseward
https://hg.mozilla.org/integration/autoland/rev/ea37a81034b1
Avoid defineReuseInput for LNegI except on x86. r=jseward

Forgot to implement a crashing stub for the 'none' backend.

Flags: needinfo?(lhansen)
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/48b1f36fbd9c
Add arm64 support to whitebox codegen test harness. r=jseward
https://hg.mozilla.org/integration/autoland/rev/17ffb503a083
Avoid defineReuseInput for LNegI except on x86. r=jseward
Depends on: 1712886

Add, Subtract, And, Or, and Xor don't need to overconstrain the
register allocator by reusing their input register.

Depends on D115983

There's no need for shift and rotate to overconstrain the regalloc by
reusing the input register.

Depends on D115984

Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/35332ecdf3fd
Avoid input reuse for i64 add/sub/bitop on arm64. r=yury
https://hg.mozilla.org/integration/autoland/rev/178b3ed4b9a3
Avoid input reuse for i64 shift and rotate. r=yury

This factors out the lowering of wasm select (except for the folded
compare-and-select case which needs more thought) into platform code.
The benefit of doing this is partly to get rid of a platform ifdef in
platform-independent code (this was a hack when Wasm Ion on ARM64 was
being bootstrapped) and partly to loosen up the register constraints
on ARM64, which are overly pessimistic and result in a lot of extra
moves.

This also improves the generated ARM64 code for FP select, using FCSEL
for float/double operations. There does not seem to be a select
instruction for vector data, but it's likely that vector data would be
selected lane-wise anyway, so here I tie the output register to the
true register and conditionally move the false register into output
using a jump.

Depends on D116088

Avoid tying the ABS input to output on arm/arm64/mips, generating
marginally better code for floating point ABS on these platforms. On
x86 there is no floating point ABS instruction and we must instead AND
with a mask, this requires input == output still.

Even though this is not very important, the patch also abstracts the
integer ABS codegen to avoid branchy code on arm and arm64 where
integer abs is easy to implement without branches. (It would be
possible to avoid the branch on x86 too, using more convoluted bit
manipulation code and extra registers.) Integer ABS is not part of
the wasm instruction set so this change will show up only in asm.js
and JS code. I have visually inspected that the correct code is
generated, and we have functionality test cases, but there's no asm.js
disassembly support in the shell so we can't have whitebox codegen
tests for this yet.

Drive-by fix: the implementation of ma_abs on arm contained an
unnecessary move, which I have removed.

Depends on D116149

Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d3486f284fe2
Clean up lowering of wasm 'select'.  r=yury

Looks like I failed to account for a path when changing the logic in CodeGenerator.cpp.

Flags: needinfo?(lhansen)

Lars: does this need to be "leave-open" still? in another bug you referenced it was fixed in part by these changes.

Flags: needinfo?(lhansen)

(In reply to Daniel Veditz [:dveditz] from comment #27)

Lars: does this need to be "leave-open" still? in another bug you referenced it was fixed in part by these changes.

Yeah, this bug collects a number of code generation improvements and there are more coming; only one of the patches here was pertinent to the bug you're referencing. So it should stay open, unless that causes a hardship. LMK.

Flags: needinfo?(lhansen)
Blocks: 1716580

On ARM64 we want different register allocation, so break that case
apart from the others and specialize it.

This patch keeps limiting the compare-and-select code to the case
where the comparison has type i32 and the value operands are also of
type i32, as that fits the code generation pattern on the other
platforms. But ARM64 is now set up for handling other comparisons and
other value types.

This is a straightforward generalization of the prior
compare-and-select patch on ARM64 only, for Float32 and Double both as
compare operands and as value operands.

(Int64 is a little harder due to the LIR structure of 64-bit data, and
supporting int64 is a separate bug: 1716580.)

Depends on D117915

Add a couple of comments about input reuse on arm64 so that we're
not confused in the future as to why this is OK.

Depends on D117916

See Also: → 1716710
Keywords: leave-open
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ec5325d76057
Avoid defineReuseInput for compare-and-select on ARM64.  r=yury
https://hg.mozilla.org/integration/autoland/rev/d545aa5a387d
Generate good arm64 code for compare-and-select on fp types. r=yury
https://hg.mozilla.org/integration/autoland/rev/b626c01bd196
Add comments about input reuse, r=yury
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: