ARM64: Avoid defineReuseInput
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox91 | --- | fixed |
People
(Reporter: lth, Assigned: lth)
References
(Blocks 1 open bug)
Details
Attachments
(10 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Not too many operations on ARM64 still use defineReuseInput; the common lowering code is generally good about deferring to arm64 logic for this part. But some cases linger. Consider:
(func $f1 (param $p i32) (param $q i32)
(i32.store (local.get $p) (i32.mul (local.get $q) (i32.const -1)))
(i32.store offset=4 (local.get $p) (local.get $q))))
The multiply by -1 is just a negate and the lowering optimizes this into an LNegI, but the LNegI is set up with defineReuseInput so the resulting code is:
0xe40902c1034 2a0103e2 mov w2, w1
0xe40902c1038 4b0203e2 neg w2, w2
0xe40902c103c b8206aa2 str w2, [x21, x0]
0xe40902c1040 91001010 add x16, x0, #0x4 (4)
0xe40902c1044 b8306aa1 str w1, [x21, x16]
where the initial mov is unnecessary because we could neg w2, w1
on this platform.
Edit: the following list is believed to be complete and will be maintained. See also bug 1712692 which is closely related and fixes some of these.
The operations that are relevant for wasm that use defineReuseInput in common lowering code appear to be:
MinMax for i32 (asm.js only) and f32/f64(will "fix" these with a comment because SNaN handling makes the FP variants expensive and asm.js is dead; there's nothing to be gained from avoiding reuse)Abs for i32 (asm.js only) and f32/f64Neg (that is, Mul by -1)WasmExtendU32Index, requires comment onlyWasmWrapU32Index, requires comment onlySelect has multiple cases:The generic case pins output == trueReg but this is not necessary or desirable for any typeThe optimized i32 compare-and-select case needs further studyThere is no i64 compare-and-select case(moved to bug 1716580, so that x64 can be handled too)The i64 case pins output == trueReg, but this is not necessary, we can use three different regs. Lowering.cpp has an ifdef arm64 here which is clearly not what we want either.
The operations that use defineReuseInput in platform code are:
copySignDoublecopySignFloat32lowerForALUInt64 (add, subtract, and bit operations)lowerForMulInt64lowerForShiftInt64
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 3•3 years ago
|
||
Reusing the input overconstrains the register allocator on this platform,
it is sufficient with 2x useAtStart and normal define, the regalloc will
still reuse the input if that is to our advantage.
Assignee | ||
Updated•3 years ago
|
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ef1ff05e33a6 ARM64: Do not use defineReuseInput for mul64. r=jseward
Assignee | ||
Comment 5•3 years ago
|
||
This moves the documentation for the whitebox codegen test harness out of
the SIMD subdirectory into the main wasm test directory, and updates it
to match current reality. The documentation in the harness code is also
updated.
This adds some initial arm64 support for codegen testing, which we will use
to validate some optimizations.
Assignee | ||
Comment 6•3 years ago
|
||
Only x86 requires the input register to be reused; other platforms are better
off if that constraint is absent. Delegate the decision to platform code.
Added test cases for x64 and arm64 that the n*-1 => neg(n)
transformation works as expected for i32 on these platforms, as that
is how integer negation needs to be expressed in wasm.
Drive-by-added a test case for n*-1 turning into a negate on x64 when
n is an i64, this transformation is only implemented on x64 so far.
Depends on D115694
Comment 7•3 years ago
|
||
bugherder |
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c3d0066ec4ea Add arm64 support to whitebox codegen test harness. r=jseward https://hg.mozilla.org/integration/autoland/rev/ea37a81034b1 Avoid defineReuseInput for LNegI except on x86. r=jseward
Comment 9•3 years ago
|
||
Backed out for causing SM bustages.
Backout link: https://hg.mozilla.org/integration/autoland/rev/299b3a1c144d17f8cd3f2bf27b801eca577df28a
Failure log: https://treeherder.mozilla.org/logviewer?job_id=340741199&repo=autoland&lineNumber=891
Assignee | ||
Comment 10•3 years ago
|
||
Forgot to implement a crashing stub for the 'none' backend.
Comment 11•3 years ago
|
||
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/48b1f36fbd9c Add arm64 support to whitebox codegen test harness. r=jseward https://hg.mozilla.org/integration/autoland/rev/17ffb503a083 Avoid defineReuseInput for LNegI except on x86. r=jseward
Comment 12•3 years ago
|
||
bugherder |
Assignee | ||
Comment 13•3 years ago
|
||
Add, Subtract, And, Or, and Xor don't need to overconstrain the
register allocator by reusing their input register.
Depends on D115983
Assignee | ||
Comment 14•3 years ago
|
||
There's no need for shift and rotate to overconstrain the regalloc by
reusing the input register.
Depends on D115984
Comment 16•3 years ago
|
||
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/35332ecdf3fd Avoid input reuse for i64 add/sub/bitop on arm64. r=yury https://hg.mozilla.org/integration/autoland/rev/178b3ed4b9a3 Avoid input reuse for i64 shift and rotate. r=yury
Comment 17•3 years ago
|
||
bugherder |
Assignee | ||
Comment 18•3 years ago
|
||
This factors out the lowering of wasm select (except for the folded
compare-and-select case which needs more thought) into platform code.
The benefit of doing this is partly to get rid of a platform ifdef in
platform-independent code (this was a hack when Wasm Ion on ARM64 was
being bootstrapped) and partly to loosen up the register constraints
on ARM64, which are overly pessimistic and result in a lot of extra
moves.
This also improves the generated ARM64 code for FP select, using FCSEL
for float/double operations. There does not seem to be a select
instruction for vector data, but it's likely that vector data would be
selected lane-wise anyway, so here I tie the output register to the
true register and conditionally move the false register into output
using a jump.
Depends on D116088
Assignee | ||
Comment 19•3 years ago
|
||
Avoid tying the ABS input to output on arm/arm64/mips, generating
marginally better code for floating point ABS on these platforms. On
x86 there is no floating point ABS instruction and we must instead AND
with a mask, this requires input == output still.
Even though this is not very important, the patch also abstracts the
integer ABS codegen to avoid branchy code on arm and arm64 where
integer abs is easy to implement without branches. (It would be
possible to avoid the branch on x86 too, using more convoluted bit
manipulation code and extra registers.) Integer ABS is not part of
the wasm instruction set so this change will show up only in asm.js
and JS code. I have visually inspected that the correct code is
generated, and we have functionality test cases, but there's no asm.js
disassembly support in the shell so we can't have whitebox codegen
tests for this yet.
Drive-by fix: the implementation of ma_abs on arm contained an
unnecessary move, which I have removed.
Depends on D116149
Comment 20•3 years ago
|
||
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d3486f284fe2 Clean up lowering of wasm 'select'. r=yury
Comment 21•3 years ago
|
||
bugherder |
Comment 22•3 years ago
|
||
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/69cf2e324189 Clean up lowering of 'abs'. r=yury
Comment 23•3 years ago
|
||
Backed out for failures on /bug1078871.js
-
backout: https://hg.mozilla.org/integration/autoland/rev/366da4f957828b0d4e2d64cc365428c8c9bf4b76
-
failure log:
- TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/basic/bug1078871.js | /builds/worker/checkouts/gecko/js/src/jit-test/tests/basic/bug1078871.js:28:17 TypeError: can't access property "length", s is undefined (code 3, args "--ion-eager --ion-offthread-compile=off --more-compartments") [0.1 s]
- TEST-UNEXPECTED-FAIL | tests/jit-test/jit-test/tests/basic/bug1078871.js | /data/local/tmp/test_root/tests/basic/bug1078871.js:35:37 InternalError: allocation size overflow (code 3, args "--ion-eager --ion-offthread-compile=off --more-compartments") [3.6 s]
Assignee | ||
Comment 24•3 years ago
|
||
Looks like I failed to account for a path when changing the logic in CodeGenerator.cpp.
Comment 25•3 years ago
|
||
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/21f81e31ec79 Clean up lowering of 'abs'. r=yury
Comment 26•3 years ago
|
||
bugherder |
Comment 27•3 years ago
|
||
Lars: does this need to be "leave-open" still? in another bug you referenced it was fixed in part by these changes.
Assignee | ||
Comment 28•3 years ago
|
||
(In reply to Daniel Veditz [:dveditz] from comment #27)
Lars: does this need to be "leave-open" still? in another bug you referenced it was fixed in part by these changes.
Yeah, this bug collects a number of code generation improvements and there are more coming; only one of the patches here was pertinent to the bug you're referencing. So it should stay open, unless that causes a hardship. LMK.
Assignee | ||
Comment 29•3 years ago
|
||
On ARM64 we want different register allocation, so break that case
apart from the others and specialize it.
This patch keeps limiting the compare-and-select code to the case
where the comparison has type i32 and the value operands are also of
type i32, as that fits the code generation pattern on the other
platforms. But ARM64 is now set up for handling other comparisons and
other value types.
Assignee | ||
Comment 30•3 years ago
|
||
This is a straightforward generalization of the prior
compare-and-select patch on ARM64 only, for Float32 and Double both as
compare operands and as value operands.
(Int64 is a little harder due to the LIR structure of 64-bit data, and
supporting int64 is a separate bug: 1716580.)
Depends on D117915
Assignee | ||
Comment 31•3 years ago
|
||
Add a couple of comments about input reuse on arm64 so that we're
not confused in the future as to why this is OK.
Depends on D117916
Assignee | ||
Updated•3 years ago
|
Comment 32•3 years ago
|
||
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ec5325d76057 Avoid defineReuseInput for compare-and-select on ARM64. r=yury https://hg.mozilla.org/integration/autoland/rev/d545aa5a387d Generate good arm64 code for compare-and-select on fp types. r=yury https://hg.mozilla.org/integration/autoland/rev/b626c01bd196 Add comments about input reuse, r=yury
Comment 33•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ec5325d76057
https://hg.mozilla.org/mozilla-central/rev/d545aa5a387d
https://hg.mozilla.org/mozilla-central/rev/b626c01bd196
Description
•