Lower SIMD instructions so as to respect register preferences
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox84 | --- | fixed |
People
(Reporter: lth, Assigned: lth)
References
(Blocks 1 open bug)
Details
Attachments
(4 files, 1 obsolete file)
A number of the SIMD instructions have register preferences. For example, a number of the unary instructions prefer src == dest to avoid a spurious move, and bitselect prefers its temp and mask registers to be the same. (I have a long list.) As a matter of general hygiene we should try to cater to the preferences in the Ion pipeline so that the last use of a value (frequently the only use of a value) can avoid extra moves and extra register pressure.
There's a slight conceptual jungle here:
- useRegisterAtStart signals that the codegen macro can handle src == dest (for a particular src, there being only one dest) without forcing that. For SIMD, we have set it up so that the codegen macros always can handle src == dest, but sometimes by introducing extra moves at the beginning, and we want to avoid those moves if we can
- defineReuseInput is used to force src == dest
- useRegister is used to force src != dest (it makes their lifetimes overlap)
- there is no obvious way to signal a preference for src == dest without forcing it with a defineReuseInput, indeed it does not appear that the regalloc will prioritize reusing the register of the (dead) src for the dest; it will pick that src register only if it is first in the set of available registers when the dest is allocated, and that set is not some LIFO thing.
Consider the case where we create flexibility by using useRegisterAtStart but not defineReuseInput but really prefer src == dest (last bullet above):
- If the regalloc does not choose to reuse the src register after all but the src is dead after the instruction, then we'll get suboptimal code (more moves) because the initial moves will be generated by the codegen macro anyway
- If we had chosen less flexibility, then we get locally optimal code (no initial moves), and if a copy is needed because the src is not dead, then the regalloc must create that copy before the instruction, so this is no worse, and uses no more registers
- Thus it is relatively locally optimal to force the hand of the regalloc here. There is of course a chance that this triggers bad behavior elsewhere because the constraints are now quite stringent (live ranges may have to be split more, say), ie it creates globally less optimal code.
- Of course this analysis only holds for SSE. For AVX, and for 3-address architectures, it would be different. But we don't care about anything but SSE here.
In the final analysis:
- if the codegen macro prefers src != dest then we will use useRegister(src) and plain define for the destination
- if the codegen macro prefers src == dest then we will use useRegisterAtStart(src) and defineReuseInput(src) for the destination
- if it doesn't matter then we will use useRegisterAtStart(src) and plain define, to create the maximum amount of flexibility and to reduce register pressure as much as we can
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
Depends on D77784
Assignee | ||
Comment 2•5 years ago
|
||
Use useRegister / useRegisterAtStart / define / defineReuseInput (as
well as comments) so as to generate (and document) the best code under
context-less assumptions.
We add test cases to assert that the best code is actually generated
when that makes sense. There are test cases only for operations where
the operation itself has explicit, but optional, reg->reg moves at the
beginning of the emitted code to deal with register misallocation.
The test cases assert that extraneous moves are not inserted at the
start of an operation under ideal circumstances, taking into account
whether the operation prefers src == dest or src != dest. For cases
where the operation prefers src == dest the input register is set up
to be the same as the output register, ie, we use local 0 as the
argument and generate a result in the result register; for cases where
the operation prefers src != dest, we use local 1 as the argument.
The test cases are fairly delicate - they may be sensitive to small
changes in Ion; comments in the test files highlight this, and the
tests are not meant to constrain Ion, only to highlight accidental
changes. That said, these tests have already uncovered bugs, and for
now let's see if we can make them stick.
Drive-by fixes:
-
prefer AND-with-literal-mask and XOR-with-literal-mask in a couple
of places (instead of preloading the constant) to generate better
code, mostly where this matters right now because we introduce
tests that need to inspect the code; a more general cleanup of this
type of pattern may happen later -
remove unused/unimplemented generality in the negFloatNNxM,
absFloatNNxM, and notTypeNNxM primitives -
make the reuseInput*() primitives assert that their inputs are
SIMD registers, and adapt client code accordingly; this
prevents some bugs since float registers do not compare equal
if they are not of the same type -
make moveSimd128{Int,Float} no longer emit a move if the source and
destination are the same register
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 3•5 years ago
|
||
Depends on D93630
Updated•5 years ago
|
Assignee | ||
Comment 4•5 years ago
|
||
Depends on D93721
Assignee | ||
Comment 5•5 years ago
|
||
Change register allocation for the shuffle operators so as to generate
better code. Mostly this is about those shuffles that decay into a
permutation of one of the operands.
Add test cases to check the code generation.
Depends on D93912
Comment 7•5 years ago
|
||
Backed out 13 changesets (bug 1641595, bug 1671998, bug 1672770, bug 1641641, bug 1672697, bug 1672911, bug 1671996, bug 1671907, bug 1640662) for causing sm failures in splat-x64-ion-codegen.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/c2c595aa8fe1dc18803bf80df3bca7ca07443038
Comment 9•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/480aa6fe2e0a
https://hg.mozilla.org/mozilla-central/rev/21a7814d292e
https://hg.mozilla.org/mozilla-central/rev/fa2584575737
https://hg.mozilla.org/mozilla-central/rev/1b69d3cfd0d4
Description
•