Closed Bug 1640662 Opened 5 years ago Closed 5 years ago

Lower SIMD instructions so as to respect register preferences

Categories

(Core :: JavaScript: WebAssembly, enhancement, P3)

x86_64
All
enhancement

Tracking

()

RESOLVED FIXED
84 Branch
Tracking Status
firefox84 --- fixed

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 1 obsolete file)

A number of the SIMD instructions have register preferences. For example, a number of the unary instructions prefer src == dest to avoid a spurious move, and bitselect prefers its temp and mask registers to be the same. (I have a long list.) As a matter of general hygiene we should try to cater to the preferences in the Ion pipeline so that the last use of a value (frequently the only use of a value) can avoid extra moves and extra register pressure.

There's a slight conceptual jungle here:

  • useRegisterAtStart signals that the codegen macro can handle src == dest (for a particular src, there being only one dest) without forcing that. For SIMD, we have set it up so that the codegen macros always can handle src == dest, but sometimes by introducing extra moves at the beginning, and we want to avoid those moves if we can
  • defineReuseInput is used to force src == dest
  • useRegister is used to force src != dest (it makes their lifetimes overlap)
  • there is no obvious way to signal a preference for src == dest without forcing it with a defineReuseInput, indeed it does not appear that the regalloc will prioritize reusing the register of the (dead) src for the dest; it will pick that src register only if it is first in the set of available registers when the dest is allocated, and that set is not some LIFO thing.

Consider the case where we create flexibility by using useRegisterAtStart but not defineReuseInput but really prefer src == dest (last bullet above):

  • If the regalloc does not choose to reuse the src register after all but the src is dead after the instruction, then we'll get suboptimal code (more moves) because the initial moves will be generated by the codegen macro anyway
  • If we had chosen less flexibility, then we get locally optimal code (no initial moves), and if a copy is needed because the src is not dead, then the regalloc must create that copy before the instruction, so this is no worse, and uses no more registers
  • Thus it is relatively locally optimal to force the hand of the regalloc here. There is of course a chance that this triggers bad behavior elsewhere because the constraints are now quite stringent (live ranges may have to be split more, say), ie it creates globally less optimal code.
  • Of course this analysis only holds for SSE. For AVX, and for 3-address architectures, it would be different. But we don't care about anything but SSE here.

In the final analysis:

  • if the codegen macro prefers src != dest then we will use useRegister(src) and plain define for the destination
  • if the codegen macro prefers src == dest then we will use useRegisterAtStart(src) and defineReuseInput(src) for the destination
  • if it doesn't matter then we will use useRegisterAtStart(src) and plain define, to create the maximum amount of flexibility and to reduce register pressure as much as we can
Severity: -- → N/A
Priority: -- → P3
No longer blocks: 1637332
See Also: → 1641641

Depends on D77784

Use useRegister / useRegisterAtStart / define / defineReuseInput (as
well as comments) so as to generate (and document) the best code under
context-less assumptions.

We add test cases to assert that the best code is actually generated
when that makes sense. There are test cases only for operations where
the operation itself has explicit, but optional, reg->reg moves at the
beginning of the emitted code to deal with register misallocation.

The test cases assert that extraneous moves are not inserted at the
start of an operation under ideal circumstances, taking into account
whether the operation prefers src == dest or src != dest. For cases
where the operation prefers src == dest the input register is set up
to be the same as the output register, ie, we use local 0 as the
argument and generate a result in the result register; for cases where
the operation prefers src != dest, we use local 1 as the argument.

The test cases are fairly delicate - they may be sensitive to small
changes in Ion; comments in the test files highlight this, and the
tests are not meant to constrain Ion, only to highlight accidental
changes. That said, these tests have already uncovered bugs, and for
now let's see if we can make them stick.

Drive-by fixes:

  • prefer AND-with-literal-mask and XOR-with-literal-mask in a couple
    of places (instead of preloading the constant) to generate better
    code, mostly where this matters right now because we introduce
    tests that need to inspect the code; a more general cleanup of this
    type of pattern may happen later

  • remove unused/unimplemented generality in the negFloatNNxM,
    absFloatNNxM, and notTypeNNxM primitives

  • make the reuseInput*() primitives assert that their inputs are
    SIMD registers, and adapt client code accordingly; this
    prevents some bugs since float registers do not compare equal
    if they are not of the same type

  • make moveSimd128{Int,Float} no longer emit a move if the source and
    destination are the same register

Attachment #9154101 - Attachment is obsolete: true
Attachment #9181730 - Attachment description: Bug 1640662 - wasm ion simd register preferences part 1: unary-ish ops. → Bug 1640662 - wasm ion simd register preferences part 1: unary-ish ops. r?jseward
Attachment #9181918 - Attachment description: Bug 1640662 - wasm ion simd register preferences part 2: binary-ish ops. → Bug 1640662 - wasm ion simd register preferences part 2: binary-ish ops. r?jseward
Depends on: 1671907

Change register allocation for the shuffle operators so as to generate
better code. Mostly this is about those shuffles that decay into a
permutation of one of the operands.

Add test cases to check the code generation.

Depends on D93912

Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2026ca13fc92 wasm ion simd register preferences part 1: unary-ish ops. r=jseward https://hg.mozilla.org/integration/autoland/rev/164c3fd55821 wasm ion simd register preferences part 2: binary-ish ops. r=jseward https://hg.mozilla.org/integration/autoland/rev/4e949bd393fa wasm ion simd register preferences part 3: bitselect. r=jseward https://hg.mozilla.org/integration/autoland/rev/bda987b1c505 wasm ion simd register preferences part 4: shuffle. r=jseward
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/480aa6fe2e0a wasm ion simd register preferences part 1: unary-ish ops. r=jseward https://hg.mozilla.org/integration/autoland/rev/21a7814d292e wasm ion simd register preferences part 2: binary-ish ops. r=jseward https://hg.mozilla.org/integration/autoland/rev/fa2584575737 wasm ion simd register preferences part 3: bitselect. r=jseward https://hg.mozilla.org/integration/autoland/rev/1b69d3cfd0d4 wasm ion simd register preferences part 4: shuffle. r=jseward
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: