Optimize SIMD v8x16.shuffle in Ion x86_64
Categories
(Core :: JavaScript: WebAssembly, enhancement, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox78 | --- | fixed |
People
(Reporter: lth, Assigned: lth)
References
Details
Attachments
(1 file)
The v8x16.shuffle opcode is a very general workhorse for wasm SIMD, performing byte shuffle and blend. The straightforward implementation is expensive, equivalent to at least a dozen simple instructions (CONST + PSHUFB + CONST + PSHUFB + POR on x86). In many cases, the patterns are simple and can be lowered to a small number of instructions. We should recognize a number of these patterns and lower to better code.
Assignee | ||
Comment 1•4 years ago
|
||
Implement some shuffle specializations in the MacroAssembler interface
(permutations, interleaves, concat-and-shift) and then add code to the
Ion x64 back-end to pattern match the shuffle masks and map as many
cases as we can to these specializations.
The pattern matcher is simple: it sorts instructions into buckets of
single-operand, single-operand-with-zero, and dual-operand, and then
matches patterns on the shuffle mask in a fixed order from what is
perceived as least expensive to most expensive. The matcher is
optimized for clarity, not for speed, since it will run very rarely.
The patterns I've chosen are inspired by the SSE instruction set, the
v8 code, and the SIMD.js code. More can be added; some TODO remarks
are left in the code to indicate this.
A simple test infrastructure is added and used to ensure that
optimizations are triggered (and not triggered) as expected.
Currently the pattern matcher is in x64-specific code, since we only
support x64. But it will move without any substantive changes into
x86-shared code when we add x86 support (bug 1637332), and it is
mostly platform-independent and can eventually move into shared code,
possibly with some platform hooks and some extensions, when we add
arm64 support.
The matcher can also be used to optimize baseline code, should we wish
to do that.
Assignee | ||
Comment 2•4 years ago
|
||
Memo to self: we don't have to use PALIGNR for byte shifting the vector; we have PSLLDQ and PSRLDQ for that case and can avoid generating a zero or futzing with operand order.
Updated•4 years ago
|
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2d1bf65618ad wasm ion simd: optimize v8x16.shuffle. r=jseward
Comment 4•4 years ago
|
||
bugherder |
Description
•