SIMD optimization x64/x86: Better code for variable swizzle
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox95 | --- | fixed |
People
(Reporter: lth, Assigned: yury)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
The variable swizzle on intel can use PSHUFB to shuffle the bytes but the mask vector must first be sanitized so that out-of-range lanes in the mask have the high bit set. Currently we use a compare-with-constant-and-POR to do this (and we don't even inline the constant load in the compare, sigh) but it's possible to do better by saturating-add'ing a constant into the mask: https://github.com/WebAssembly/simd/issues/68#issuecomment-470825324
For specific code generation, I'm not sure if it's better to (a) splat a byte value into scratch / load the constant into scratch, and add the mask to the scratch, or (b) to move the mask to scratch and add a constant from memory into scratch. Either way the mask register is not volatile.
Also see https://github.com/WebAssembly/simd/issues/93 for more discussion, probably worth reading although it ranges across a bunch of topics.
Assignee | ||
Comment 1•4 years ago
|
||
Updated•4 years ago
|
Assignee | ||
Comment 2•4 years ago
|
||
There is some 3-4% gain in local microbenchmark test.
Comment 4•4 years ago
|
||
bugherder |
Description
•