SIMD optimization: sign replication
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox90 | --- | fixed |
People
(Reporter: lth, Assigned: yury)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
https://github.com/WebAssembly/simd/issues/437 points out that some cliches, such as iNxM.shr_s(v, N-1)
and iNxM.shr_s(v, -1)
, actually mean "replicate the sign bit throughout the lane", and that on some architectures there are faster instruction sequences for this than constant right shift. (See the ticket for suggestions.) It would be easy to optimize this, as we already handle the shift-by-constant case specially.
Reporter | ||
Comment 1•4 years ago
|
||
Google bug with some ideas for code generation: https://crbug.com/v8/11311
Assignee | ||
Comment 2•4 years ago
|
||
Updated•4 years ago
|
Assignee | ||
Comment 3•4 years ago
•
|
||
Not sure if it makes sense to replace one instruction vpsraw/vpsrad with multiple instructions PXOR/PCMPGTx. Submitted a patch to optimize x86 for i8x16.shr_s and i64x2.shr_s
Reporter | ||
Comment 4•4 years ago
|
||
(In reply to Yury Delendik (:yury) from comment #3)
Not sure if it makes sense to replace one instruction vpsraw/vpsrad with multiple instructions PXOR/PCMPGTx. Submitted a patch to optimize x86 for i8x16.shr_s and i64x2.shr_s
I agree, on x64 the 16/32 bit cases are best left as they are, the V8 bug also indicates that the payoff is for 8 and 64 primarily.
On ARM64 there seems to be no particularly good reason to change anything; apart from too many moves (see bug 1712692) we already generate a single shift instruction, and we'll do this for all operand sizes. The ARM64 optimization manual indicates that the execution cost of the shifts does not differ from that of the compares.
Comment 6•4 years ago
|
||
bugherder |
Description
•