Closed Bug 1694191 Opened 10 months ago Closed 3 months ago

SIMD optimization x64/x86: Improve codegen for extractLaneInt64x2 when lane=0

Categories

(Core :: Javascript: WebAssembly, enhancement, P3)

x86_64
All
enhancement

Tracking

()

RESOLVED FIXED
93 Branch
Tracking Status
firefox93 --- fixed

People

(Reporter: lth, Assigned: yury)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

The codegen for extractLaneInt64x2 is specialized to x86 and x64. It lowers as a single vpextrq. But the other extract-lane operations have a couple of optimizations missing here, if the lane is zero: a move can be used instead to move the value from src to dest (extract-lane is pretty slow, according to the manual), and if the src and dest registers are the same then this operation should generate no code at all.

Use movq for extractLaneInt64x2 and lane=0.

Special case extractLaneInt16x8 and extractLaneInt8x16 for high/low bytes of low word.

Assignee: nobody → ydelendik
Status: NEW → ASSIGNED

the src and dest registers are the same then this operation should generate no code at all

I wonder if this is not applicable to x86 and x64 instruction.

(In reply to Yury Delendik (:yury) from comment #2)

the src and dest registers are the same then this operation should generate no code at all

I wonder if this is not applicable to x86 and x64 instruction.

Yes, I think that when the lane is an integer lane then this will never happen - the destination is a GPR, and the source is an FPR. It's a factor for FP extract-lane but that's a different matter.

Attachment #9238587 - Attachment description: Bug 1694191 - Improve codegen for extractLaneIntXXX. r?lth → Bug 1694191 - Improve codegen for extractLaneIntXXX. r=lth
Pushed by ydelendik@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fc7e4f88c1ab
Improve codegen for extractLaneIntXXX. r=lth
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 93 Branch
You need to log in before you can comment on or make changes to this bug.