Closed Bug 1664397 Opened 1 year ago Closed 1 year ago

Enhance Movi masm function to generate better code for SIMD constant loads

Categories

(Core :: Javascript: WebAssembly, enhancement, P2)

ARM64
All
enhancement

Tracking

()

RESOLVED FIXED
84 Branch
Tracking Status
firefox84 --- fixed

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

Movi is not very sophisticated for SIMD loads, it needs to be improved to take advantage of load-and-splat, compressed constants, etc.

Severity: -- → N/A
Priority: -- → P3

Julian on IM talking about bitmask as implemented by the baseline compiler: "In particular, for the 8x16 case, the upper and lower 64 bits of constant are the same, so it winds up spending 10 insns overall where it can be done with 5."

It's not quite that black and white but we should probably prioritize fixing Movi. Here's the constant setup for i8x16.bitmask, from Julian:

[Codegen] [q] d2804030        mov     x16, #0x201
[Codegen] [q] f2a10090        movk    x16, #0x804, lsl #16
[Codegen] [q] f2c40210        movk    x16, #0x2010, lsl #32
[Codegen] [q] f2f00810        movk    x16, #0x8040, lsl #48
[Codegen] [q] 4e080e01        dup     v1.2d, x16
[Codegen] [q] d2804030        mov     x16, #0x201
[Codegen] [q] f2a10090        movk    x16, #0x804, lsl #16
[Codegen] [q] f2c40210        movk    x16, #0x2010, lsl #32
[Codegen] [q] f2f00810        movk    x16, #0x8040, lsl #48
[Codegen] [q] 4e181e01        mov     v1.d[1], x16

Obviously, there's an easy optimization when the high 64 bits match the low 64 bits, but maybe we can do better than that still.

Priority: P3 → P2
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Attachment #9183444 - Attachment description: Bug 1664397 - Improve simd constant loads (WIP) → Bug 1664397 - Improve simd constant loads. r?jseward

The only other improvable cases I know of are with i16x8.bitmask and i32x4.bitmask, in which hi == lo << 4 and hi == lo << 8 respectively. So they can be done with just one 64-bit constant load, the dup, a shift left of the integer reg, and the insn to copy that to the top of the vector. Not sure it's worth the effort. But I thought I should make a record of it somewhere.

Agreed. Did wonder about this. Decided that it was probably not worth the bother / required too much work to figure out if it was worth it. I would hope that a good implementation boils away MOV+MOVK* in some efficient way anyhow...

Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cc07e95f3b14
Improve simd constant loads. r=jseward
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 84 Branch

Test case is actually broken.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

A test case was broken but this was not discovered because I mostly test on release builds, and
that's all we run on CI as well. On release-debug, with the disassembler available,
we don't quit early and hence run into the bug: the expected values for a case were wrong.
Fix this by changing the expected values and running the functionality test even when
the disassembler is not available.

Attachment #9185151 - Attachment description: Bug 1664397 - fix test case. r?jseward → Bug 1664397 - remove spurious print statements. r?jseward
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f4c5176760c4
remove spurious print statements. r=jseward DONTBUILD
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.