Closed Bug 1690462 Opened 4 years ago Closed 4 years ago

SIMD optimization: sign replication

Tracking

()

Status:

RESOLVED FIXED

Milestone:

90 Branch

Tracking Flags:

Tracking

Status

firefox90

---

fixed

People

(Reporter: lth, Assigned: yury)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Bug 1690462 - SIMD optimization for sign replication. r?lth 4 years ago Yury Delendik (:yury) 48 bytes, text/x-phabricator-request		Details \| Review

Lars T Hansen [:lth]

Reporter

Description

•

4 years ago

https://github.com/WebAssembly/simd/issues/437 points out that some cliches, such as iNxM.shr_s(v, N-1) and iNxM.shr_s(v, -1) , actually mean "replicate the sign bit throughout the lane", and that on some architectures there are faster instruction sequences for this than constant right shift. (See the ticket for suggestions.) It would be easy to optimize this, as we already handle the shift-by-constant case specially.

Lars T Hansen [:lth]

Reporter

Updated

•

4 years ago

Updated

•

4 years ago

Comment 1

•

4 years ago

Google bug with some ideas for code generation: https://crbug.com/v8/11311

Yury Delendik (:yury)

Assignee

Comment 2

•

4 years ago

Attached file Bug 1690462 - SIMD optimization for sign replication. r?lth — Details

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → ydelendik

Status: NEW → ASSIGNED

Yury Delendik (:yury)

Assignee

Comment 3

•

4 years ago

•

Edited

Not sure if it makes sense to replace one instruction vpsraw/vpsrad with multiple instructions PXOR/PCMPGTx. Submitted a patch to optimize x86 for i8x16.shr_s and i64x2.shr_s

Lars T Hansen [:lth]

Reporter

Comment 4

•

4 years ago

(In reply to Yury Delendik (:yury) from comment #3)

Not sure if it makes sense to replace one instruction vpsraw/vpsrad with multiple instructions PXOR/PCMPGTx. Submitted a patch to optimize x86 for i8x16.shr_s and i64x2.shr_s

I agree, on x64 the 16/32 bit cases are best left as they are, the V8 bug also indicates that the payoff is for 8 and 64 primarily.

On ARM64 there seems to be no particularly good reason to change anything; apart from too many moves (see bug 1712692) we already generate a single shift instruction, and we'll do this for all operand sizes. The ARM64 optimization manual indicates that the execution cost of the shifts does not differ from that of the compares.

Pulsebot

Comment 5

•

4 years ago

Pushed by ydelendik@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/556aff9ffdc6 SIMD optimization for sign replication. r=lth

Alexandru Michis [:malexandru]

Comment 6

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/556aff9ffdc6

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox90: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 90 Branch

You need to log in before you can comment on or make changes to this bug.

Bugzilla

SIMD optimization: sign replication

Categories

(Core :: JavaScript: WebAssembly, enhancement, P3)

Tracking

()

People

(Reporter: lth, Assigned: yury)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Attachment

General

Description

File Name

Content Type