Closed Bug 1597790 Opened 2 years ago Closed 1 month ago

Optimization: Use SIMD for inlined bulk-memory operations

Categories

(Core :: Javascript: WebAssembly, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
93 Branch
Tracking Status
firefox93 --- fixed

People

(Reporter: rhunt, Assigned: yury)

References

Details

Attachments

(2 files)

This shouldn't block shipping bulk-memory.

Bug 1594204 add a basic OOL path for memory.copy/memory.fill that uses up to 4 byte loads/stores on 32-bit systems and up to 8 byte loads/stores on 64-bit systems. We can do better if we utilize the SIMD register bank.

I originally avoided this for expediency. By only using GPRs we could re-use our existing wasmLoad/wasmStore infrastructure for bounds checking and trap handling. SIMD will require more work.

I had a version of a patch to do this by adding a Scalar::Int128, and extending wasmLoad/Store to utilize this. It was a bit hacky, and didn't fit at all with the baseline compiler (which uses ValueType for the stack). It's not clear that we should care for this optimization in the baseline compiler, so maybe that's fine.

No longer blocks: wasm-bulkmem
Summary: Use SIMD for inlined bulk-memory operations → Optimization: Use SIMD for inlined bulk-memory operations
Assignee: nobody → rhunt

I had a version of a patch to do this by adding a Scalar::Int128, and extending wasmLoad/Store to utilize this. It was a bit hacky, and didn't fit at all with the baseline compiler (which uses ValueType for the stack).

Since we have Scalar::Simd128, we could just partially enable ENABLE_WASM_SIMD fragments. Though it becomes too complicated and adds overhead just for copy/fill when ENABLE_WASM_SIMD is off. The patch enables inlining with SIMD ops only when ENABLE_WASM_SIMD is on.

I did some microbenchmarking on x64 (attaching test):

before after
ion aligned 1921 1315 46.06%
ion unaligned 2086 1475 41.42%
baseline aligned 2078 1504 38.16%
baseline unaligned 2158 1634 32.06%
Attached file bulk_simd_tests.js

Nice results and nice compromise.

Assignee: rhunt → ydelendik
Status: NEW → ASSIGNED
Attachment #9234768 - Attachment description: WIP: Bug 1597790 - WIP use SIMD in inlined memory.{fill,copy} → Bug 1597790 - Use SIMD in inlined memory.{fill,copy} ops. r?lth

Yury, IIUC your test is checking size=128/129 byte copies and fills. Would it be possible to test on smaller sizes? Like 16,17,32,33 bytes? Just curious if that changes the picture. Would also be curious if we could test this on ARM64 to ensure we get reasonable codegen there too.

Flags: needinfo?(ydelendik)

(In reply to Ryan Hunt [:rhunt] from comment #5)

Yury, IIUC your test is checking size=128/129 byte copies and fills. Would it be possible to test on smaller sizes? Like 16,17,32,33 bytes? Just curious if that changes the picture. Would also be curious if we could test this on ARM64 to ensure we get reasonable codegen there too.

The benchmark checked length from 1 to 65, I think. The 128/129 is destination offset. ARM64 is pending.

Flags: needinfo?(ydelendik)
Pushed by ydelendik@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e8b9b45734fa
Use SIMD in inlined memory.{fill,copy} ops. r=rhunt
Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → 93 Branch
You need to log in before you can comment on or make changes to this bug.