Bulk memory callout operations on shared memory are quite slow
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
People
(Reporter: lth, Unassigned, NeedInfo)
References
(Blocks 1 open bug)
Details
I was running some tests on memory.fill on shared memory64... and the test to fill 4GB of shared memory takes some 20s while the test to fill 4GB of nonshared memory takes 3s, on my workstation.
The reason is that memset (which is used on unshared memory) is massively optimized while our fill function for shared memory (which is C++-UB-safe) is a bit clunky. The functions for memory.copy and memory.init (both use memcpy on unshared memory) are assumed to have the same problem; to be verified.
Since our code generators know how to generate good machine code for these operations (for the non-callout case) and this code would be C++-UB-safe, it should be possible for us to do better than we currently do for the shared memory ops.
Reporter | ||
Comment 1•3 years ago
|
||
memcpySafeWhenRacy actually seems to be competitive with memcpy, so this may be an artifact of memsetSafeWhenRacy.
Comment 2•1 year ago
|
||
I couldn't reproduce such performance numbers, in fact on my system filling shared memory was a tad bit faster than filling unshared one! (I don't really know what to think of that one)
$ hyperfine -w 3 -L file memory-fill-shared2,memory-fill2 "obj-js-release-x86_64-pc-linux-gnu/dist/bin/js -f js/src/jit-test/tests/wasm/memory64/{file}.js"
Benchmark 1: obj-js-release-x86_64-pc-linux-gnu/dist/bin/js -f js/src/jit-test/tests/wasm/memory64/memory-fill-shared2.js
Time (mean ± σ): 1.017 s ± 0.016 s [User: 0.200 s, System: 0.815 s]
Range (min … max): 0.984 s … 1.041 s 10 runs
Benchmark 2: obj-js-release-x86_64-pc-linux-gnu/dist/bin/js -f js/src/jit-test/tests/wasm/memory64/memory-fill2.js
Time (mean ± σ): 1.077 s ± 0.009 s [User: 0.139 s, System: 0.936 s]
Range (min … max): 1.069 s … 1.094 s 10 runs
Summary
obj-js-release-x86_64-pc-linux-gnu/dist/bin/js -f js/src/jit-test/tests/wasm/memory64/memory-fill-shared2.js ran
1.06 ± 0.02 times faster than obj-js-release-x86_64-pc-linux-gnu/dist/bin/js -f js/src/jit-test/tests/wasm/memory64/memory-fill2.js
I used the following code as the test file memory-fill-shared2.js
:
var ins = new WebAssembly.Instance(new WebAssembly.Module(wasmTextToBinary(`
(module
(memory (export "mem") i64 65537 65537 shared)
(func (export "f") (param $p i64) (param $c i32) (param $n i64)
(memory.fill (local.get $p) (local.get $c) (local.get $n))))`)));
ins.exports.f(BigInt(0x0), 0x41, BigInt(0x100000100));
and the same code without the shared
attribute for memory-fill2.js
.
Setup: the JS interpreter with optimisations and debug disable on Linux, in a 10-vCPUs Qemu-KVM VM (the host is an Intel i7-11700 CPU).
I'm aware this is a very wacky way of testing the performance of memory.fill, given that this also measures the overhead of starting the JS interpreter and all, but I don't know how to use the performance tools in the codebase to benchmark JS functions performance (yet).
Is this behavior still reproducible? (I can't ask @lth directly because it appears his account is disabled).
Description
•