Speed up OOL path for memory.copy and memory.fill
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox72 | --- | fixed |
People
(Reporter: rhunt, Assigned: rhunt)
References
Details
Attachments
(7 files, 1 obsolete file)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
Bug 1570112 indicates that the overhead to our OOL implementations of memory.copy and memory.fill can be significant. I have patches to improve this and I'm moving them to this separate bug to reduce the noise on that issue.
Assignee | ||
Comment 1•5 years ago
|
||
This commit removes the lock from the shared Wasm buffer that protects the length
of the buffer that is exposed to Wasm. The important correctness guarantee that must
be upheld is that a write to length is observed after the new pages have been
committed. If this is not upheld, some Wasm code may query the length and
perform a write within the reported bounds that then traps.
There is a lack of documentation on the memory ordering guarantees of memmap/
VirtualAlloc, so it's difficult to understand whether this change is safe or
not.
I would be surprised if memmap/VirtualAlloc didn't follow a strict ordering,
but I have been surprised many times before.
Assignee | ||
Comment 2•5 years ago
|
||
Whether a module uses shared memory or not is fixed throughout its lifetime. We
can use this to specialize the implementation of memCopy/memFill and remove a
branch on the memory type. This will also be useful when acquiring the memory
length in a future commit, which will require different code per shared-ness.
Assignee | ||
Comment 3•5 years ago
|
||
This commit declares WasmArrayRawBuffer in the ArrayBufferObject header. This
will allow WasmInstance to access the buffer in a future commit.
Assignee | ||
Comment 4•5 years ago
|
||
Currently the Wasm memory length for non-shared memory is stored in a slot of
the ArrayBuffer. This commit tracks it in the WasmArrayRawBuffer as well, for
use in a future commit.
Assignee | ||
Comment 5•5 years ago
|
||
This commit uses the previous commits to actually optimize the OOL
implementations.
This is done by:
- Passing the heap base pointer to each builtin
- Acquiring the WasmArrayRawBuffer/SharedArrayRawBuffer from this pointer
- This is trivial as they are embedded a fixed offset before the Wasm heap
- Acquiring the heap length from the raw buffer
By doing this, we avoid cache misses from accessing:
TLSData -> Instance -> WasmMemoryObject -> ArrayBufferObject -> WasmArrayRawBuffer
This is enough to get close enough to parity with V8 for small sizes. Further
improvements should be done with an inline generated code path.
Assignee | ||
Comment 6•5 years ago
|
||
The baseline compiler is able to directly acquire the heap base pointer using
HeapReg on non-x86 platforms. Ion will use HeapReg from the CodeGenerator for
appropriate instructions, but there's no way to use it on the MIR level for an
instance call. This commit adds a HeapBase MIR node for optimized acquisition
of the heap base for memCopy/memFill.
Assignee | ||
Comment 7•5 years ago
|
||
There are essentially two independent functional changes here.
- Use atomics to acquire the heap length from SharedArrayRawBuffer
- Pass the heap base pointer to the builtins, use that to directly access the raw buffers (which are fixed relative to the heap base) and acquire the heap length
The speed-ups result from avoiding a lot of indirection to acquire the heap base and heap length, and avoiding overhead of acquiring a lock.
I understand that (1) is more controversial, so if it really is too much of a risk we can drop it and still get some speed-up from (2).
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 8•5 years ago
|
||
Assignee | ||
Comment 9•5 years ago
|
||
I've figured out the Win64 issue. Adding a pointer argument to memCopy/memFill causes us to start passing by stack on Win64. This exposes an issue in our use of ABIArgType in GenerateBuiltinThunks. Will have a patch to fix it soon.
Assignee | ||
Comment 10•5 years ago
|
||
When adding a pointer to WasmInstance::memCopy, it appears that we ran out of
GPRs for calling and the pointer needed to be passed via stack.
GenerateBuiltinThunk uses ABIFunctionType/ABIArgType, and converts it to
MIRType for calling StackCopy [1]. ArgType_General is treated as being equal to
MIRType::Int32, leading to only half of the pointer being passed correctly on
64bit window systems.
This means that all instance functions currently have an incorrect
ABIFunctionType but avoid this issue because they don't have enough parameters
or don't have a 64bit value that is passed by the stack.
We have to use ABIFunctionType here for compatibility with the ARM/ARM64
simulators, otherwise it would be convenient to use a different representation.
This commit:
- Adds an ArgType_Pointer which is equivalent to MIRType::Pointer
- Adds a helper constexpr for defining ABIFunctionType
- Fixes the ABIFunctionType used for instance calls
- Fixes the simulators to recognize the new ABIFunctionType's
- Adds an assertion that the SymbolicAddressSignature and ABIFunctionType
are compatible.
Depends on D50378
Assignee | ||
Comment 11•5 years ago
|
||
Finally got a patch together to fix it. [1]
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=639611ce487024b0fe6c146b3171821da1a8533a
Comment 12•5 years ago
|
||
Comment 13•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/911c7062f83f
https://hg.mozilla.org/mozilla-central/rev/2a4f82299920
https://hg.mozilla.org/mozilla-central/rev/08f232c06c37
https://hg.mozilla.org/mozilla-central/rev/6aa480e46a9f
https://hg.mozilla.org/mozilla-central/rev/756e84749f99
https://hg.mozilla.org/mozilla-central/rev/3cffff8e9f47
Comment 14•5 years ago
|
||
Comment 15•5 years ago
|
||
Comment 16•5 years ago
|
||
bugherder |
Description
•