Closed Bug 1591047 Opened 5 years ago Closed 5 years ago

Speed up OOL path for memory.copy and memory.fill

Categories

(Core :: JavaScript: WebAssembly, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: rhunt, Assigned: rhunt)

References

Details

Attachments

(7 files, 1 obsolete file)

Bug 1570112 indicates that the overhead to our OOL implementations of memory.copy and memory.fill can be significant. I have patches to improve this and I'm moving them to this separate bug to reduce the noise on that issue.

This commit removes the lock from the shared Wasm buffer that protects the length
of the buffer that is exposed to Wasm. The important correctness guarantee that must
be upheld is that a write to length is observed after the new pages have been
committed. If this is not upheld, some Wasm code may query the length and
perform a write within the reported bounds that then traps.

There is a lack of documentation on the memory ordering guarantees of memmap/
VirtualAlloc, so it's difficult to understand whether this change is safe or
not.

I would be surprised if memmap/VirtualAlloc didn't follow a strict ordering,
but I have been surprised many times before.

Whether a module uses shared memory or not is fixed throughout its lifetime. We
can use this to specialize the implementation of memCopy/memFill and remove a
branch on the memory type. This will also be useful when acquiring the memory
length in a future commit, which will require different code per shared-ness.

This commit declares WasmArrayRawBuffer in the ArrayBufferObject header. This
will allow WasmInstance to access the buffer in a future commit.

Currently the Wasm memory length for non-shared memory is stored in a slot of
the ArrayBuffer. This commit tracks it in the WasmArrayRawBuffer as well, for
use in a future commit.

This commit uses the previous commits to actually optimize the OOL
implementations.

This is done by:

  • Passing the heap base pointer to each builtin
  • Acquiring the WasmArrayRawBuffer/SharedArrayRawBuffer from this pointer
    • This is trivial as they are embedded a fixed offset before the Wasm heap
  • Acquiring the heap length from the raw buffer

By doing this, we avoid cache misses from accessing:
TLSData -> Instance -> WasmMemoryObject -> ArrayBufferObject -> WasmArrayRawBuffer

This is enough to get close enough to parity with V8 for small sizes. Further
improvements should be done with an inline generated code path.

The baseline compiler is able to directly acquire the heap base pointer using
HeapReg on non-x86 platforms. Ion will use HeapReg from the CodeGenerator for
appropriate instructions, but there's no way to use it on the MIR level for an
instance call. This commit adds a HeapBase MIR node for optimized acquisition
of the heap base for memCopy/memFill.

There are essentially two independent functional changes here.

  1. Use atomics to acquire the heap length from SharedArrayRawBuffer
  2. Pass the heap base pointer to the builtins, use that to directly access the raw buffers (which are fixed relative to the heap base) and acquire the heap length

The speed-ups result from avoiding a lot of indirection to acquire the heap base and heap length, and avoiding overhead of acquiring a lock.

I understand that (1) is more controversial, so if it really is too much of a risk we can drop it and still get some speed-up from (2).

Attachment #9104182 - Attachment description: Bug 1591047 part 1 - Remove locking from acquiring length of shared Wasm memory. r?lth → Bug 1591047 part 1 - Don't lock when acquiring length of shared Wasm memory. r?lth
Attachment #9104187 - Attachment is obsolete: true

I've figured out the Win64 issue. Adding a pointer argument to memCopy/memFill causes us to start passing by stack on Win64. This exposes an issue in our use of ABIArgType in GenerateBuiltinThunks. Will have a patch to fix it soon.

When adding a pointer to WasmInstance::memCopy, it appears that we ran out of
GPRs for calling and the pointer needed to be passed via stack.

GenerateBuiltinThunk uses ABIFunctionType/ABIArgType, and converts it to
MIRType for calling StackCopy [1]. ArgType_General is treated as being equal to
MIRType::Int32, leading to only half of the pointer being passed correctly on
64bit window systems.

This means that all instance functions currently have an incorrect
ABIFunctionType but avoid this issue because they don't have enough parameters
or don't have a 64bit value that is passed by the stack.

We have to use ABIFunctionType here for compatibility with the ARM/ARM64
simulators, otherwise it would be convenient to use a different representation.

This commit:

  1. Adds an ArgType_Pointer which is equivalent to MIRType::Pointer
  2. Adds a helper constexpr for defining ABIFunctionType
  3. Fixes the ABIFunctionType used for instance calls
  4. Fixes the simulators to recognize the new ABIFunctionType's
  5. Adds an assertion that the SymbolicAddressSignature and ABIFunctionType
    are compatible.

Depends on D50378

Pushed by rhunt@eqrion.net: https://hg.mozilla.org/integration/autoland/rev/911c7062f83f part 1 - Don't lock when acquiring length of shared Wasm memory. r=lth https://hg.mozilla.org/integration/autoland/rev/2a4f82299920 part 2 - Split memCopy/memFill implementations for shared/non-shared modules. r=lth https://hg.mozilla.org/integration/autoland/rev/08f232c06c37 part 3 - Expose WasmArrayRawBuffer from 'ArrayBufferObject.h'. r=lth https://hg.mozilla.org/integration/autoland/rev/6aa480e46a9f part 4 - Track Wasm memory length in WasmArrayRawBuffer. r=lth https://hg.mozilla.org/integration/autoland/rev/756e84749f99 part 5 - Pass heapBase to memCopy/memFill and use that to acquire length. r=lth https://hg.mozilla.org/integration/autoland/rev/3cffff8e9f47 part 6 - Add ABIArgType::Pointer and use it for builtin functions. r=lth
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6603dda2bbad part 5 - [mips] Pass heapBase to memCopy/memFill and use that to acquire length. r=lth
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: