Remove the data header from wasm GC inline array allocations
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
People
(Reporter: bvisness, Assigned: bvisness)
References
(Blocks 1 open bug)
Details
Attachments
(3 files, 3 obsolete files)
We allocate a one-word data header at the beginning of each chunk of wasm GC data. This potentially 8-byte value only ever stores a single meaningful bit, meaning we waste anywhere from 31 to 63 bits in each array data area that could be used for elements. In particular, this means that we could have more arrays that are allocated with inline storage, since we already know that GC programs are heavy on small arrays.
We can remove the data header by simply setting a low bit in the array data pointer when the data is out of line, then masking this bit off on access. The cost for this should be quite small, hopefully negligible, since it is a single-cycle op with a small encoding.
| Assignee | ||
Comment 1•1 month ago
|
||
Updated•1 month ago
|
| Assignee | ||
Comment 2•1 month ago
|
||
My initial attempt at this was a mixed bag - Kotlin and Dart workloads got a small win, but j2cl got a substantial loss. To me this suggests that Kotlin and Dart benefited from more arrays using inline storage, but that the cost of accessing arrays was made a fair amount worse, particularly hitting j2cl because that workload is mostly just looping over arrays.
I'm not entirely sure what makes the masking so costly, but one candidate is register allocation - many operations need additional temps. We could tighten this up somewhat, but it seems dubious that we could get much of a win from e.g. some careful use of useRegisterAtStart.
If we can come up with other approaches to save space that don't involve masking the pointer on load, that might be the better choice.
| Assignee | ||
Comment 3•1 month ago
|
||
Updated•1 month ago
|
| Assignee | ||
Comment 4•1 month ago
|
||
Updated•1 month ago
|
Updated•1 month ago
|
Updated•1 month ago
|
| Assignee | ||
Comment 5•1 month ago
|
||
After some experimentation, the masking approach didn't seem to pay off, but Ryan had an idea that allows us to remove the data header from inline arrays regardless. It is simply:
- Don't allocate the data header for inline arrays.
- Change how we interpret the data header: null -> out of line, non-null -> inline.
- Make the out-of-line data header null.
This works because inline elements always follow the array's data pointer, and arrays always have a non-null data pointer. This is nice and simple, simplifies our bookkeeping, and doesn't affect codegen except by slightly simplifying allocation.
This is implemented in D267984.
Updated•1 month ago
|
Comment 6•1 month ago
|
||
I think unfortunately, the comment 5 scheme is going to conflict with the
change in bug 1995834, which is in turn a fix on top of bug 1995106, which is
needed to make the stackmap machinery work properly with BufferAlloc'd OOL
pointers. I tried but failed to think of a way to reconcile the two sets of
changes.
We should revisit this in a month or so and see if there's a way to reconcile
the two.
| Assignee | ||
Comment 7•15 days ago
|
||
We have yet another scheme for this, which may be addressed in bug 1995106: just do "direct forwarding" for both inline and out-of-line arrays.
In short: whenever we move array data, either inline or out of line, write the new data pointer at old_data[-1]. Then, whenever we find an array data pointer on the stack, simply update it to data[-1].
In full:
- Allocate inline array data with no data header.
- Allocate out-of-line array data with a one-word data header.
- When moving an inline WasmArrayObject, write
new_array.dataintoold_array.data. Because the inline array data immediately follows the WAO's data pointer, this places the new data address at old_data[-1]. - When moving out-of-line array data (allocated by the buffer allocator), write the new data pointer into the old trailer's data header, again placing the new data address at old_data[-1].
- Finally, when updating pointers on the stack, just update them like so: data = data[-1].
The one thing I'm not clear on is whether it's ok in the OOL case to write the actual data address instead of the actual allocation address. With the data header in there, alloc[0] is the same as data[-1]. It seems like this should be workable though.
Comment 8•13 days ago
•
|
||
Following further discussion on Matrix, it seems this won't quite work because
of the unclarity mentioned in the previous comment. The GC/BufferAlloc
forwarding machinery really wants to deal with start-of-block pointers. Here's
a revised proposal from Jon/Ben which avoids that problem, and means we don't
need to change the GC/BufferAlloc forwarding machinery at all.
Proposed layout -- outside of GC times. "#e+p" is #elems + 4 bytes padding.
| Shape* | STV* | #e+p | data* | ...payload... | (IL case)
| | |
| \-->-/
|
| | header | ...payload... | (OOL case)
| |
\------->-------/
After obj_move -- OOL case
oldRoot
|
| Shape* | STV* | #e+p | data* | | header | ...payload... | (old)
| | |
\--------->---------/
|
v newRoot
| |
| Shape* | STV* | #e+p | data* | | header | ...payload... | (new)
| |
\--------->---------/
In this case, the GC (BufferAllocator) will write the new address of the
payload block in the first word of the old payload block, as illustrated (the
arrow from old header to new header). This write will be done by
calling Nursery::setForwardingPointer in the object moved hook.
Hence, old payload data pointers that we find on the stack can be updated as
newRoot = oldRoot[-1] + 1 where -1/+1 move backwards/forwards one word.
So now the game is to kludge up the IL case to make it look as much as possible
like the OOL case. In particular, the trick is to update the old data* to
make it be the address of (not the contents of) the new data* field.
After obj_move -- IL case
oldRoot
|
| Shape* | STV* | #e+p | data* | ...payload... | (old)
|
v newRoot
| |
| Shape* | STV* | #e+p | data* | ...payload... | (new)
| |
\-->-/
So we still have newRoot = oldRoot[-1] + 1. And so it is possible to update
payload pointers without having to know whether they are to IL or OOL blocks.
Comment 9•11 days ago
|
||
Commit message pending.
Updated•11 days ago
|
Updated•6 days ago
|
Updated•5 days ago
|
Comment 10•5 days ago
|
||
This patch allows the Nursery's forwarded-buffer machinery (the
direct-forwarding variant only) to forward a machine word that it does not
interpret.
In practice this is trivially done by having two new routines to set and get
the word. Existing forwarding code and uses thereof are unchanged. The new
routines do retain the assertion that the buffer that has been forwarded is
within the Nursery.
The use cases, WasmArrayObject::obj_moved and Instance::updateFrameForMovingGC,
which previously did such forwarding "manually", have been updated to use the
new routines.
Description
•