Consider speeding up post barriers by skipping same-chunk edges
Categories
(Core :: JavaScript: GC, enhancement, P3)
Tracking
()
People
(Reporter: sfink, Unassigned)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
We could skip recording store buffer edges if src xor dst < ChunkSize, because those would be intra-chunk edges and so cannot go from a tenured chunk to a nursery chunk. It requires no memory dereferences, though it does add a few instructions to cases where you do need to insert into the store buffer, as well as cross-chunk cases where you end up deciding not to insert.
(Note that if we controlled virtual memory addresses such that the nursery came earlier in memory than the tenured heap, we could instead skip edges where src <= dst or (src & ChunkMask) < (dst & ChunkMask) which would remove even more unnecessary dereferences, especially if we arranged to allocate tenured chunks in decreasing address order so that the oldest objects were at the highest addresses.)
Comment 1•4 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #0)
Note that if we controlled virtual memory addresses such that the nursery came earlier in memory than the tenured heap, we could instead skip edges where
src <= dstor(src & ChunkMask) < (dst & ChunkMask)which would remove even more unnecessary dereferences,
I like this a lot. This shouldn't be too hard on 64-bit platforms with plenty of address space? For pointer compression it would also be necessary to better control where (certain) GC chunks are allocated...
Comment 2•4 years ago
|
||
Note that if we controlled virtual memory addresses
How much control do we have over virtual address? Can we encode nursery/tenured heap in a bit of the address?
Comment 3•4 years ago
|
||
Maybe could do something similar on 64-bit to what we do for JIT code: reserve a large region of GC memory, then allocate chunks from a region within that. This would also pave the way for pointer compression.
Comment 4•4 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #3)
If we reserve an aligned 8GB region per runtime we can allocate nursery chunks from the bottom half and tenured chunks from the top. Then bit 32 of the address will tell us whether a chunk is in the nursery or not.
For pointer compression we'd put the nursery chunks at the top of the bottom half and halfway point - max nursery size would be our base address.
Updated•4 years ago
|
Updated•4 years ago
|
Description
•