Consider adding batch allocation functions
Categories
(Core :: Memory Allocator, enhancement)
Tracking
()
People
(Reporter: pbone, Assigned: pbone)
References
(Blocks 4 open bugs)
Details
(Whiteboard: [sp3])
It'd be nice to be able to amortise some allocation costs when allocating batches of objects. To be able to ask Jemalloc "Please allocate 25 objects which are 32 bytes long each" and have it amortise lookup and locking costs.
Freeing objects in the same way could also be beneficial, we should consider what invariants we want such as:
- Do we allow objects that weren't batch allocated to be batch freed? Can we mix objects from different batches?
- Can objects be batch freed across arenas? Different sizes?
Bug 1843932 may have a motivating test case.
Updated•1 year ago
|
Updated•1 year ago
|
Assignee | ||
Comment 1•1 year ago
|
||
Continuing discussion from Bug 1845130 comment 8.
(In reply to Emilio Cobos Álvarez (:emilio) from comment #8)
I'd be happy to attend such a call of course.
we could make jemalloc free things quickly if we knew they all belonged to the same run, bin and arena. But would that restriction on the API be too severe that such a batch API isn't useful? So what rules are useful to a batch API that give us the right amount of optimisations?
That seems hard to guarantee from style code at least. Things are usually allocated from the worker threads, which have thread-local arenas, but we don't keep around which worker thread computed the style (and even so we might share some allocations across threads).
That sounds almost as if jemalloc had some kind of visibility into a "free/not free" status of objects and could process them all in a batch. Like mark bits in a GC. Blending normal allocators and some GC-ish features (mark bits, or refcounts) is pretty exciting.
Kinda? I was arguably thinking of something a bit more dumb (just record the free calls, do them all together potentially more efficiently, at some point later). How it is implemented I don't think I have a strong opinion on tho...
Based on this I think we can do this asynchronously.
For batched frees it sounds like we want no-constraints on how they were allocatoed, they can come from different arenas, and be of different allocation classes and sizes. What jemalloc can do is sort by address which will allow it to amortise some lookups/locks. It'll know that if a sequence of addresses are in the same chunk they they belong to the same arena, and if they're in the same "run" then we only need to lookup the run/bin information once. Something like:
free_batch(void **aAddressArray, size_t aNumAddresses);
The array is allocated by the caller and may be modified by the callee. When the call terminates every non-null pointer within the array is freed.
We could do batched allocation too but in that case I would say all the resulting allocations belong to the same arena and are the same size.
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 2•1 year ago
|
||
I started working on a patch that added the batching code to servo's ARC implementation. That is the code that collects the pointers to free before giving them to mozjemalloc. It could would work that way however I've changed my mind. It's better to add that to mozjemalloc for two reasons, it's simpler and can capture other pointers from servo that aren't managed by Arc (I don't know, if servo has something like C++'s unique_ptr
then it'd catch that too). Second, it can be used from other places in firefox that free things in batches like the cycle collector.
I'll continue work on it but with this new plan.
Comment 3•1 year ago
|
||
For comparison libpas (WebKit's malloc) uses a fixed size thread local deallocation log.
When freeing the entries in the deallocation log it will only release the lock if the next needed lock is not the next one. This avoids locking and unlocking the same lock over and over again. See process_deallocation_log_with_config.
Description
•