The chunk allocator currently traps low-level usage of the OS allocation functions behind an interface without enough knobs on Windows. We would be far better served by just hyper-optimizing the 3 relevant allocators directly. In windows this would allow us to use MEM_RESERVE in the first 2 allocation passes and only use MEM_COMMIT in the last allocation where we get our final 1MiB chunk. This should be somewhat faster and might serve to alleviate some of the thread contention that is there currently.
We can do the same thing in jemalloc (the chunk allocation code is the same in both js and jemalloc).
(In reply to Justin Lebar [:jlebar] from comment #1) > We can do the same thing in jemalloc (the chunk allocation code is the same > in both js and jemalloc). There are two complications related to the race here. The first is that this will race with any allocation in the system using the same path, even if it's behind the jemalloc spinlock (which we don't take anymore for performance reasons). The second is that jemalloc only uses this path for large (>1MiB) allocations that are aligned, and aligned on a > chunk size boundary. So we should, in practice, not be contending much, even with multiple threads doing normal malloc's like crazy in the background. Thus: while gross, this isn't really a problem for the JS engine, so much as it is for jemalloc, but only in rare cases (which is probably why it has not seen much love). What we have is clearly suboptimal, but it appears to mostly work okay for the moment. I think we should leave this alone for now, unless we have some specific test case or workload where this is killing us.
I'm not sure I follow, but are you saying that we rarely hit the case where we allocate 2mb and then re-allocate 1mb within it? This should be easy to test, no?
...I have no idea how much less expensive a MEM_RESERVE is than a MEM_COMMIT. If the difference is small, then there's not much room for improvement here.