Closed Bug 616310 Opened 14 years ago Closed 14 years ago

JM: reduce fragmentation in ExecutableAllocator

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta9+

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 2 obsolete files)

Bug 615199 comment 36 indicates that JM's code allocation has a reasonable amount of fragmentation, ie. in one example 74.9MB of code took up 88.4MB of space. And that's on Linux, the situation is probably worse on Windows because the minimum allocation size on Windows is larger. I have some ideas to improve this.
In what follows, I'll talk about two sizes of things: chunks and pages. - Linux: chunk = 16KB, page = 4KB - Windows: chunk = 64KB, page = 64KB (effectively, because VirtualAlloc always 64KB-aligns, apparently) Sources of waste: - "lwaste": When we do a large allocation (> chunk size) we round the size up to the nearest page size. - When we do a non-large allocation which doesn't fit in the small pool, we allocate a new chunk-sized pool. - "swaste": If the space left in the pool is greater than that left in the small pool, we replace the existing small pool with the new pool. This wastes the space in the existing small pool. - "nwaste": Otherwise, the excess space in the new pool is wasted. - "owaste": if a small pool is only half-filled and we stop allocating, there's wasted space at the end. This is the least important, because it could be used later on if we start allocating again. Possible changes: - Keeping N small pools around instead of 1 will help with "swaste" and "nwaste". - Changing the chunk and page size can help with "lwaste". Both of these potentially increase "owaste". I implemented N small pools and did some measurements, outlined below. My conclusion is that we should use 4 small pools and increase the chunk size to 64KB on all platforms. On Linux that reduces the over-allocation factor from 1.12x to 1.03x. On Windows it reduces from 1.19x to 1.09x. In both cases it saves ~400KB per tab. It's a shame the windows "page" size is so big, that hurts some. One thing I haven't managed to do is see changes using 'top'. Not sure why, though there is quite some variability -- subsequent runs often vary by 10s of MBs. (All measurements are on Linux64, opening 40 tabs, each containing a random cad-comic.com comic.) ----------------------------------------------------------------------------- Linux current (chunk = 16KB, page = 4KB) ----------------------------------------------------------------------------- nPools=1: - askedFor: 183,232,046 - alloc'd: 205,996,032 (1.12x) - lwaste: 5,683,824 - swaste: 12,842,208 - nwaste: 2,889,560 - owaste: 1,348,394 nPools=4: - askedFor: 183,471,893 - alloc'd: 193,081,344 (1.05x) - lwaste: 5,653,303 - swaste: 2,406,968 - nwaste: 195,002 - owaste: 1,354,178 nPools=16: - askedFor: 183,827,960 - alloc'd: 191,299,584 (1.04x) - lwaste: 5,650,322 - swaste: 398,320 - nwaste: 2,286 - owaste: 1,420,696 ----------------------------------------------------------------------------- Windows current (chunk = 64KB, page = 64KB) ----------------------------------------------------------------------------- nPools=1: - askedFor: 183,663,492 - alloc'd: 218,955,776 (1.19x) - lwaste: 12,207,391 - swaste: 13,741,528 - nwaste: 5,471,857 - owaste: 3,871,508 nPools=4: - askedFor: 184,612,482 (1.09x) - alloc'd: 201,719,808 - lwaste: 12,207,391 - swaste: 1,128,904 - nwaste: 0 - owaste: 3,771,031 nPools=16: - askedFor: 183,506,524 - alloc'd: 199,688,192 (1.09x) - lwaste: 12,207,391 - swaste: 44,584 - nwaste: 0 - owaste: 3,929,693 ----------------------------------------------------------------------------- Linux improved (chunk = 64KB, page = 4KB) ----------------------------------------------------------------------------- nPools=4: - askedFor: 183,216,900 - alloc'd: 188,682,240 (1.03x) - lwaste: 677,151 - swaste: 1,014,128 - nwaste: 0 - owaste: 3,774,061
Attachment #494946 - Flags: review?(jseward)
Looks good. One comment: + // Try to fit in an existing small allocator + for (size_t i = 0; i < m_smallAllocationPools.length(); i++) { + ExecutablePool *pool = m_smallAllocationPools[i]; + if (n < pool->available()) { + pool->addRef(); + return pool; + } + } Here you've got a first-fit strategy. A bit of casual web surfing suggests that worst-fit (check all chunks, use the one with the largest free space) results in less fragmentation overall. Did you try that?
(In reply to comment #2) > > Here you've got a first-fit strategy. A bit of casual web surfing > suggests that worst-fit (check all chunks, use the one with the > largest free space) results in less fragmentation overall. > Did you try that? No, good suggestion. Thinking about it now, I would have thought best-fit (use the one with the least free space) would work better, the idea being that big allocations cause the most problems, so catering for them gives the best results. This situation is a little different to the usual situation because it's a pool and so memory isn't being freed again... I'll try both.
Best-fit is the easy winner: first-fit - askedFor: 183,975,061 - alloc'd: 189,534,208 (1.03x) - lwaste: 644,671 - swaste: 1,027,288 - nwaste: 0 - owaste: 3,887,188 best-fit - askedFor: 183,942,189 - alloc'd: 188,354,560 (1.02x) - lwaste: 644,671 - swaste: 30,992 - nwaste: 0 - owaste: 3,736,708 worst-fit - askedFor: 183,838,374 - alloc'd: 196,546,560 (1.07x) - lwaste: 644,671 - swaste: 7,972,376 - nwaste: 0 - owaste: 4,091,139 This doesn't surprise me. In this allocator, we waste the most space when we throw away a small pool that still has lots of space left in it, which happens when we do a moderately large allocation. By using best-fit, we find the smallest hole that'll fit our required size, leaving bigger holes open for later. Whereas in a normal allocator, unallocated space is never discarded, so small holes become a problem, so worst-fit works well because it avoids creating small holes.
The same patch, this time using best-fit to choose a small pool.
Attachment #494946 - Attachment is obsolete: true
Attachment #495444 - Flags: review?(jseward)
Attachment #494946 - Flags: review?(jseward)
This patch has a more concise formulation of the best-fit search. It also changes |n < pool->available()| to |n <= pool->available()| in the best-fit search, which seems reasonable.
Attachment #495444 - Attachment is obsolete: true
Attachment #495446 - Flags: review?(jseward)
Attachment #495444 - Flags: review?(jseward)
Attachment #495446 - Flags: review?(jseward) → review+
blocking2.0: --- → beta9+
Whiteboard: fixed-in-tracemonkey
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: