Closed Bug 616310 Opened 9 years ago Closed 9 years ago

JM: reduce fragmentation in ExecutableAllocator

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta9+

People

(Reporter: njn, Assigned: njn)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 2 obsolete files)

Bug 615199 comment 36 indicates that JM's code allocation has a reasonable amount of fragmentation, ie. in one example 74.9MB of code took up 88.4MB of space.  And that's on Linux, the situation is probably worse on Windows because the minimum allocation size on Windows is larger.

I have some ideas to improve this.
In what follows, I'll talk about two sizes of things: chunks and pages.
- Linux: chunk = 16KB, page = 4KB
- Windows: chunk = 64KB, page = 64KB (effectively, because VirtualAlloc
  always 64KB-aligns, apparently)

Sources of waste:
- "lwaste":  When we do a large allocation (> chunk size) we round the size
  up to the nearest page size.

- When we do a non-large allocation which doesn't fit in the
  small pool, we allocate a new chunk-sized pool.  

  - "swaste":  If the space left in the pool is greater than that left in
    the small pool, we replace the existing small pool with the new pool.
    This wastes the space in the existing small pool.

  - "nwaste":  Otherwise, the excess space in the new pool is wasted.

- "owaste": if a small pool is only half-filled and we stop allocating,
  there's wasted space at the end.  This is the least important, because
  it could be used later on if we start allocating again.
  
Possible changes:
- Keeping N small pools around instead of 1 will help with "swaste" and
  "nwaste".

- Changing the chunk and page size can help with "lwaste".

Both of these potentially increase "owaste".

I implemented N small pools and did some measurements, outlined below.  My
conclusion is that we should use 4 small pools and increase the chunk size
to 64KB on all platforms.  On Linux that reduces the over-allocation factor
from 1.12x to 1.03x.  On Windows it reduces from 1.19x to 1.09x.  In both
cases it saves ~400KB per tab.  It's a shame the windows "page" size is so
big, that hurts some.

One thing I haven't managed to do is see changes using 'top'.  Not sure why, though there is quite some variability -- subsequent runs often vary by 10s of MBs.

(All measurements are on Linux64, opening 40 tabs, each containing a random
cad-comic.com comic.)

-----------------------------------------------------------------------------
Linux current (chunk = 16KB, page = 4KB)
-----------------------------------------------------------------------------
nPools=1:
- askedFor: 183,232,046
- alloc'd:  205,996,032 (1.12x)
- lwaste:     5,683,824
- swaste:    12,842,208
- nwaste:     2,889,560
- owaste:     1,348,394

nPools=4:
- askedFor: 183,471,893
- alloc'd:  193,081,344 (1.05x)
- lwaste:     5,653,303
- swaste:     2,406,968
- nwaste:       195,002
- owaste:     1,354,178

nPools=16:
- askedFor: 183,827,960
- alloc'd:  191,299,584 (1.04x)
- lwaste:     5,650,322
- swaste:       398,320
- nwaste:         2,286
- owaste:     1,420,696

-----------------------------------------------------------------------------
Windows current (chunk = 64KB, page = 64KB)
-----------------------------------------------------------------------------
nPools=1:
- askedFor: 183,663,492
- alloc'd:  218,955,776 (1.19x)
- lwaste:    12,207,391
- swaste:    13,741,528
- nwaste:     5,471,857
- owaste:     3,871,508

nPools=4:
- askedFor: 184,612,482 (1.09x)
- alloc'd:  201,719,808
- lwaste:    12,207,391
- swaste:     1,128,904
- nwaste:             0
- owaste:     3,771,031

nPools=16:
- askedFor: 183,506,524
- alloc'd:  199,688,192 (1.09x)
- lwaste:    12,207,391
- swaste:        44,584
- nwaste:             0
- owaste:     3,929,693

-----------------------------------------------------------------------------
Linux improved (chunk = 64KB, page = 4KB)
-----------------------------------------------------------------------------
nPools=4:
- askedFor: 183,216,900
- alloc'd:  188,682,240 (1.03x)
- lwaste:       677,151
- swaste:     1,014,128
- nwaste:             0
- owaste:     3,774,061
Attachment #494946 - Flags: review?(jseward)
Looks good.  One comment:

+        // Try to fit in an existing small allocator
+        for (size_t i = 0; i < m_smallAllocationPools.length(); i++) {
+            ExecutablePool *pool = m_smallAllocationPools[i];
+            if (n < pool->available()) {
+                pool->addRef();
+                return pool;
+            }
+        }

Here you've got a first-fit strategy.  A bit of casual web surfing
suggests that worst-fit (check all chunks, use the one with the
largest free space) results in less fragmentation overall.  Did you
try that?
(In reply to comment #2)
> 
> Here you've got a first-fit strategy.  A bit of casual web surfing
> suggests that worst-fit (check all chunks, use the one with the
> largest free space) results in less fragmentation overall.
> Did you try that?

No, good suggestion.  Thinking about it now, I would have thought best-fit (use the one with the least free space) would work better, the idea being that big allocations cause the most problems, so catering for them gives the best results.  This situation is a little different to the usual situation because it's a pool and so memory isn't being freed again... I'll try both.
Best-fit is the easy winner:

first-fit
- askedFor: 183,975,061
- alloc'd:  189,534,208 (1.03x)
- lwaste:       644,671
- swaste:     1,027,288
- nwaste:             0
- owaste:     3,887,188

best-fit
- askedFor: 183,942,189
- alloc'd:  188,354,560 (1.02x)
- lwaste:       644,671
- swaste:        30,992
- nwaste:             0
- owaste:     3,736,708

worst-fit
- askedFor: 183,838,374
- alloc'd:  196,546,560 (1.07x)
- lwaste:       644,671
- swaste:     7,972,376
- nwaste:             0
- owaste:     4,091,139

This doesn't surprise me.  In this allocator, we waste the most space when we throw away a small pool that still has lots of space left in it, which happens when we do a moderately large allocation.  By using best-fit, we find the smallest hole that'll fit our required size, leaving bigger holes open for later.

Whereas in a normal allocator, unallocated space is never discarded, so small holes become a problem, so worst-fit works well because it avoids creating small holes.
The same patch, this time using best-fit to choose a small pool.
Attachment #494946 - Attachment is obsolete: true
Attachment #495444 - Flags: review?(jseward)
Attachment #494946 - Flags: review?(jseward)
This patch has a more concise formulation of the best-fit search.  It also changes |n < pool->available()| to |n <= pool->available()| in the best-fit search, which seems reasonable.
Attachment #495444 - Attachment is obsolete: true
Attachment #495446 - Flags: review?(jseward)
Attachment #495444 - Flags: review?(jseward)
Attachment #495446 - Flags: review?(jseward) → review+
blocking2.0: --- → beta9+
http://hg.mozilla.org/tracemonkey/rev/8921e3faccd2
Whiteboard: fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/8921e3faccd2
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.