Last Comment Bug 414088 - Investigate FrameArena sizing
: Investigate FrameArena sizing
Status: RESOLVED DUPLICATE of bug 676457
: footprint, perf
Product: Core
Classification: Components
Component: Layout (show other bugs)
: Trunk
: All All
: P2 normal (vote)
: ---
Assigned To: Robin Bate Boerop
:
: Jet Villegas (:jet)
Mentors:
Depends on: 419131
Blocks:
  Show dependency treegraph
 
Reported: 2008-01-25 17:59 PST by Boris Zbarsky [:bz] (still a bit busy)
Modified: 2011-08-06 02:36 PDT (History)
16 users (show)
roc: blocking1.9-
roc: wanted1.9+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
wip (4.30 KB, patch)
2008-01-26 22:12 PST, Mats Palmgren (:mats)
no flags Details | Diff | Splinter Review
Draft patch, for reading only. (3.82 KB, patch)
2008-03-09 22:47 PDT, Robin Bate Boerop
no flags Details | Diff | Splinter Review
Reviewable patch, to be applied after patch from bug 419131 (5.19 KB, patch)
2008-03-11 16:00 PDT, Robin Bate Boerop
no flags Details | Diff | Splinter Review

Description Boris Zbarsky [:bz] (still a bit busy) 2008-01-25 17:59:11 PST
It looks like we pass 4096 as the size to PLArena in FrameArena.  But PLArena has some sort of allocation overhead, so this might not be the optimal size to use.
Comment 1 Mike Schroepfer 2008-01-26 09:16:19 PST
Worth answering one way or the other before ship
Comment 2 Mats Palmgren (:mats) 2008-01-26 22:09:39 PST
Yep, it leads to a PR_MALLOC of 4131 bytes.  The overhead varies though:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/nsprpub/lib/ds/plarena.c&rev=3.14&root=/cvsroot&mark=126-129,227#107
Comment 3 Mats Palmgren (:mats) 2008-01-26 22:12:01 PST
Created attachment 299524 [details] [diff] [review]
wip

This is a naive fix that would suit arena pool consumers that
already specifies a target size that is optimal for malloc and
that doesn't require that the arena actually has the requested size.
(I think some consumers requires that though, nsFixedSizeAllocator?)

With this patch the internal frame arena size will be 4061
which leads to a PR_MALLOC of 4096.
Comment 4 Robin Bate Boerop 2008-01-29 17:45:07 PST
The non-naive fix is to:
1. Arrange for PLArena to keep header data separate from the arena; and then
2. Run some tests to see what the best appropriate size for FrameArenas are.
3. Take out this fix and set the arena size of FrameArenas appropriately.

I am working on the first task, and can also do the latter two if someone wants to open bug for it.
Comment 5 Robert O'Callahan (:roc) (email my personal email if necessary) 2008-02-28 18:18:33 PST
We won't block on this, but it would be great if Robin or someone else can come up with some answers for 1.9.
Comment 6 Robin Bate Boerop 2008-02-29 08:43:39 PST
PLArena code is very similar to JSArena code.  One was a cut-and-paste of the other at one point, I expect.  There are two bugs for a problem similar to this one in the JSArena code: bug 408921 and bug 415967.  I can port whatever solutions are devised in those to PLArena code.  Assign this bug to me if you want me to do that, please.
Comment 7 Robert O'Callahan (:roc) (email my personal email if necessary) 2008-02-29 13:29:55 PST
With pleasure!
Comment 8 Robin Bate Boerop 2008-03-07 11:15:43 PST
Increasing priority from P4 to P2 to match wanted1.9+ flag.
Comment 9 Robin Bate Boerop 2008-03-09 20:40:26 PDT
The FrameArena has a simple histogram of memory allocations sizes.

Start-and-stop the browser test:
(4115,4608) 61

Tdhtml:
(4115,4608) 2212

The wasted space is 61*(4608-4115) = 30073 bytes, in the first case.  For Tdhtml, it is 2212*(4608-4115) = 1090516.  Yikes!  1MB!

The effect of this is made much less severe by my patch for bug 419131.  The above stats were collected without it.  Nonetheless, this problem begs a solution.  I intend to solve it the same way as I did bug 420476.  In other words, I prefer to modify NSPR for all pools, not merely FrameArena.

Patch coming soon....
Comment 10 Robin Bate Boerop 2008-03-09 20:43:16 PDT
FrameArena folks: does it make sense to use 4K as the arena size for the FrameArena pool when merely starting the browser requires 61 4K pages?  (Not meant to be a rhetorical question.)  Maybe your arena size should be larger?
Comment 11 Robert O'Callahan (:roc) (email my personal email if necessary) 2008-03-09 20:48:55 PDT
There's one per rendered document (we usually create 6 just starting Firefox) and we don't want each document to hit us too hard. I'm not sure how document sizes vary in real life. I have a feeling there are quite a few small ones (e.g. documents are framesets containing a few FRAMEs or IFRAMEs).
Comment 12 Robin Bate Boerop 2008-03-09 21:11:14 PDT
(In reply to comment #9)
> The wasted space is 61*(4608-4115) = 30073 bytes, in the first case.  For
> Tdhtml, it is 2212*(4608-4115) = 1090516.  Yikes!  1MB!

The above portion of my post is misleading.  The posted histograms were not high water marks for the FrameArena pool.  Please disregard the conclusion that my post leads you to infer; namely, that the FrameArena consumes 2212 * 4K during the Tdhtml test.  It does not necessarily.

What the numbers mean is more complicated, but who cares - this bug is about fixing the malloc's so that they are not requests for 4115 bytes, but rather requests for 4096 bytes.  I'm fixing that, now...
Comment 13 Robin Bate Boerop 2008-03-09 21:13:01 PDT
(In reply to comment #11)
> There's one per rendered document (we usually create 6 just starting Firefox)
> and we don't want each document to hit us too hard. I'm not sure how document
> sizes vary in real life. I have a feeling there are quite a few small ones
> (e.g. documents are framesets containing a few FRAMEs or IFRAMEs).

When you say there's "one per rendered document", do you mean "one FrameArena pool" or "one 4K allocation from the FrameArena pool"?
Comment 14 Robin Bate Boerop 2008-03-09 22:02:08 PDT
I implemented a solution similar to that found in bug 420476.  I did this in NSPR, so it applies to all allocators using PLArenaPools, not just to FrameArena.

Here are my results.  For the 'start and stop the browser test' without my patch:
numMallocs:    900
numReallocs:   0
numFrees:      895
hwmRequested:  830744
hwmReal:       977168

For the same test, with my patch:
numMallocs:    900
numReallocs:   0
numFrees:      894
hwmRequested:  820480
hwmReal:       849888

So, the number of calls to malloc is entirely unchanged, but the amount of real memory consumed at the high water mark is reduced by over 100K, a reduction of 15%.

Again, I give histograms like so:
(x,y) z
This means that there were z allocations requesting x bytes from malloc, which really reserved y bytes for each of those allocations.

Histogram for start-stop test without the patch:
(279,288) 305
(287,288) 1
(291,304) 20
(303,304) 21
(535,1024) 2
(547,1024) 1
(559,1024) 30
(599,1024) 1
(887,1024) 16
(1043,1536) 37
(1047,1536) 83
(1095,1536) 3
(2067,2560) 50
(2071,2560) 21
(2083,2560) 14
(2595,3072) 4
(3619,4096) 164
(4115,4608) 66
(4871,5120) 41
(8211,8704) 5
(8215,8704) 15

With the patch:
(256,256) 313
(272,272) 20
(288,288) 21
(512,512) 2
(528,1024) 1
(544,1024) 30
(592,1024) 1
(864,1024) 16
(1024,1024) 114
(1072,1536) 3
(2048,2048) 70
(2064,2560) 14
(2576,3072) 4
(3600,4096) 164
(4096,4096) 66
(4848,5120) 41
(8192,8192) 20
Comment 15 Robin Bate Boerop 2008-03-09 22:11:01 PDT
With the malloc_usable_size functionality from bug 419131 activated, the numbers for the start-stop test are:

With patch (with 419131):
numMallocs:    888
numReallocs:   0
numFrees:      882
hwmRequested:  805104
hwmReal:       825824

Without patch (with 419131):
numMallocs:    797
numReallocs:   0
numFrees:      793
hwmRequested:  716884
hwmReal:       831504

So, the smaller arenas are increasing the number of calls to malloc, but the high water mark of real memory use is slightly less with both patches applied.

This means that investigation into the use of larger arena sizes is advised.
Comment 16 Robin Bate Boerop 2008-03-09 22:39:19 PDT
The cumulative effects of the patch from this bug and the patch from bug 419131 are favourable on Tdhtml memory use:

Before either patch:
numMallocs:    35090
numReallocs:   0
numFrees:      35085
hwmRequested:  2145505
hwmReal:       2421520

After both:
numMallocs:    34519
numReallocs:   0
numFrees:      34513
hwmRequested:  2047232
hwmReal:       2137824

Real memory savings at the high water mark of memory use: 13%.
Comment 17 Robin Bate Boerop 2008-03-09 22:47:05 PDT
Created attachment 308378 [details] [diff] [review]
Draft patch, for reading only.

Please comment on the patch.

Can someone suggest a reviewer for this patch?  The code changes are all in NSPR.
Comment 18 Robin Bate Boerop 2008-03-11 16:00:53 PDT
Created attachment 308744 [details] [diff] [review]
Reviewable patch, to be applied after patch from bug 419131

This patch cannot be applied as-is, because it depends on the patch from bug 419131.  However, all of the changes that I want to make are visible in this attachment.
Comment 19 Robin Bate Boerop 2008-03-12 16:49:29 PDT
Bug 166701 is causing problems with the FrameArena's arena sizes.

Basically, the PLArena code maintains a global freelist of unused arenas.  The arenas on that list are not all of the same size.  The algorithm which selects an arena from the freelist only considers whether the current allocation will fit in the free arena, not whether the PLArenaPool's arena size matches the size of the free arena.  So, some pool's steal free arenas that are of the "wrong" size.  This is potentially VERY bad, and could result in enormous fragmentation and wastage.

So, I investigated.  I found that while the potential for some serious overconsumption of memory is there, things are not that bad.  That's for a current build of Firefox.

At the high water mark of memory use during a 'start and stop the browser test', there were 4 PLArenaPools named "FrameArena".  One of them looked like this:
            arenaSize: 4096
            arenas:
               "32d1000" (4115,4608)
               "344ba00" (1043,1536)
               "344b400" (1043,1536)
               "343e000" (1043,1536)
               "343f000" (4115,4608)
               "3422200" (4115,4608)
               "3421000" (4115,4608)
               "3397600" (1043,1536)
               "33c2400" (4115,4608)
               "331d800" (4115,4608)
               "3322c00" (4115,4608)
               "32dbe00" (1043,1536)
               "32ae400" (4115,4608)
               ... a few dozen more arenas ...

One can see that a bunch of arenas of size (1043,1536) have snuck into the arena pool which is supposed to have arenas of size 4096.

The magnitude of this problem is not severe, but be aware that FrameArena and others are at risk of appearing to overconsume.  Fragmentation is also a problem.

I hope to quell worry with this post, but also want to apply some pressure on bug 166701.
Comment 20 Wan-Teh Chang 2008-03-13 13:50:14 PDT
Comment on attachment 308744 [details] [diff] [review]
Reviewable patch, to be applied after patch from bug 419131

The idea of this patch isn't clear.  Could you explain it?  For example,
why is MALLOC_MIN_ALIGNMENT a minimum alignment when you
allow 'align' to be <= MALLOC_MIN_ALIGNMENT?
Comment 21 Robin Bate Boerop 2008-03-13 13:56:40 PDT
MALLOC_MIN_ALIGNMENT is the minimum alignment that we expect from malloc.  malloc will always give at least that alignment on all platforms, for the size of allocation that we make.  Assertions are there to confirm that.

MALLOC_MIN_ALIGNMENT could perhaps have a name which better conveys that.  Suggestions?

The idea of the patch is to take advantage of the fact that the memory returned by malloc is always aligned to MALLOC_MIN_ALIGNMENT bytes, and then base the remaining alignment math on that.
Comment 22 Jason Evans 2008-03-13 14:52:25 PDT
If you need alignment guarantees, use posix_memalign() rather than malloc().  As written, FlexibleMalloc() will fail for requestedSize <= 4 when jemalloc is enabled.
Comment 23 Robin Bate Boerop 2008-03-13 18:06:39 PDT
(In reply to comment #22)
> If you need alignment guarantees, use posix_memalign() rather than malloc(). 
> As written, FlexibleMalloc() will fail for requestedSize <= 4 when jemalloc is
> enabled.

Jason, thanks for the suggestion.  The MALLOC_MIN_ALIGNMENT stuff is not so much asking for alignment as it is there to show that the alignment that is already provided by malloc suffices.  The alignment code that was already there was not taking advantage of that, and was over-allocating by asking for too much.

posix_memalign could allow us to provide greater guarantees of alignment, but there are no callers who require it.

requestedSize will never be less than MALLOC_MIN_ALIGNMENT.  The assertion that says:
+    PR_ASSERT(pool->arenasize >= MALLOC_MIN_ALIGNMENT);
reminds us of that.  (Arenas are never smaller than arenasize.)  Though, I could easily add another assertion, closer to the malloc.  I think that that would be useful.

Thanks!
Comment 24 Robin Bate Boerop 2008-03-13 21:42:05 PDT
Comment on attachment 308744 [details] [diff] [review]
Reviewable patch, to be applied after patch from bug 419131

Canceling review until I find a better solution.
Comment 25 Robin Bate Boerop 2008-03-13 21:43:15 PDT
(In reply to comment #23)
> (In reply to comment #22)
> posix_memalign could allow us to provide greater guarantees of alignment, but
> there are no callers who require it.

This may be untrue.  NSPR has not only mozilla code as its user.  We don't know if all of the callers do not require alignment larger than MALLOC_MIN_ALIGNMENT.  So, my solution may not be feasible, in part because NSPR may not easily support posix_memalign.

I may devise another solution, outside of NSPR, for this bug.
Comment 26 Nicholas Nethercote [:njn] 2011-08-04 18:37:47 PDT
I'm dup'ing this forward to bug 676457 to avoid this bug's baggage.

*** This bug has been marked as a duplicate of bug 676457 ***

Note You need to log in before you can comment on or make changes to this bug.