Closed Bug 414088 Opened 12 years ago Closed 9 years ago

Investigate FrameArena sizing

Categories

(Core :: Layout, defect, P2)

defect

Tracking

()

RESOLVED DUPLICATE of bug 676457

People

(Reporter: bzbarsky, Assigned: moz)

References

(Depends on 1 open bug)

Details

(Keywords: memory-footprint, perf)

Attachments

(1 file, 2 obsolete files)

It looks like we pass 4096 as the size to PLArena in FrameArena.  But PLArena has some sort of allocation overhead, so this might not be the optimal size to use.
Flags: blocking1.9?
Worth answering one way or the other before ship
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
Attached patch wip (obsolete) — Splinter Review
This is a naive fix that would suit arena pool consumers that
already specifies a target size that is optimal for malloc and
that doesn't require that the arena actually has the requested size.
(I think some consumers requires that though, nsFixedSizeAllocator?)

With this patch the internal frame arena size will be 4061
which leads to a PR_MALLOC of 4096.
The non-naive fix is to:
1. Arrange for PLArena to keep header data separate from the arena; and then
2. Run some tests to see what the best appropriate size for FrameArenas are.
3. Take out this fix and set the arena size of FrameArenas appropriately.

I am working on the first task, and can also do the latter two if someone wants to open bug for it.
We won't block on this, but it would be great if Robin or someone else can come up with some answers for 1.9.
Flags: wanted1.9+
Flags: blocking1.9-
PLArena code is very similar to JSArena code.  One was a cut-and-paste of the other at one point, I expect.  There are two bugs for a problem similar to this one in the JSArena code: bug 408921 and bug 415967.  I can port whatever solutions are devised in those to PLArena code.  Assign this bug to me if you want me to do that, please.
With pleasure!
Assignee: nobody → moz
Status: NEW → ASSIGNED
Flags: tracking1.9+
Increasing priority from P4 to P2 to match wanted1.9+ flag.
Priority: P4 → P2
The FrameArena has a simple histogram of memory allocations sizes.

Start-and-stop the browser test:
(4115,4608) 61

Tdhtml:
(4115,4608) 2212

The wasted space is 61*(4608-4115) = 30073 bytes, in the first case.  For Tdhtml, it is 2212*(4608-4115) = 1090516.  Yikes!  1MB!

The effect of this is made much less severe by my patch for bug 419131.  The above stats were collected without it.  Nonetheless, this problem begs a solution.  I intend to solve it the same way as I did bug 420476.  In other words, I prefer to modify NSPR for all pools, not merely FrameArena.

Patch coming soon....
FrameArena folks: does it make sense to use 4K as the arena size for the FrameArena pool when merely starting the browser requires 61 4K pages?  (Not meant to be a rhetorical question.)  Maybe your arena size should be larger?
There's one per rendered document (we usually create 6 just starting Firefox) and we don't want each document to hit us too hard. I'm not sure how document sizes vary in real life. I have a feeling there are quite a few small ones (e.g. documents are framesets containing a few FRAMEs or IFRAMEs).
(In reply to comment #9)
> The wasted space is 61*(4608-4115) = 30073 bytes, in the first case.  For
> Tdhtml, it is 2212*(4608-4115) = 1090516.  Yikes!  1MB!

The above portion of my post is misleading.  The posted histograms were not high water marks for the FrameArena pool.  Please disregard the conclusion that my post leads you to infer; namely, that the FrameArena consumes 2212 * 4K during the Tdhtml test.  It does not necessarily.

What the numbers mean is more complicated, but who cares - this bug is about fixing the malloc's so that they are not requests for 4115 bytes, but rather requests for 4096 bytes.  I'm fixing that, now...
(In reply to comment #11)
> There's one per rendered document (we usually create 6 just starting Firefox)
> and we don't want each document to hit us too hard. I'm not sure how document
> sizes vary in real life. I have a feeling there are quite a few small ones
> (e.g. documents are framesets containing a few FRAMEs or IFRAMEs).

When you say there's "one per rendered document", do you mean "one FrameArena pool" or "one 4K allocation from the FrameArena pool"?
I implemented a solution similar to that found in bug 420476.  I did this in NSPR, so it applies to all allocators using PLArenaPools, not just to FrameArena.

Here are my results.  For the 'start and stop the browser test' without my patch:
numMallocs:    900
numReallocs:   0
numFrees:      895
hwmRequested:  830744
hwmReal:       977168

For the same test, with my patch:
numMallocs:    900
numReallocs:   0
numFrees:      894
hwmRequested:  820480
hwmReal:       849888

So, the number of calls to malloc is entirely unchanged, but the amount of real memory consumed at the high water mark is reduced by over 100K, a reduction of 15%.

Again, I give histograms like so:
(x,y) z
This means that there were z allocations requesting x bytes from malloc, which really reserved y bytes for each of those allocations.

Histogram for start-stop test without the patch:
(279,288) 305
(287,288) 1
(291,304) 20
(303,304) 21
(535,1024) 2
(547,1024) 1
(559,1024) 30
(599,1024) 1
(887,1024) 16
(1043,1536) 37
(1047,1536) 83
(1095,1536) 3
(2067,2560) 50
(2071,2560) 21
(2083,2560) 14
(2595,3072) 4
(3619,4096) 164
(4115,4608) 66
(4871,5120) 41
(8211,8704) 5
(8215,8704) 15

With the patch:
(256,256) 313
(272,272) 20
(288,288) 21
(512,512) 2
(528,1024) 1
(544,1024) 30
(592,1024) 1
(864,1024) 16
(1024,1024) 114
(1072,1536) 3
(2048,2048) 70
(2064,2560) 14
(2576,3072) 4
(3600,4096) 164
(4096,4096) 66
(4848,5120) 41
(8192,8192) 20
With the malloc_usable_size functionality from bug 419131 activated, the numbers for the start-stop test are:

With patch (with 419131):
numMallocs:    888
numReallocs:   0
numFrees:      882
hwmRequested:  805104
hwmReal:       825824

Without patch (with 419131):
numMallocs:    797
numReallocs:   0
numFrees:      793
hwmRequested:  716884
hwmReal:       831504

So, the smaller arenas are increasing the number of calls to malloc, but the high water mark of real memory use is slightly less with both patches applied.

This means that investigation into the use of larger arena sizes is advised.
The cumulative effects of the patch from this bug and the patch from bug 419131 are favourable on Tdhtml memory use:

Before either patch:
numMallocs:    35090
numReallocs:   0
numFrees:      35085
hwmRequested:  2145505
hwmReal:       2421520

After both:
numMallocs:    34519
numReallocs:   0
numFrees:      34513
hwmRequested:  2047232
hwmReal:       2137824

Real memory savings at the high water mark of memory use: 13%.
Attached patch Draft patch, for reading only. (obsolete) — Splinter Review
Please comment on the patch.

Can someone suggest a reviewer for this patch?  The code changes are all in NSPR.
Attachment #299524 - Attachment is obsolete: true
Attachment #308378 - Flags: review?
Blocks: 422245
This patch cannot be applied as-is, because it depends on the patch from bug 419131.  However, all of the changes that I want to make are visible in this attachment.
Attachment #308378 - Attachment is obsolete: true
Attachment #308744 - Flags: review?(wtc)
Attachment #308378 - Flags: review?
Depends on: 419131
OS: Linux → All
Hardware: PC → All
Bug 166701 is causing problems with the FrameArena's arena sizes.

Basically, the PLArena code maintains a global freelist of unused arenas.  The arenas on that list are not all of the same size.  The algorithm which selects an arena from the freelist only considers whether the current allocation will fit in the free arena, not whether the PLArenaPool's arena size matches the size of the free arena.  So, some pool's steal free arenas that are of the "wrong" size.  This is potentially VERY bad, and could result in enormous fragmentation and wastage.

So, I investigated.  I found that while the potential for some serious overconsumption of memory is there, things are not that bad.  That's for a current build of Firefox.

At the high water mark of memory use during a 'start and stop the browser test', there were 4 PLArenaPools named "FrameArena".  One of them looked like this:
            arenaSize: 4096
            arenas:
               "32d1000" (4115,4608)
               "344ba00" (1043,1536)
               "344b400" (1043,1536)
               "343e000" (1043,1536)
               "343f000" (4115,4608)
               "3422200" (4115,4608)
               "3421000" (4115,4608)
               "3397600" (1043,1536)
               "33c2400" (4115,4608)
               "331d800" (4115,4608)
               "3322c00" (4115,4608)
               "32dbe00" (1043,1536)
               "32ae400" (4115,4608)
               ... a few dozen more arenas ...

One can see that a bunch of arenas of size (1043,1536) have snuck into the arena pool which is supposed to have arenas of size 4096.

The magnitude of this problem is not severe, but be aware that FrameArena and others are at risk of appearing to overconsume.  Fragmentation is also a problem.

I hope to quell worry with this post, but also want to apply some pressure on bug 166701.
No longer blocks: 422245
Comment on attachment 308744 [details] [diff] [review]
Reviewable patch, to be applied after patch from bug 419131

The idea of this patch isn't clear.  Could you explain it?  For example,
why is MALLOC_MIN_ALIGNMENT a minimum alignment when you
allow 'align' to be <= MALLOC_MIN_ALIGNMENT?
MALLOC_MIN_ALIGNMENT is the minimum alignment that we expect from malloc.  malloc will always give at least that alignment on all platforms, for the size of allocation that we make.  Assertions are there to confirm that.

MALLOC_MIN_ALIGNMENT could perhaps have a name which better conveys that.  Suggestions?

The idea of the patch is to take advantage of the fact that the memory returned by malloc is always aligned to MALLOC_MIN_ALIGNMENT bytes, and then base the remaining alignment math on that.
If you need alignment guarantees, use posix_memalign() rather than malloc().  As written, FlexibleMalloc() will fail for requestedSize <= 4 when jemalloc is enabled.
(In reply to comment #22)
> If you need alignment guarantees, use posix_memalign() rather than malloc(). 
> As written, FlexibleMalloc() will fail for requestedSize <= 4 when jemalloc is
> enabled.

Jason, thanks for the suggestion.  The MALLOC_MIN_ALIGNMENT stuff is not so much asking for alignment as it is there to show that the alignment that is already provided by malloc suffices.  The alignment code that was already there was not taking advantage of that, and was over-allocating by asking for too much.

posix_memalign could allow us to provide greater guarantees of alignment, but there are no callers who require it.

requestedSize will never be less than MALLOC_MIN_ALIGNMENT.  The assertion that says:
+    PR_ASSERT(pool->arenasize >= MALLOC_MIN_ALIGNMENT);
reminds us of that.  (Arenas are never smaller than arenasize.)  Though, I could easily add another assertion, closer to the malloc.  I think that that would be useful.

Thanks!
Comment on attachment 308744 [details] [diff] [review]
Reviewable patch, to be applied after patch from bug 419131

Canceling review until I find a better solution.
Attachment #308744 - Flags: review?(wtc)
(In reply to comment #23)
> (In reply to comment #22)
> posix_memalign could allow us to provide greater guarantees of alignment, but
> there are no callers who require it.

This may be untrue.  NSPR has not only mozilla code as its user.  We don't know if all of the callers do not require alignment larger than MALLOC_MIN_ALIGNMENT.  So, my solution may not be feasible, in part because NSPR may not easily support posix_memalign.

I may devise another solution, outside of NSPR, for this bug.
I'm dup'ing this forward to bug 676457 to avoid this bug's baggage.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 676457
You need to log in before you can comment on or make changes to this bug.