Last Comment Bug 676457 - Avoid wasted space in PLArenaPools due to jemalloc rounding up
: Avoid wasted space in PLArenaPools due to jemalloc rounding up
Status: RESOLVED FIXED
[MemShrink:P2][clownshoes]
:
Product: NSPR
Classification: Components
Component: NSPR (show other bugs)
: other
: All All
: P1 normal with 2 votes (vote)
: 4.9
Assigned To: Nicholas Nethercote [:njn] (on vacation until July 11)
:
Mentors:
: 414088 (view as bug list)
Depends on:
Blocks: DarkMatter
  Show dependency treegraph
 
Reported: 2011-08-03 21:08 PDT by Nicholas Nethercote [:njn] (on vacation until July 11)
Modified: 2011-08-20 06:22 PDT (History)
21 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
instrumentation patch (3.12 KB, patch)
2011-08-03 21:08 PDT, Nicholas Nethercote [:njn] (on vacation until July 11)
no flags Details | Diff | Review
plarena fix (1.55 KB, patch)
2011-08-04 00:02 PDT, Nicholas Nethercote [:njn] (on vacation until July 11)
no flags Details | Diff | Review
plarena fix, v2 (2.71 KB, patch)
2011-08-08 17:47 PDT, Nicholas Nethercote [:njn] (on vacation until July 11)
no flags Details | Diff | Review
plarena fix, v3, by Nicholas Nethercote (2.40 KB, patch)
2011-08-13 12:58 PDT, Wan-Teh Chang
n.nethercote: review+
Details | Diff | Review

Description Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-03 21:08:08 PDT
Created attachment 550598 [details] [diff] [review]
instrumentation patch

This is much the same as bug 675150 -- we try to allocate power-of-two-sized blocks, but then the header is added and we end up rounding up again to the next power-of-two size, wasting almost half the space.

This bug is a duplicate of bug 414088, but I want to avoid that bug's baggage.  This bug will also render bug 419131 obsolete when it's fixed.

The attached instrumentation patch prints out the total size (requested and actual) of all PLArenas.  Starting the browser and opening Gmail, at the peak PLArena usage point we have 3,380,402 bytes requested for PLArenas, and 6,504,512 actually allocated.  I.e. we're allocating 1.92x more than we need to.

I plan to fix this in the simplest possible way -- just reduce pool->arenasize by the header+slop size so that the allocation requests are all powers-of-two.
Comment 1 Justin Lebar (not reading bugmail) 2011-08-03 21:16:03 PDT
It would be awesome if you made PL_ArenaAllocate warn in debug builds if it wastes a lot of space due to clownshoes.  Maybe you can only do this when we have malloc_usable_size.
Comment 2 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-03 21:21:58 PDT
Well, most of the existing PLArenaPools use a power-of-two as their arena size, it's just that the plarena.c code then adds to that.  So once plarena.c is fixed I don't think there's much of a problem.
Comment 3 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-03 21:48:33 PDT
BTW, in the JSArena version of this bug I'm asserting that the pool constructor's pass in a power-of-two arena size, thus guaranteeing no space is wasted.  But I don't know if I can do that for NSPR -- since Mozilla isn't its only user, I'm uncertain whether API changes like that are acceptable.
Comment 4 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-04 00:02:33 PDT
Created attachment 550615 [details] [diff] [review]
plarena fix

A simple patch.  With this patch applied I re-measured gmail startup.    3,182,126 bytes are requested, and 3,252,128 bytes are allocated.  The two numbers don't match exactly because some plarenas in Gecko pass in a non-power-of-two size to PL_InitArenaPool, and so suffer some rounding up even with the patch.
Comment 5 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-04 00:17:38 PDT
Some numbers from about:memory, after loading Gmail and then hitting "minimize memory usage" several times:

                      pre-patch         post-patch
                      ---------         ----------
explicit              116.2 MB          113.9MB
heap-unclassified      42.1 MB (36.3%)  37.3 MB (32.7%)


I only did each one once and there's probably some noise, but that ~3MB drop sure is nice.  It looks like having a slightly smaller netsize in each plarena hasn't hurt at all.
Comment 6 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-04 18:37:47 PDT
*** Bug 414088 has been marked as a duplicate of this bug. ***
Comment 7 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-04 18:38:20 PDT
*** Bug 419131 has been marked as a duplicate of this bug. ***
Comment 8 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-08-08 12:34:35 PDT
What if size <= sizeof(PLArena) - pool->mask? I think that we should only use this new allocation strategy when size is much larger than the overhead (e.g. size <= PR_MAX(128, size of overhead).

Also, the savings is for all platforms, right? If so, it would be great to get this into the next Aurora.
Comment 9 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-08 13:54:47 PDT
(In reply to Brian Smith (:bsmith) from comment #8)
> What if size <= sizeof(PLArena) - pool->mask? I think that we should only
> use this new allocation strategy when size is much larger than the overhead
> (e.g. size <= PR_MAX(128, size of overhead).

|sizeof(PLArena)| is 4 words -- 16 bytes on 32-bit platforms, 32 bytes on 64-bit platforms.  |pool->mask| is usually 7.  So you're asking "what if size <= 25?"  Maybe you meant |+| instead of |-|?  Either way, I'm fine to do something like that because it's an unusual case -- it's pretty silly to pass a |size| that small.  For example, in Firefox I think all the instances have |size| of 1024 or greater.


> Also, the savings is for all platforms, right? If so, it would be great to
> get this into the next Aurora.

AFAIK the saving is for all platforms, but it might be smaller on 32-bit platforms due to the smaller pointers.

I'd also like to land this ASAP, but I have no idea what the expectations are in terms of review turnaround time for NSPR.
Comment 10 Justin Lebar (not reading bugmail) 2011-08-08 13:59:40 PDT
Not only does it have to get reviewed and then landed into NSPR's CVS, but we then have to pull the new version of NSPR.
Comment 11 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-08-08 14:49:08 PDT
(In reply to Nicholas Nethercote [:njn] from comment #9)
> (In reply to Brian Smith (:bsmith) from comment #8)
> > What if size <= sizeof(PLArena) - pool->mask? I think that we should only
> > use this new allocation strategy when size is much larger than the overhead
> > (e.g. size <= PR_MAX(128, size of overhead).
> 
> |sizeof(PLArena)| is 4 words -- 16 bytes on 32-bit platforms, 32 bytes on
> 64-bit platforms.  |pool->mask| is usually 7.  So you're asking "what if
> size <= 25?"  Maybe you meant |+| instead of |-|?  Either way, I'm fine to
> do something like that because it's an unusual case -- it's pretty silly to
> pass a |size| that small.  For example, in Firefox I think all the instances
> have |size| of 1024 or greater.

In your patch, you have:
    
    pool->netsize = size - sizeof(PLArena) - pool->mask;

It isn't obvious how we guarantee that this won't underflow in the (unreasonable but possible) case where size is small. We must ensure that size >= sizeof(PLArena) and that size - sizeof(PLArena) >= pool->mask. We cannot return an error in those cases (though maybe we should warn?) for backward compatibility.

> I'd also like to land this ASAP, but I have no idea what the expectations
> are in terms of review turnaround time for NSPR.

Wan-Teh is the best reviewer for this code. 

> Not only does it have to get reviewed and then landed into NSPR's CVS,
> but we then have to pull the new version of NSPR.

I think we still have time to get this into the next release of Firefox. I will ask Wan-Teh whether he would prefer to release 4.8.9 as is or whether we can delay that to take an (updated) patch, if we get that patch soon.
Comment 12 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-08 17:47:12 PDT
Created attachment 551643 [details] [diff] [review]
plarena fix, v2

This patch addresses Brian's concern about arenas with a very small |size|.
Comment 13 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-08-09 01:35:28 PDT
Instead of documenting that |size| is usually a power of two, we could just do this optimization only if we find |size| is a power of two. This would minimize disruption for callers that implement some hack to solve this problem themselves (e.g. passing size == 2^n - 64 for some n).
Comment 14 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-09 02:22:33 PDT
I'm happy with any solution that avoids all the waste.  Brian, do you want me to implement your suggestion?
Comment 15 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-08-11 18:21:25 PDT
Ted, is this something you could help with reviews on?  We have evidence that we're potentially wasting *a lot* of space with this.
Comment 16 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-11 19:10:10 PDT
Brian Smith has been emailing me about this.  He spoke to Wan-teh:

"I spoke with Wan-Teh today. He doesn't have time to review the patches immediately and he doesn't want to coordinate an NSPR release with just this fix."

So, spot-fixes like the one in bug 678422 seem the best thing for now.
Comment 17 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-08-11 20:14:42 PDT
I filed bug 678434 to track wallpaper fixes in Gecko like bug 678422. Let's keep this bug focused on fixes internal to NSPR.
Comment 18 Wan-Teh Chang 2011-08-11 20:17:26 PDT
Comment on attachment 551643 [details] [diff] [review]
plarena fix, v2

I can understand why the allocator rounds up 17 to 32,
but I am surprised that the allocator rounds up 8K + 1
to 16K, as opposed to 9K (the next multiple of 1K) or
12K (the next multiple of 4K).

In any case I will review this and check this in, and
this will be pushed to mozilla-central within a few
weeks.  (I push NSPR updates to mozilla-central regularly.)
Comment 19 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-08-11 20:25:38 PDT
The allocator rounds 8K+1 to 12K.

But the allocator rounds 1K+1 to 2K and 256+1 to 272.

That is to say, the rounding behavior of the allocator is not particularly simple.  But allocations of size 2^N are always satisfied exactly, so those are safe no matter what N is.
Comment 20 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-11 20:36:56 PDT
And jemalloc rounds 4K up to 8K.  Here are jemalloc's size classes, from memory/jemalloc/jemalloc.c:

 *   |=====================================|
 *   | Category | Subcategory    |    Size |
 *   |=====================================|
 *   | Small    | Tiny           |       2 |
 *   |          |                |       4 |
 *   |          |                |       8 |
 *   |          |----------------+---------|
 *   |          | Quantum-spaced |      16 |
 *   |          |                |      32 |
 *   |          |                |      48 |
 *   |          |                |     ... |
 *   |          |                |     480 |
 *   |          |                |     496 |
 *   |          |                |     512 |
 *   |          |----------------+---------|
 *   |          | Sub-page       |    1 kB |
 *   |          |                |    2 kB |
 *   |=====================================|
 *   | Large                     |    4 kB |
 *   |                           |    8 kB |
 *   |                           |   12 kB |
 *   |                           |     ... |
 *   |                           | 1012 kB |
 *   |                           | 1016 kB |
 *   |                           | 1020 kB |
 *   |=====================================|
 *   | Huge                      |    1 MB |
 *   |                           |    2 MB |
 *   |                           |    3 MB |
 *   |                           |     ... |
 *   |=====================================|

Certain values get doubled if you go 1 byte over:  512, 1kB, 2kB, 4kB, 1MB.

Wan-teh, when you push the change to m-c will you mention it in this bug?  We want to undo spot-fixes like bug 678422 when that happens.
Comment 21 Wan-Teh Chang 2011-08-12 09:00:46 PDT
Thank you very much for the info on jemalloc's behavior.

I will try to review the patch this weekend.  I will post
the mozilla-central patchset push URL in this bug.
Comment 22 Wan-Teh Chang 2011-08-13 12:58:04 PDT
Created attachment 552897 [details] [diff] [review]
plarena fix, v3, by Nicholas Nethercote

I didn't rename the 'arenasize' member of the PLArena structure
because the comment already says it is the "net exact size of a
new arena".

I updated the comment for PL_InitArenaPool to point out that the
'size' argument now means the gross size and should be a power of
two to avoid the heap allocator rounding the size up.

Nicholas, please review the comments to make sure they are accurate.

Patch checked in on the NSPR CVS trunk (NSPR 4.9).

Checking in plarena.c;
/cvsroot/mozilla/nsprpub/lib/ds/plarena.c,v  <--  plarena.c
new revision: 3.19; previous revision: 3.18
done
Checking in plarenas.h;
/cvsroot/mozilla/nsprpub/lib/ds/plarenas.h,v  <--  plarenas.h
new revision: 3.9; previous revision: 3.8
done
Comment 23 Wan-Teh Chang 2011-08-13 13:11:21 PDT
(In reply to Brian Smith (:bsmith) from comment #13)
> Instead of documenting that |size| is usually a power of two, we could just
> do this optimization only if we find |size| is a power of two. This would
> minimize disruption for callers that implement some hack to solve this
> problem themselves (e.g. passing size == 2^n - 64 for some n).

This is a good idea, but probably not worth the effort.
A caller that passes size = 2^n - 64 will get a net size
of 2^n - 64 - (16 or 32), so it will lose only 16 or 32
bytes (the header consists of four pointers) per arena.
Comment 24 Wan-Teh Chang 2011-08-19 09:30:26 PDT
The patch was pushed to mozilla-central yesterday:
http://hg.mozilla.org/mozilla-central/rev/4ab18c9e38b1

Note You need to log in before you can comment on or make changes to this bug.