Closed Bug 676457 Opened 13 years ago Closed 13 years ago

Avoid wasted space in PLArenaPools due to jemalloc rounding up

Categories

(NSPR :: NSPR, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P2][clownshoes])

Attachments

(2 files, 2 obsolete files)

This is much the same as bug 675150 -- we try to allocate power-of-two-sized blocks, but then the header is added and we end up rounding up again to the next power-of-two size, wasting almost half the space.

This bug is a duplicate of bug 414088, but I want to avoid that bug's baggage.  This bug will also render bug 419131 obsolete when it's fixed.

The attached instrumentation patch prints out the total size (requested and actual) of all PLArenas.  Starting the browser and opening Gmail, at the peak PLArena usage point we have 3,380,402 bytes requested for PLArenas, and 6,504,512 actually allocated.  I.e. we're allocating 1.92x more than we need to.

I plan to fix this in the simplest possible way -- just reduce pool->arenasize by the header+slop size so that the allocation requests are all powers-of-two.
Assignee: wtc → nnethercote
Blocks: DarkMatter
Summary: Avoid wasted space in JSArenaPools due to jemalloc rounding up → Avoid wasted space in PLArenaPools due to jemalloc rounding up
Whiteboard: [MemShrink]
It would be awesome if you made PL_ArenaAllocate warn in debug builds if it wastes a lot of space due to clownshoes.  Maybe you can only do this when we have malloc_usable_size.
Well, most of the existing PLArenaPools use a power-of-two as their arena size, it's just that the plarena.c code then adds to that.  So once plarena.c is fixed I don't think there's much of a problem.
BTW, in the JSArena version of this bug I'm asserting that the pool constructor's pass in a power-of-two arena size, thus guaranteeing no space is wasted.  But I don't know if I can do that for NSPR -- since Mozilla isn't its only user, I'm uncertain whether API changes like that are acceptable.
Attached patch plarena fix (obsolete) — Splinter Review
A simple patch.  With this patch applied I re-measured gmail startup.    3,182,126 bytes are requested, and 3,252,128 bytes are allocated.  The two numbers don't match exactly because some plarenas in Gecko pass in a non-power-of-two size to PL_InitArenaPool, and so suffer some rounding up even with the patch.
Attachment #550615 - Flags: review?(wtc)
Some numbers from about:memory, after loading Gmail and then hitting "minimize memory usage" several times:

                      pre-patch         post-patch
                      ---------         ----------
explicit              116.2 MB          113.9MB
heap-unclassified      42.1 MB (36.3%)  37.3 MB (32.7%)


I only did each one once and there's probably some noise, but that ~3MB drop sure is nice.  It looks like having a slightly smaller netsize in each plarena hasn't hurt at all.
Whiteboard: [MemShrink] → [MemShrink][clownshoes]
What if size <= sizeof(PLArena) - pool->mask? I think that we should only use this new allocation strategy when size is much larger than the overhead (e.g. size <= PR_MAX(128, size of overhead).

Also, the savings is for all platforms, right? If so, it would be great to get this into the next Aurora.
(In reply to Brian Smith (:bsmith) from comment #8)
> What if size <= sizeof(PLArena) - pool->mask? I think that we should only
> use this new allocation strategy when size is much larger than the overhead
> (e.g. size <= PR_MAX(128, size of overhead).

|sizeof(PLArena)| is 4 words -- 16 bytes on 32-bit platforms, 32 bytes on 64-bit platforms.  |pool->mask| is usually 7.  So you're asking "what if size <= 25?"  Maybe you meant |+| instead of |-|?  Either way, I'm fine to do something like that because it's an unusual case -- it's pretty silly to pass a |size| that small.  For example, in Firefox I think all the instances have |size| of 1024 or greater.


> Also, the savings is for all platforms, right? If so, it would be great to
> get this into the next Aurora.

AFAIK the saving is for all platforms, but it might be smaller on 32-bit platforms due to the smaller pointers.

I'd also like to land this ASAP, but I have no idea what the expectations are in terms of review turnaround time for NSPR.
Not only does it have to get reviewed and then landed into NSPR's CVS, but we then have to pull the new version of NSPR.
(In reply to Nicholas Nethercote [:njn] from comment #9)
> (In reply to Brian Smith (:bsmith) from comment #8)
> > What if size <= sizeof(PLArena) - pool->mask? I think that we should only
> > use this new allocation strategy when size is much larger than the overhead
> > (e.g. size <= PR_MAX(128, size of overhead).
> 
> |sizeof(PLArena)| is 4 words -- 16 bytes on 32-bit platforms, 32 bytes on
> 64-bit platforms.  |pool->mask| is usually 7.  So you're asking "what if
> size <= 25?"  Maybe you meant |+| instead of |-|?  Either way, I'm fine to
> do something like that because it's an unusual case -- it's pretty silly to
> pass a |size| that small.  For example, in Firefox I think all the instances
> have |size| of 1024 or greater.

In your patch, you have:
    
    pool->netsize = size - sizeof(PLArena) - pool->mask;

It isn't obvious how we guarantee that this won't underflow in the (unreasonable but possible) case where size is small. We must ensure that size >= sizeof(PLArena) and that size - sizeof(PLArena) >= pool->mask. We cannot return an error in those cases (though maybe we should warn?) for backward compatibility.

> I'd also like to land this ASAP, but I have no idea what the expectations
> are in terms of review turnaround time for NSPR.

Wan-Teh is the best reviewer for this code. 

> Not only does it have to get reviewed and then landed into NSPR's CVS,
> but we then have to pull the new version of NSPR.

I think we still have time to get this into the next release of Firefox. I will ask Wan-Teh whether he would prefer to release 4.8.9 as is or whether we can delay that to take an (updated) patch, if we get that patch soon.
Attached patch plarena fix, v2 (obsolete) — Splinter Review
This patch addresses Brian's concern about arenas with a very small |size|.
Attachment #550615 - Attachment is obsolete: true
Attachment #550615 - Flags: review?(wtc)
Attachment #551643 - Flags: review?(wtc)
Instead of documenting that |size| is usually a power of two, we could just do this optimization only if we find |size| is a power of two. This would minimize disruption for callers that implement some hack to solve this problem themselves (e.g. passing size == 2^n - 64 for some n).
I'm happy with any solution that avoids all the waste.  Brian, do you want me to implement your suggestion?
Whiteboard: [MemShrink][clownshoes] → [MemShrink:P2][clownshoes]
Ted, is this something you could help with reviews on?  We have evidence that we're potentially wasting *a lot* of space with this.
Brian Smith has been emailing me about this.  He spoke to Wan-teh:

"I spoke with Wan-Teh today. He doesn't have time to review the patches immediately and he doesn't want to coordinate an NSPR release with just this fix."

So, spot-fixes like the one in bug 678422 seem the best thing for now.
I filed bug 678434 to track wallpaper fixes in Gecko like bug 678422. Let's keep this bug focused on fixes internal to NSPR.
Comment on attachment 551643 [details] [diff] [review]
plarena fix, v2

I can understand why the allocator rounds up 17 to 32,
but I am surprised that the allocator rounds up 8K + 1
to 16K, as opposed to 9K (the next multiple of 1K) or
12K (the next multiple of 4K).

In any case I will review this and check this in, and
this will be pushed to mozilla-central within a few
weeks.  (I push NSPR updates to mozilla-central regularly.)
The allocator rounds 8K+1 to 12K.

But the allocator rounds 1K+1 to 2K and 256+1 to 272.

That is to say, the rounding behavior of the allocator is not particularly simple.  But allocations of size 2^N are always satisfied exactly, so those are safe no matter what N is.
And jemalloc rounds 4K up to 8K.  Here are jemalloc's size classes, from memory/jemalloc/jemalloc.c:

 *   |=====================================|
 *   | Category | Subcategory    |    Size |
 *   |=====================================|
 *   | Small    | Tiny           |       2 |
 *   |          |                |       4 |
 *   |          |                |       8 |
 *   |          |----------------+---------|
 *   |          | Quantum-spaced |      16 |
 *   |          |                |      32 |
 *   |          |                |      48 |
 *   |          |                |     ... |
 *   |          |                |     480 |
 *   |          |                |     496 |
 *   |          |                |     512 |
 *   |          |----------------+---------|
 *   |          | Sub-page       |    1 kB |
 *   |          |                |    2 kB |
 *   |=====================================|
 *   | Large                     |    4 kB |
 *   |                           |    8 kB |
 *   |                           |   12 kB |
 *   |                           |     ... |
 *   |                           | 1012 kB |
 *   |                           | 1016 kB |
 *   |                           | 1020 kB |
 *   |=====================================|
 *   | Huge                      |    1 MB |
 *   |                           |    2 MB |
 *   |                           |    3 MB |
 *   |                           |     ... |
 *   |=====================================|

Certain values get doubled if you go 1 byte over:  512, 1kB, 2kB, 4kB, 1MB.

Wan-teh, when you push the change to m-c will you mention it in this bug?  We want to undo spot-fixes like bug 678422 when that happens.
Thank you very much for the info on jemalloc's behavior.

I will try to review the patch this weekend.  I will post
the mozilla-central patchset push URL in this bug.
I didn't rename the 'arenasize' member of the PLArena structure
because the comment already says it is the "net exact size of a
new arena".

I updated the comment for PL_InitArenaPool to point out that the
'size' argument now means the gross size and should be a power of
two to avoid the heap allocator rounding the size up.

Nicholas, please review the comments to make sure they are accurate.

Patch checked in on the NSPR CVS trunk (NSPR 4.9).

Checking in plarena.c;
/cvsroot/mozilla/nsprpub/lib/ds/plarena.c,v  <--  plarena.c
new revision: 3.19; previous revision: 3.18
done
Checking in plarenas.h;
/cvsroot/mozilla/nsprpub/lib/ds/plarenas.h,v  <--  plarenas.h
new revision: 3.9; previous revision: 3.8
done
Attachment #551643 - Attachment is obsolete: true
Attachment #551643 - Flags: review?(wtc)
(In reply to Brian Smith (:bsmith) from comment #13)
> Instead of documenting that |size| is usually a power of two, we could just
> do this optimization only if we find |size| is a power of two. This would
> minimize disruption for callers that implement some hack to solve this
> problem themselves (e.g. passing size == 2^n - 64 for some n).

This is a good idea, but probably not worth the effort.
A caller that passes size = 2^n - 64 will get a net size
of 2^n - 64 - (16 or 32), so it will lose only 16 or 32
bytes (the header consists of four pointers) per arena.
Status: NEW → ASSIGNED
OS: Linux → All
Priority: -- → P1
Hardware: x86_64 → All
Target Milestone: --- → 4.9
Attachment #552897 - Flags: review+
The patch was pushed to mozilla-central yesterday:
http://hg.mozilla.org/mozilla-central/rev/4ab18c9e38b1
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: