Last Comment Bug 683597 - Change jemalloc's accounting for huge allocations to round up to the next page boundary, not to the next MB
: Change jemalloc's accounting for huge allocations to round up to the next pag...
Status: RESOLVED FIXED
[MemShrink:P2][inbound]
:
Product: Core
Classification: Components
Component: Memory Allocator (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla10
Assigned To: Justin Lebar (not reading bugmail)
:
:
Mentors:
Depends on:
Blocks: 681183
  Show dependency treegraph
 
Reported: 2011-08-31 08:31 PDT by Justin Lebar (not reading bugmail)
Modified: 2011-10-11 02:28 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch v1 (7.03 KB, patch)
2011-10-07 11:58 PDT, Justin Lebar (not reading bugmail)
khuey: review+
Details | Diff | Splinter Review

Description Justin Lebar (not reading bugmail) 2011-08-31 08:31:06 PDT
jemalloc rounds "huge" allocations (1MB and up) up to megabyte boundaries.  But the physical memory used is the allocation size rounded up to a page boundary -- if you allocate but never touch the page, the OS never attaches physical memory to that page.

This is only a problem for huge allocations, because large allocations are rounded up to page boundaries.

This is important, e.g. for bug 682195.  There, we want to account for how much memory is being used by media elements.  If we allocate 1MB + 1B, we use 1MB + 4KB.  I'd like to report that, not 2MB, in about:memory.  But right now, we have to report 2MB, because that's how much jemalloc counts against heap-allocated, which is used to compute heap-unclassified.
Comment 1 Justin Lebar (not reading bugmail) 2011-08-31 08:33:06 PDT
This is [MemShrink] inasmuch as it influences accounting in about:memory.
Comment 2 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-08-31 08:36:03 PDT
(In reply to Justin Lebar [:jlebar] from comment #0)
> jemalloc rounds "huge" allocations (1MB and up) up to megabyte boundaries. 
> But the physical memory used is the allocation size rounded up to a page
> boundary -- if you allocate but never touch the page, the OS never attaches
> physical memory to that page.

So, you're saying that if we ask for 1 MB + 1 byte we end up using 1 MB + 1 page of actual memory (but reserving 2MB of address space)?

Is that true on all OSs, or just Linux (where I'm presuming you tested)?
Comment 3 Justin Lebar (not reading bugmail) 2011-08-31 08:53:38 PDT
> So, you're saying that if we ask for 1 MB + 1 byte we end up using 1 MB + 1 page of 
> actual memory (but reserving 2MB of address space)?

Right.

I haven't tested specifically in this case with jemalloc, although I've tested this in general on Linux.  I'm pretty sure it works the same way on Mac.

I don't know how it works on Windows, but in jemalloc, we set decommit on Windows, which explicitly decommits the slop pages.
Comment 4 Randell Jesup [:jesup] 2011-08-31 09:41:46 PDT
But it still eats VM space even if decommitted.  The original design goals for jemalloc probably didn't consider the exhaustion of VM space on 32-bit machines, but it's very much a reality nowadays, and was the primary cause of my crashing or being forced to restart my browser on WinXP with several hundred (mostly not loaded!) tabs.
Comment 5 Justin Lebar (not reading bugmail) 2011-08-31 09:55:20 PDT
Right now 'explicit' is almost directly comparable to 'resident' (minus code size, which is constant), but it's not at all comparable to 'vsize'.  I understand that vmem exhaustion is a problem, but I don't think that making 'explicit' count a hybrid between resident and non-resident memory is a useful step towards improving that situation.
Comment 6 Randell Jesup [:jesup] 2011-08-31 11:08:25 PDT
Agreed; that's really a different bug than this one.  (We could keep a separate count of this sort of slop, though it's semi-calculable from those values.)

When tbpl.allizom.org/?usebuildbot=1 was leaking and I'd been running for a week or so in my 'huge' profile, I had 3.1GB explicit, 3.1GB resident, 2.9GB heap-allocated, 3.7GB heap-committed - and 5GB vsize (Linux x64).  That's a big difference.
Comment 7 Nicholas Nethercote [:njn] 2011-08-31 20:37:53 PDT
In the 1MB+1 case, does jemalloc's "heap-allocated" measurement count 2MB or 1MB+4KB?  I've been assuming it's the former.

Furthermore, "heap-unclassified" is computed in about:memory to be "heap-allocated" minus everything reported by heap memory reporters.  If my assumption is correct, then the heap reporters should report 2MB, as malloc_usable_size does, otherwise there'll be a portion (1MB-4KB) of dark matter that we'll never be able to avoid, which is bad.
Comment 8 Nicholas Nethercote [:njn] 2011-08-31 20:46:36 PDT
(In reply to Nicholas Nethercote [:njn] from comment #7)
> In the 1MB+1 case, does jemalloc's "heap-allocated" measurement count 2MB or
> 1MB+4KB?  I've been assuming it's the former.

Oh god, it appears to depend on whether MALLOC_DECOMMIT is defined.  From huge_malloc():

#  ifdef MALLOC_DECOMMIT
        huge_allocated += psize;
#  else
        huge_allocated += csize;
#  endif

|psize| is the request rounded up to the nearest page (i.e. 1MB+4KB), |csize| is the request rounded up to the nearest "chunk ceiling" (i.e. 2MB).

Remind me if we have MALLOC_DECOMMIT defined, and on which platforms?
Comment 9 Matheus Kerschbaum 2011-08-31 20:54:46 PDT
MALLOC_DECOMMIT is only defined on Windows (see http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.c#269)
Comment 10 Justin Lebar (not reading bugmail) 2011-08-31 20:59:28 PDT
(In reply to Nicholas Nethercote [:njn] from comment #7)
> In the 1MB+1 case, does jemalloc's "heap-allocated" measurement count 2MB or
> 1MB+4KB?  I've been assuming it's the former.

The proposal is to make it always be the latter, regardless of whether DECOMMIT is enabled.
Comment 11 Nicholas Nethercote [:njn] 2011-08-31 21:35:52 PDT
In general, I want memory reporters to use malloc_usable_size.  If we don't do that, we'll always miss the 10%+ of heap memory that is slop.

For non-huge (< 1MB) allocations, the meaning of "slop" is clear, and we can use malloc_usable_size without any confusion.  But for huge (1MB+) allocations, "slop" can have one of two meanings, which I'll call "psize" and "csize" (as per comment 8).  psize corresponds to RSS, csize corresponds to vsize.

malloc_usable_size uses the "csize" meaning.  And you're proposing that "heap-allocated" be changed to always use the "psize" meaning.  But if we do that, then we can't use malloc_usable_size in memory reporters if we want our measurements to be consistent.

I suppose we could have an alternative version of malloc_usable_size that is just used in memory reporters and gives us the "psize" meaning.  Ugh.  And I don't know how all this would work on non-jemalloc platforms... but (apart from Mac, which will use jemalloc soon anyway) we don't compute "heap-allocated" or "heap-unclassified" on non-jemalloc platforms so maybe it doesn't matter.
Comment 12 Justin Lebar (not reading bugmail) 2011-08-31 21:43:09 PDT
I don't know how hard it would be to change malloc_usable_size so it reports the psize instead of the csize, but we might be able to do that...

Do we even have malloc_usable_size on non-jemalloc platforms?
Comment 13 Nicholas Nethercote [:njn] 2011-08-31 22:01:49 PDT
> Do we even have malloc_usable_size on non-jemalloc platforms?

Yes.  From memory/mozalloc/mozalloc.cpp:

  size_t
  moz_malloc_usable_size(void *ptr)
  {
      if (!ptr)
          return 0;    

  #if defined(XP_MACOSX)
      return malloc_size(ptr);
  #elif defined(MOZ_MEMORY)
      return malloc_usable_size(ptr);
  #elif defined(XP_WIN)
      return _msize(ptr);
  #else
      return 0;
  #endif
  }
Comment 14 Justin Lebar (not reading bugmail) 2011-10-07 11:58:56 PDT
Created attachment 565602 [details] [diff] [review]
Patch v1
Comment 15 Justin Lebar (not reading bugmail) 2011-10-07 12:06:11 PDT
I tested (in gdb on Linux) that if you malloc(1024 * 1024 + 1024) bytes, you get something with malloc_usable_size = 1MB + 4KB.  If you then realloc to 1024 * 1024 + 5000, the malloc_usable_size is now 1MB + 8KB, as expected.
Comment 16 Justin Lebar (not reading bugmail) 2011-10-07 12:31:20 PDT
Comment on attachment 565602 [details] [diff] [review]
Patch v1

Canceling the review; I may have a clever way of solving this and bug 670967 with one simple patch.
Comment 17 Justin Lebar (not reading bugmail) 2011-10-07 12:47:11 PDT
Comment on attachment 565602 [details] [diff] [review]
Patch v1

Hm, no, it's not so simple.

(My idea was to define MALLOC_DECOMMIT everywhere, but only *actually* decommit when MALLOC_DECOMMIT_FOR_REAL is defined, on Windows.  Then you'd get all the bookkeeping for free.

But jemalloc assumes that commit() zeroes memory, which of course isn't the case if decommit and commit are nops.

It was a nice idea.)
Comment 18 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-10-10 08:25:15 PDT
Comment on attachment 565602 [details] [diff] [review]
Patch v1

Review of attachment 565602 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine, even though it's pretty scary.  jlebar is going to verify that the assumptions about lazy physical page allocation hold on 10.6 before landing.
Comment 19 Justin Lebar (not reading bugmail) 2011-10-10 08:47:55 PDT
> jlebar is going to verify that the assumptions about lazy physical page allocation hold on 10.6 
> before landing.

Just did this.  I mmap'ed 8GB with

  mmap(NULL, 8L * 1024 * 1024 * 1024, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0)

(which is what jemalloc uses).  Then I observed that the activity monitor's "real mem" column reflected not how much memory I'd mapped but how much of the mapping I'd written to.
Comment 20 Justin Lebar (not reading bugmail) 2011-10-10 12:13:10 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/21df455d5083
Comment 21 Marco Bonardo [::mak] 2011-10-11 02:28:55 PDT
https://hg.mozilla.org/mozilla-central/rev/21df455d5083

Note You need to log in before you can comment on or make changes to this bug.