Open Bug 819769 Opened 12 years ago Updated 2 years ago

Add dark dark matter (memory used by anonymous mmap'ed pages that's not the heap or JS) to about:memory

Categories

(Toolkit :: about:memory, defect)

x86
macOS
defect

Tracking

()

REOPENED

People

(Reporter: justin.lebar+bug, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P2])

The first step in resolving dark dark matter (memory used by anonymous mmap'ed pages that's not the heap or JS) is establishing whether it even exists.

Note that even if we look at all of Gecko and establish that all our anonymous mmaps are accounted for, libraries can mmap stuff into our address space.

On Linux where we have smaps, this is relatively easy to do.  We can parse smaps to find out how much vmem and rss is used by anonymous mappings.  Then we can ask JS and malloc how much vmem and rss is used by their mappings.  (To be totally correct, we'd have to use mincore here.)  Do some subtraction and we know how much dark dark matter we have.
I don't know how to do this on Windows or Mac.  Which is a shame, because I there's considerable scope for platform-specific weirdness with this stuff.  But just having it on Linux/Android (esp. B2G) will be a good start.
On Windows, we could leverage the VirtualAlloc / VirtualFree interception we already have.  We'd of course have to make sure that we don't add too much overhead.

On Mac, perhaps we could do the same thing and intercept mmap / munmap.
Assignee: nobody → n.nethercote
Whiteboard: [MemShrink] → [MemShrink:P2]
I thought I'd start looking at vsize, because it's easier and doesn't need mincore.

Here are some excerpts from about:memory just after startup:

67,175,456 B (100.0%) ++ explicit

114,528,256 B (100.0%) -- rss
├───73,003,008 B (63.74%) -- anonymous
│   ├──72,372,224 B (63.19%) ── anonymous, outside brk() [rw-p] [60]
│   └─────630,784 B (00.55%) ── anonymous, outside brk() [rwxp] [9]
├───39,718,912 B (34.68%) ++ shared-libraries
├────1,658,880 B (01.45%) ++ other-files
├──────143,360 B (00.13%) ── main thread's stack [rw-p]
└────────4,096 B (00.00%) ── vdso [r-xp]

651,980,800 B (100.0%) -- size
├──315,240,448 B (48.35%) -- anonymous
│  ├──314,408,960 B (48.22%) ── anonymous, outside brk() [rw-p] [65]
│  ├──────720,896 B (00.11%) ── anonymous, outside brk() [rwxp] [9]
│  └──────110,592 B (00.02%) ── anonymous, outside brk() [---p] [27]
├──303,624,192 B (46.57%) ++ shared-libraries
├───32,960,512 B (05.06%) ++ other-files
├──────151,552 B (00.02%) ── main thread's stack [rw-p]
└────────4,096 B (00.00%) ── vdso [r-xp]

20,971,520 B (100.0%) -- js-main-runtime-gc-heap-committed
├──11,923,992 B (56.86%) -- used
│  ├──11,345,312 B (54.10%) ── gc-things
│  ├─────311,296 B (01.48%) ── chunk-admin
│  └─────267,384 B (01.27%) ── arena-admin
└───9,047,528 B (43.14%) -- unused
    ├──7,020,008 B (33.47%) ── gc-things
    ├──1,048,576 B (05.00%) ── chunks
    └────978,944 B (04.67%) ── arenas

 45,540,384 B ── heap-allocated
 50,499,584 B ── heap-committed
  4,912,600 B ── heap-committed-unused
       10.75% ── heap-committed-unused-ratio
  3,010,560 B ── heap-dirty
 15,232,456 B ── heap-unused

112,439,296 B ── resident
104,402,944 B ── resident-unique

651,845,632 B ── vsize


jemalloc's committed heap is ~50 MB.  The JS heap is ~21 MB.  Together they are ~71 MB, which is close to the ~72 MB we have for "rss/anonymous/anonymous, outside brk() [rw-p]", which is good.  

But the corresponding vsize amount ("size/anonymous/anonymous, outside brk() [rw-p]") is ~314 MB!  That's ~242 MB unaccounted for.  Huh?

What we really want is to get the committed size from smaps.  Here's a sample anonymous mapping entry:

 01fe6000-02007000 rw-p 00000000 00:00 0                                  [heap]
 Size:                132 kB
 Rss:                  24 kB
 Pss:                  24 kB
 Shared_Clean:          0 kB
 Shared_Dirty:          0 kB
 Private_Clean:         0 kB
 Private_Dirty:        24 kB
 Referenced:           24 kB
 Anonymous:            24 kB
 AnonHugePages:         0 kB
 Swap:                  0 kB
 KernelPageSize:        4 kB
 MMUPageSize:           4 kB
 Locked:                0 kB

Size (i.e. vsize) is way bigger than any of the others.  http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=Documentation/filesystems/proc.txt documents some of these entries, but left me none the wiser.
> But the corresponding vsize amount ("size/anonymous/anonymous, outside brk()
> [rw-p]") is ~314 MB!  That's ~242 MB unaccounted for.  Huh?

Well, jemalloc-heap-committed is not counting mapped pages, but instead "wired" pages (pages in RSS or swap).  I dunno about the js-heap reporter.

I guess we might care to investigate further for the purposes of OOM prevention on Windows.  I don't know how common out-of-vsize OOMs are.
> Well, jemalloc-heap-committed is not counting mapped pages, but instead
> "wired" pages (pages in RSS or swap).

A.k.a. "committed", right?

The point is that getting vsize or RSS from smaps isn't much help.  Which is why I wrote:

>> What we really want is to get the committed size from smaps.

Without that, I don't see how to make progress here.  (Even then, we're assuming that everything under "explicit" is committed, which is probably true most of the time, except for the JS decommitted stuff...)
> A.k.a. "committed", right?

Yes.

Sorry, it's late, but why can't you add RSS + Swap from smaps to get committed?
> why can't you add RSS + Swap from smaps to get committed?

Because that would be far too easy!  /me slaps forehead.
Now I'm getting weird numbers.  Viz:


136,708,096 B (100.0%) -- rss
├───58,527,744 B (42.81%) ++ shared-libraries
├───42,274,816 B (30.92%) -- anonymous
│   ├──41,938,944 B (30.68%) ── anonymous, outside brk() [rw-p] [32]
│   └─────335,872 B (00.25%) ── anonymous, outside brk() [rwxp] [6]
├───35,672,064 B (26.09%) -- other-files
│   ├───9,408,512 B (06.88%) ── [stack:7732] [rw-p]
│   ├───4,952,064 B (03.62%) ── [stack:7726] [rw-p]
│   ├───4,440,064 B (03.25%) ── [stack:7737] [rw-p]
│   ├───3,153,920 B (02.31%) ── [stack:7741] [rw-p]
│   ├───3,153,920 B (02.31%) ── [stack:7721] [rw-p]
│   ├───1,880,064 B (01.38%) ── [stack:7739] [rw-p]
│   ├───1,744,896 B (01.28%) ── [stack:7728] [rw-p]
│   ├───1,708,032 B (01.25%) ── [stack:7733] [rw-p]
│   ├───1,589,248 B (01.16%) ── [stack:7743] [rw-p]
│   ├─────917,504 B (00.67%) ── [stack:7730] [rw-p]
│   ├─────913,408 B (00.67%) ── [stack:7736] [rw-p]

20,946,944 B (100.0%) -- js-main-runtime-gc-heap-committed

 46,419,968 B ── heap-committed


Anonymous mappings are only 42 MiB, but "heap-committed" plus the JS GC heap is 67 MiB.  And I have these weird "[stack:<nnn>]" entries.  I suspect they are really anonymous but something has changed in the /proc/<pid>/smaps accounting.

Nb: I updated my Ubuntu distribution, and hence my kernel, between comment 3 and now, which probably explains why things have changed, if not the meaning of the change.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=Documentation/filesystems/proc.txt says this:

 If the mapping is not associated with a file:

  [heap]                   = the heap of the program
  [stack]                  = the stack of the main process
  [stack:1001]             = the stack of the thread with tid 1001
  [vdso]                   = the "virtual dynamic shared object",
                             the kernel system call handler

Which seems fair enough.  But having 30+ MiB of thread stacks seems rather odd.  And the anonymous mapping counts are still clearly too low.

(I guess these stack entries should not be reported under "other-files", i.e. the smaps parsing needs tweaking so that "[stack:1234]" is treated like "[stack]".)
One thing which might be helpful would be to get jemalloc and js to dump all of their mappings and which parts they believe are committed.  Then we can cross-check that with the smaps data.

An alternative would be to modify jemalloc and JS so that they always allocate memory in a certain address range.  I don't think that would be too hard.  Then you can simply look at the addresses in smaps to determine who allocated what.

30+MiB of virtual address space for stacks doesn't seem so bad.  But 30+MiB of RSS for thread stacks would certainly be surprising to me.  As I read comment 8, you're saying that it's the latter, right?
[stack:nnnn] entries count more than stack, actually, because the kernel only knows about the start of each thread stack, not its end. So what each number for [stack:nnnn] represents is some thread stack *and* anonymous memory. All in all, it would be better to count all that as anonymous memory, instead of pretending it's stack. (and it's what would happen with older kernels, that don't show [stack:nnnn] entries anyways)
Depends on: 827691
Another interesting obstacle:  I'm currently computing committed-anonymous-unclassified as:

  anonymous-committed - heap-committed - (sum of all NONHEAP reports)

I'm getting negative values.  I can see at least two reasons:

- "explicit/startup-cache/mapping" measures memory that is (probably) committed, but non-anonymous, and reported as a NONHEAP report.

- decommitted GC memory is reported as a NONHEAP report(s).

One possibility is to move both of these cases out of the "explicit" tree and say that "explicit" is only for memory that is both anonymous and probably committed.  (It's impossible in general to say if a memory report is definitely committed, AFAICT.)

The other possibility is to split NONHEAP into sub-categories that indicate anonymity/non-anonymity and probably committedness/decommittedness.

Hmm.
> (It's impossible in general to say if a memory report is definitely committed, AFAICT.)

You can of course look up the relevant addresses in smaps.  But I don't mean to beg the question of whether we'd want to do this.
> > (It's impossible in general to say if a memory report is definitely committed, AFAICT.)
> 
> You can of course look up the relevant addresses in smaps.  But I don't mean
> to beg the question of whether we'd want to do this.

That works on Linux, but by "in general" I meant "on all platforms".
> Another interesting obstacle:  I'm currently computing
> committed-anonymous-unclassified as:
> 
>   anonymous-committed - heap-committed - (sum of all NONHEAP reports)
> 
> I'm getting negative values.  I can see at least two reasons:
> 
> - "explicit/startup-cache/mapping" measures memory that is (probably)
> committed, but non-anonymous, and reported as a NONHEAP report.
> 
> - decommitted GC memory is reported as a NONHEAP report(s).
> 
> One possibility is to move both of these cases out of the "explicit" tree
> and say that "explicit" is only for memory that is both anonymous and
> probably committed.

I have local patches moving these two things out of "explicit", and my result is still very negative.  For example:

  anonymous-committed = 181.8 MiB
  heap-committed = 147.9 MiB
  (sum of all NONHEAP reports) = 72.0 MiB

Which gives a committed-anonymous-unclassified of -38.1 MiB.  This is with a resident value of 251.5

Hmm.  Mind you, it's not always negative, but often is, and by a lot.

----

I looked at the [rwxp] segments, to narrow it down.  They all come from the JS JIT, and I know for sure that we allocate in 64 KiB chunks.  The JS reporter says we have 1,507,328 bytes allocated, which is 23 x 64 KiB chunks, which seems reasonable.  But the "rss" tree says we only have 1,343,488 bytes in [rwxp] segments.  Looking at proc/pid/smaps: I see some segments like this:

7f4dc01a9000-7f4dc01b9000 rwxp 00000000 00:00 0
Size:                 64 kB
Rss:                  60 kB
Pss:                  60 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        60 kB
Referenced:           60 kB
Anonymous:            60 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB

Size is 64 KiB, but Rss+Swap is only 60 KiB.  There are lots of segments like this, and one where Size is 1,280 KiB and Rss+Swap is 1,252 KiB, and another with 576 vs. 496 KiB, etc.

Similarly, I have this:

56,623,104 B (100.0%) -- js-main-runtime-gc-heap-committed
├──32,365,248 B (57.16%) -- unused
│  ├──14,503,936 B (25.61%) ── arenas
│  ├──12,618,432 B (22.28%) ── gc-things
│  └───5,242,880 B (09.26%) ── chunks

I can imagine some of those arenas haven't been touched yet.

In other words, we sometimes over-allocate, and parts of the allocation don't get touched (or haven't been touched *yet*, at least), and so those parts aren't committed.  But we report the full allocation, so the dark dark matter calculation goes awry.

Right now I can't see how to make this work.  The numbers reported by the reporters and the numbers from /proc/pid/smaps just don't line up sufficiently...
> Right now I can't see how to make this work.

...except by using mincore all over the place?

I agree that's a very large hammer, and I know it doesn't work on Windows.

OTOH if we can't get /some/ handle on dark dark matter, we will have no way to evaluate things like bug 828886.  :-/
> I agree that's a very large hammer, and I know it doesn't work on Windows.

...and mincore doesn't work properly if pages are swapped out.

I mean, if we really wanted to be crazy, we could read from each page before we ran mincore.  I'd hope that if we read from a CoW zero page that mincore would still report "false" for that page.
> I mean, if we really wanted to be crazy, we could read from each page before
> we ran mincore.  I'd hope that if we read from a CoW zero page that mincore
> would still report "false" for that page.

I was contemplating forcing every mapping to be fully committed, by touching every word just after allocating it :)


> OTOH if we can't get /some/ handle on dark dark matter, we will have no way
> to evaluate things like bug 828886. :-/

Bug 828886 looks like a combination of system malloc (which bug 828844 will hopefully cover) and mappings done within system graphics libraries, which I don't think we'll ever get insight into.

We can make some progress on dark dark matter just by auditing the code.  E.g. by searching for "MAP_ANON" yesterday I found JSRuntime::bumpAlloc_ (bug 832026) and nsPresArena.cpp's poison arena (which is only a single page and so not worth reporting).

FWIW, here are the files containing MAP_ANON that aren't covered by an existing reporter (excluding nsPresArena.cpp):

> ./toolkit/crashreporter/google-breakpad/src/common/memory.h
> ./other-licenses/snappy/src/snappy-stubs-internal.h
> ./mozglue/linker/CustomElf.cpp
> ./mozglue/linker/ElfLoader.cpp
> ./mozglue/linker/Mappable.cpp
> ./js/src/ctypes/libffi/src/dlmalloc.c
> ./js/src/ctypes/libffi/src/closures.c

The crashreporter one seems to be involved with minidump writing, which presumably doesn't happen unless we crash.  And even then, it looks like the mmap() call is unreachable.

The snappy definitions appear to be unused.

Maybe bug 828845 will cover the mozglue/linker/* cases?

I don't know why ctypes/libffi has dlmalloc in it.
> mappings done within system graphics libraries, which I don't think we'll ever get 
> insight into.

We could at least have insight into how much dark-dark matter we have, which we don't meaningfully count now...  But yours is a reasonable point; why put a lot of time into measuring dark-dark matter if it's going to be an opaque number?
(In reply to Nicholas Nethercote [:njn] from comment #18)
> Maybe bug 828845 will cover the mozglue/linker/* cases?

yes
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
We agreed to reopen this because we have a way we think we could fix this: Wrap mmap(), push a TLS variable every time Gecko allocates or frees something (jemalloc, js, yarr, probably a few other places).  Everywhere else is dark dark matter.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
This isn't a simple bug, but it might be an interesting project for a strong systems programmer.  It's a relatively low-level issue, so I think it shouldn't require a lot of understanding of Gecko internals.
Whiteboard: [MemShrink:P2] → [MemShrink:P2][mentor=jlebar]
Assignee: n.nethercote → nobody
Whiteboard: [MemShrink:P2][mentor=jlebar] → [MemShrink:P2]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.