Open Bug 1446519 Opened 6 years ago Updated 2 years ago

Need a tool for measuring non-heap process memory

Categories

(Core :: Memory Allocator, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [overhead:noted])

For bug 1436250 we will care about non-heap process memory.  This includes whatever TEXT and DATA segment bits we can't share with other processes, for example.

Right now we don't have a measurement for this in about:memory.  Do we have some other tool that does the job?  If not, we probably want to create one.

A bare minimum output for such a tool is a "total" number; that will at least allow one to test the impact of a change.  Better would be to have some sort of breakdown or some sort of guidance as to what might be taking a lot of non-heap memory or whatnot.
resident-unique is in about:memory and I think sounds like what you are interested in (well, you'd have to remove explicit from the total), but there's no breakdown of any kind.
resident-unique may not cover everything we care about here.  For example, if there's per-content-process shared memory that is only shared with the parent, then we'd want to include it in this metric, right?

But yes, "resident-unique - explicit" is at least a start.
Also, "resident-unique" is smaller than "explicit" for me, probably because some things are swapped out?
decommitted-arenas might account for some of that. (When an arena isn't being used, but it is in a JS GC chunk that still has live arenas, we decommit to release the physical memory.)
So for my main process right now I have:

1,905.98 MB (100.0%) -- explicit
  701.14 MB ── resident-unique
   47.96 MB (100.0%) -- decommitted

For one of the web content processes, I have:

 461.17 MB (100.0%) -- explicit
 217.75 MB ── resident-unique
 134.25 MB (100.0%) -- decommitted

So yes, it can account for some of it.  More so for the second case than the first one... ;)

My point is that we should have a tool that, like the main heap measurement tree of about:memory can be used by non-experts.  Even something as simple as putting all the relevant numbers in one spot would be a big help.
The script attached to bug 1254777 is quite useful. It analyzes Linux libraries and binaries.

Bloaty looks like a more advanced take on the same basic idea. It's available here: https://github.com/google/bloaty.
It appears this is being worked and it is not blocking a release.  Emma indicated that I should put this in memory allocator.
Component: General → Memory Allocator
Priority: -- → P3
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #2)
> resident-unique may not cover everything we care about here.  For example,
> if there's per-content-process shared memory that is only shared with the
> parent, then we'd want to include it in this metric, right?

On Linux we have access to the Proportional Set Size, which is the sum over resident pages of (page size / n) where n is the number of places the page is mapped, and that's reported for each virtual memory area.

We could also do different things for different types of memory: USS for data/relro, RSS for shared memory.

Also on Linux, we could add support for tagging IPC shared memory segments with names (that can be read out from procfs) if that turns out to be something we need more visibility into.
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #3)
> Also, "resident-unique" is smaller than "explicit" for me, probably because
> some things are swapped out?

|explicit| can contain non-heap entries. The delta you're interested in is probably something like |resident-unique| - |heap-allocated|

(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #2)
> resident-unique may not cover everything we care about here.  For example,
> if there's per-content-process shared memory that is only shared with the
> parent, then we'd want to include it in this metric, right?

Shared memory *should* be reported (possibly only in the parent process, I'd have to take a look at that reporter again) [1].

(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #8)
> We could also do different things for different types of memory: USS for
> data/relro, RSS for shared memory.

Seems like we should just resurrect the system memory reporter, I think that covered a fair amount of this. I can probably add part of that back, but I might tag you or glandium in to flesh out the handling of smaps info though.

[1] https://searchfox.org/mozilla-central/rev/78dbe34925f04975f16cb9a5d4938be714d41897/ipc/glue/SharedMemory.cpp#31-39
Whiteboard: [overhead:noted]
A quick update on where we're at:
  - Section sizes are now being tracked as build metrics as of bug 1463296
  - Committed stack sizes are being worked on in bug 1446519
  - Shared memory should already reported

I'm not sure what else we want to add at this point.
> I'm not sure what else we want to add at this point.

System allocator-allocated memory? bug 828844 did it for Linux, bug 1194061 for Windows, but AFAIK we're still short on Android and Mac.

I'm also not sure we track GPU memory on all platforms if at all.
In theory we could have random mmap(MAP_ANONYNOUS) calls that are happening behind jemalloc's back.  In practice, it's not clear how we'd detect those.

What might be interesting is comparing the sum of all the bits we know about with what the OS thinks is going on, if we can ask the OS for the information we actually care about here.  If they're close enough, we're done.  If not, we need to think about what could be causing the discrepancy...
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #12)
> In theory we could have random mmap(MAP_ANONYNOUS) calls that are happening
> behind jemalloc's back.  In practice, it's not clear how we'd detect those.

I've been considering the possibility of interposing malloc calls in third-party libraries so we can get some handle of how much is being allocated by things like fontconfig. In theory, we could do the same for mmap.

That's a non-trivial but doable problem on Linux. I don't know enough about Windows or mach linkers to know how doable it is on those platforms.

> What might be interesting is comparing the sum of all the bits we know about
> with what the OS thinks is going on, if we can ask the OS for the
> information we actually care about here.  If they're close enough, we're
> done.  If not, we need to think about what could be causing the
> discrepancy...

We already do that, to various degrees on various platforms. Windows apparently has the concept of multiple heaps, and we have accounting for how much space the non-jemalloc heaps use.

On other platforms, we have accounting for how much virtual memory is allocated. The extra allocations are basically the difference between the sum of explicit allocations and the resident-unique numbers. I suppose having a separate reporter for that, similar to heap-unclassified, might make sense...
We *are* interposing malloc calls from third-party libraries on mac and linux. We just can't tell them apart.
(In reply to Mike Hommey [:glandium] from comment #14)
> We *are* interposing malloc calls from third-party libraries on mac and
> linux. We just can't tell them apart.

I mean specifically interposing calls from specific libraries, like we do for our bundled Hunspell.
(In reply to Kris Maglione [:kmag] from comment #13)
> (In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from
> comment #12)
> > In theory we could have random mmap(MAP_ANONYNOUS) calls that are happening
> > behind jemalloc's back.  In practice, it's not clear how we'd detect those.
> 
> I've been considering the possibility of interposing malloc calls in
> third-party libraries so we can get some handle of how much is being
> allocated by things like fontconfig. In theory, we could do the same for
> mmap.
> 
> That's a non-trivial but doable problem on Linux. I don't know enough about
> Windows or mach linkers to know how doable it is on those platforms.

It seems like DMD is good enough for this, do we need an always-on thing?
(In reply to Eric Rahm [:erahm] from comment #16)
> It seems like DMD is good enough for this, do we need an always-on thing?

So, there are two problems with DMD:

1) It requires a special build, which basically means that it's easy to use it to find information about our own configurations, but extremely difficult to get information about what happens in the wild. What kind of memory are random graphics drivers using? How much memory are fontconfig and GTK using for ordinary users, compared to stock Ubuntu, or computers of people like me or jld?

2) It's kind of easy to ignore things that only show up in DMD. Even those of us who run it don't run it that often, and we generally have to do different ad-hoc analyses when we do. It's way easier to ignore, to use the same examples, the megabytes of data that GTK and fontconfig use when they only show up in obscure DMD reports than when they show up at the top of about:memory every time you open it.
It actually doesn't require a special build. We just need to finish bug 1409739.
> I mean specifically interposing calls from specific libraries, like we do for our bundled Hunspell.

To expand on that: we have `CountingAllocatorBase`. For third-party libraries that let you plug in your own allocator, we use it to count the memory allocated in that library. It's used by Hunspell, ICU, some media stuff, and (on Android) Freetype.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.