Going to use this to track various specific bits.

We have a lot of heap-unclassified (almost 40% of the heap) in a vanilla content process with nothing really loaded.

Also, I haven't even looked at the non-heap-allocated overhead.
I have a dump of memory allocated by content processes using ASAN's __sanitizer_print_memory_profile(); it's a little confused because both content processes interleave their dumps (I'll need to find a way to separate them, or only dump 1).  The two processes have allocated (still live) ~17MB and ~21.5MB; for reference the two processes show a size in System Monitor (on Fedora) of ~24.5 and 28MB when I looked in a different run with the same profile.  

Probably the 17MB is the 'warm' process that hasn't loaded any content. and the other is showing a blank page.  Comparing the two will also be interesting.

The dump is large (since I told it to dump the allocation stacks of *all* allocations; total was ~77K for the small, and 100K for the larger).  I'll upload the raw files, but the highlights:

We're spending a lot on alignment(?)
We have a LOT of power-of-2-sized buffers -- IIRC jemalloc isn't efficient on powers-of-two (not unusual)  -- Glandium?
The profiler is allocating a bunch of memory up front in case it needs it when turned on (I presume)
Lots of HashTables - many probably far from filled, and some are static once created
Prefs.... (njn is working on this!)
fontconfig is a PIG!!!!
Telemetry is a non-0 %-age
Quite a bit (scattered) of script data/source/etc

~8% is in posix_memalign from slab_allocator_alloc_chunk() in gslice.c (2500+ allocations)
~3% (~850K) in ~25 allocations from ThreadInfo::ThreadInfo in tools/profiler/core/ThreadInfo.cpp, allocated when the threads are registered with the profiler, or 32808 bytes per thread.  That's a lot to spend for the profiler when I haven't installed it in that profile, let alone used it.  Lazy allocation, perhaps?
~3% in many allocations from g_realloc() (no further backtrace)
~3% (650K) in 21 allocations from performXDR<> called from js::XDRScript<> 
~2% in 5 allocations (of ~88K each) from js::DuplicateString(), called from js::ScriptSource::setSourceCopy()
~1% (262144) in 1 allocation from PLDHashTable::ChangeTable from Preferences(!) (SetLatePreferences)
~1% (262144) in 16 allocations from DoInterfaceDescriptior(XPTArena...), called a ways above from DoRegisterXPT()
~1% in ~8000 allocations from FcCharSetFindLeafCreate() (fontconfig)
~1% in ~7700 allocations from FcValueListCreate/
~1% in 621 allocations from  JSScript::createScriptData() (from XDRScript<>)
~0.5% in 2 allocations from __strtof_l()
~0.5% in FcPatternObjectInsertElt()
~0.5% (131072) from js::detail::HashTable<>changeTableSize()
~0.5% (131072) in 2 allocations from PLDHashTable::Add() (an XPTInterfaceInfoManager table)
~0.5% (131072) in 1 allocations from PLDHashTable::Add()  from GetAtomHashEntry() when in RegisterStaticAtoms
~0.5% (131072) in PLDHashTable::Add() called from TelemetryHistogram::InitializeGlobalState()
(perhaps a couple more 131072 or 262144 allocations)
~0.5% (122K) in XPT_DoCString() from XPTInterfaceInfoManager::RegisterBuffer()
~0.4% (113K) in 95 allocations from _dl_new_object()
~0.4% (111K) from FcCharSetPutLeaf
~0.4% in 17xx allocations from PLDHashTable::Add for strings from TelemetryHistogram::InitializeGlobalState()
~0.4% (102K) in 50 allocations from  nsPersistentProperties::SetStringProperty()
~0.4% (101K) in many allocations from FcValueSave()
~0.4% (98K) in 3 allocations from ThreadInfo::ThreadInfo()
~0.4% (98304) in 6 allocations from xptiInterfaceEntry::Create()
~0.4% (98304) in 2 allocations from PLDHashTable::Add() 
98K in 12 allocs from FcConfigAllocExpr
81920 (10*8192!) in 10 allocations from DuplicateString<char, 8192ul, 1ul> from Pref::Pref()
Bunch more allocs from ThreadInfo::ThreadInfo() (profiler)
65536 in 1 alloc from HashTable<>::createTable() from AtomizeAndCopyChars<>
65536 in 1 allocation from nsAtomFriend::RegisterStaticAtoms()
65536 in 2 allocations from gfxFcPlatformFontList::AddPatternToFontList()/InitFontListForPlatform()
65536 in 2 allocs from js::LifoAlloc::newChunkWithCapacity()
65536 in 1 alloc from nsComponentManagerImpl::RegisterCIDEntryLocked()
65520 in 2 allocs from nsPurpleBuffer::Put()
60K in 65 allocs from ft_mem_qalloc() (freetype)
60K in ~2500 allocs from nsAtomFriend::RegisterStaticAtoms()
> We have a LOT of power-of-2-sized buffers -- IIRC jemalloc isn't efficient on powers-of-two (not unusual)  -- Glandium?

No, powers-of-two are the best case, along with everything that's exactly matching a class size, or is a multiple of the page size for larger sizes.
> No, powers-of-two are the best case, along with everything that's exactly
> matching a class size, or is a multiple of the page size for larger sizes.

Good.  (IIRC at one point it was better to be power-of-2-minus-n; though perhaps I'm thinking of some other system/allocator)
You might be thinking about things like nsTAutoArray, which have an embedded header, so a better size for it is jemalloc_class_size - header_size.
(In reply to Randell Jesup [:jesup] from comment #1)
> I have a dump of memory allocated by content processes using ASAN's
> __sanitizer_print_memory_profile()

You'll want to be careful with that -- I'm pretty sure ASAN will be using it's own allocator instead of jemalloc, so it's not a representative run.

You can use DMD for vanilla heap profiling. It works with jemalloc so will give representative results. See the docs about "live mode" at
So, DMD results (I was using ASAN): similar of course, since I don't care much about a few bytes - biggest difference would be in slop and alignment I imagine.

Comments above are still valid; we can now see that gtk is using a moderate amount, and no surprise the fontconfig stuff is called from a InitFontList.
Lots in total in js::ScriptSource, various things involving Atoms, and XPTInterfaceInfoManager::RegisterBuffer() is big hotspot
LifoAlloc then comes in with a lot of different little allocations (probably not surprising)

6% (811K) in ThreadInfo::ThreadInfo (note: another couple % below with different stacks)
4.5% in js::ScriptSource::performXDR<>
3.6% (491K) from XPTInterfaceInfoManager::RegisterBuffer() (a few % more below)
2.3% in js::ScriptSource::setSourceCopy()
2% in js::SharedScriptData::new_ from JSScript::createScriptData()
1.9% (262144) in PLDHashTable::ChangeTable() from SetLatePreferences()
1.7% in js::SharedScriptData::new_ from JSScript::fillyInitFromEmitters
1.6% in FcPatternObjectAddWithBinding() (from InitFontList())
1.5% in gtk_css_selector_tree_builder_build from (near top) dgtk_settings_get_for_display()
1.4% from js::ScriptSource::setSourceCopy()
1.2% in glibc _dl_new_object()
1% (~30% cumulative) in nsPersistentProperties::SetStringProperty() from nsStringBundle::LoadProperties()
1% from XPTInterfaceInfoManager::RegisterBuffer() (again, different stack slightly)
1% in PLDHashTable::Add() from nsAtomFriend::RegisterStaticAtoms()
1% (131072, 1 alloc) in gtk_css_provider_load_internal() from gtk_settings_get_for_display
1% (ditto) in PLDHashTable::Add() from TelemetryHistogram::InitializeGlobalState()
1% (ditto) in js::AtomizeChars from the frontend::GeneralParser<>
1% (ditto) in js::AtomizeChars from js::XDRAtom<>/js::XDRScript<>
1% in FcPatternObjectInsertElt from InitFontlist()
0.85% in PLDHashTable::Add from XPTInterfaceInfoManager::RegisterBuffer() (different stack)
0.8% in gtk_css_ruleset_add() from gtk_settings_get_for_display
0.8% in js::SharedScriptData::new_() from JSScript::fullyInitFromEmitter()
0.8% DuplicateString() from pref_SetPref()
0.7% in FcCharSetPutLeaf from InitFontList
0.7% in js:LifoAlloc::newChunkWithCapacity() from js::frontend:PerHandlerParser
0.6% from js::SharedScriptData::new_ from JSScriptCreateScriptData (different stack)
0.6% from nsAtomFriend::RegisterStaticAtoms()
0.5% from FcPatternObjectAddWithBinding
0.5% from ThreadInfo::ThreadInfo (different stack)
0.5% from XPTInterfaceInfoManager::RegisterBuffer() (different stack)
0.5% from call_init() (dl_init.c) in glibc
0.5% (45% cumulative) in js::AtomizeChars (different stack)

<bunch of 65536 byte allocs from Component Manager, HashTables for StaticAtoms, JSSCript::shareScriptData()>

<several 61440-byte totals (15 allocs) from LifoAlloc, and a bunch in the 50K region with 13 allocs from LifoAlloc>

<4 ~36K alloc stacks from ThreadInfo::ThreadInfo -- different callers - HangMonitor, WatchdogMain, BackgroundHangManager -- I wonder if there's some duplication here that could be eliminated)
Bug 1436179 tracks the ThreadInfo/profiler bits.
Raw data.  Note lsan4_xaa is the first 100K lines (which goes down to ~1K total allocation/stack; the tail is LONG; xaa is only about 1/15th of the full file.  Also note that lsan has a mix of two content processes; one that is displaying a blank page, one that hasn't been used yet
Alias: memshrink-content
See Also: → 1350472
cc'ing felipe who might want to be in the loop on this.
User Story: (updated)
