Open Bug 1436250 (memshrink-content) Opened 2 years ago Updated Last month

[meta] Reduce content process memory overhead

Categories

(Core :: General, enhancement)

enhancement
Not set

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

(Depends on 55 open bugs, Blocks 1 open bug)

Details

(Keywords: meta, Whiteboard: [MemShrink:meta])

User Story

https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=mozilla-inbound,1684808,1,4&series=mozilla-inbound,1684802,1,4

Attachments

(1 file)

Going to use this to track various specific bits.

We have a lot of heap-unclassified (almost 40% of the heap) in a vanilla content process with nothing really loaded.

Also, I haven't even looked at the non-heap-allocated overhead.
Depends on: 1436179
Whiteboard: [memshrink]
Keywords: meta
Summary: Reduce content process memory overhead → [meta] Reduce content process memory overhead
I have a dump of memory allocated by content processes using ASAN's __sanitizer_print_memory_profile(); it's a little confused because both content processes interleave their dumps (I'll need to find a way to separate them, or only dump 1).  The two processes have allocated (still live) ~17MB and ~21.5MB; for reference the two processes show a size in System Monitor (on Fedora) of ~24.5 and 28MB when I looked in a different run with the same profile.  

Probably the 17MB is the 'warm' process that hasn't loaded any content. and the other is showing a blank page.  Comparing the two will also be interesting.

The dump is large (since I told it to dump the allocation stacks of *all* allocations; total was ~77K for the small, and 100K for the larger).  I'll upload the raw files, but the highlights:

We're spending a lot on alignment(?)
We have a LOT of power-of-2-sized buffers -- IIRC jemalloc isn't efficient on powers-of-two (not unusual)  -- Glandium?
The profiler is allocating a bunch of memory up front in case it needs it when turned on (I presume)
Lots of HashTables - many probably far from filled, and some are static once created
Prefs.... (njn is working on this!)
fontconfig is a PIG!!!!
Telemetry is a non-0 %-age
Quite a bit (scattered) of script data/source/etc



~8% is in posix_memalign from slab_allocator_alloc_chunk() in gslice.c (2500+ allocations)
~3% (~850K) in ~25 allocations from ThreadInfo::ThreadInfo in tools/profiler/core/ThreadInfo.cpp, allocated when the threads are registered with the profiler, or 32808 bytes per thread.  That's a lot to spend for the profiler when I haven't installed it in that profile, let alone used it.  Lazy allocation, perhaps?
~3% in many allocations from g_realloc() (no further backtrace)
~3% (650K) in 21 allocations from performXDR<> called from js::XDRScript<> 
~2% in 5 allocations (of ~88K each) from js::DuplicateString(), called from js::ScriptSource::setSourceCopy()
~1% (262144) in 1 allocation from PLDHashTable::ChangeTable from Preferences(!) (SetLatePreferences)
~1% (262144) in 16 allocations from DoInterfaceDescriptior(XPTArena...), called a ways above from DoRegisterXPT()
~1% in ~8000 allocations from FcCharSetFindLeafCreate() (fontconfig)
~1% in ~7700 allocations from FcValueListCreate/
~1% in 621 allocations from  JSScript::createScriptData() (from XDRScript<>)
~0.5% in 2 allocations from __strtof_l()
~0.5% in FcPatternObjectInsertElt()
~0.5% (131072) from js::detail::HashTable<>changeTableSize()
~0.5% (131072) in 2 allocations from PLDHashTable::Add() (an XPTInterfaceInfoManager table)
~0.5% (131072) in 1 allocations from PLDHashTable::Add()  from GetAtomHashEntry() when in RegisterStaticAtoms
~0.5% (131072) in PLDHashTable::Add() called from TelemetryHistogram::InitializeGlobalState()
(perhaps a couple more 131072 or 262144 allocations)
~0.5% (122K) in XPT_DoCString() from XPTInterfaceInfoManager::RegisterBuffer()
~0.4% (113K) in 95 allocations from _dl_new_object()
~0.4% (111K) from FcCharSetPutLeaf
~0.4% in 17xx allocations from PLDHashTable::Add for strings from TelemetryHistogram::InitializeGlobalState()
~0.4% (102K) in 50 allocations from  nsPersistentProperties::SetStringProperty()
~0.4% (101K) in many allocations from FcValueSave()
~0.4% (98K) in 3 allocations from ThreadInfo::ThreadInfo()
~0.4% (98304) in 6 allocations from xptiInterfaceEntry::Create()
~0.4% (98304) in 2 allocations from PLDHashTable::Add() 
98K in 12 allocs from FcConfigAllocExpr
81920 (10*8192!) in 10 allocations from DuplicateString<char, 8192ul, 1ul> from Pref::Pref()
Bunch more allocs from ThreadInfo::ThreadInfo() (profiler)
65536 in 1 alloc from HashTable<>::createTable() from AtomizeAndCopyChars<>
65536 in 1 allocation from nsAtomFriend::RegisterStaticAtoms()
65536 in 2 allocations from gfxFcPlatformFontList::AddPatternToFontList()/InitFontListForPlatform()
65536 in 2 allocs from js::LifoAlloc::newChunkWithCapacity()
65536 in 1 alloc from nsComponentManagerImpl::RegisterCIDEntryLocked()
65520 in 2 allocs from nsPurpleBuffer::Put()
60K in 65 allocs from ft_mem_qalloc() (freetype)
60K in ~2500 allocs from nsAtomFriend::RegisterStaticAtoms()
Flags: needinfo?(mh+mozilla)
> We have a LOT of power-of-2-sized buffers -- IIRC jemalloc isn't efficient on powers-of-two (not unusual)  -- Glandium?

No, powers-of-two are the best case, along with everything that's exactly matching a class size, or is a multiple of the page size for larger sizes.
Flags: needinfo?(mh+mozilla)
> No, powers-of-two are the best case, along with everything that's exactly
> matching a class size, or is a multiple of the page size for larger sizes.

Good.  (IIRC at one point it was better to be power-of-2-minus-n; though perhaps I'm thinking of some other system/allocator)
You might be thinking about things like nsTAutoArray, which have an embedded header, so a better size for it is jemalloc_class_size - header_size.
(In reply to Randell Jesup [:jesup] from comment #1)
> I have a dump of memory allocated by content processes using ASAN's
> __sanitizer_print_memory_profile()

You'll want to be careful with that -- I'm pretty sure ASAN will be using it's own allocator instead of jemalloc, so it's not a representative run.

You can use DMD for vanilla heap profiling. It works with jemalloc so will give representative results. See the docs about "live mode" at https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD.
So, DMD results (I was using ASAN): similar of course, since I don't care much about a few bytes - biggest difference would be in slop and alignment I imagine.

Comments above are still valid; we can now see that gtk is using a moderate amount, and no surprise the fontconfig stuff is called from a InitFontList.
Lots in total in js::ScriptSource, various things involving Atoms, and XPTInterfaceInfoManager::RegisterBuffer() is big hotspot
LifoAlloc then comes in with a lot of different little allocations (probably not surprising)

6% (811K) in ThreadInfo::ThreadInfo (note: another couple % below with different stacks)
4.5% in js::ScriptSource::performXDR<>
3.6% (491K) from XPTInterfaceInfoManager::RegisterBuffer() (a few % more below)
2.3% in js::ScriptSource::setSourceCopy()
2% in js::SharedScriptData::new_ from JSScript::createScriptData()
1.9% (262144) in PLDHashTable::ChangeTable() from SetLatePreferences()
1.7% in js::SharedScriptData::new_ from JSScript::fillyInitFromEmitters
1.6% in FcPatternObjectAddWithBinding() (from InitFontList())
1.5% in gtk_css_selector_tree_builder_build from (near top) dgtk_settings_get_for_display()
1.4% from js::ScriptSource::setSourceCopy()
1.2% in glibc _dl_new_object()
1% (~30% cumulative) in nsPersistentProperties::SetStringProperty() from nsStringBundle::LoadProperties()
1% from XPTInterfaceInfoManager::RegisterBuffer() (again, different stack slightly)
1% in PLDHashTable::Add() from nsAtomFriend::RegisterStaticAtoms()
1% (131072, 1 alloc) in gtk_css_provider_load_internal() from gtk_settings_get_for_display
1% (ditto) in PLDHashTable::Add() from TelemetryHistogram::InitializeGlobalState()
1% (ditto) in js::AtomizeChars from the frontend::GeneralParser<>
1% (ditto) in js::AtomizeChars from js::XDRAtom<>/js::XDRScript<>
1% in FcPatternObjectInsertElt from InitFontlist()
0.85% in PLDHashTable::Add from XPTInterfaceInfoManager::RegisterBuffer() (different stack)
0.8% in gtk_css_ruleset_add() from gtk_settings_get_for_display
0.8% in js::SharedScriptData::new_() from JSScript::fullyInitFromEmitter()
0.8% DuplicateString() from pref_SetPref()
0.7% in FcCharSetPutLeaf from InitFontList
0.7% in js:LifoAlloc::newChunkWithCapacity() from js::frontend:PerHandlerParser
0.6% from js::SharedScriptData::new_ from JSScriptCreateScriptData (different stack)
0.6% from nsAtomFriend::RegisterStaticAtoms()
0.5% from FcPatternObjectAddWithBinding
0.5% from ThreadInfo::ThreadInfo (different stack)
0.5% from XPTInterfaceInfoManager::RegisterBuffer() (different stack)
0.5% from call_init() (dl_init.c) in glibc
0.5% (45% cumulative) in js::AtomizeChars (different stack)

<bunch of 65536 byte allocs from Component Manager, HashTables for StaticAtoms, JSSCript::shareScriptData()>

<several 61440-byte totals (15 allocs) from LifoAlloc, and a bunch in the 50K region with 13 allocs from LifoAlloc>

<4 ~36K alloc stacks from ThreadInfo::ThreadInfo -- different callers - HangMonitor, WatchdogMain, BackgroundHangManager -- I wonder if there's some duplication here that could be eliminated)
Bug 1436179 tracks the ThreadInfo/profiler bits.
Raw data.  Note lsan4_xaa is the first 100K lines (which goes down to ~1K total allocation/stack; the tail is LONG; xaa is only about 1/15th of the full file.  Also note that lsan has a mix of two content processes; one that is displaying a blank page, one that hasn't been used yet

https://app.box.com/folder/46288716831
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Assignee: rjesup → nobody
Status: ASSIGNED → NEW
Depends on: 1438088
Depends on: 1438287
Whiteboard: [memshrink] → [MemShrink:meta]
Depends on: 1441290
Depends on: 1441292
Depends on: 1441736
No longer depends on: 1441290
Depends on: 1441754
Depends on: 1442433
Depends on: 1442737
Depends on: 1443930, 1443932
Duplicate of this bug: 1444751
Depends on: 1446519
Depends on: 1254777
Depends on: 1447744
Depends on: 1448034
Depends on: 1448040
Depends on: 1448060
Depends on: 1449288
Depends on: MinGCMem
Depends on: 786819
Depends on: 1451524
Depends on: 1451568
Depends on: 1452786
Depends on: 1452862
Depends on: 1455178
Depends on: 833098
Depends on: 1458339
Depends on: 1460304
Alias: memshrink-content
Depends on: 1460416
Depends on: 1460002
Depends on: 1460674
Depends on: 1463587
Depends on: 1463908
Depends on: 1464542
See Also: → 1350472
Depends on: 1464548
Depends on: 1464552
cc'ing felipe who might want to be in the loop on this.
Depends on: angle-62
No longer depends on: angle-62
Depends on: 1443077
Depends on: 648417
No longer depends on: 1439412
Depends on: 1470324
Depends on: 1470333
Depends on: 1470339
Depends on: 1470365
Depends on: 1470591
Depends on: 1470783
Depends on: 1470793
Depends on: 1470983
Depends on: 1471025
Depends on: 1471062
Depends on: 1471091
Depends on: 1471102
Depends on: 1472491
Depends on: 1472523
Depends on: 1473414
Depends on: 1473631
Depends on: 1473634
Depends on: 1474130
Depends on: 1474139
Depends on: 1474140
Depends on: 1474143
Depends on: 1474155
Depends on: 1474163
Depends on: 1258781
Depends on: 1474400
Depends on: 1240547
Depends on: 1474793
Depends on: 1474918
Depends on: 1446831
Depends on: 1471535
Depends on: 645563
Depends on: 1475290
Depends on: 1475518
Depends on: 1475700
Depends on: 1475899
Depends on: 1476403
Depends on: 1476405
Depends on: 1476416
Depends on: 1476432
Depends on: 1477393
Depends on: 1477576
Depends on: 1477579
Depends on: 1478124
Depends on: 1416723
Depends on: 1479236
Depends on: 1479241
Depends on: 1479245
Depends on: 1479250
Depends on: 1479309
Depends on: 1479310
Depends on: 1479312
Depends on: 1479313
Depends on: 1479318
Depends on: 1479450
Depends on: 1446940
Depends on: 1480244
Depends on: 1480319
Depends on: 1480327
Depends on: 1471878
Depends on: 1479569
User Story: (updated)
Depends on: 1481321
Depends on: 1475571
Depends on: 1481975
Depends on: 1481998
Depends on: 1483363
Depends on: 1483414
Depends on: 1483664
Depends on: 1483738
Depends on: 1484373
Depends on: 1484413
Depends on: 1484415
Depends on: 1484466
Depends on: 1484496
Depends on: 1485347
Depends on: 1486182
Depends on: 1477213
Depends on: 1487137
Depends on: 1487135
Depends on: 1487146
Depends on: 1487198
Depends on: 1487212
Depends on: 1487214
Depends on: 1487216
Depends on: 1487217
Depends on: 1487221
Depends on: 1487223
Depends on: 1487228
Depends on: 1487234
Depends on: 1487235
Depends on: 1487237
Depends on: 1489315
Depends on: 1475556
Depends on: 1257388
Depends on: 1497729
No longer depends on: 1497729
Depends on: 1482153
Depends on: 1498278
Depends on: 1501438
Depends on: 1419091
Depends on: 1482091
Depends on: 1505522
Depends on: 1503496
Depends on: 1344428
Depends on: 1502284
No longer depends on: 1502284
Depends on: 1507434
Depends on: 1477432
Depends on: 1508873
Depends on: ipc-devirt
Depends on: 1514869
Blocks: fission
Depends on: 1523749
Depends on: 1524687
Depends on: 1524688
Depends on: 1529551
No longer depends on: 1529551
Depends on: 1527532
Depends on: 1543777
Depends on: 1544371
Depends on: 1556539
Depends on: 1543407
Depends on: 1561739
Depends on: 1561937
Depends on: 1549975
Depends on: 1507287
Depends on: 1564412
Depends on: 1565040
Depends on: 1510569
Depends on: 1569526
You need to log in before you can comment on or make changes to this bug.