Open Bug 1436250 (memshrink-content) Opened 6 years ago Updated 1 year ago

[meta] Reduce content process memory overhead

Categories

(Core :: General, enhancement)

enhancement

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

(Depends on 56 open bugs, Blocks 1 open bug)

Details

(Keywords: meta, Whiteboard: [MemShrink:meta])

User Story

https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=mozilla-inbound,1684808,1,4&series=mozilla-inbound,1684802,1,4

Attachments

(1 file)

Going to use this to track various specific bits.

We have a lot of heap-unclassified (almost 40% of the heap) in a vanilla content process with nothing really loaded.

Also, I haven't even looked at the non-heap-allocated overhead.
Depends on: 1436179
Whiteboard: [memshrink]
Keywords: meta
Summary: Reduce content process memory overhead → [meta] Reduce content process memory overhead
I have a dump of memory allocated by content processes using ASAN's __sanitizer_print_memory_profile(); it's a little confused because both content processes interleave their dumps (I'll need to find a way to separate them, or only dump 1).  The two processes have allocated (still live) ~17MB and ~21.5MB; for reference the two processes show a size in System Monitor (on Fedora) of ~24.5 and 28MB when I looked in a different run with the same profile.  

Probably the 17MB is the 'warm' process that hasn't loaded any content. and the other is showing a blank page.  Comparing the two will also be interesting.

The dump is large (since I told it to dump the allocation stacks of *all* allocations; total was ~77K for the small, and 100K for the larger).  I'll upload the raw files, but the highlights:

We're spending a lot on alignment(?)
We have a LOT of power-of-2-sized buffers -- IIRC jemalloc isn't efficient on powers-of-two (not unusual)  -- Glandium?
The profiler is allocating a bunch of memory up front in case it needs it when turned on (I presume)
Lots of HashTables - many probably far from filled, and some are static once created
Prefs.... (njn is working on this!)
fontconfig is a PIG!!!!
Telemetry is a non-0 %-age
Quite a bit (scattered) of script data/source/etc



~8% is in posix_memalign from slab_allocator_alloc_chunk() in gslice.c (2500+ allocations)
~3% (~850K) in ~25 allocations from ThreadInfo::ThreadInfo in tools/profiler/core/ThreadInfo.cpp, allocated when the threads are registered with the profiler, or 32808 bytes per thread.  That's a lot to spend for the profiler when I haven't installed it in that profile, let alone used it.  Lazy allocation, perhaps?
~3% in many allocations from g_realloc() (no further backtrace)
~3% (650K) in 21 allocations from performXDR<> called from js::XDRScript<> 
~2% in 5 allocations (of ~88K each) from js::DuplicateString(), called from js::ScriptSource::setSourceCopy()
~1% (262144) in 1 allocation from PLDHashTable::ChangeTable from Preferences(!) (SetLatePreferences)
~1% (262144) in 16 allocations from DoInterfaceDescriptior(XPTArena...), called a ways above from DoRegisterXPT()
~1% in ~8000 allocations from FcCharSetFindLeafCreate() (fontconfig)
~1% in ~7700 allocations from FcValueListCreate/
~1% in 621 allocations from  JSScript::createScriptData() (from XDRScript<>)
~0.5% in 2 allocations from __strtof_l()
~0.5% in FcPatternObjectInsertElt()
~0.5% (131072) from js::detail::HashTable<>changeTableSize()
~0.5% (131072) in 2 allocations from PLDHashTable::Add() (an XPTInterfaceInfoManager table)
~0.5% (131072) in 1 allocations from PLDHashTable::Add()  from GetAtomHashEntry() when in RegisterStaticAtoms
~0.5% (131072) in PLDHashTable::Add() called from TelemetryHistogram::InitializeGlobalState()
(perhaps a couple more 131072 or 262144 allocations)
~0.5% (122K) in XPT_DoCString() from XPTInterfaceInfoManager::RegisterBuffer()
~0.4% (113K) in 95 allocations from _dl_new_object()
~0.4% (111K) from FcCharSetPutLeaf
~0.4% in 17xx allocations from PLDHashTable::Add for strings from TelemetryHistogram::InitializeGlobalState()
~0.4% (102K) in 50 allocations from  nsPersistentProperties::SetStringProperty()
~0.4% (101K) in many allocations from FcValueSave()
~0.4% (98K) in 3 allocations from ThreadInfo::ThreadInfo()
~0.4% (98304) in 6 allocations from xptiInterfaceEntry::Create()
~0.4% (98304) in 2 allocations from PLDHashTable::Add() 
98K in 12 allocs from FcConfigAllocExpr
81920 (10*8192!) in 10 allocations from DuplicateString<char, 8192ul, 1ul> from Pref::Pref()
Bunch more allocs from ThreadInfo::ThreadInfo() (profiler)
65536 in 1 alloc from HashTable<>::createTable() from AtomizeAndCopyChars<>
65536 in 1 allocation from nsAtomFriend::RegisterStaticAtoms()
65536 in 2 allocations from gfxFcPlatformFontList::AddPatternToFontList()/InitFontListForPlatform()
65536 in 2 allocs from js::LifoAlloc::newChunkWithCapacity()
65536 in 1 alloc from nsComponentManagerImpl::RegisterCIDEntryLocked()
65520 in 2 allocs from nsPurpleBuffer::Put()
60K in 65 allocs from ft_mem_qalloc() (freetype)
60K in ~2500 allocs from nsAtomFriend::RegisterStaticAtoms()
Flags: needinfo?(mh+mozilla)
> We have a LOT of power-of-2-sized buffers -- IIRC jemalloc isn't efficient on powers-of-two (not unusual)  -- Glandium?

No, powers-of-two are the best case, along with everything that's exactly matching a class size, or is a multiple of the page size for larger sizes.
Flags: needinfo?(mh+mozilla)
> No, powers-of-two are the best case, along with everything that's exactly
> matching a class size, or is a multiple of the page size for larger sizes.

Good.  (IIRC at one point it was better to be power-of-2-minus-n; though perhaps I'm thinking of some other system/allocator)
You might be thinking about things like nsTAutoArray, which have an embedded header, so a better size for it is jemalloc_class_size - header_size.
(In reply to Randell Jesup [:jesup] from comment #1)
> I have a dump of memory allocated by content processes using ASAN's
> __sanitizer_print_memory_profile()

You'll want to be careful with that -- I'm pretty sure ASAN will be using it's own allocator instead of jemalloc, so it's not a representative run.

You can use DMD for vanilla heap profiling. It works with jemalloc so will give representative results. See the docs about "live mode" at https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD.
So, DMD results (I was using ASAN): similar of course, since I don't care much about a few bytes - biggest difference would be in slop and alignment I imagine.

Comments above are still valid; we can now see that gtk is using a moderate amount, and no surprise the fontconfig stuff is called from a InitFontList.
Lots in total in js::ScriptSource, various things involving Atoms, and XPTInterfaceInfoManager::RegisterBuffer() is big hotspot
LifoAlloc then comes in with a lot of different little allocations (probably not surprising)

6% (811K) in ThreadInfo::ThreadInfo (note: another couple % below with different stacks)
4.5% in js::ScriptSource::performXDR<>
3.6% (491K) from XPTInterfaceInfoManager::RegisterBuffer() (a few % more below)
2.3% in js::ScriptSource::setSourceCopy()
2% in js::SharedScriptData::new_ from JSScript::createScriptData()
1.9% (262144) in PLDHashTable::ChangeTable() from SetLatePreferences()
1.7% in js::SharedScriptData::new_ from JSScript::fillyInitFromEmitters
1.6% in FcPatternObjectAddWithBinding() (from InitFontList())
1.5% in gtk_css_selector_tree_builder_build from (near top) dgtk_settings_get_for_display()
1.4% from js::ScriptSource::setSourceCopy()
1.2% in glibc _dl_new_object()
1% (~30% cumulative) in nsPersistentProperties::SetStringProperty() from nsStringBundle::LoadProperties()
1% from XPTInterfaceInfoManager::RegisterBuffer() (again, different stack slightly)
1% in PLDHashTable::Add() from nsAtomFriend::RegisterStaticAtoms()
1% (131072, 1 alloc) in gtk_css_provider_load_internal() from gtk_settings_get_for_display
1% (ditto) in PLDHashTable::Add() from TelemetryHistogram::InitializeGlobalState()
1% (ditto) in js::AtomizeChars from the frontend::GeneralParser<>
1% (ditto) in js::AtomizeChars from js::XDRAtom<>/js::XDRScript<>
1% in FcPatternObjectInsertElt from InitFontlist()
0.85% in PLDHashTable::Add from XPTInterfaceInfoManager::RegisterBuffer() (different stack)
0.8% in gtk_css_ruleset_add() from gtk_settings_get_for_display
0.8% in js::SharedScriptData::new_() from JSScript::fullyInitFromEmitter()
0.8% DuplicateString() from pref_SetPref()
0.7% in FcCharSetPutLeaf from InitFontList
0.7% in js:LifoAlloc::newChunkWithCapacity() from js::frontend:PerHandlerParser
0.6% from js::SharedScriptData::new_ from JSScriptCreateScriptData (different stack)
0.6% from nsAtomFriend::RegisterStaticAtoms()
0.5% from FcPatternObjectAddWithBinding
0.5% from ThreadInfo::ThreadInfo (different stack)
0.5% from XPTInterfaceInfoManager::RegisterBuffer() (different stack)
0.5% from call_init() (dl_init.c) in glibc
0.5% (45% cumulative) in js::AtomizeChars (different stack)

<bunch of 65536 byte allocs from Component Manager, HashTables for StaticAtoms, JSSCript::shareScriptData()>

<several 61440-byte totals (15 allocs) from LifoAlloc, and a bunch in the 50K region with 13 allocs from LifoAlloc>

<4 ~36K alloc stacks from ThreadInfo::ThreadInfo -- different callers - HangMonitor, WatchdogMain, BackgroundHangManager -- I wonder if there's some duplication here that could be eliminated)
Bug 1436179 tracks the ThreadInfo/profiler bits.
Raw data.  Note lsan4_xaa is the first 100K lines (which goes down to ~1K total allocation/stack; the tail is LONG; xaa is only about 1/15th of the full file.  Also note that lsan has a mix of two content processes; one that is displaying a blank page, one that hasn't been used yet

https://app.box.com/folder/46288716831
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Assignee: rjesup → nobody
Status: ASSIGNED → NEW
Depends on: 1437168
Depends on: 1438088
Depends on: 1438287
Whiteboard: [memshrink] → [MemShrink:meta]
Depends on: 1441290
Depends on: 1441292
Depends on: 529808
Depends on: 1441736
No longer depends on: 1441290
Depends on: 1441754
Depends on: 1442361
Depends on: 1442433
Depends on: 1442737
Depends on: 1425524
Depends on: 1443930, 1443932
Depends on: 1446519
Depends on: 1254777
Depends on: 1447744
Depends on: 1448034
Depends on: 1448040
Depends on: 1448060
Depends on: 1449288
Depends on: MinGCMem
Depends on: 1440336
Depends on: 786819
Depends on: 1451568
Depends on: 1452786
Depends on: 1452862
Depends on: 1455178
Depends on: 833098
Depends on: 1458339
Depends on: 1460304
Alias: memshrink-content
Depends on: 1460416
Depends on: 1460674
Depends on: 1463569
Depends on: 1463587
Depends on: 1463908
Depends on: 1464542
See Also: → 1350472
Depends on: 1464548
Depends on: 1464552
cc'ing felipe who might want to be in the loop on this.
Depends on: angle-62
No longer depends on: angle-62
Depends on: 1443077
Depends on: 1469719
Depends on: 648417
No longer depends on: 1439412
Depends on: 1470023
Depends on: 1470324
Depends on: 1470333
Depends on: 1470339
Depends on: 1470365
Depends on: 1470591
Depends on: 1470783
Depends on: 1470793
Depends on: 1470983
Depends on: 1471025
Depends on: 1471062
Depends on: 1471091
Depends on: 1471102
Depends on: 1472491
Depends on: 1472523
Depends on: 1473414
Depends on: 1473631
Depends on: 1473634
Depends on: 1474130
Depends on: 1474139
Depends on: 1474140
Depends on: 1474143
Depends on: 1474155
Depends on: 1474163
Depends on: 1258781
Depends on: 1474400
Depends on: 1240547
Depends on: 1474793
Depends on: 1474918
Depends on: 1446831
Depends on: 1471535
Depends on: 1475091
Depends on: 645563
Depends on: 1475290
Depends on: 1475518
Depends on: 1475700
Depends on: 1475899
Depends on: 1476403
Depends on: 1476405
Depends on: 1476416
Depends on: 1476432
Depends on: 1477393
Depends on: 1477576
Depends on: 1477579
Depends on: 1478124
Depends on: 1416723
Depends on: 1479236
Depends on: 1479241
Depends on: 1479245
Depends on: 1479250
Depends on: 1479309
Depends on: 1479310
Depends on: 1479312
Depends on: 1479313
Depends on: 1479318
Depends on: 1479450
Depends on: 1446940
Depends on: 1480244
Depends on: 1480319
Depends on: 1480327
Depends on: 1471878
Depends on: 1479569
User Story: (updated)
Depends on: 1481321
Depends on: 1475571
Depends on: 1481975
Depends on: 1481998
Depends on: 1483363
Depends on: 1483414
Depends on: 1483664
Depends on: 1483738
Depends on: 1484373
Depends on: 1484413
Depends on: 1484415
Depends on: 1484466
Depends on: 1484496
Depends on: 1485347
Depends on: 1486182
Depends on: 1486444
Depends on: 1477213
Depends on: 1487137
Depends on: 1487135
Depends on: 1487146
Depends on: 1487198
Depends on: 1487212
Depends on: 1487214
Depends on: 1487216
Depends on: 1487217
Depends on: 1487221
Depends on: 1487223
Depends on: 1487228
Depends on: 1487234
Depends on: 1487235
Depends on: 1487237
Depends on: 1489315
Depends on: 1475556
Depends on: 1257388
Depends on: 1497729
No longer depends on: 1497729
Depends on: 1482153
Depends on: 1498278
Depends on: 1501438
Depends on: 1419091
Depends on: 1482091
Depends on: 1505522
Depends on: 1503496
Depends on: 1505689
Depends on: 1505690
Depends on: 1344428
Depends on: 1502284
Depends on: 1506763
No longer depends on: 1502284
Depends on: 1507434
Depends on: 1477432
Depends on: 1508873
Depends on: ipc-devirt
Depends on: 1514869
Blocks: fission
Depends on: 1523749
Depends on: 1524687
Depends on: 1524688
Depends on: 1529551
Depends on: 1430810
No longer depends on: 1529551
Depends on: 1527532
Depends on: 1540301
Depends on: 1540824
Depends on: 1541208
Depends on: 1543777
Depends on: 1544371
Depends on: 1556539
Depends on: 1543407
Depends on: 1561739
Depends on: 1561937
Depends on: 1549975
Depends on: 1507287
Depends on: 1564412
Depends on: 1565040
Depends on: 1566191
Depends on: 1510569
Depends on: 1564674
Depends on: 1569526
No longer depends on: 1583362
Depends on: 1596420
Depends on: 1600705
Depends on: WarpBuilder
Depends on: 1616511
Depends on: 1614933
Depends on: 1619803
Depends on: 1626127
Depends on: 1634469
Blocks: fission-perf
No longer blocks: fission
Depends on: 1439412, 1496583, 1533462
Depends on: 1639398
Depends on: 1639719

There are a few instance of nsStaticCaseInsensitiveNameTable, which is a class that could clearly be generated at compile time. However, the three instances of this class in each content process occupy a total of about 7000 bytes so it would not be worth the time to convert it.

gPropertyIDLNameTable also looks to contain only static data. It uses about 28720 bytes, which is better but still maybe too small to bother with.

Depends on: 1639922
Depends on: 1640309
Depends on: 1641090
Depends on: 1641614
Depends on: 1643170
Depends on: 1643368
Depends on: 1623557
Depends on: 1643732
Depends on: 1645237
Depends on: 1645500
Depends on: 1645510
Depends on: 1645862
Depends on: 1646145
Depends on: 1647943
Depends on: 1648000
Depends on: 1648178
Depends on: 1649181
Depends on: 1649221
Depends on: 1649554
Depends on: 1649578
Depends on: 1649844
Depends on: 1649879
Depends on: 1650151
Depends on: 1650707
Depends on: 1650709
Depends on: 1462841
Depends on: 1651941
Depends on: 1656155
Depends on: 1656582
Depends on: 1649843
Depends on: 1655438
Depends on: 1659460
Depends on: 1659585
Depends on: 1660737
Depends on: 1661888
Depends on: 1662345

Here's 12KB of slop/clownshoes allocated by a 3rd party rust library: https://github.com/crossbeam-rs/crossbeam/issues/551

Depends on: 1665258
Depends on: 1669392
No longer depends on: 1669392
Depends on: 1669392
Depends on: 1675554
Depends on: 1685801
Depends on: 1627111
Depends on: 1690956
No longer depends on: 1690956
Depends on: 1694174
Depends on: 1582687
Depends on: 1689413
Fission Milestone: --- → Future
Depends on: ReShape
Depends on: 1703326
Depends on: 1713100
Depends on: 1714585
Depends on: 1732168
Depends on: 1708243
Fission Milestone: Future → ---
Depends on: 1713960
Depends on: 1054671
Depends on: 1529336
Depends on: 1774806
Depends on: 1776826
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.