802446 - B2G memshrink brainstorming bug

Reporter

Description

•

12 years ago

In today's MemShrink meeting we decided we wanted a place to dissect memory dumps and find opportunities for improvement. That's this bug. I'll post some about:memory dumps from the device, but feel free to ask for specific use cases or provide your own. To generate an about:memory dump, do the following: * Update your B2G checkout (git fetch and merge; ./repo sync isn't sufficient) * Run ./get-about-memory.py * gunzip merged-reports.gz * Open nightly on your desktop and load the file into about:memory (see button at the bottom of about:memory) * Copy-paste the text into a file and attach it here.

Justin Lebar (not reading bugmail)

Reporter

Comment 1

•

12 years ago

Attached file about:memory DUMP 1 — Details

== DUMP 1 == I loaded a few apps but didn't interact with them much. I loaded mozilla.org into the browser.

Justin Lebar (not reading bugmail)

Reporter

Comment 2

•

12 years ago

Attached file about:memory?verbose DUMP 1 — Details

Justin Lebar (not reading bugmail)

Reporter

Comment 3

•

12 years ago

Attached file merged-reports DUMP 1 — Details

You can load this file into about:memory on your machine.

Andrew McCreight [:mccr8]

Comment 4

•

12 years ago

One idea I had was that we could take that hugetastical list of compartments and post it on dev.platform and see if people have ideas of things we could get rid of. It would reach a broad number of people. But I don't know how many people are going to dig through that list, so maybe such a scattershot approach won't be effective.

Nicholas Nethercote [inactive]

Comment 5

•

12 years ago

jlebar, can you mail dev.platform and point them to this bug?

Justin Lebar (not reading bugmail)

Reporter

Comment 6

•

12 years ago

(In reply to Nicholas Nethercote [:njn] from comment #5) > jlebar, can you mail dev.platform and point them to this bug? Can we get a few days' analysis here first? We're nominally the experts in understanding what these numbers mean.

Nicholas Nethercote [inactive]

Comment 7

•

12 years ago

Initial thoughts: - Shared libraries are big. Fortunately the PSS numbers are substantially lower than the RSS numbers. - JS dominates among the Gecko stuff. DOM and layout hardly matter in comparison. - 1 MiB of xpti-working-set per process is terrible. Bug 799658 is open for that. - heap-unclassified continues to be annoyingly high. Recent reporter fixes (esp. bug 799796) should help a bit.

Justin Lebar (not reading bugmail)

Reporter

Comment 8

•

12 years ago

Another idea I had was to basically grep through the processes' private address spaces to see whether there's a lot of other memory we might be able to share. But the trick would be identifying the owner of a page once we've found a candidate for sharing.

Thinker Li [:sinker]

Comment 9

•

12 years ago

I find analysis-temporary get a big ratio of memory usage. I had done some simple tests. By cutting off default chunk size of LifoAlloc (LIFO_ALLOC_PRIMARY_CHUNK_SIZE) from 128K to 32K, it cuts off a lot of memory (5+%). If we free analysis-temporary more aggressively, it cuts more, 8~10% I guess.

Thinker Li [:sinker]

Updated

•

12 years ago

Depends on: 804891

Nicholas Nethercote [inactive]

Comment 10

•

12 years ago

Does the nsEffectiveTLDService need to be running in all processes? Judging from DMDV's output it is, and on 64-bit builds it's slightly more than 128 KiB per process.

Thinker Li [:sinker]

Comment 11

•

12 years ago

Chrome process spend a lot of space on huge string. Most of them are used for data URI for 7.*MB. Following are the list of compartments that use data URI. - BrowserElementParent.js (1.6MB) - contentSecurityPolicy.js (1.41MB) - CSPUtils.js (1.1MB) - system app (2.81MB) Near all of them are image data.

Justin Lebar (not reading bugmail)

Reporter

Comment 12

•

12 years ago

> Chrome process spend a lot of space on huge string. > Near all of them are image data. Some of these at least are screenshots, which we're tracking in bug 798002 and dependencies. But CSPUtils.js using screenshots sounds unlikely to me, so I dunno what that is. It would be relatively easy to get a dump of all large strings and their associated compartments. If you still see a lot of huge strings after we fix bug 802647, let me know and I'll work on this. > Does the nsEffectiveTLDService need to be running in all processes? We ought to be able to proxy those calls to the parent process; I can't imagine we make many calls into it. I also have to imagine that we could compress its data structures. (I say this without having ever looked at this code, but just in general... :)

Nicholas Nethercote [inactive]

Comment 13

•

12 years ago

I'm seeing this a lot: 1 block(s) in record 1 of 12897 262,144 bytes (262,112 requested / 32 slop) 1.61% of the heap (1.61% cumulative unreported) malloc (vg_replace_malloc.c:270) moz_xmalloc (mozalloc.cpp:54) operator new[](unsigned long) (mozalloc.h:200) std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16) std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16) std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16) std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16) IPC::Channel::ChannelImpl::ProcessIncomingMessages() (ipc_channel_posix.cc:496) IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) (ipc_channel_posix.cc:747) base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) (message_pump_libevent.cc:213) event_process_active (event.c:385) event_base_loop (event.c:522) base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) (message_pump_libevent.cc:331) MessageLoop::RunInternal() (message_loop.cc:215) MessageLoop::RunHandler() (message_loop.cc:208) MessageLoop::Run() (message_loop.cc:182) base::Thread::ThreadMain() (thread.cc:156) ThreadFunc(void*) (platform_thread_posix.cc:39) start_thread (pthread_create.c:308) clone (clone.S:112) This looks like an IPC buffer. Is it expected that it's (often) this big? Unfortunately it uses std::string which means we can't really measure it in memory reports.

Nicholas Nethercote [inactive]

Comment 14

•

12 years ago

I'm also seeing lots of variants on this within a single process: Unreported: 28 block(s) in record 3 of 12897 114,688 bytes (60,256 requested / 54,432 slop) 0.70% of the heap (3.15% cumulative unreported) at 0x402C2AF: malloc (vg_replace_malloc.c:270) by 0x418E03B: moz_xmalloc (mozalloc.cpp:54) by 0x54C1611: operator new[](unsigned long) (mozalloc.h:200) by 0x57ABE9E: nsJAR::nsJAR() (nsJAR.cpp:92) by 0x57B00E8: nsZipReaderCache::GetZip(nsIFile*, nsIZipReader**) (nsJAR.cpp:1092) by 0x57B409E: nsJARChannel::CreateJarInput(nsIZipReaderCache*) (nsJARChannel.cpp:276) by 0x57B4845: nsJARChannel::EnsureJarInput(bool) (nsJARChannel.cpp:357) by 0x57B55FA: nsJARChannel::AsyncOpen(nsIStreamListener*, nsISupports*) (nsJARChannel.cpp:702) by 0x582F077: imgLoader::LoadImage(nsIURI*, nsIURI*, nsIURI*, nsIPrincipal*, nsILoadGroup*, imgINotificationObserver*, nsISupports*, unsigned int, nsISupp orts*, imgIRequest*, nsIChannelPolicy*, imgIRequest**) (imgLoader.cpp:1716) by 0x5C5959D: nsContentUtils::LoadImage(nsIURI*, nsIDocument*, nsIPrincipal*, nsIURI*, imgINotificationObserver*, int, imgIRequest**) (nsContentUtils.cpp: 2764) by 0x5CF992A: nsImageLoadingContent::LoadImage(nsIURI*, bool, bool, nsIDocument*, unsigned int) (nsImageLoadingContent.cpp:664) by 0x5CF9475: nsImageLoadingContent::LoadImage(nsAString_internal const&, bool, bool) (nsImageLoadingContent.cpp:578) by 0x5EDED9A: nsHTMLImageElement::SetAttr(int, nsIAtom*, nsIAtom*, nsAString_internal const&, bool) (nsHTMLImageElement.cpp:378) by 0x5E94D7E: nsGenericHTMLElement::SetAttr(int, nsIAtom*, nsAString_internal const&, bool) (nsGenericHTMLElement.h:245) by 0x5E9DC7C: nsGenericHTMLElement::SetAttrHelper(nsIAtom*, nsAString_internal const&) (nsGenericHTMLElement.cpp:2871) by 0x5EDE3E5: nsHTMLImageElement::SetSrc(nsAString_internal const&) (nsHTMLImageElement.cpp:114) by 0x66ED8A2: nsIDOMHTMLImageElement_SetSrc(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, int, JS::MutableHandle<JS::Value>) (dom_quickstubs.cpp:13 179) by 0x7BD13A3: js::CallJSPropertyOpSetter(JSContext*, int (*)(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, int, JS::MutableHandle<JS::Value>), JS:: Handle<JSObject*>, JS::Handle<jsid>, int, JS::MutableHandle<JS::Value>) (jscntxtinlines.h:450) by 0x7BD26D3: js::Shape::set(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, bool, JS::MutableHandle<JS::Value>) (jsscopeinlines.h:333) by 0x7BE624D: js_NativeSet(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, js::Shape*, bool, bool, JS::Value*) (jsobj.cpp:4284) Lots of images are being loaded from JARs, and the unzipping requires memory(?) I don't see ones like this on desktop.

Thinker Li [:sinker]

Comment 15

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #12) > > Chrome process spend a lot of space on huge string. > > Near all of them are image data. > > Some of these at least are screenshots, which we're tracking in bug 798002 It seems working for me. The huge strings are dramatically dropped. Only system app still use data URI for images.

[:fabrice] Fabrice Desré

Comment 16

•

12 years ago

(In reply to Thinker Li [:sinker] from comment #15) > > Some of these at least are screenshots, which we're tracking in bug 798002 > It seems working for me. The huge strings are dramatically dropped. Only > system app still use data URI for images. The default background is stored as a data: URI in a setting. This is probably the one you're seeing.

Thinker Li [:sinker]

Comment 17

•

12 years ago

heap-dirty is 2.2~3.5MB for every process. I had try to reduce it by reducing opt_dirty_max to 256 from default value 1024. Then heap-dirty is dropped dramatically to 0.5MB~0.8xMB. I also measure boot time of otoro, I can not tell any change before and after the change (25s for both). opt_dirty_max can be changed to 256 by add a line of |export MALLOC_OPTIONS="ff"| in b2g.sh.

Gabriele Svelto [:gsvelto]

Comment 18

•

12 years ago

(In reply to Thinker Li [:sinker] from comment #17) > heap-dirty is 2.2~3.5MB for every process. I had try to reduce it by > reducing opt_dirty_max to 256 from default value 1024. Then heap-dirty is > dropped dramatically to 0.5MB~0.8xMB. I also measure boot time of otoro, I > can not tell any change before and after the change (25s for both). > > opt_dirty_max can be changed to 256 by add a line of |export > MALLOC_OPTIONS="ff"| in b2g.sh. We're trying to tackle this issue in bug 805855, I'm currently working on a patch that will reduce opt_dirty_max as you suggest as well as clear it completely when apps are sent to the background.

Gabriele Svelto [:gsvelto]

Comment 19

•

12 years ago

(In reply to Nicholas Nethercote [:njn] from comment #14) > Lots of images are being loaded from JARs, and the unzipping requires > memory(?) Yes, that should be expected, nsJAR creates an instance of nsZipArchive which in turn uses zlib for decompression, the comment here states that this requires 9520 + 32768 bytes per each decompression: http://mxr.mozilla.org/mozilla-central/source/modules/libjar/nsZipArchive.cpp#73 BTW this seems inconsistent with zlib's own documentation which states 11520 + 32768 (see under the Memory Footprint section): http://www.zlib.net/zlib_tech.html

Justin Lebar (not reading bugmail)

Reporter

Updated

•

12 years ago

Depends on: 806374

Justin Lebar (not reading bugmail)

Reporter

Updated

•

12 years ago

Depends on: 806377

Justin Lebar (not reading bugmail)

Reporter

Updated

•

12 years ago

Depends on: 806379

Justin Lebar (not reading bugmail)

Reporter

Updated

•

12 years ago

Depends on: 806383

Justin Lebar (not reading bugmail)

Reporter

Comment 20

•

12 years ago

(In reply to Fabrice Desré [:fabrice] from comment #16) > The default background is stored as a data: URI in a setting. This is > probably the one you're seeing. I filed bug 806374. > This looks like an IPC buffer. Is it expected that it's (often) this big? Unfortunately > it uses std::string which means we can't really measure it in memory reports. We should be able to use a custom allocator? I filed bug 806377. > Lots of images are being loaded from JARs, and the unzipping requires memory(?) Compressing images in JARs sounds pretty dumb. I wonder if we store the images in the jar with zero compression whether that will cause us to spin up the gzip instances. I filed bug 806379 for the dark matter and bug 806383 for reducing the memory usage here somehow.

Nicholas Nethercote [inactive]

Comment 21

•

12 years ago

The per-process overhead is non-trivial. Some examples (warning, 64-bit build which overstates things somewhat): ├─────415,464 B (02.27%) -- layout │ ├──365,320 B (01.99%) ── style-sheet-cache │ └───50,144 B (00.27%) ── style-sheet-service ├─────407,232 B (02.22%) -- xpcom │ ├──231,296 B (01.26%) ── component-manager │ ├──135,264 B (00.74%) ── effective-TLD-service │ └───40,672 B (00.22%) ── category-manager ├─────350,096 B (01.91%) ── atom-tables ├─────171,576 B (00.94%) ── xpconnect ├─────165,760 B (00.91%) ── script-namespace-manager ├─────165,264 B (00.90%) ── preferences ├──────36,864 B (00.20%) ── cycle-collector/collector-object ├──────21,712 B (00.12%) ── telemetry This is from the clock app, where (presumably) a lot of this stuff isn't exactly necessary.

Nicholas Nethercote [inactive]

Comment 22

•

12 years ago

(In reply to Nicholas Nethercote [:njn] from comment #21) > The per-process overhead is non-trivial. Some examples (warning, 64-bit > build which overstates things somewhat): > > ├─────415,464 B (02.27%) -- layout > │ ├──365,320 B (01.99%) ── style-sheet-cache I dug into this some more. Here are the sizes of each of the seven sheets within the cache: mFormsSheet: 66888 mFullScreenOverrideSheet: 752 mQuirkSheet: 48472 mScrollbarsSheet: 21152 mUASheet: 222504 mUserChromeSheet: 0 mUserContentSheet: 0 The UASheet is easily the biggest. I wonder if it can be made smaller?

Nicholas Nethercote [inactive]

Comment 23

•

12 years ago

(I forgot to mention that the style-sheet-cache numbers are the same for every process.)

Boris Zbarsky [:bzbarsky]

Comment 24

•

12 years ago

> I wonder if it can be made smaller? I wonder how much of the space is ua.css itself vs html.css and xul.css (which it imports). I'll bet money xul.css is the main reason this is taking so much space. :(

Justin Lebar (not reading bugmail)

Reporter

Comment 25

•

12 years ago

> I'll bet money xul.css is the main reason this is taking so much space. :( We don't have any xul in B2G content processes, and we have very little xul in the B2G main process. Could we coalesce these files and then remove the unnecessary bits, or do you think that's a losing game?

Boris Zbarsky [:bzbarsky]

Comment 26

•

12 years ago

> We don't have any xul in B2G content processes, No scrollbars? No video controls? I think getting data on whether my hunch is right would be good. If it is, we might be able to come up with a smaller xul.css for b2g, possibly.

Gregor Wagner [:gwagner]

Comment 27

•

12 years ago

We could also try to disable the system/user chunk separation for content processes. During app startup we allocate 1MB for the chrome and 1MB for content JS heap (4MB if we consider the alignment code) but maybe we never allocate more than 1MB JS objects in the first place for common apps.

Justin Lebar (not reading bugmail)

Reporter

Comment 28

•

12 years ago

> During app startup we allocate 1MB for the chrome and 1MB for content JS heap (4MB if we consider > the alignment code) but maybe we never allocate more than 1MB JS objects in the first place for > common apps. We should be careful not to conflate virtual memory usage and RSS. We allocate up to 4MB of virtual memory for these chunks, but much of that will not be committed. In fact, if different compartments can't share arenas (pages, in the JS engine), I don't see how merging the chunks would make a difference in RSS.

Ting-Yuan Huang

Updated

•

12 years ago

Depends on: 811671

Ting-Yuan Huang

Comment 29

•

12 years ago

There are 4 coefficient tables computed at runtime: Num: Value Size Type Bind Vis Ndx Name 194068: 012f3170 65536 OBJECT LOCAL DEFAULT 24 jpeg_nbits_table 180068: 012e0be4 65536 OBJECT LOCAL DEFAULT 24 _ZL17sPremultiplyTable 180066: 012d0be4 65536 OBJECT LOCAL DEFAULT 24 _ZL19sUnpremultiplyTable 14744: 012bb15c 41984 OBJECT LOCAL DEFAULT 24 _ZL18gUnicodeToGBKTable http://mxr.mozilla.org/mozilla-central/source/media/libjpeg/jchuff.c#24 http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxUtils.cpp#24 http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxUtils.cpp#25 http://mxr.mozilla.org/mozilla-central/source/intl/uconv/ucvcn/nsGBKConvUtil.cpp#18 They can be easily converted to constants to save 233KB .bss at the expense of elf size. Is it worth it? The following two dynamically allocated tables are actually redundant of the above, although they are in quite different source trees. http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/CanvasRenderingContext2D.cpp#3558 http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/CanvasRenderingContext2D.cpp#3362

Justin Lebar (not reading bugmail)

Reporter

Comment 30

•

12 years ago

> They can be easily converted to constants to save 233KB .bss at the expense of elf size. > Is it worth it? Probably, yes! Let's figure out the details in a new bug?

Ting-Yuan Huang

Updated

•

12 years ago

Depends on: 815473

Nicholas Nethercote [inactive]

Comment 31

•

12 years ago

I think this bug has served its purpose. Current B2G memory consumption excitement is over in bug 837187. Come join the party.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WORKSFORME

about:memory DUMP 1 12 years ago Justin Lebar (not reading bugmail) 33.12 KB, text/plain		Details
about:memory?verbose DUMP 1 12 years ago Justin Lebar (not reading bugmail) 170.38 KB, text/plain		Details
merged-reports DUMP 1 12 years ago Justin Lebar (not reading bugmail) 41.80 KB, application/x-xz		Details