Closed Bug 802446 Opened 12 years ago Closed 12 years ago

B2G memshrink brainstorming bug

Categories

(Firefox OS Graveyard :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: justin.lebar+bug, Unassigned)

References

(Depends on 2 open bugs)

Details

Attachments

(3 files)

In today's MemShrink meeting we decided we wanted a place to dissect memory dumps and find opportunities for improvement.  That's this bug.

I'll post some about:memory dumps from the device, but feel free to ask for specific use cases or provide your own.

To generate an about:memory dump, do the following:

* Update your B2G checkout (git fetch and merge; ./repo sync isn't sufficient)
* Run ./get-about-memory.py
* gunzip merged-reports.gz
* Open nightly on your desktop and load the file into about:memory (see button at the bottom of about:memory)
* Copy-paste the text into a file and attach it here.
Attached file about:memory DUMP 1
== DUMP 1 ==

I loaded a few apps but didn't interact with them much.  I loaded mozilla.org into the browser.
Attached file merged-reports DUMP 1
You can load this file into about:memory on your machine.
One idea I had was that we could take that hugetastical list of compartments and post it on dev.platform and see if people have ideas of things we could get rid of.  It would reach a broad number of people.  But I don't know how many people are going to dig through that list, so maybe such a scattershot approach won't be effective.
jlebar, can you mail dev.platform and point them to this bug?
(In reply to Nicholas Nethercote [:njn] from comment #5)
> jlebar, can you mail dev.platform and point them to this bug?

Can we get a few days' analysis here first?  We're nominally the experts in understanding what these numbers mean.
Initial thoughts:

- Shared libraries are big.  Fortunately the PSS numbers are substantially lower than the RSS numbers.

- JS dominates among the Gecko stuff.  DOM and layout hardly matter in comparison.

- 1 MiB of xpti-working-set per process is terrible.  Bug 799658 is open for that.

- heap-unclassified continues to be annoyingly high.  Recent reporter fixes (esp. bug 799796) should help a bit.
Another idea I had was to basically grep through the processes' private address spaces to see whether there's a lot of other memory we might be able to share.  But the trick would be identifying the owner of a page once we've found a candidate for sharing.
I find analysis-temporary get a big ratio of memory usage.  I had done some simple tests.  By cutting off default chunk size of LifoAlloc (LIFO_ALLOC_PRIMARY_CHUNK_SIZE) from 128K to 32K, it cuts off a lot of memory (5+%).  If we free analysis-temporary more aggressively, it cuts more, 8~10% I guess.
Depends on: 804891
Does the nsEffectiveTLDService need to be running in all processes?  Judging from DMDV's output it is, and on 64-bit builds it's slightly more than 128 KiB per process.
Chrome process spend a lot of space on huge string.  Most of them are used for data URI for 7.*MB.  Following are the list of compartments that use data URI.
 - BrowserElementParent.js (1.6MB)
 - contentSecurityPolicy.js (1.41MB)
 - CSPUtils.js (1.1MB)
 - system app (2.81MB)
Near all of them are image data.
> Chrome process spend a lot of space on huge string.
> Near all of them are image data.

Some of these at least are screenshots, which we're tracking in bug 798002 and dependencies.  But CSPUtils.js using screenshots sounds unlikely to me, so I dunno what that is.

It would be relatively easy to get a dump of all large strings and their associated compartments.  If you still see a lot of huge strings after we fix bug 802647, let me know and I'll work on this.

> Does the nsEffectiveTLDService need to be running in all processes?

We ought to be able to proxy those calls to the parent process; I can't imagine we make many calls into it.

I also have to imagine that we could compress its data structures.  (I say this without having ever looked at this code, but just in general...  :)
I'm seeing this a lot:

1 block(s) in record 1 of 12897
262,144 bytes (262,112 requested / 32 slop)
1.61% of the heap (1.61% cumulative unreported)
 malloc (vg_replace_malloc.c:270) 
 moz_xmalloc (mozalloc.cpp:54)
 operator new[](unsigned long) (mozalloc.h:200)
 std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
 std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
 std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
 std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
 IPC::Channel::ChannelImpl::ProcessIncomingMessages() (ipc_channel_posix.cc:496)
 IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) (ipc_channel_posix.cc:747)
 base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) (message_pump_libevent.cc:213)
 event_process_active (event.c:385)
 event_base_loop (event.c:522)
 base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) (message_pump_libevent.cc:331)
 MessageLoop::RunInternal() (message_loop.cc:215)
 MessageLoop::RunHandler() (message_loop.cc:208)
 MessageLoop::Run() (message_loop.cc:182)
 base::Thread::ThreadMain() (thread.cc:156)
 ThreadFunc(void*) (platform_thread_posix.cc:39)
 start_thread (pthread_create.c:308)
 clone (clone.S:112)

This looks like an IPC buffer.  Is it expected that it's (often) this big?  Unfortunately it uses std::string which means we can't really measure it in memory reports.
I'm also seeing lots of variants on this within a single process:

 Unreported: 28 block(s) in record 3 of 12897 
  114,688 bytes (60,256 requested / 54,432 slop)
  0.70% of the heap (3.15% cumulative unreported)
    at 0x402C2AF: malloc (vg_replace_malloc.c:270)
    by 0x418E03B: moz_xmalloc (mozalloc.cpp:54)
    by 0x54C1611: operator new[](unsigned long) (mozalloc.h:200)
    by 0x57ABE9E: nsJAR::nsJAR() (nsJAR.cpp:92)
    by 0x57B00E8: nsZipReaderCache::GetZip(nsIFile*, nsIZipReader**) (nsJAR.cpp:1092)
    by 0x57B409E: nsJARChannel::CreateJarInput(nsIZipReaderCache*) (nsJARChannel.cpp:276)
    by 0x57B4845: nsJARChannel::EnsureJarInput(bool) (nsJARChannel.cpp:357)
    by 0x57B55FA: nsJARChannel::AsyncOpen(nsIStreamListener*, nsISupports*) (nsJARChannel.cpp:702)
    by 0x582F077: imgLoader::LoadImage(nsIURI*, nsIURI*, nsIURI*, nsIPrincipal*, nsILoadGroup*, imgINotificationObserver*, nsISupports*, unsigned int, nsISupp
orts*, imgIRequest*, nsIChannelPolicy*, imgIRequest**) (imgLoader.cpp:1716)
    by 0x5C5959D: nsContentUtils::LoadImage(nsIURI*, nsIDocument*, nsIPrincipal*, nsIURI*, imgINotificationObserver*, int, imgIRequest**) (nsContentUtils.cpp:
2764)
    by 0x5CF992A: nsImageLoadingContent::LoadImage(nsIURI*, bool, bool, nsIDocument*, unsigned int) (nsImageLoadingContent.cpp:664)
    by 0x5CF9475: nsImageLoadingContent::LoadImage(nsAString_internal const&, bool, bool) (nsImageLoadingContent.cpp:578)
    by 0x5EDED9A: nsHTMLImageElement::SetAttr(int, nsIAtom*, nsIAtom*, nsAString_internal const&, bool) (nsHTMLImageElement.cpp:378)
    by 0x5E94D7E: nsGenericHTMLElement::SetAttr(int, nsIAtom*, nsAString_internal const&, bool) (nsGenericHTMLElement.h:245)
    by 0x5E9DC7C: nsGenericHTMLElement::SetAttrHelper(nsIAtom*, nsAString_internal const&) (nsGenericHTMLElement.cpp:2871)
    by 0x5EDE3E5: nsHTMLImageElement::SetSrc(nsAString_internal const&) (nsHTMLImageElement.cpp:114)
    by 0x66ED8A2: nsIDOMHTMLImageElement_SetSrc(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, int, JS::MutableHandle<JS::Value>) (dom_quickstubs.cpp:13
179)
    by 0x7BD13A3: js::CallJSPropertyOpSetter(JSContext*, int (*)(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, int, JS::MutableHandle<JS::Value>), JS::
Handle<JSObject*>, JS::Handle<jsid>, int, JS::MutableHandle<JS::Value>) (jscntxtinlines.h:450)
    by 0x7BD26D3: js::Shape::set(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, bool, JS::MutableHandle<JS::Value>) (jsscopeinlines.h:333)
    by 0x7BE624D: js_NativeSet(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, js::Shape*, bool, bool, JS::Value*) (jsobj.cpp:4284)

Lots of images are being loaded from JARs, and the unzipping requires memory(?)  I don't see ones like this on desktop.
(In reply to Justin Lebar [:jlebar] from comment #12)
> > Chrome process spend a lot of space on huge string.
> > Near all of them are image data.
> 
> Some of these at least are screenshots, which we're tracking in bug 798002
It seems working for me.  The huge strings are dramatically dropped.  Only system app still use data URI for images.
(In reply to Thinker Li [:sinker] from comment #15)

> > Some of these at least are screenshots, which we're tracking in bug 798002
> It seems working for me.  The huge strings are dramatically dropped.  Only
> system app still use data URI for images.

The default background is stored as a data: URI in a setting. This is probably the one you're seeing.
heap-dirty is 2.2~3.5MB for every process.  I had try to reduce it by reducing opt_dirty_max to 256 from default value 1024.  Then heap-dirty is dropped dramatically to 0.5MB~0.8xMB.  I also measure boot time of otoro, I can not tell any change before and after the change (25s for both).

opt_dirty_max can be changed to 256 by add a line of |export MALLOC_OPTIONS="ff"| in b2g.sh.
(In reply to Thinker Li [:sinker] from comment #17)
> heap-dirty is 2.2~3.5MB for every process.  I had try to reduce it by
> reducing opt_dirty_max to 256 from default value 1024.  Then heap-dirty is
> dropped dramatically to 0.5MB~0.8xMB.  I also measure boot time of otoro, I
> can not tell any change before and after the change (25s for both).
> 
> opt_dirty_max can be changed to 256 by add a line of |export
> MALLOC_OPTIONS="ff"| in b2g.sh.

We're trying to tackle this issue in bug 805855, I'm currently working on a patch that will reduce opt_dirty_max as you suggest as well as clear it completely when apps are sent to the background.
(In reply to Nicholas Nethercote [:njn] from comment #14)
> Lots of images are being loaded from JARs, and the unzipping requires
> memory(?)

Yes, that should be expected, nsJAR creates an instance of nsZipArchive which in turn uses zlib for decompression, the comment here states that this requires 9520 + 32768 bytes per each decompression:

http://mxr.mozilla.org/mozilla-central/source/modules/libjar/nsZipArchive.cpp#73

BTW this seems inconsistent with zlib's own documentation which states 11520 + 32768 (see under the Memory Footprint section):

http://www.zlib.net/zlib_tech.html
Depends on: 806374
Depends on: 806377
Depends on: 806379
Depends on: 806383
(In reply to Fabrice Desré [:fabrice] from comment #16)
> The default background is stored as a data: URI in a setting. This is
> probably the one you're seeing.

I filed bug 806374.

> This looks like an IPC buffer.  Is it expected that it's (often) this big?  Unfortunately 
> it uses std::string which means we can't really measure it in memory reports.

We should be able to use a custom allocator?  I filed bug 806377.

> Lots of images are being loaded from JARs, and the unzipping requires memory(?)

Compressing images in JARs sounds pretty dumb.  I wonder if we store the images in the jar with zero compression whether that will cause us to spin up the gzip instances.

I filed bug 806379 for the dark matter and bug 806383 for reducing the memory usage here somehow.
The per-process overhead is non-trivial.  Some examples (warning, 64-bit build which overstates things somewhat):

├─────415,464 B (02.27%) -- layout
│     ├──365,320 B (01.99%) ── style-sheet-cache
│     └───50,144 B (00.27%) ── style-sheet-service
├─────407,232 B (02.22%) -- xpcom
│     ├──231,296 B (01.26%) ── component-manager
│     ├──135,264 B (00.74%) ── effective-TLD-service
│     └───40,672 B (00.22%) ── category-manager
├─────350,096 B (01.91%) ── atom-tables
├─────171,576 B (00.94%) ── xpconnect
├─────165,760 B (00.91%) ── script-namespace-manager
├─────165,264 B (00.90%) ── preferences
├──────36,864 B (00.20%) ── cycle-collector/collector-object
├──────21,712 B (00.12%) ── telemetry

This is from the clock app, where (presumably) a lot of this stuff isn't exactly necessary.
(In reply to Nicholas Nethercote [:njn] from comment #21)
> The per-process overhead is non-trivial.  Some examples (warning, 64-bit
> build which overstates things somewhat):
> 
> ├─────415,464 B (02.27%) -- layout
> │     ├──365,320 B (01.99%) ── style-sheet-cache

I dug into this some more.  Here are the sizes of each of the seven sheets within the cache:

 mFormsSheet:              66888
 mFullScreenOverrideSheet:   752
 mQuirkSheet:              48472
 mScrollbarsSheet:         21152
 mUASheet:                222504
 mUserChromeSheet:             0
 mUserContentSheet:            0

The UASheet is easily the biggest.  I wonder if it can be made smaller?
(I forgot to mention that the style-sheet-cache numbers are the same for every process.)
> I wonder if it can be made smaller?

I wonder how much of the space is ua.css itself vs html.css and xul.css (which it imports).  I'll bet money xul.css is the main reason this is taking so much space.  :(
> I'll bet money xul.css is the main reason this is taking so much space.  :(

We don't have any xul in B2G content processes, and we have very little xul in the B2G main process.  Could we coalesce these files and then remove the unnecessary bits, or do you think that's a losing game?
> We don't have any xul in B2G content processes,

No scrollbars?  No video controls?

I think getting data on whether my hunch is right would be good.  If it is, we might be able to come up with a smaller xul.css for b2g, possibly.
We could also try to disable the system/user chunk separation for content processes.

During app startup we allocate 1MB for the chrome and 1MB for content JS heap (4MB if we consider the alignment code) but maybe we never allocate more than 1MB JS objects in the first place for common apps.
> During app startup we allocate 1MB for the chrome and 1MB for content JS heap (4MB if we consider 
> the alignment code) but maybe we never allocate more than 1MB JS objects in the first place for 
> common apps.

We should be careful not to conflate virtual memory usage and RSS.  We allocate up to 4MB of virtual memory for these chunks, but much of that will not be committed.

In fact, if different compartments can't share arenas (pages, in the JS engine), I don't see how merging the chunks would make a difference in RSS.
Depends on: 811671
There are 4 coefficient tables computed at runtime:

   Num:    Value  Size Type    Bind   Vis      Ndx Name
194068: 012f3170 65536 OBJECT  LOCAL  DEFAULT   24 jpeg_nbits_table
180068: 012e0be4 65536 OBJECT  LOCAL  DEFAULT   24 _ZL17sPremultiplyTable
180066: 012d0be4 65536 OBJECT  LOCAL  DEFAULT   24 _ZL19sUnpremultiplyTable
 14744: 012bb15c 41984 OBJECT  LOCAL  DEFAULT   24 _ZL18gUnicodeToGBKTable

http://mxr.mozilla.org/mozilla-central/source/media/libjpeg/jchuff.c#24
http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxUtils.cpp#24
http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxUtils.cpp#25
http://mxr.mozilla.org/mozilla-central/source/intl/uconv/ucvcn/nsGBKConvUtil.cpp#18

They can be easily converted to constants to save 233KB .bss at the expense of elf size. Is it worth it?

The following two dynamically allocated tables are actually redundant of the above, although they are in quite different source trees.
http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/CanvasRenderingContext2D.cpp#3558
http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/CanvasRenderingContext2D.cpp#3362
> They can be easily converted to constants to save 233KB .bss at the expense of elf size. 
> Is it worth it?

Probably, yes!  Let's figure out the details in a new bug?
Depends on: 815473
I think this bug has served its purpose.  Current B2G memory consumption excitement is over in bug 837187.  Come join the party.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: