Closed Bug 801580 Opened 12 years ago Closed 12 years ago

Add support for dumping a heap profile using bionic's trace-malloc equivalent on B2G

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: alan.yenlin.huang, Assigned: justin.lebar+bug)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 5 obsolete files)

Android provide a way to monitor/profile memory allocation of native heap. Setting property "libc.debug.malloc" to 1/5/10 and using DDMS with certain config could monitor/profile memory allocations in native heap. 

Setting property "libc.debug.malloc" to 1/5/10 (and disable jemalloc) to collect certain information is still available. But there's no mechanism to dump those information. We are trying to let b2g process react to certain signal, and using "/proc/[pid]/maps" and get_malloc_leak_info() to these information.
What information does this Android native heap dump provide that about:memory doesn't currently provide.

Would it be possible to add that information to about:memory?  (We can do so if we can access this information from Gecko.)
(In reply to Justin Lebar [:jlebar] from comment #1)
> What information does this Android native heap dump provide that
> about:memory doesn't currently provide.
> 
> Would it be possible to add that information to about:memory?  (We can do so
> if we can access this information from Gecko.)

I think it would be possible to add more information in about:memory. Currently, I have a patch to dump all malloc backtrace. By doing so, we could know all anonymous page's allocation. I'll attach patch later.

Also, if partners met memory leak in their loaded library in hal, it would be much easier for them to get those info.

Could we also get "dumpstate" while doing "get-about-memory.py"? procmem/procrank/librank/showmap/... etc. are also important for memory profiling.
Assignee: nobody → ahuang
What information does this Android native heap dump provide that about:memory doesn't 
currently provide?

> Could we also get "dumpstate" while doing "get-about-memory.py"? procmem/procrank/librank
> /showmap/... etc. are also important for memory profiling.

I made that tool to work for B2G -- I had no idea it worked for Fennec on Android!

I'm fine capturing whatever additional information you want, so long as doing so doesn't break B2G.  We already capture procrank.  Although I think all of that info should already be in the memory report.
This patch let b2g uses bionic's get_malloc_leak_info() to retrieve all malloc's back trace. This would be helpful to dig out anonymous page's memory consumption, including b2g hal loaded library. 

The other concern is that b2g uses jemalloc (we may need use gecko/tools/trace-memory to get back trace.) But partners' hal loaded library will use dlmalloc provided by bionic. To let correct information aligned, we need to disable jemalloc when debugging this.
(DISABLE_JEMALLOC = 1)

Since enable B2G optimize may cause (unexpected) incorrect message when parsing symbols, we also need to disable optimization when debugging.
(B2G_NOOPT = 1)

After patching b2g, we need below command to dump b2g's malloc back trace and b2g's VMA mapping:

adb shell "echo "libc.debug.malloc=1" >> /data/local.prop"
(Using libc's debugging library)
adb reboot
adb shell kill -12 [b2g's pid]
(This patch makes b2g react to signal 12 and dump back trace)
adb shell cat /proc/[b2g's pid]/maps > pid-[b2g's pid].maps
(Dump VMA maps)
adb pull /data/misc/pid-[b2g's pid].nhprof
(Pull back traces out)

We could use *.nhprof and *.maps with addr2line/symbol files to translate address.

I think a tool to parse *.nhprof and *.maps with addr2line/symbol files is also needed. I'm working on it, too.
Attachment #676527 - Flags: feedback?(justin.lebar+bug)
Correct typo.
Attachment #676527 - Attachment is obsolete: true
Attachment #676527 - Flags: feedback?(justin.lebar+bug)
Attachment #676924 - Flags: feedback?(justin.lebar+bug)
Okay...  I thought this bug was about Firefox on Android, but clearly I was mistaken!  That explains a lot.  :)

I have some questions about this which I'd appreciate your help answering.

> partners' hal loaded library will use dlmalloc provided by bionic.

1) Are you sure about that?  How do we prevent allocator mismatches when hal sends Gecko a pointer to free, or vice versa?  I thought we explicitly prevent other libraries from using anything other than jemalloc.  For example, libc uses jemalloc for things like strdup.

2) How is this information gathered here different from what Gecko's trace-malloc gathers?

3) Can you please attach an example of the output that's generated (translated as best you can with your current tools)?
Flags: needinfo?(ahuang)
Isn't jemalloc LD_PRELOADed on b2g? If it is, then there should be no native heap except maybe libc's own allocations.
(In reply to Mike Hommey [:glandium] from comment #7)
> Isn't jemalloc LD_PRELOADed on b2g? If it is, then there should be no native
> heap except maybe libc's own allocations.

"export LD_PRELOAD=/system/b2g/libmozglue.so" in /system/bin/b2g.sh tells it is.
(In reply to Alan Huang from comment #4)
> The other concern is that b2g uses jemalloc (we may need use
> gecko/tools/trace-malloc to get back trace.)

It seems I misunderstood purpose of "trace-malloc". It is not used to get back trace of jemalloc. For b2g using mozjemalloc, it seems we cannot get back trace now.
Flags: needinfo?(ahuang)
> It seems I misunderstood purpose of "trace-malloc". It is not used to get back trace of jemalloc.

I think right now trace-malloc always uses the system allocator, but I don't think it would be too hard to get it to use jemalloc, if we wanted.
(In reply to Justin Lebar [:jlebar] from comment #10)
> > It seems I misunderstood purpose of "trace-malloc". It is not used to get back trace of jemalloc.
> 
> I think right now trace-malloc always uses the system allocator, but I don't
> think it would be too hard to get it to use jemalloc, if we wanted.

It would actually be tricky on b2g/linux. Less so after bug 804303, but that will only support jemalloc 3.
Comment on attachment 676924 [details] [diff] [review]
Patch for b2g to dump all (dl)malloc's backtrace.

Clearing f? until the questions in comment 6 are answered.
Attachment #676924 - Flags: feedback?(justin.lebar+bug)
Attached image ddms_native_heap_dump (obsolete) —
This is a screen shot of android ddms. This native heap dump uses ddm protocol, so we cannot use this tool.
(In reply to Justin Lebar [:jlebar] from comment #6)
> Okay...  I thought this bug was about Firefox on Android, but clearly I was
> mistaken!  That explains a lot.  :)
> 
> I have some questions about this which I'd appreciate your help answering.
> 
> > partners' hal loaded library will use dlmalloc provided by bionic.
> 
> 1) Are you sure about that?  How do we prevent allocator mismatches when hal
> sends Gecko a pointer to free, or vice versa?  I thought we explicitly
> prevent other libraries from using anything other than jemalloc.  For
> example, libc uses jemalloc for things like strdup.
> 
As bug 801571 which is the case of comment 7. But I think the problem is we have no mechanism to dump back trace when debugging possible memory leak in b2g, which is question partners keep asking ... . So I'm trying to use bionic libc's memory leak debugging mechanism, which could include bug 801571's case.

> 2) How is this information gathered here different from what Gecko's
> trace-malloc gathers?
> 
> 3) Can you please attach an example of the output that's generated
> (translated as best you can with your current tools)?

I attached a screen shot as an example.
> As bug 801571 which is the case of comment 7.

Okay, but it sounds like we all agree bug 801571 is something we need to fix.

> But I think the problem is we have no mechanism to dump back trace when debugging 
> possible memory leak in b2g, which is question partners keep asking

Well, we have trace-malloc, but this thing looks nicer than that to me.

glandium, would you mind taking this f?/review?  I don't feel like I know enough to make an informed decision here.
...although I'd probably want this code to live inside nsMemoryInfoDumper.cpp, where we already have glue for listening to signals, etc.  (It's not clear to me that the bionic malloc would be happy if we dumped its heap while it was inside a call to malloc() on that thread; the code in nsMemoryInfoDumper.cpp could insure that we handle the signal at a safe point in time.)
Comment on attachment 676924 [details] [diff] [review]
Patch for b2g to dump all (dl)malloc's backtrace.

Putting in my queue not to forget about it.
Attachment #676924 - Flags: feedback?(mh+mozilla)
Comment on attachment 676924 [details] [diff] [review]
Patch for b2g to dump all (dl)malloc's backtrace.

Review of attachment 676924 [details] [diff] [review]:
-----------------------------------------------------------------

It's still not clear to me what this is supposed to help with, considering b2g uses jemalloc on gonk.

::: b2g/app/malloc_profile.h
@@ +1,4 @@
> +#ifndef __DUTILS_H__
> +#define __DUTILS_H__
> +
> +__BEGIN_DECLS

You need #include <sys/cdefs.h> before using __BEGIN_DECLS. But we usually do a "manual"
#ifdef __cplusplus
extern "C" {
#endif

::: b2g/app/nsBrowserApp.cpp
@@ +158,5 @@
>  int main(int argc, char* argv[])
>  {
> +
> +#ifdef MOZ_WIDGET_GONK
> +  enableDumpMallocProfile(12);

12 is SIGUSR2. SIGUSR2 is used for the profiler.
Attachment #676924 - Flags: feedback?(mh+mozilla) → feedback-
(In reply to Mike Hommey [:glandium] from comment #18)
> Comment on attachment 676924 [details] [diff] [review]
> Patch for b2g to dump all (dl)malloc's backtrace.
> 
> Review of attachment 676924 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It's still not clear to me what this is supposed to help with, considering
> b2g uses jemalloc on gonk.
> 
> ::: b2g/app/malloc_profile.h
> @@ +1,4 @@
> > +#ifndef __DUTILS_H__
> > +#define __DUTILS_H__
> > +
> > +__BEGIN_DECLS
> 
> You need #include <sys/cdefs.h> before using __BEGIN_DECLS. But we usually
> do a "manual"
> #ifdef __cplusplus
> extern "C" {
> #endif

Note we also have MOZ_BEGIN_EXTERN_C/MOZ_END_EXTERN_C in mozilla/Types.h
Depends on: 808932
This may be our best bet for finding dark matter on devices, short of native DMD.  I'm going to see if I have any luck with it for bug 810105.
Blocks: slim-fast
Summary: There's no mechanism to dump memory allocation of native heap. → Add support for dumping a heap profile using bionic's trace-malloc equivalent on B2G
> This is a screen shot of android ddms. This native heap dump uses ddm protocol, so we cannot use 
> this tool.

Okay...given a dump collected using this tool, how do I open it?
Oh, I see:

> I think a tool to parse *.nhprof and *.maps with addr2line/symbol files is also needed. I'm working > on it, too.
tools/rb/fix-linux-stack.pl might be of use.  If so, also see bug 812070.
I looked through the relevant source in bionic here ($B2G_ROOT/bionic/libc/bionic/malloc_debug*), and bionic is calling _Unwind_Backtrace to get its backtraces.

I think we've established that doesn't work for our builds, for some mysterious reason.  I'll see if I can double-check this conclusion.

If we cannot get _Unwind_Backtrace to work, we should close this bug, as it won't be useful.
Actually, the stacks I'm getting from _Unwind_Backtrace with -funwind-tables seem pretty sane (at first glance, anyway).
Hooray, I finally got this to work.  I'm getting a reasonable-looking set of stacktraces and malloc blocks.

We still want to fix bug 717853, but this should suffice in a pinch.  I'll post patches once they get into a less-embarrassing state.
Assignee: ahuang → justin.lebar+bug
One problem we have is that the stacks I'm getting out of _Unwind_Backtrace are pretty crappy.  See bug 810105 comment 33, for example.

I'm currently compiling with -funwind-tables; that's the only change I've made to facilitate _Unwind_Backtrace.

I have a number of ideas for trying to improve our stacks.

1) Use --no-merge-exidx-entries.

Perhaps the stack weirdness we're seeing is due to arm/thumb transitions.  I understand that we must pass --no-merge-exidx-entries in order to handle these correctly (1?), but our prebuilt linker is apparently too old to support that flag.  So I think I'd have to upgrade our toolchain in order to add this flag.

2) Use -marm.

If arm/thumb transitions are messing us up, perhaps we could build with ARM
instructions only.  I build with --with-thumb=no and my build seems to start up fine, but it crashes before the screen turns on.  Here's the logcat tail:

> I/nsVolumeService( 9073): UpdateVolume: 'sdcard' state Mounted @ '/mnt/sdcard'
> E/GeckoConsole( 9073): [JavaScript Error: "lock._transaction.objectStore is not a function" {file: "jar:file:///system/b2g/omni.ja!/components/SettingsService.js" line: 44}]
> F/libc    ( 9073): Fatal signal 4 (SIGILL) at 0x4170ee72 (code=1)
> I/DEBUG   ( 9074): debuggerd committing suicide to free the zombie!

In GDB (a separate run, so the addresses aren't the same), I saw us SIGILL at

   0x416bae70 <+128>:	b	0x416bae64 <js::XDRState<(js::XDRMode)0>::codeScript(JS::MutableHandle<JSScript*>)+116>
   0x416bae74 <+132>:	mov	r1, #4

and the pc is 0x416bae72, i.e., right between the instructions.  So...that's not very good.  It's unclear how we're getting here, because the stack is of course unreadable by GDB.

Full GDB output is at the bottom of this comment.

I'm not sure how to proceed here.

3) Use -mtpcs-frame, -mtpcs-leaf-frame.

Again from [1], maybe this will fix the arm/thumb problem.

4) Compile with -fno-omit-frame-pointer and write our own backtrace function.

If we got a consistent stack frame, this might not be too hard.  But I don't know what the ABI says about how the stack frames are supposed to look.

5) Fix breakpad (bug 779291).


glandium, do you have any thoughts about what's likely to be the most expeditious way forward here (among the options listed, or not), or tips on what I'm doing wrong?


[1] http://readlist.com/lists/gcc.gnu.org/gcc-help/3/19270.html

Full GDB output of -marm SIGILL (from (2) above):

> Program received signal SIGILL, Illegal instruction.
> 0x416bae72 in js::XDRState<(js::XDRMode)0>::codeScript (this=0x43d55070, scriptp=<value optimized out>)
>     at ../../../../ff-git2/src/js/src/vm/Xdr.cpp:146
> 146	        return false;
> (gdb) bt
> #0  0x416bae72 in js::XDRState<(js::XDRMode)0>::codeScript (this=0x43d55070, 
>     scriptp=<value optimized out>) at ../../../../ff-git2/src/js/src/vm/Xdr.cpp:146
> #1  0x43415320 in ?? ()
> Cannot access memory at address 0x37ffa
> (gdb) disassemble
> Dump of assembler code for function _ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE:
>    0x416badf0 <+0>:	ldr	r3, [r0, #8]
>    0x416badf4 <+4>:	ldr	r2, [r0, #12]
>    0x416badf8 <+8>:	ldr	r1, [r1]
>    0x416badfc <+12>:	rsb	r2, r3, r2
>    0x416bae00 <+16>:	push	{r4, lr}
>    0x416bae04 <+20>:	cmp	r2, #3
>    0x416bae08 <+24>:	sub	sp, sp, #16
>    0x416bae0c <+28>:	mov	r4, r0
>    0x416bae10 <+32>:	str	r1, [sp, #12]
>    0x416bae14 <+36>:	bls	0x416bae74 <_ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE+132>
>    0x416bae18 <+40>:	cmp	r3, #0
>    0x416bae1c <+44>:	add	r2, r3, #4
>    0x416bae20 <+48>:	str	r2, [r4, #8]
>    0x416bae24 <+52>:	beq	0x416bae6c <_ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE+124>
>    0x416bae28 <+56>:	add	r1, sp, #16
>    0x416bae2c <+60>:	movw	r2, #49241	; 0xc059
>    0x416bae30 <+64>:	movt	r2, #47475	; 0xb973
>    0x416bae34 <+68>:	mov	r0, r3
>    0x416bae38 <+72>:	str	r2, [r1, #-8]!
>    0x416bae3c <+76>:	mov	r2, #4
>    0x416bae40 <+80>:	bl	0x40583038
>    0x416bae44 <+84>:	add	r3, sp, #12
>    0x416bae48 <+88>:	str	r3, [sp]
>    0x416bae4c <+92>:	mov	r0, r4
>    0x416bae50 <+96>:	ldr	r3, [pc, #52]	; 0x416bae8c <_ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE+156>
>    0x416bae54 <+100>:	ldr	r1, [pc, r3]
>    0x416bae58 <+104>:	mov	r2, r1
>    0x416bae5c <+108>:	mov	r3, r1
>    0x416bae60 <+112>:	bl	0x415b6aec <_ZN2js9XDRScriptILNS_7XDRModeE0EEEbPNS_8XDRStateIXT_EEEN2JS6HandleIP8JSObjectEENS6_IP8JSScriptEENS6_IP10JSFunctionEENS5_13MutableHandleISB_EE>
>    0x416bae64 <+116>:	add	sp, sp, #16
>    0x416bae68 <+120>:	pop	{r4, pc}
>    0x416bae6c <+124>:	mov	r0, #0
>    0x416bae70 <+128>:	b	0x416bae64 <_ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE+116>
>    0x416bae74 <+132>:	mov	r1, #4
>    0x416bae78 <+136>:	bl	0x416ba854 <_ZN2js9XDRBuffer4growEj>
>    0x416bae7c <+140>:	cmp	r0, #0
>    0x416bae80 <+144>:	beq	0x416bae6c <_ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE+124>
>    0x416bae84 <+148>:	ldr	r3, [r4, #8]
>    0x416bae88 <+152>:	b	0x416bae18 <_ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE+40>
>    0x416bae8c <+156>:	rsbeq	r4, r0, r8, lsl lr
> End of assembler dump.
> (gdb) info registers
> r0             0x20	32
> r1             0x42cce1e0	1120723424
> r2             0x43d55070	1138053232
> r3             0x430ae000	1124786176
> r4             0x43d55070	1138053232
> r5             0x42cce1e0	1120723424
> r6             0xffff	65535
> r7             0x42afc034	1118814260
> r8             0x1	1
> r9             0xfffffa68	-1432
> r10            0x42cce1e0	1120723424
> r11            0x42cce1e0	1120723424
> r12            0x0	0
> sp             0xbec0d190	0xbec0d190
> lr             0x416baf58	1097576280
> pc             0x416bae72	0x416bae72 <js::XDRState<(js::XDRMode)0>::codeScript(JS::MutableHandle<JSScript*>)+130>
> cpsr           0x20000030	536870960
> One problem we have is that the stacks I'm getting out of _Unwind_Backtrace
> are pretty crappy. 

More specifically:

- Some of them look fine.

- Some of them look ok but rotated in that abovementioned fashion.

- Some of them are bizarre -- they look like several calls to _Unwind_Backtrace were racing to write into a single stack trace buffer.  E.g. several contiguous frames will make a sensible sequence, then there's a jump to some unrelated place, then a few more sensibly-sequenced frames, then another jump, etc.  But jlebar assures me that _Unwind_Backtrace is serialized via a lock.


> I have a number of ideas for trying to improve our stacks.

You could also try a debug build.  The chance of it helping mightn't be high, but it's easy to try.
Just so nobody needs to take my word for it, here's the code from bionic's
malloc_debug_leak.c.  (I tried and couldn't find a linkable copy online, but
you can git clone https://android.googlesource.com/platform/bionic if you
want.)

>void* leak_malloc(size_t bytes) 
>{   
>    void* base = dlmalloc(bytes + sizeof(AllocationEntry));
>    if (base != NULL) {
>        pthread_mutex_lock(&gAllocationsMutex);
>
>            intptr_t backtrace[BACKTRACE_SIZE];
>            size_t numEntries = get_backtrace(backtrace, BACKTRACE_SIZE); // calls _Unwind_Backtrace
>
>            AllocationEntry* header = (AllocationEntry*)base;
>            header->entry = record_backtrace(backtrace, numEntries, bytes);
>            header->guard = GUARD;
>            base = (AllocationEntry*)base + 1;
>
>        pthread_mutex_unlock(&gAllocationsMutex);
>    }
>
>    return base;
>}
> Just so nobody needs to take my word for it

I believed you, but I thought it worth mentioning how we considered and dismissed this possibility :)
> 1) Use --no-merge-exidx-entries.

I just did some archaeology in the binutils repo, and I'm pretty sure that the default in old linkers was to merge these entries.  So it's possible that updating our linker and passing this flag will help.

> 3) Use -mtpcs-frame, -mtpcs-leaf-frame.

This SIGILL'ed.  I didn't get a stack, but I'm happy to if anyone is curious.
> I just did some archaeology in the binutils repo

--no-merge-exidx-entries was introduced in binutils 2.21, and our prebuilt binaries are from binutils 2.20.

How crazy would it be to try to update the prebuilt binaries and see if that helps our stacks?
Glandium comes to the rescue a year ago: http://glandium.org/blog/?p=2146 has a prebuilt ld binary which has --no-merge-exidx-entries!
Getting there...

Now (with a 4.6.1 toolchain) linking libxul fails with

> /home/jlebar/code/moz/ff-git2/src/modules/freetype2/src/gzip/ftgzip.c:295: error: undefined reference to 'inflateInit2_'
> /home/jlebar/code/moz/ff-git2/src/modules/freetype2/src/gzip/ftgzip.c:310: error: undefined reference to 'inflateEnd'
> /home/jlebar/code/moz/ff-git2/src/modules/freetype2/src/gzip/ftgzip.c:339: error: undefined reference to 'inflateReset'
> /home/jlebar/code/moz/ff-git2/src/modules/freetype2/src/gzip/ftgzip.c:413: error: undefined reference to 'inflate'

I'd expect this to be part of zlib, which it seems we intend to build as modules/zlib/src/libmozz.a.  We pass that path to ld, but that library does not exist.

> $ find . -name 'libmozz*'
> ./modules/zlib/src/.deps/libmozz.a.desc.pp
> ./modules/zlib/src/libmozz.a.desc

Her's our invocation of ld:

/home/jlebar/code/moz/B2G/objdir-gecko/_virtualenv/bin/python /home/jlebar/code/moz/ff-git2/src/config/pythonpath.py -I../../config /home/jlebar/code/moz/ff-git2/src/config/expandlibs_exec.py --depend .deps/libxul.so.pp --target libxul.so --uselist --  /usr/local/bin/ccache /home/jlebar/code/moz/B2G/prebuilt/linux-x86/toolchain/arm-linux-androideabi-4.6.1/bin/arm-linux-androideabi-g++ -DANDROID -isystem /home/jlebar/code/moz/B2G/bionic/libc/arch-arm/include -isystem /home/jlebar/code/moz/B2G/bionic/libc/include/ -isystem /home/jlebar/code/moz/B2G/bionic/libc/kernel/common -isystem /home/jlebar/code/moz/B2G/bionic/libc/kernel/arch-arm -isystem /home/jlebar/code/moz/B2G/bionic/libm/include -I/home/jlebar/code/moz/B2G/frameworks/base/opengl/include -I/home/jlebar/code/moz/B2G/frameworks/base/native/include -I/home/jlebar/code/moz/B2G/hardware/libhardware/include -I/home/jlebar/code/moz/B2G/hardware/libhardware_legacy/include -I/home/jlebar/code/moz/B2G/system -I/home/jlebar/code/moz/B2G/system/core/include -isystem /home/jlebar/code/moz/B2G/bionic -I/home/jlebar/code/moz/B2G/frameworks/base/include -I/home/jlebar/code/moz/B2G/external/dbus -I/home/jlebar/code/moz/B2G/external/bluetooth/bluez/lib  -I/home/jlebar/code/moz/B2G/frameworks/base/services/sensorservice -I/home/jlebar/code/moz/B2G/frameworks/base/services/camera -I/home/jlebar/code/moz/B2G/system/media/wilhelm/include -I/home/jlebar/code/moz/B2G/frameworks/base/include/media/stagefright -I/home/jlebar/code/moz/B2G/frameworks/base/include/media/stagefright/openmax -I/home/jlebar/code/moz/B2G/frameworks/base/media/libstagefright/rtsp -I/home/jlebar/code/moz/B2G/frameworks/base/media/libstagefright/include -I/home/jlebar/code/moz/B2G/dalvik/libnativehelper/include/nativehelper -pedantic -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Wtype-limits -Wempty-body -Wno-ctor-dtor-privacy -Wno-overlength-strings -Wno-invalid-offsetof -Wno-variadic-macros -Wno-long-long -mandroid -fno-short-enums -fno-exceptions -Wno-psabi -DMOZ_ENABLE_JS_DUMP -include /home/jlebar/code/moz/B2G/gonk-misc/Unicode.h -funwind-tables -I/home/jlebar/code/moz/B2G/ndk/sources/cxx-stl/stlport/stlport/ -I/home/jlebar/code/moz/B2G/external/stlport/stlport/ -march=armv7-a -mthumb -mfpu=neon -mfloat-abi=softfp -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -std=gnu++0x -pipe  -DNDEBUG -DTRIMMED -g -Os -freorder-blocks -fno-reorder-functions -fomit-frame-pointer -fPIC -shared -Wl,-z,defs -Wl,-h,libxul.so -o libxul.so  nsStaticXULComponents.o nsUnicharUtils.o nsBidiUtils.o nsSpecialCasingData.o nsUnicodeProperties.o nsRDFResource.o    -mandroid -L/home/jlebar/code/moz/B2G/out/target/product/otoro/obj/lib -Wl,-rpath-link=/home/jlebar/code/moz/B2G/out/target/product/otoro/obj/lib --sysroot=/home/jlebar/code/moz/B2G/out/target/product/otoro/obj/ -Wl,--no-merge-exidx-entries -mthumb -Wl,-z,noexecstack -Wl,--icf=safe   -Wl,-rpath-link,/home/jlebar/code/moz/B2G/objdir-gecko/dist/bin -Wl,-rpath-link,/usr/local/lib  -L/home/jlebar/code/moz/B2G/objdir-gecko/dist/lib -lmozglue -Wl,--wrap=getaddrinfo,--wrap=freeaddrinfo,--wrap=gai_strerror -Wl--wrap=PR_GetEnv,--wrap=PR_SetEnv ../../toolkit/components/osfile/libosfile_s.a ../../toolkit/xre/libxulapp_s.a  ../../staticlib/components/libnecko.a ../../staticlib/components/libuconv.a ../../staticlib/components/libi18n.a ../../staticlib/components/libchardet.a ../../staticlib/components/libjar50.a ../../staticlib/components/libstartupcache.a ../../staticlib/components/libpref.a ../../staticlib/components/libhtmlpars.a ../../staticlib/components/libidentity.a ../../staticlib/components/libimglib2.a ../../staticlib/components/libmediasniffer.a ../../staticlib/components/libgkgfx.a ../../staticlib/components/libgklayout.a ../../staticlib/components/libdocshell.a ../../staticlib/components/libembedcomponents.a ../../staticlib/components/libwebbrwsr.a ../../staticlib/components/libnsappshell.a ../../staticlib/components/libtxmgr.a ../../staticlib/components/libcommandlines.a ../../staticlib/components/libtoolkitcomps.a ../../staticlib/components/libpipboot.a ../../staticlib/components/libpipnss.a ../../staticlib/components/libappcomps.a ../../staticlib/components/libjsreflect.a ../../staticlib/components/libcomposer.a ../../staticlib/components/libtelemetry.a ../../staticlib/components/libjsinspector.a ../../staticlib/components/libjsdebugger.a ../../staticlib/components/libstoragecomps.a ../../staticlib/components/librdf.a ../../staticlib/components/libwindowds.a ../../staticlib/components/libjsctypes.a ../../staticlib/components/libjsperf.a ../../staticlib/components/libgkplugin.a ../../staticlib/components/libjsd.a ../../staticlib/components/libautoconfig.a ../../staticlib/components/libauth.a ../../staticlib/components/libcookie.a ../../staticlib/components/libpermissions.a ../../staticlib/components/libuniversalchardet.a ../../staticlib/components/libtkautocomplete.a ../../staticlib/components/libsatchel.a ../../staticlib/components/libpippki.a ../../staticlib/components/libwidget_gonk.a ../../staticlib/components/libprofiler.a ../../staticlib/components/libaccessibility.a ../../staticlib/components/libspellchecker.a ../../staticlib/components/libzipwriter.a ../../staticlib/components/libservices-crypto.a ../../staticlib/libjsipc_s.a ../../staticlib/libdomipc_s.a ../../staticlib/libdomplugins_s.a ../../staticlib/libmozipc_s.a ../../staticlib/libmozipdlgen_s.a ../../staticlib/libipcshell_s.a ../../staticlib/libgfxipc_s.a ../../staticlib/libhal_s.a ../../staticlib/libdombindings_s.a ../../staticlib/libmozril_s.a ../../staticlib/libmozdbus_s.a ../../staticlib/libmozipcunixsocket_s.a ../../staticlib/libmoznetd_s.a ../../staticlib/libxpcom_core.a ../../staticlib/libucvutil_s.a ../../staticlib/libchromium_s.a ../../staticlib/libsnappy_s.a ../../staticlib/libthebes.a ../../staticlib/libgl.a ../../staticlib/libycbcr.a  -L../../dist/bin -L../../dist/lib /home/jlebar/code/moz/B2G/objdir-gecko/dist/lib/libjs_static.a -L../../dist/bin -L../../dist/lib -lcrmf -lsmime3 -lssl3 -lnss3 -lnssutil3   ../../dist/lib/libmozsqlite3.a -lsoundtouch  /home/jlebar/code/moz/B2G/objdir-gecko/modules/zlib/src/libmozz.a ../../dist/lib/libgkmedias.a   -L../../dist/bin -L../../dist/lib  -L/home/jlebar/code/moz/B2G/objdir-gecko/dist/lib -lnspr4 -lplc4 -lplds4 ../../dist/lib/libmozalloc.a  -llog -lstlport -lstagefright -lstagefright_omx -lui -lmedia -lhardware_legacy -lhardware -lutils -lcutils -lsysutils -lcamera_client -lbinder -lsensorservice -ldbus -lstagefright -lstagefright_omx -lbinder
> I'd expect this to be part of zlib, which it seems we intend to build as modules/zlib
> /src/libmozz.a.  We pass that path to ld, but that library does not exist.

Oh, I see, this is the expandlibs business.

But still:

$ nm objdir-gecko/modules/zlib/src/*.o | grep inflate | grep 'T '
[...]
00000001 T MOZ_Z_inflate
[...]

Doesn't look like there's an "inflate" symbol there.
Okay, we're passing -DFT_CONFIG_OPTION_SYSTEM_ZLIB, but only with gcc 4.6.1, for some reason...
You may want to try _Unwind_Backtrace with -fno-omit-frame-pointer, in case bionic does something smart in that case (which might be far fetched).
I fixed the Freetype issue by passing --with-zlib=no to its configure (by modifying Gecko's configure.in).  I have no idea why Freetype's configure was deciding to do something different with the new toolchain.

I think we're getting much better stacks with this new linker and --no-merge-exidx-entries.  I'll attach some stacks in a moment.
Comment on attachment 683641 [details]
Main-process stacks with --no-merge-exidx-entries.

Upon closer inspection, these stacks aren't any prettier than the ones in bug 810105.  :(
Attachment #683641 - Attachment description: Pretty main-process stacks → Main-process stacks with --no-merge-exidx-entries.
Oh jeez.

If you pass -i to addr2line, it will occasionally print four lines instead of the expected two (because it's giving you the line of the real function and the inlined function).  I'd only read two lines out of addr2line, so the extra lines would slowly build up, causing the stacks to appear rotated.
Attached file Actually good stacks, I think. (obsolete) —
This was generated with --no-merge-exidx-entries, but I bet I don't need that, in the end...
Attachment #683641 - Attachment is obsolete: true
Attachment 683738 [details] shows 272 KiB taken up by the gzipper for the memory dumping, under stacks like this:

  MOZ_Z_deflateInit2_ modules/zlib/src/deflate.c:297
  gz_init src/modules/zlib/src/gzwrite.c:44
  MOZ_Z_gzwrite src/modules/zlib/src/gzwrite.c:197
  nsGZFileWriter::Write(nsACString_internal const&) xpcom/base/nsGZFileWriter.cpp:73
  nsIGZFileWriter::Write(char const*, unsigned int) B2G/objdir-gecko/dist/include/nsIGZFileWriter.h:51
  nsMemoryInfoDumper::DumpMemoryReportsToFileImpl(nsAString_internal const&) xpcom/base/nsMemoryInfoDumper.cpp:651
  nsMemoryInfoDumper::DumpMemoryReportsToFile(nsAString_internal const&, bool, bool) xpcom/base/nsMemoryInfoDumper.cpp:367

That's big enough that it would be worth having a reporter for it if it's not too difficult.  zlib is unmodified 3rd party code, but it looks like you can provide a custom allocator (z_stream_s.{zalloc,zfree}) so it shouldn't be too hard.
Note we use system zlib on android (and maybe b2g), and Linux distros usually use it as well, which makes the z_stream_s fields better anyways.
I filed 813771 about the zlib reporting.
I intend to make this easier to use, but if anyone wants to use the patches I'm about to post, here are instructions:

Apply B2G, gecko, and gonk-misc patches.
$ rm -rf objdir-gecko
$ ./build.sh gecko && ./flash.sh gecko
$ adb shell setprop libc.malloc.debug 1
$ adb shell stop b2g
$ adb shell start b2g
... do whatever on the phone
$ ./get_about_memory.py --keep
Ignore errors about being unable to merge reports.
$ ./parse_malloc_profile.py about-memory-N/memory-report-XXX-YYY.json.gz
To apply B2G patch,

  (From root B2G checkout)
  $ git remote add jlebar git://github.com/jlebar/B2G.git
  $ git fetch jlebar
  $ git checkout jlebar/malloc-profile
To apply, start at your B2G root checkout.  Then

  $ cd gonk-misc
  $ git apply <(curl -L http://url/of/this/attachment)
Attachment #676924 - Attachment is obsolete: true
Attachment #677335 - Attachment is obsolete: true
Attachment #683738 - Attachment is obsolete: true
To apply gecko patch, start at your B2G root checkout, then

  $ git remote add jlebar git://github.com/jlebar/mozilla-central.git
  $ git fetch jlebar
  $ git checkout jlebar/bionic-heap-dump

or alternatively, apply the patch that I'll post in a moment.
> $ adb shell setprop libc.malloc.debug 1

Gah, it's libc.debug.malloc 1, sorry.
In attachment 683798 [details] there are 611,304 bytes in stacks like this one:

  malloc /home/jlebar/code/moz/B2G/bionic/libc/bionic/malloc_debug_common.c:221
  nsTArray_base<nsTArrayInfallibleAllocator>::EnsureCapacity(unsigned int, unsigned int) objdir-gecko/dist/include/nsTArray-inl.h:120
  mozilla::layers::Layer::GetNextSibling() objdir-gecko/dist/include/Layers.h:790
  mozilla::layers::Layer::GetNextSibling() objdir-gecko/dist/include/Layers.h:790
  mozilla::layers::CompositorParent::TransformShadowTree(mozilla::TimeStamp) gfx/layers/ipc/CompositorParent.cpp:828
  mozilla::layers::CompositorParent::Composite() gfx/layers/ipc/CompositorParent.cpp:552
  RunnableMethod<mozilla::layers::ImageContainerChild, void (mozilla::layers::ImageContainerChild::*)(), Tuple0>::Run()ipc/chromium/src/base/task.h:308
  MessageLoop::RunTask(Task*) /home/jlebar/code/moz/ff-git2/src/ipc/chromium/src/base/message_loop.cc:334

I don't entirely understand this stack trace -- GetNextSibling() just looks like this:

  Layer* GetNextSibling() { return mNextSibling; }

so I don't know how the EnsureCapacity() call can be above it on the stack.  But it looks like the CompositorParent should be measured.  Maybe sCurrentCompositor in gfx/layers/ipc/CompositorParent.cpp?  Or sCompositorParent in widget/android/nsWindow.h?
The gallery app is in the background.
> Heap profile of browser process with nytimes.com loaded

How annoying -- the top stack listed, which accounts for almost 1.2 MiB, is completely unhelpful.
jlebar, did your build have bug 811596's patch in it?  I hope not, because I think this stack should be much smaller (4 or 8 KiB) when that patch is present.

131,073B in one alloc
  get_backtrace bionic/libc/bionic/malloc_debug_leak.c:258
  malloc bionic/libc/bionic/malloc_debug_common.c:221
  operator new(unsigned int) bionic/libstdc++/src/new.cpp:18
  __stl_new ndk/sources/cxx-stl/stlport/stlport/stl/_new.h:134
  std::priv::_String_base<char, std::allocator<char> >::_M_Start() const ndk/sources/cxx-stl/stlport/stlport/stl/_string_base.h:65
  IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) chromium/src/chrome/common/ipc_channel_posix.cc:747
  base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) ipc/chromium/src/base/message_pump_libevent.cc:216
This is only semi-related, but I had to mention it because I learned about it while investigating these profiles...

PLArenaPool has a global free list.  If you call PL_FreeArenaPool() (as opposed to PL_FinishArenaPool()) then any in-use chunks get put onto the free list, regardless of their size!  I.e. the free list contains chunks of varying sizes.

Furthermore, any pool allocation request which requires a new chunk will, before calling malloc, check the free list to see if there's a big-enough chunk.  So when you create an arena pool with chunks of, say, 8 KiB, there's no guarantee that you'll actually be getting 8 KiB chunks.  The chunks you get depend entirely on what other arena pools are doing.  Fantastic.
> jlebar, did your build have bug 811596's patch in it?

No.  I checked by doing

  $ git log --grep 811596 jlebar/bionic-heap-dump

(see comment 47 for pre-conditions).

> PLArenaPool has a global free list.

We should probably just remove this unconditionally, since jemalloc keeps a global free-list.  But if we can't do so, we should disable this free-list for our profiling builds, for sure.

I'd love just to remove PLArenaPool altogether.  In fact, didn't we have a bug on this?
> We should probably just remove this unconditionally, since jemalloc keeps a
> global free-list.  But if we can't do so, we should disable this free-list
> for our profiling builds, for sure.

I just filed bug 814312 which avoids using it in the two significant cases within Gecko.


> I'd love just to remove PLArenaPool altogether.  In fact, didn't we have a
> bug on this?

Nope.  It's used all over the place.  Besides, arena-style allocation is quite reasonable in many contexts... are you thinking of replacing it with a better arena allocator, or avoiding arena allocation?
This is subsumed by bug 717853, which we're getting ready to land.

Alan, thanks a lot for your work here.  In particular, it was really helpful that you demonstrated that _Unwind_Backtrace worked on ARM.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: