Last Comment Bug 801580 - Add support for dumping a heap profile using bionic's trace-malloc equivalent on B2G
: Add support for dumping a heap profile using bionic's trace-malloc equivalent...
Status: RESOLVED WONTFIX
:
Product: Firefox OS
Classification: Client Software
Component: General (show other bugs)
: unspecified
: ARM Gonk (Firefox OS)
: -- normal (vote)
: ---
Assigned To: Justin Lebar (not reading bugmail)
:
:
Mentors:
Depends on: 808932
Blocks: slim-fast 813771
  Show dependency treegraph
 
Reported: 2012-10-15 03:46 PDT by Alan Huang
Modified: 2013-06-06 20:29 PDT (History)
14 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Patch for b2g to dump all (dl)malloc's backtrace. (3.67 KB, patch)
2012-10-30 02:41 PDT, Alan Huang
no flags Details | Diff | Splinter Review
Patch for b2g to dump all (dl)malloc's backtrace. (3.67 KB, patch)
2012-10-31 00:01 PDT, Alan Huang
mh+mozilla: feedback-
Details | Diff | Splinter Review
ddms_native_heap_dump (361.23 KB, image/png)
2012-11-01 03:38 PDT, Alan Huang
no flags Details
Main-process stacks with --no-merge-exidx-entries. (630.06 KB, text/plain)
2012-11-20 09:48 PST, Justin Lebar (not reading bugmail)
no flags Details
Actually good stacks, I think. (869.73 KB, text/plain)
2012-11-20 14:06 PST, Justin Lebar (not reading bugmail)
no flags Details
gonk-misc patch, turns on -funwind-tables, and disables jemalloc (1.38 KB, patch)
2012-11-20 16:29 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Usable WIP Gecko patch for heap dumping, v1 (11.09 KB, patch)
2012-11-20 16:34 PST, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Heap profile of gallery app (349.54 KB, text/plain)
2012-11-20 16:41 PST, Justin Lebar (not reading bugmail)
no flags Details
Heap profile of main process with gallery app in foreground (876.03 KB, text/plain)
2012-11-20 16:41 PST, Justin Lebar (not reading bugmail)
no flags Details
Heap profile of main process with browser app (nytimes.com) in foreground (921.21 KB, text/plain)
2012-11-20 22:22 PST, Justin Lebar (not reading bugmail)
no flags Details
Heap profile of browser process with nytimes.com loaded (620.81 KB, text/plain)
2012-11-20 22:24 PST, Justin Lebar (not reading bugmail)
no flags Details

Description Alan Huang 2012-10-15 03:46:05 PDT
Android provide a way to monitor/profile memory allocation of native heap. Setting property "libc.debug.malloc" to 1/5/10 and using DDMS with certain config could monitor/profile memory allocations in native heap. 

Setting property "libc.debug.malloc" to 1/5/10 (and disable jemalloc) to collect certain information is still available. But there's no mechanism to dump those information. We are trying to let b2g process react to certain signal, and using "/proc/[pid]/maps" and get_malloc_leak_info() to these information.
Comment 1 Justin Lebar (not reading bugmail) 2012-10-15 06:09:52 PDT
What information does this Android native heap dump provide that about:memory doesn't currently provide.

Would it be possible to add that information to about:memory?  (We can do so if we can access this information from Gecko.)
Comment 2 Alan Huang 2012-10-27 00:52:09 PDT
(In reply to Justin Lebar [:jlebar] from comment #1)
> What information does this Android native heap dump provide that
> about:memory doesn't currently provide.
> 
> Would it be possible to add that information to about:memory?  (We can do so
> if we can access this information from Gecko.)

I think it would be possible to add more information in about:memory. Currently, I have a patch to dump all malloc backtrace. By doing so, we could know all anonymous page's allocation. I'll attach patch later.

Also, if partners met memory leak in their loaded library in hal, it would be much easier for them to get those info.

Could we also get "dumpstate" while doing "get-about-memory.py"? procmem/procrank/librank/showmap/... etc. are also important for memory profiling.
Comment 3 Justin Lebar (not reading bugmail) 2012-10-27 07:13:55 PDT
What information does this Android native heap dump provide that about:memory doesn't 
currently provide?

> Could we also get "dumpstate" while doing "get-about-memory.py"? procmem/procrank/librank
> /showmap/... etc. are also important for memory profiling.

I made that tool to work for B2G -- I had no idea it worked for Fennec on Android!

I'm fine capturing whatever additional information you want, so long as doing so doesn't break B2G.  We already capture procrank.  Although I think all of that info should already be in the memory report.
Comment 4 Alan Huang 2012-10-30 02:41:02 PDT
Created attachment 676527 [details] [diff] [review]
Patch for b2g to dump all (dl)malloc's backtrace.

This patch let b2g uses bionic's get_malloc_leak_info() to retrieve all malloc's back trace. This would be helpful to dig out anonymous page's memory consumption, including b2g hal loaded library. 

The other concern is that b2g uses jemalloc (we may need use gecko/tools/trace-memory to get back trace.) But partners' hal loaded library will use dlmalloc provided by bionic. To let correct information aligned, we need to disable jemalloc when debugging this.
(DISABLE_JEMALLOC = 1)

Since enable B2G optimize may cause (unexpected) incorrect message when parsing symbols, we also need to disable optimization when debugging.
(B2G_NOOPT = 1)

After patching b2g, we need below command to dump b2g's malloc back trace and b2g's VMA mapping:

adb shell "echo "libc.debug.malloc=1" >> /data/local.prop"
(Using libc's debugging library)
adb reboot
adb shell kill -12 [b2g's pid]
(This patch makes b2g react to signal 12 and dump back trace)
adb shell cat /proc/[b2g's pid]/maps > pid-[b2g's pid].maps
(Dump VMA maps)
adb pull /data/misc/pid-[b2g's pid].nhprof
(Pull back traces out)

We could use *.nhprof and *.maps with addr2line/symbol files to translate address.

I think a tool to parse *.nhprof and *.maps with addr2line/symbol files is also needed. I'm working on it, too.
Comment 5 Alan Huang 2012-10-31 00:01:27 PDT
Created attachment 676924 [details] [diff] [review]
Patch for b2g to dump all (dl)malloc's backtrace.

Correct typo.
Comment 6 Justin Lebar (not reading bugmail) 2012-10-31 00:17:15 PDT
Okay...  I thought this bug was about Firefox on Android, but clearly I was mistaken!  That explains a lot.  :)

I have some questions about this which I'd appreciate your help answering.

> partners' hal loaded library will use dlmalloc provided by bionic.

1) Are you sure about that?  How do we prevent allocator mismatches when hal sends Gecko a pointer to free, or vice versa?  I thought we explicitly prevent other libraries from using anything other than jemalloc.  For example, libc uses jemalloc for things like strdup.

2) How is this information gathered here different from what Gecko's trace-malloc gathers?

3) Can you please attach an example of the output that's generated (translated as best you can with your current tools)?
Comment 7 Mike Hommey [:glandium] 2012-10-31 00:46:56 PDT
Isn't jemalloc LD_PRELOADed on b2g? If it is, then there should be no native heap except maybe libc's own allocations.
Comment 8 Mike Hommey [:glandium] 2012-10-31 00:54:02 PDT
(In reply to Mike Hommey [:glandium] from comment #7)
> Isn't jemalloc LD_PRELOADed on b2g? If it is, then there should be no native
> heap except maybe libc's own allocations.

"export LD_PRELOAD=/system/b2g/libmozglue.so" in /system/bin/b2g.sh tells it is.
Comment 9 Alan Huang 2012-10-31 04:03:44 PDT
(In reply to Alan Huang from comment #4)
> The other concern is that b2g uses jemalloc (we may need use
> gecko/tools/trace-malloc to get back trace.)

It seems I misunderstood purpose of "trace-malloc". It is not used to get back trace of jemalloc. For b2g using mozjemalloc, it seems we cannot get back trace now.
Comment 10 Justin Lebar (not reading bugmail) 2012-10-31 09:42:26 PDT
> It seems I misunderstood purpose of "trace-malloc". It is not used to get back trace of jemalloc.

I think right now trace-malloc always uses the system allocator, but I don't think it would be too hard to get it to use jemalloc, if we wanted.
Comment 11 Mike Hommey [:glandium] 2012-10-31 09:51:43 PDT
(In reply to Justin Lebar [:jlebar] from comment #10)
> > It seems I misunderstood purpose of "trace-malloc". It is not used to get back trace of jemalloc.
> 
> I think right now trace-malloc always uses the system allocator, but I don't
> think it would be too hard to get it to use jemalloc, if we wanted.

It would actually be tricky on b2g/linux. Less so after bug 804303, but that will only support jemalloc 3.
Comment 12 Justin Lebar (not reading bugmail) 2012-10-31 10:13:16 PDT
Comment on attachment 676924 [details] [diff] [review]
Patch for b2g to dump all (dl)malloc's backtrace.

Clearing f? until the questions in comment 6 are answered.
Comment 13 Alan Huang 2012-11-01 03:38:27 PDT
Created attachment 677335 [details]
ddms_native_heap_dump

This is a screen shot of android ddms. This native heap dump uses ddm protocol, so we cannot use this tool.
Comment 14 Alan Huang 2012-11-01 03:43:18 PDT
(In reply to Justin Lebar [:jlebar] from comment #6)
> Okay...  I thought this bug was about Firefox on Android, but clearly I was
> mistaken!  That explains a lot.  :)
> 
> I have some questions about this which I'd appreciate your help answering.
> 
> > partners' hal loaded library will use dlmalloc provided by bionic.
> 
> 1) Are you sure about that?  How do we prevent allocator mismatches when hal
> sends Gecko a pointer to free, or vice versa?  I thought we explicitly
> prevent other libraries from using anything other than jemalloc.  For
> example, libc uses jemalloc for things like strdup.
> 
As bug 801571 which is the case of comment 7. But I think the problem is we have no mechanism to dump back trace when debugging possible memory leak in b2g, which is question partners keep asking ... . So I'm trying to use bionic libc's memory leak debugging mechanism, which could include bug 801571's case.

> 2) How is this information gathered here different from what Gecko's
> trace-malloc gathers?
> 
> 3) Can you please attach an example of the output that's generated
> (translated as best you can with your current tools)?

I attached a screen shot as an example.
Comment 15 Justin Lebar (not reading bugmail) 2012-11-01 05:36:17 PDT
> As bug 801571 which is the case of comment 7.

Okay, but it sounds like we all agree bug 801571 is something we need to fix.

> But I think the problem is we have no mechanism to dump back trace when debugging 
> possible memory leak in b2g, which is question partners keep asking

Well, we have trace-malloc, but this thing looks nicer than that to me.

glandium, would you mind taking this f?/review?  I don't feel like I know enough to make an informed decision here.
Comment 16 Justin Lebar (not reading bugmail) 2012-11-01 05:40:43 PDT
...although I'd probably want this code to live inside nsMemoryInfoDumper.cpp, where we already have glue for listening to signals, etc.  (It's not clear to me that the bionic malloc would be happy if we dumped its heap while it was inside a call to malloc() on that thread; the code in nsMemoryInfoDumper.cpp could insure that we handle the signal at a safe point in time.)
Comment 17 Mike Hommey [:glandium] 2012-11-01 12:09:47 PDT
Comment on attachment 676924 [details] [diff] [review]
Patch for b2g to dump all (dl)malloc's backtrace.

Putting in my queue not to forget about it.
Comment 18 Mike Hommey [:glandium] 2012-11-05 00:02:18 PST
Comment on attachment 676924 [details] [diff] [review]
Patch for b2g to dump all (dl)malloc's backtrace.

Review of attachment 676924 [details] [diff] [review]:
-----------------------------------------------------------------

It's still not clear to me what this is supposed to help with, considering b2g uses jemalloc on gonk.

::: b2g/app/malloc_profile.h
@@ +1,4 @@
> +#ifndef __DUTILS_H__
> +#define __DUTILS_H__
> +
> +__BEGIN_DECLS

You need #include <sys/cdefs.h> before using __BEGIN_DECLS. But we usually do a "manual"
#ifdef __cplusplus
extern "C" {
#endif

::: b2g/app/nsBrowserApp.cpp
@@ +158,5 @@
>  int main(int argc, char* argv[])
>  {
> +
> +#ifdef MOZ_WIDGET_GONK
> +  enableDumpMallocProfile(12);

12 is SIGUSR2. SIGUSR2 is used for the profiler.
Comment 19 Mike Hommey [:glandium] 2012-11-06 10:28:42 PST
(In reply to Mike Hommey [:glandium] from comment #18)
> Comment on attachment 676924 [details] [diff] [review]
> Patch for b2g to dump all (dl)malloc's backtrace.
> 
> Review of attachment 676924 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It's still not clear to me what this is supposed to help with, considering
> b2g uses jemalloc on gonk.
> 
> ::: b2g/app/malloc_profile.h
> @@ +1,4 @@
> > +#ifndef __DUTILS_H__
> > +#define __DUTILS_H__
> > +
> > +__BEGIN_DECLS
> 
> You need #include <sys/cdefs.h> before using __BEGIN_DECLS. But we usually
> do a "manual"
> #ifdef __cplusplus
> extern "C" {
> #endif

Note we also have MOZ_BEGIN_EXTERN_C/MOZ_END_EXTERN_C in mozilla/Types.h
Comment 20 Justin Lebar (not reading bugmail) 2012-11-12 14:01:54 PST
This may be our best bet for finding dark matter on devices, short of native DMD.  I'm going to see if I have any luck with it for bug 810105.
Comment 21 Justin Lebar (not reading bugmail) 2012-11-15 09:37:45 PST
> This is a screen shot of android ddms. This native heap dump uses ddm protocol, so we cannot use 
> this tool.

Okay...given a dump collected using this tool, how do I open it?
Comment 22 Justin Lebar (not reading bugmail) 2012-11-15 09:39:30 PST
Oh, I see:

> I think a tool to parse *.nhprof and *.maps with addr2line/symbol files is also needed. I'm working > on it, too.
Comment 23 Nicholas Nethercote [:njn] 2012-11-15 10:09:14 PST
tools/rb/fix-linux-stack.pl might be of use.  If so, also see bug 812070.
Comment 24 Justin Lebar (not reading bugmail) 2012-11-18 04:59:43 PST
I looked through the relevant source in bionic here ($B2G_ROOT/bionic/libc/bionic/malloc_debug*), and bionic is calling _Unwind_Backtrace to get its backtraces.

I think we've established that doesn't work for our builds, for some mysterious reason.  I'll see if I can double-check this conclusion.

If we cannot get _Unwind_Backtrace to work, we should close this bug, as it won't be useful.
Comment 25 Justin Lebar (not reading bugmail) 2012-11-18 06:52:09 PST
Actually, the stacks I'm getting from _Unwind_Backtrace with -funwind-tables seem pretty sane (at first glance, anyway).
Comment 26 Justin Lebar (not reading bugmail) 2012-11-18 17:32:44 PST
Hooray, I finally got this to work.  I'm getting a reasonable-looking set of stacktraces and malloc blocks.

We still want to fix bug 717853, but this should suffice in a pinch.  I'll post patches once they get into a less-embarrassing state.
Comment 27 Justin Lebar (not reading bugmail) 2012-11-19 17:35:35 PST
One problem we have is that the stacks I'm getting out of _Unwind_Backtrace are pretty crappy.  See bug 810105 comment 33, for example.

I'm currently compiling with -funwind-tables; that's the only change I've made to facilitate _Unwind_Backtrace.

I have a number of ideas for trying to improve our stacks.

1) Use --no-merge-exidx-entries.

Perhaps the stack weirdness we're seeing is due to arm/thumb transitions.  I understand that we must pass --no-merge-exidx-entries in order to handle these correctly (1?), but our prebuilt linker is apparently too old to support that flag.  So I think I'd have to upgrade our toolchain in order to add this flag.

2) Use -marm.

If arm/thumb transitions are messing us up, perhaps we could build with ARM
instructions only.  I build with --with-thumb=no and my build seems to start up fine, but it crashes before the screen turns on.  Here's the logcat tail:

> I/nsVolumeService( 9073): UpdateVolume: 'sdcard' state Mounted @ '/mnt/sdcard'
> E/GeckoConsole( 9073): [JavaScript Error: "lock._transaction.objectStore is not a function" {file: "jar:file:///system/b2g/omni.ja!/components/SettingsService.js" line: 44}]
> F/libc    ( 9073): Fatal signal 4 (SIGILL) at 0x4170ee72 (code=1)
> I/DEBUG   ( 9074): debuggerd committing suicide to free the zombie!

In GDB (a separate run, so the addresses aren't the same), I saw us SIGILL at

   0x416bae70 <+128>:	b	0x416bae64 <js::XDRState<(js::XDRMode)0>::codeScript(JS::MutableHandle<JSScript*>)+116>
   0x416bae74 <+132>:	mov	r1, #4

and the pc is 0x416bae72, i.e., right between the instructions.  So...that's not very good.  It's unclear how we're getting here, because the stack is of course unreadable by GDB.

Full GDB output is at the bottom of this comment.

I'm not sure how to proceed here.

3) Use -mtpcs-frame, -mtpcs-leaf-frame.

Again from [1], maybe this will fix the arm/thumb problem.

4) Compile with -fno-omit-frame-pointer and write our own backtrace function.

If we got a consistent stack frame, this might not be too hard.  But I don't know what the ABI says about how the stack frames are supposed to look.

5) Fix breakpad (bug 779291).


glandium, do you have any thoughts about what's likely to be the most expeditious way forward here (among the options listed, or not), or tips on what I'm doing wrong?


[1] http://readlist.com/lists/gcc.gnu.org/gcc-help/3/19270.html

Full GDB output of -marm SIGILL (from (2) above):

> Program received signal SIGILL, Illegal instruction.
> 0x416bae72 in js::XDRState<(js::XDRMode)0>::codeScript (this=0x43d55070, scriptp=<value optimized out>)
>     at ../../../../ff-git2/src/js/src/vm/Xdr.cpp:146
> 146	        return false;
> (gdb) bt
> #0  0x416bae72 in js::XDRState<(js::XDRMode)0>::codeScript (this=0x43d55070, 
>     scriptp=<value optimized out>) at ../../../../ff-git2/src/js/src/vm/Xdr.cpp:146
> #1  0x43415320 in ?? ()
> Cannot access memory at address 0x37ffa
> (gdb) disassemble
> Dump of assembler code for function _ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE:
>    0x416badf0 <+0>:	ldr	r3, [r0, #8]
>    0x416badf4 <+4>:	ldr	r2, [r0, #12]
>    0x416badf8 <+8>:	ldr	r1, [r1]
>    0x416badfc <+12>:	rsb	r2, r3, r2
>    0x416bae00 <+16>:	push	{r4, lr}
>    0x416bae04 <+20>:	cmp	r2, #3
>    0x416bae08 <+24>:	sub	sp, sp, #16
>    0x416bae0c <+28>:	mov	r4, r0
>    0x416bae10 <+32>:	str	r1, [sp, #12]
>    0x416bae14 <+36>:	bls	0x416bae74 <_ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE+132>
>    0x416bae18 <+40>:	cmp	r3, #0
>    0x416bae1c <+44>:	add	r2, r3, #4
>    0x416bae20 <+48>:	str	r2, [r4, #8]
>    0x416bae24 <+52>:	beq	0x416bae6c <_ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE+124>
>    0x416bae28 <+56>:	add	r1, sp, #16
>    0x416bae2c <+60>:	movw	r2, #49241	; 0xc059
>    0x416bae30 <+64>:	movt	r2, #47475	; 0xb973
>    0x416bae34 <+68>:	mov	r0, r3
>    0x416bae38 <+72>:	str	r2, [r1, #-8]!
>    0x416bae3c <+76>:	mov	r2, #4
>    0x416bae40 <+80>:	bl	0x40583038
>    0x416bae44 <+84>:	add	r3, sp, #12
>    0x416bae48 <+88>:	str	r3, [sp]
>    0x416bae4c <+92>:	mov	r0, r4
>    0x416bae50 <+96>:	ldr	r3, [pc, #52]	; 0x416bae8c <_ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE+156>
>    0x416bae54 <+100>:	ldr	r1, [pc, r3]
>    0x416bae58 <+104>:	mov	r2, r1
>    0x416bae5c <+108>:	mov	r3, r1
>    0x416bae60 <+112>:	bl	0x415b6aec <_ZN2js9XDRScriptILNS_7XDRModeE0EEEbPNS_8XDRStateIXT_EEEN2JS6HandleIP8JSObjectEENS6_IP8JSScriptEENS6_IP10JSFunctionEENS5_13MutableHandleISB_EE>
>    0x416bae64 <+116>:	add	sp, sp, #16
>    0x416bae68 <+120>:	pop	{r4, pc}
>    0x416bae6c <+124>:	mov	r0, #0
>    0x416bae70 <+128>:	b	0x416bae64 <_ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE+116>
>    0x416bae74 <+132>:	mov	r1, #4
>    0x416bae78 <+136>:	bl	0x416ba854 <_ZN2js9XDRBuffer4growEj>
>    0x416bae7c <+140>:	cmp	r0, #0
>    0x416bae80 <+144>:	beq	0x416bae6c <_ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE+124>
>    0x416bae84 <+148>:	ldr	r3, [r4, #8]
>    0x416bae88 <+152>:	b	0x416bae18 <_ZN2js8XDRStateILNS_7XDRModeE0EE10codeScriptEN2JS13MutableHandleIP8JSScriptEE+40>
>    0x416bae8c <+156>:	rsbeq	r4, r0, r8, lsl lr
> End of assembler dump.
> (gdb) info registers
> r0             0x20	32
> r1             0x42cce1e0	1120723424
> r2             0x43d55070	1138053232
> r3             0x430ae000	1124786176
> r4             0x43d55070	1138053232
> r5             0x42cce1e0	1120723424
> r6             0xffff	65535
> r7             0x42afc034	1118814260
> r8             0x1	1
> r9             0xfffffa68	-1432
> r10            0x42cce1e0	1120723424
> r11            0x42cce1e0	1120723424
> r12            0x0	0
> sp             0xbec0d190	0xbec0d190
> lr             0x416baf58	1097576280
> pc             0x416bae72	0x416bae72 <js::XDRState<(js::XDRMode)0>::codeScript(JS::MutableHandle<JSScript*>)+130>
> cpsr           0x20000030	536870960
Comment 28 Nicholas Nethercote [:njn] 2012-11-19 18:26:16 PST
> One problem we have is that the stacks I'm getting out of _Unwind_Backtrace
> are pretty crappy. 

More specifically:

- Some of them look fine.

- Some of them look ok but rotated in that abovementioned fashion.

- Some of them are bizarre -- they look like several calls to _Unwind_Backtrace were racing to write into a single stack trace buffer.  E.g. several contiguous frames will make a sensible sequence, then there's a jump to some unrelated place, then a few more sensibly-sequenced frames, then another jump, etc.  But jlebar assures me that _Unwind_Backtrace is serialized via a lock.


> I have a number of ideas for trying to improve our stacks.

You could also try a debug build.  The chance of it helping mightn't be high, but it's easy to try.
Comment 29 Justin Lebar (not reading bugmail) 2012-11-19 18:37:09 PST
Just so nobody needs to take my word for it, here's the code from bionic's
malloc_debug_leak.c.  (I tried and couldn't find a linkable copy online, but
you can git clone https://android.googlesource.com/platform/bionic if you
want.)

>void* leak_malloc(size_t bytes) 
>{   
>    void* base = dlmalloc(bytes + sizeof(AllocationEntry));
>    if (base != NULL) {
>        pthread_mutex_lock(&gAllocationsMutex);
>
>            intptr_t backtrace[BACKTRACE_SIZE];
>            size_t numEntries = get_backtrace(backtrace, BACKTRACE_SIZE); // calls _Unwind_Backtrace
>
>            AllocationEntry* header = (AllocationEntry*)base;
>            header->entry = record_backtrace(backtrace, numEntries, bytes);
>            header->guard = GUARD;
>            base = (AllocationEntry*)base + 1;
>
>        pthread_mutex_unlock(&gAllocationsMutex);
>    }
>
>    return base;
>}
Comment 30 Nicholas Nethercote [:njn] 2012-11-19 18:50:55 PST
> Just so nobody needs to take my word for it

I believed you, but I thought it worth mentioning how we considered and dismissed this possibility :)
Comment 31 Justin Lebar (not reading bugmail) 2012-11-19 20:12:36 PST
> 1) Use --no-merge-exidx-entries.

I just did some archaeology in the binutils repo, and I'm pretty sure that the default in old linkers was to merge these entries.  So it's possible that updating our linker and passing this flag will help.

> 3) Use -mtpcs-frame, -mtpcs-leaf-frame.

This SIGILL'ed.  I didn't get a stack, but I'm happy to if anyone is curious.
Comment 32 Justin Lebar (not reading bugmail) 2012-11-19 20:23:21 PST
> I just did some archaeology in the binutils repo

--no-merge-exidx-entries was introduced in binutils 2.21, and our prebuilt binaries are from binutils 2.20.

How crazy would it be to try to update the prebuilt binaries and see if that helps our stacks?
Comment 33 Justin Lebar (not reading bugmail) 2012-11-19 20:41:06 PST
Glandium comes to the rescue a year ago: http://glandium.org/blog/?p=2146 has a prebuilt ld binary which has --no-merge-exidx-entries!
Comment 34 Justin Lebar (not reading bugmail) 2012-11-19 22:04:11 PST
Getting there...

Now (with a 4.6.1 toolchain) linking libxul fails with

> /home/jlebar/code/moz/ff-git2/src/modules/freetype2/src/gzip/ftgzip.c:295: error: undefined reference to 'inflateInit2_'
> /home/jlebar/code/moz/ff-git2/src/modules/freetype2/src/gzip/ftgzip.c:310: error: undefined reference to 'inflateEnd'
> /home/jlebar/code/moz/ff-git2/src/modules/freetype2/src/gzip/ftgzip.c:339: error: undefined reference to 'inflateReset'
> /home/jlebar/code/moz/ff-git2/src/modules/freetype2/src/gzip/ftgzip.c:413: error: undefined reference to 'inflate'

I'd expect this to be part of zlib, which it seems we intend to build as modules/zlib/src/libmozz.a.  We pass that path to ld, but that library does not exist.

> $ find . -name 'libmozz*'
> ./modules/zlib/src/.deps/libmozz.a.desc.pp
> ./modules/zlib/src/libmozz.a.desc

Her's our invocation of ld:

/home/jlebar/code/moz/B2G/objdir-gecko/_virtualenv/bin/python /home/jlebar/code/moz/ff-git2/src/config/pythonpath.py -I../../config /home/jlebar/code/moz/ff-git2/src/config/expandlibs_exec.py --depend .deps/libxul.so.pp --target libxul.so --uselist --  /usr/local/bin/ccache /home/jlebar/code/moz/B2G/prebuilt/linux-x86/toolchain/arm-linux-androideabi-4.6.1/bin/arm-linux-androideabi-g++ -DANDROID -isystem /home/jlebar/code/moz/B2G/bionic/libc/arch-arm/include -isystem /home/jlebar/code/moz/B2G/bionic/libc/include/ -isystem /home/jlebar/code/moz/B2G/bionic/libc/kernel/common -isystem /home/jlebar/code/moz/B2G/bionic/libc/kernel/arch-arm -isystem /home/jlebar/code/moz/B2G/bionic/libm/include -I/home/jlebar/code/moz/B2G/frameworks/base/opengl/include -I/home/jlebar/code/moz/B2G/frameworks/base/native/include -I/home/jlebar/code/moz/B2G/hardware/libhardware/include -I/home/jlebar/code/moz/B2G/hardware/libhardware_legacy/include -I/home/jlebar/code/moz/B2G/system -I/home/jlebar/code/moz/B2G/system/core/include -isystem /home/jlebar/code/moz/B2G/bionic -I/home/jlebar/code/moz/B2G/frameworks/base/include -I/home/jlebar/code/moz/B2G/external/dbus -I/home/jlebar/code/moz/B2G/external/bluetooth/bluez/lib  -I/home/jlebar/code/moz/B2G/frameworks/base/services/sensorservice -I/home/jlebar/code/moz/B2G/frameworks/base/services/camera -I/home/jlebar/code/moz/B2G/system/media/wilhelm/include -I/home/jlebar/code/moz/B2G/frameworks/base/include/media/stagefright -I/home/jlebar/code/moz/B2G/frameworks/base/include/media/stagefright/openmax -I/home/jlebar/code/moz/B2G/frameworks/base/media/libstagefright/rtsp -I/home/jlebar/code/moz/B2G/frameworks/base/media/libstagefright/include -I/home/jlebar/code/moz/B2G/dalvik/libnativehelper/include/nativehelper -pedantic -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Wtype-limits -Wempty-body -Wno-ctor-dtor-privacy -Wno-overlength-strings -Wno-invalid-offsetof -Wno-variadic-macros -Wno-long-long -mandroid -fno-short-enums -fno-exceptions -Wno-psabi -DMOZ_ENABLE_JS_DUMP -include /home/jlebar/code/moz/B2G/gonk-misc/Unicode.h -funwind-tables -I/home/jlebar/code/moz/B2G/ndk/sources/cxx-stl/stlport/stlport/ -I/home/jlebar/code/moz/B2G/external/stlport/stlport/ -march=armv7-a -mthumb -mfpu=neon -mfloat-abi=softfp -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -std=gnu++0x -pipe  -DNDEBUG -DTRIMMED -g -Os -freorder-blocks -fno-reorder-functions -fomit-frame-pointer -fPIC -shared -Wl,-z,defs -Wl,-h,libxul.so -o libxul.so  nsStaticXULComponents.o nsUnicharUtils.o nsBidiUtils.o nsSpecialCasingData.o nsUnicodeProperties.o nsRDFResource.o    -mandroid -L/home/jlebar/code/moz/B2G/out/target/product/otoro/obj/lib -Wl,-rpath-link=/home/jlebar/code/moz/B2G/out/target/product/otoro/obj/lib --sysroot=/home/jlebar/code/moz/B2G/out/target/product/otoro/obj/ -Wl,--no-merge-exidx-entries -mthumb -Wl,-z,noexecstack -Wl,--icf=safe   -Wl,-rpath-link,/home/jlebar/code/moz/B2G/objdir-gecko/dist/bin -Wl,-rpath-link,/usr/local/lib  -L/home/jlebar/code/moz/B2G/objdir-gecko/dist/lib -lmozglue -Wl,--wrap=getaddrinfo,--wrap=freeaddrinfo,--wrap=gai_strerror -Wl--wrap=PR_GetEnv,--wrap=PR_SetEnv ../../toolkit/components/osfile/libosfile_s.a ../../toolkit/xre/libxulapp_s.a  ../../staticlib/components/libnecko.a ../../staticlib/components/libuconv.a ../../staticlib/components/libi18n.a ../../staticlib/components/libchardet.a ../../staticlib/components/libjar50.a ../../staticlib/components/libstartupcache.a ../../staticlib/components/libpref.a ../../staticlib/components/libhtmlpars.a ../../staticlib/components/libidentity.a ../../staticlib/components/libimglib2.a ../../staticlib/components/libmediasniffer.a ../../staticlib/components/libgkgfx.a ../../staticlib/components/libgklayout.a ../../staticlib/components/libdocshell.a ../../staticlib/components/libembedcomponents.a ../../staticlib/components/libwebbrwsr.a ../../staticlib/components/libnsappshell.a ../../staticlib/components/libtxmgr.a ../../staticlib/components/libcommandlines.a ../../staticlib/components/libtoolkitcomps.a ../../staticlib/components/libpipboot.a ../../staticlib/components/libpipnss.a ../../staticlib/components/libappcomps.a ../../staticlib/components/libjsreflect.a ../../staticlib/components/libcomposer.a ../../staticlib/components/libtelemetry.a ../../staticlib/components/libjsinspector.a ../../staticlib/components/libjsdebugger.a ../../staticlib/components/libstoragecomps.a ../../staticlib/components/librdf.a ../../staticlib/components/libwindowds.a ../../staticlib/components/libjsctypes.a ../../staticlib/components/libjsperf.a ../../staticlib/components/libgkplugin.a ../../staticlib/components/libjsd.a ../../staticlib/components/libautoconfig.a ../../staticlib/components/libauth.a ../../staticlib/components/libcookie.a ../../staticlib/components/libpermissions.a ../../staticlib/components/libuniversalchardet.a ../../staticlib/components/libtkautocomplete.a ../../staticlib/components/libsatchel.a ../../staticlib/components/libpippki.a ../../staticlib/components/libwidget_gonk.a ../../staticlib/components/libprofiler.a ../../staticlib/components/libaccessibility.a ../../staticlib/components/libspellchecker.a ../../staticlib/components/libzipwriter.a ../../staticlib/components/libservices-crypto.a ../../staticlib/libjsipc_s.a ../../staticlib/libdomipc_s.a ../../staticlib/libdomplugins_s.a ../../staticlib/libmozipc_s.a ../../staticlib/libmozipdlgen_s.a ../../staticlib/libipcshell_s.a ../../staticlib/libgfxipc_s.a ../../staticlib/libhal_s.a ../../staticlib/libdombindings_s.a ../../staticlib/libmozril_s.a ../../staticlib/libmozdbus_s.a ../../staticlib/libmozipcunixsocket_s.a ../../staticlib/libmoznetd_s.a ../../staticlib/libxpcom_core.a ../../staticlib/libucvutil_s.a ../../staticlib/libchromium_s.a ../../staticlib/libsnappy_s.a ../../staticlib/libthebes.a ../../staticlib/libgl.a ../../staticlib/libycbcr.a  -L../../dist/bin -L../../dist/lib /home/jlebar/code/moz/B2G/objdir-gecko/dist/lib/libjs_static.a -L../../dist/bin -L../../dist/lib -lcrmf -lsmime3 -lssl3 -lnss3 -lnssutil3   ../../dist/lib/libmozsqlite3.a -lsoundtouch  /home/jlebar/code/moz/B2G/objdir-gecko/modules/zlib/src/libmozz.a ../../dist/lib/libgkmedias.a   -L../../dist/bin -L../../dist/lib  -L/home/jlebar/code/moz/B2G/objdir-gecko/dist/lib -lnspr4 -lplc4 -lplds4 ../../dist/lib/libmozalloc.a  -llog -lstlport -lstagefright -lstagefright_omx -lui -lmedia -lhardware_legacy -lhardware -lutils -lcutils -lsysutils -lcamera_client -lbinder -lsensorservice -ldbus -lstagefright -lstagefright_omx -lbinder
Comment 35 Justin Lebar (not reading bugmail) 2012-11-19 22:16:54 PST
> I'd expect this to be part of zlib, which it seems we intend to build as modules/zlib
> /src/libmozz.a.  We pass that path to ld, but that library does not exist.

Oh, I see, this is the expandlibs business.

But still:

$ nm objdir-gecko/modules/zlib/src/*.o | grep inflate | grep 'T '
[...]
00000001 T MOZ_Z_inflate
[...]

Doesn't look like there's an "inflate" symbol there.
Comment 36 Justin Lebar (not reading bugmail) 2012-11-19 23:05:20 PST
Okay, we're passing -DFT_CONFIG_OPTION_SYSTEM_ZLIB, but only with gcc 4.6.1, for some reason...
Comment 37 Mike Hommey [:glandium] 2012-11-20 04:35:13 PST
You may want to try _Unwind_Backtrace with -fno-omit-frame-pointer, in case bionic does something smart in that case (which might be far fetched).
Comment 38 Justin Lebar (not reading bugmail) 2012-11-20 09:43:10 PST
I fixed the Freetype issue by passing --with-zlib=no to its configure (by modifying Gecko's configure.in).  I have no idea why Freetype's configure was deciding to do something different with the new toolchain.

I think we're getting much better stacks with this new linker and --no-merge-exidx-entries.  I'll attach some stacks in a moment.
Comment 39 Justin Lebar (not reading bugmail) 2012-11-20 09:48:17 PST
Created attachment 683641 [details]
Main-process stacks with --no-merge-exidx-entries.
Comment 40 Justin Lebar (not reading bugmail) 2012-11-20 12:41:02 PST
Comment on attachment 683641 [details]
Main-process stacks with --no-merge-exidx-entries.

Upon closer inspection, these stacks aren't any prettier than the ones in bug 810105.  :(
Comment 41 Justin Lebar (not reading bugmail) 2012-11-20 14:05:23 PST
Oh jeez.

If you pass -i to addr2line, it will occasionally print four lines instead of the expected two (because it's giving you the line of the real function and the inlined function).  I'd only read two lines out of addr2line, so the extra lines would slowly build up, causing the stacks to appear rotated.
Comment 42 Justin Lebar (not reading bugmail) 2012-11-20 14:06:56 PST
Created attachment 683738 [details]
Actually good stacks, I think.

This was generated with --no-merge-exidx-entries, but I bet I don't need that, in the end...
Comment 43 Nicholas Nethercote [:njn] 2012-11-20 15:06:42 PST
Attachment 683738 [details] shows 272 KiB taken up by the gzipper for the memory dumping, under stacks like this:

  MOZ_Z_deflateInit2_ modules/zlib/src/deflate.c:297
  gz_init src/modules/zlib/src/gzwrite.c:44
  MOZ_Z_gzwrite src/modules/zlib/src/gzwrite.c:197
  nsGZFileWriter::Write(nsACString_internal const&) xpcom/base/nsGZFileWriter.cpp:73
  nsIGZFileWriter::Write(char const*, unsigned int) B2G/objdir-gecko/dist/include/nsIGZFileWriter.h:51
  nsMemoryInfoDumper::DumpMemoryReportsToFileImpl(nsAString_internal const&) xpcom/base/nsMemoryInfoDumper.cpp:651
  nsMemoryInfoDumper::DumpMemoryReportsToFile(nsAString_internal const&, bool, bool) xpcom/base/nsMemoryInfoDumper.cpp:367

That's big enough that it would be worth having a reporter for it if it's not too difficult.  zlib is unmodified 3rd party code, but it looks like you can provide a custom allocator (z_stream_s.{zalloc,zfree}) so it shouldn't be too hard.
Comment 44 Mike Hommey [:glandium] 2012-11-20 15:16:09 PST
Note we use system zlib on android (and maybe b2g), and Linux distros usually use it as well, which makes the z_stream_s fields better anyways.
Comment 45 Nicholas Nethercote [:njn] 2012-11-20 15:31:47 PST
I filed 813771 about the zlib reporting.
Comment 46 Justin Lebar (not reading bugmail) 2012-11-20 16:25:39 PST
I intend to make this easier to use, but if anyone wants to use the patches I'm about to post, here are instructions:

Apply B2G, gecko, and gonk-misc patches.
$ rm -rf objdir-gecko
$ ./build.sh gecko && ./flash.sh gecko
$ adb shell setprop libc.malloc.debug 1
$ adb shell stop b2g
$ adb shell start b2g
... do whatever on the phone
$ ./get_about_memory.py --keep
Ignore errors about being unable to merge reports.
$ ./parse_malloc_profile.py about-memory-N/memory-report-XXX-YYY.json.gz
Comment 47 Justin Lebar (not reading bugmail) 2012-11-20 16:28:25 PST
To apply B2G patch,

  (From root B2G checkout)
  $ git remote add jlebar git://github.com/jlebar/B2G.git
  $ git fetch jlebar
  $ git checkout jlebar/malloc-profile
Comment 48 Justin Lebar (not reading bugmail) 2012-11-20 16:29:47 PST
Created attachment 683792 [details] [diff] [review]
gonk-misc patch, turns on -funwind-tables, and disables jemalloc

To apply, start at your B2G root checkout.  Then

  $ cd gonk-misc
  $ git apply <(curl -L http://url/of/this/attachment)
Comment 49 Justin Lebar (not reading bugmail) 2012-11-20 16:33:33 PST
To apply gecko patch, start at your B2G root checkout, then

  $ git remote add jlebar git://github.com/jlebar/mozilla-central.git
  $ git fetch jlebar
  $ git checkout jlebar/bionic-heap-dump

or alternatively, apply the patch that I'll post in a moment.
Comment 50 Justin Lebar (not reading bugmail) 2012-11-20 16:34:18 PST
Created attachment 683795 [details] [diff] [review]
Usable WIP Gecko patch for heap dumping, v1
Comment 51 Justin Lebar (not reading bugmail) 2012-11-20 16:38:41 PST
> $ adb shell setprop libc.malloc.debug 1

Gah, it's libc.debug.malloc 1, sorry.
Comment 52 Justin Lebar (not reading bugmail) 2012-11-20 16:41:05 PST
Created attachment 683797 [details]
Heap profile of gallery app
Comment 53 Justin Lebar (not reading bugmail) 2012-11-20 16:41:48 PST
Created attachment 683798 [details]
Heap profile of main process with gallery app in foreground
Comment 54 Nicholas Nethercote [:njn] 2012-11-20 20:25:10 PST
In attachment 683798 [details] there are 611,304 bytes in stacks like this one:

  malloc /home/jlebar/code/moz/B2G/bionic/libc/bionic/malloc_debug_common.c:221
  nsTArray_base<nsTArrayInfallibleAllocator>::EnsureCapacity(unsigned int, unsigned int) objdir-gecko/dist/include/nsTArray-inl.h:120
  mozilla::layers::Layer::GetNextSibling() objdir-gecko/dist/include/Layers.h:790
  mozilla::layers::Layer::GetNextSibling() objdir-gecko/dist/include/Layers.h:790
  mozilla::layers::CompositorParent::TransformShadowTree(mozilla::TimeStamp) gfx/layers/ipc/CompositorParent.cpp:828
  mozilla::layers::CompositorParent::Composite() gfx/layers/ipc/CompositorParent.cpp:552
  RunnableMethod<mozilla::layers::ImageContainerChild, void (mozilla::layers::ImageContainerChild::*)(), Tuple0>::Run()ipc/chromium/src/base/task.h:308
  MessageLoop::RunTask(Task*) /home/jlebar/code/moz/ff-git2/src/ipc/chromium/src/base/message_loop.cc:334

I don't entirely understand this stack trace -- GetNextSibling() just looks like this:

  Layer* GetNextSibling() { return mNextSibling; }

so I don't know how the EnsureCapacity() call can be above it on the stack.  But it looks like the CompositorParent should be measured.  Maybe sCurrentCompositor in gfx/layers/ipc/CompositorParent.cpp?  Or sCompositorParent in widget/android/nsWindow.h?
Comment 55 Justin Lebar (not reading bugmail) 2012-11-20 22:22:05 PST
Created attachment 683892 [details]
Heap profile of main process with browser app (nytimes.com) in foreground

The gallery app is in the background.
Comment 56 Justin Lebar (not reading bugmail) 2012-11-20 22:24:31 PST
Created attachment 683893 [details]
Heap profile of browser process with nytimes.com loaded
Comment 57 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-20 23:24:28 PST
That stack doesn't make much sense.  My only guess is that you might be hitting

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp#697
Comment 58 Nicholas Nethercote [:njn] 2012-11-21 14:29:21 PST
> Heap profile of browser process with nytimes.com loaded

How annoying -- the top stack listed, which accounts for almost 1.2 MiB, is completely unhelpful.
Comment 59 Nicholas Nethercote [:njn] 2012-11-21 16:33:23 PST
jlebar, did your build have bug 811596's patch in it?  I hope not, because I think this stack should be much smaller (4 or 8 KiB) when that patch is present.

131,073B in one alloc
  get_backtrace bionic/libc/bionic/malloc_debug_leak.c:258
  malloc bionic/libc/bionic/malloc_debug_common.c:221
  operator new(unsigned int) bionic/libstdc++/src/new.cpp:18
  __stl_new ndk/sources/cxx-stl/stlport/stlport/stl/_new.h:134
  std::priv::_String_base<char, std::allocator<char> >::_M_Start() const ndk/sources/cxx-stl/stlport/stlport/stl/_string_base.h:65
  IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) chromium/src/chrome/common/ipc_channel_posix.cc:747
  base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) ipc/chromium/src/base/message_pump_libevent.cc:216
Comment 60 Nicholas Nethercote [:njn] 2012-11-21 20:06:48 PST
This is only semi-related, but I had to mention it because I learned about it while investigating these profiles...

PLArenaPool has a global free list.  If you call PL_FreeArenaPool() (as opposed to PL_FinishArenaPool()) then any in-use chunks get put onto the free list, regardless of their size!  I.e. the free list contains chunks of varying sizes.

Furthermore, any pool allocation request which requires a new chunk will, before calling malloc, check the free list to see if there's a big-enough chunk.  So when you create an arena pool with chunks of, say, 8 KiB, there's no guarantee that you'll actually be getting 8 KiB chunks.  The chunks you get depend entirely on what other arena pools are doing.  Fantastic.
Comment 61 Justin Lebar (not reading bugmail) 2012-11-21 20:55:49 PST
> jlebar, did your build have bug 811596's patch in it?

No.  I checked by doing

  $ git log --grep 811596 jlebar/bionic-heap-dump

(see comment 47 for pre-conditions).

> PLArenaPool has a global free list.

We should probably just remove this unconditionally, since jemalloc keeps a global free-list.  But if we can't do so, we should disable this free-list for our profiling builds, for sure.

I'd love just to remove PLArenaPool altogether.  In fact, didn't we have a bug on this?
Comment 62 Nicholas Nethercote [:njn] 2012-11-21 21:37:27 PST
> We should probably just remove this unconditionally, since jemalloc keeps a
> global free-list.  But if we can't do so, we should disable this free-list
> for our profiling builds, for sure.

I just filed bug 814312 which avoids using it in the two significant cases within Gecko.


> I'd love just to remove PLArenaPool altogether.  In fact, didn't we have a
> bug on this?

Nope.  It's used all over the place.  Besides, arena-style allocation is quite reasonable in many contexts... are you thinking of replacing it with a better arena allocator, or avoiding arena allocation?
Comment 63 Justin Lebar (not reading bugmail) 2012-12-10 08:22:02 PST
This is subsumed by bug 717853, which we're getting ready to land.

Alan, thanks a lot for your work here.  In particular, it was really helpful that you demonstrated that _Unwind_Backtrace worked on ARM.

Note You need to log in before you can comment on or make changes to this bug.