Last Comment Bug 823354 - DMD reports on Fennec don't have stack traces
: DMD reports on Fennec don't have stack traces
Status: RESOLVED FIXED
[MemShrink:P2]
:
Product: Core
Classification: Components
Component: General (show other bugs)
: 19 Branch
: ARM Android
: -- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on:
Blocks: 769761 976984
  Show dependency treegraph
 
Reported: 2012-12-19 18:45 PST by Kartikaya Gupta (email:kats@mozilla.com)
Modified: 2014-12-29 12:13 PST (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
DMD dump from recent fennec (6.53 KB, application/x-gzip)
2014-12-24 08:35 PST, Kartikaya Gupta (email:kats@mozilla.com)
no flags Details
Human-readable DMD output (12.31 KB, text/plain)
2014-12-25 15:03 PST, Nicholas Nethercote [:njn]
no flags Details

Description Kartikaya Gupta (email:kats@mozilla.com) 2012-12-19 18:45:44 PST
I built and ran DMD on fennec using the instructions on the wiki page at https://wiki.mozilla.org/Performance/MemShrink/DMD, but the file that was dumped out looked like the one at http://people.mozilla.com/~kgupta/bug/769761-dmd.txt (that is, no stack traces).
Comment 1 Justin Lebar (not reading bugmail) 2012-12-19 19:12:29 PST
Interesting.  Can you check whether you're building with -funwind-tables?  That should be turned on as I read configure.in, but maybe it's not for some reason.

(Just do a clean build and look at one of the invocations of g++.)
Comment 2 Kartikaya Gupta (email:kats@mozilla.com) 2012-12-20 06:56:21 PST
Yup, -funwind-tables is being passed to g++
Comment 3 Justin Lebar (not reading bugmail) 2012-12-20 08:27:49 PST
If I was looking into this, I'd use gdb to look at what happens inside NS_StackWalk and then possibly step into _Unwind_Backtrace, assuming that's called.
Comment 4 Joe Cheng [:jcheng] 2013-01-04 02:18:01 PST
*** Bug 826561 has been marked as a duplicate of this bug. ***
Comment 5 Mike Hommey [:glandium] 2013-01-04 02:24:49 PST
Try adding --es env2 MOZ_LINKER_EXTRACT=1
Comment 6 Kartikaya Gupta (email:kats@mozilla.com) 2013-01-04 09:26:01 PST
I still get the same results with that.
Comment 7 Jeff Muizelaar [:jrmuizel] 2013-03-07 05:36:57 PST
It should be possible to reuse the breakpad unwinding infrastructure that we're using for profiling to fix this (bug 779291)
Comment 8 Nicholas Nethercote [:njn] 2013-03-07 17:03:19 PST
(In reply to Jeff Muizelaar [:jrmuizel] from comment #7)
> It should be possible to reuse the breakpad unwinding infrastructure that
> we're using for profiling to fix this (bug 779291)

DMD just uses NS_StackWalk.  Does NS_StackWalk need to be hooked up to the new unwinder?
Comment 9 Jeff Muizelaar [:jrmuizel] 2013-03-08 04:29:06 PST
(In reply to Nicholas Nethercote [:njn] from comment #8)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #7)
> > It should be possible to reuse the breakpad unwinding infrastructure that
> > we're using for profiling to fix this (bug 779291)
> 
> DMD just uses NS_StackWalk.  Does NS_StackWalk need to be hooked up to the
> new unwinder?

I would expect so.
Comment 10 Nicholas Nethercote [:njn] 2014-12-17 16:50:57 PST
kats, are both -fno-omit-frame-pointer and -funwind-tables being used? I ask because I just discovered that using --enable-profiling fixed the problems we had with stack unwinding on Mac opt builds.
Comment 11 Kartikaya Gupta (email:kats@mozilla.com) 2014-12-18 11:30:47 PST
The instructions on the wiki page at https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD#Fennec_2 no longer seem sufficient to enable DMD. I built with --enable-dmd and started fennec with the right environment variables but didn't see the "DMD is enabled" output anywhere, and dumping memory reports doesn't dump any DMD reports. I'm not sure what changed, do I need to do something special to build the replace-malloc code in?
Comment 12 Nicholas Nethercote [:njn] 2014-12-18 14:06:23 PST
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #11)
> The instructions on the wiki page at
> https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD#Fennec_2 no
> longer seem sufficient to enable DMD. I built with --enable-dmd and started
> fennec with the right environment variables but didn't see the "DMD is
> enabled" output anywhere, and dumping memory reports doesn't dump any DMD
> reports. I'm not sure what changed, do I need to do something special to
> build the replace-malloc code in?

glandium tweaked things a bit... setting $DMD to 1 is no longer necessary at startup; just setting MOZ_REPLACE_MALLOC_LIB to libdmd should suffice. So I don't know what's wrong. glandium, any ideas?
Comment 13 Mike Hommey [:glandium] 2014-12-18 23:18:28 PST
It could be a number of things, but without more details, I can't tell. Does logcat say something? Does it say something about libdmd if you run with MOZ_DEBUG_LINKER=1? Try to see if replace_init is called?
Comment 14 Kartikaya Gupta (email:kats@mozilla.com) 2014-12-22 11:25:56 PST
As far as I can tell replace_malloc.c is getting compiled but the init() function in that file is never getting run. I added __android_log_print calls there that never show up in the logcat.
Comment 15 Kartikaya Gupta (email:kats@mozilla.com) 2014-12-22 11:28:33 PST
Wait that might not be right. I had linker debugging enabled as well and it might just have flooded the logcat so the print statement got dropped. When I disable linker debugging I see the log. Digging further...
Comment 16 Kartikaya Gupta (email:kats@mozilla.com) 2014-12-22 11:44:30 PST
Ah, what appears to be happening is that the MOZ_REPLACE_MALLOC_LIB env var is set at some point after the replace_malloc code is initialized. The env vars are set from Java in setupGeckoEnvironment at http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/GeckoThread.java?rev=b4628cb58bb8#112 before any libraries are loaded from java, but I guess this is too late already.
Comment 17 Kartikaya Gupta (email:kats@mozilla.com) 2014-12-22 11:52:05 PST
Appears to be a catch-22. Calling putenv requires loading mozglue, but mozglue assumes that the environment variables are already in place when it is initialized. The only way to break this I can think of is to move the native putenv implementation (http://mxr.mozilla.org/mozilla-central/source/mozglue/android/nsGeckoUtils.cpp#17) into a separate library that gets loaded even before mozglue. sigh.
Comment 18 Kartikaya Gupta (email:kats@mozilla.com) 2014-12-22 14:00:16 PST
The only other idea I had was to use the wrapper hook [1] that Android provides to set the environment variable before even starting up fennec but when I tried that it didn't seem to be working. Not sure why, maybe it's just busted in the version of Android I have on my phone.

[1] see for example the latter half of https://staktrace.com/spout/entry.php?id=762 which describes how to use it in the context of using valgrind
Comment 19 Kartikaya Gupta (email:kats@mozilla.com) 2014-12-24 08:35:17 PST
Created attachment 8541265 [details]
DMD dump from recent fennec

Oh! Apparently I need to the setprop stuff as root, not as a regular user. With that I can start Fennec with DMD. I don't have time at the moment but I'll update the wiki instructions with this later (ni to myself so I don't forget).

I pulled a DMD log (this is just with the default DMD option; the only env var I set was MOZ_REPLACE_MALLOC_LIB) and am attaching it. It has some symbol information but the format of the file is different from what I remember so I'll need to figure out if it contains all the info we want (or njn, maybe you can tell just by looking at it).
Comment 20 Nicholas Nethercote [:njn] 2014-12-25 15:03:57 PST
Created attachment 8541521 [details]
Human-readable DMD output

Thanks, kats. The output file is now JSON and you pass it to $OBJDIR/dist/bin/dmd.py to get human-readable output. (The docs are now at https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD.)

The output looks so-so. Here's one example stack trace:

> #01: replace_malloc[libdmd.so +0x272a]
> #02: malloc[libmozglue.so +0x24c3c]

It's uselessly short, and there are no source locations -- it looks like it needs to be passed through fix_linux_stack.py or a similar script, but I don't know if such a thing exists for Fennec.

Here's a representative one:

> #01: replace_malloc[libdmd.so +0x272a]
> #02: malloc[libmozglue.so +0x24c3c]
> #03: Java_org_mozilla_gecko_mozglue_DirectBufferAllocator_nativeAllocateDirectBuffer[libmozglue.so +0x32446]
> #04: dvmPlatformInvoke[libdvm.so +0x1e294]
> #05: _Z16dvmCallJNIMethodPKjP6JValuePK6MethodP6Thread[libdvm.so +0x4d414]
> #06: ???[libdvm.so +0x276a4]
> #07: _Z12dvmInterpretP6ThreadPK6MethodP6JValue[libdvm.so +0x2b580]
> #08: _Z14dvmCallMethodVP6ThreadPK6MethodP6ObjectbP6JValueSt9__va_list[libdvm.so +0x5fc34]
> #09: _Z13dvmCallMethodP6ThreadPK6MethodP6ObjectP6JValuez[libdvm.so +0x5fc5e]
> #10: ???[libdvm.so +0x547da]
> #11: __thread_entry[libc.so +0xe3dc]
> #12: pthread_create[libc.so +0xdac8]

This one is long enough to be useful, but again no source locations, and the function names are mangled.

So, things are better than they were when the bug was open and stack traces were entirely empty, but the stack traces still aren't all that useful.
Comment 21 Mike Hommey [:glandium] 2014-12-25 15:11:24 PST
fix_linux_stack.py should work, as long as it can feed the right .so files to addr2line, and as long as the addr2line in the PATH understands arm.
Comment 22 Kartikaya Gupta (email:kats@mozilla.com) 2014-12-29 06:50:29 PST
Running dmd.py from my $OBJDIR/dist/bin seemed to work in that it found the libraries and symbolicated stuff. It couldn't do things from libdvm.so and other android libraries, but it did libxul and libmozglue just fine. I've updated the instructions at https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD to describe the Android setup. I think that means there's nothing left to do in this bug.
Comment 23 Nicholas Nethercote [:njn] 2014-12-29 12:13:40 PST
Thanks, kats!

Note You need to log in before you can comment on or make changes to this bug.