Closed Bug 823354 Opened 9 years ago Closed 7 years ago
DMD reports on Fennec don't have stack traces
I built and ran DMD on fennec using the instructions on the wiki page at https://wiki.mozilla.org/Performance/MemShrink/DMD, but the file that was dumped out looked like the one at http://people.mozilla.com/~kgupta/bug/769761-dmd.txt (that is, no stack traces).
9 years ago
Interesting. Can you check whether you're building with -funwind-tables? That should be turned on as I read configure.in, but maybe it's not for some reason. (Just do a clean build and look at one of the invocations of g++.)
Yup, -funwind-tables is being passed to g++
If I was looking into this, I'd use gdb to look at what happens inside NS_StackWalk and then possibly step into _Unwind_Backtrace, assuming that's called.
Try adding --es env2 MOZ_LINKER_EXTRACT=1
I still get the same results with that.
Assignee: nobody → bugmail.mozilla
Whiteboard: [MemShrink] → [MemShrink:P2]
It should be possible to reuse the breakpad unwinding infrastructure that we're using for profiling to fix this (bug 779291)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #7) > It should be possible to reuse the breakpad unwinding infrastructure that > we're using for profiling to fix this (bug 779291) DMD just uses NS_StackWalk. Does NS_StackWalk need to be hooked up to the new unwinder?
(In reply to Nicholas Nethercote [:njn] from comment #8) > (In reply to Jeff Muizelaar [:jrmuizel] from comment #7) > > It should be possible to reuse the breakpad unwinding infrastructure that > > we're using for profiling to fix this (bug 779291) > > DMD just uses NS_StackWalk. Does NS_StackWalk need to be hooked up to the > new unwinder? I would expect so.
8 years ago
7 years ago
Assignee: bugmail.mozilla → nobody
kats, are both -fno-omit-frame-pointer and -funwind-tables being used? I ask because I just discovered that using --enable-profiling fixed the problems we had with stack unwinding on Mac opt builds.
The instructions on the wiki page at https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD#Fennec_2 no longer seem sufficient to enable DMD. I built with --enable-dmd and started fennec with the right environment variables but didn't see the "DMD is enabled" output anywhere, and dumping memory reports doesn't dump any DMD reports. I'm not sure what changed, do I need to do something special to build the replace-malloc code in?
(In reply to Kartikaya Gupta (email:firstname.lastname@example.org) from comment #11) > The instructions on the wiki page at > https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD#Fennec_2 no > longer seem sufficient to enable DMD. I built with --enable-dmd and started > fennec with the right environment variables but didn't see the "DMD is > enabled" output anywhere, and dumping memory reports doesn't dump any DMD > reports. I'm not sure what changed, do I need to do something special to > build the replace-malloc code in? glandium tweaked things a bit... setting $DMD to 1 is no longer necessary at startup; just setting MOZ_REPLACE_MALLOC_LIB to libdmd should suffice. So I don't know what's wrong. glandium, any ideas?
It could be a number of things, but without more details, I can't tell. Does logcat say something? Does it say something about libdmd if you run with MOZ_DEBUG_LINKER=1? Try to see if replace_init is called?
As far as I can tell replace_malloc.c is getting compiled but the init() function in that file is never getting run. I added __android_log_print calls there that never show up in the logcat.
Wait that might not be right. I had linker debugging enabled as well and it might just have flooded the logcat so the print statement got dropped. When I disable linker debugging I see the log. Digging further...
Ah, what appears to be happening is that the MOZ_REPLACE_MALLOC_LIB env var is set at some point after the replace_malloc code is initialized. The env vars are set from Java in setupGeckoEnvironment at http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/GeckoThread.java?rev=b4628cb58bb8#112 before any libraries are loaded from java, but I guess this is too late already.
Appears to be a catch-22. Calling putenv requires loading mozglue, but mozglue assumes that the environment variables are already in place when it is initialized. The only way to break this I can think of is to move the native putenv implementation (http://mxr.mozilla.org/mozilla-central/source/mozglue/android/nsGeckoUtils.cpp#17) into a separate library that gets loaded even before mozglue. sigh.
The only other idea I had was to use the wrapper hook  that Android provides to set the environment variable before even starting up fennec but when I tried that it didn't seem to be working. Not sure why, maybe it's just busted in the version of Android I have on my phone.  see for example the latter half of https://staktrace.com/spout/entry.php?id=762 which describes how to use it in the context of using valgrind
Oh! Apparently I need to the setprop stuff as root, not as a regular user. With that I can start Fennec with DMD. I don't have time at the moment but I'll update the wiki instructions with this later (ni to myself so I don't forget). I pulled a DMD log (this is just with the default DMD option; the only env var I set was MOZ_REPLACE_MALLOC_LIB) and am attaching it. It has some symbol information but the format of the file is different from what I remember so I'll need to figure out if it contains all the info we want (or njn, maybe you can tell just by looking at it).
Thanks, kats. The output file is now JSON and you pass it to $OBJDIR/dist/bin/dmd.py to get human-readable output. (The docs are now at https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD.) The output looks so-so. Here's one example stack trace: > #01: replace_malloc[libdmd.so +0x272a] > #02: malloc[libmozglue.so +0x24c3c] It's uselessly short, and there are no source locations -- it looks like it needs to be passed through fix_linux_stack.py or a similar script, but I don't know if such a thing exists for Fennec. Here's a representative one: > #01: replace_malloc[libdmd.so +0x272a] > #02: malloc[libmozglue.so +0x24c3c] > #03: Java_org_mozilla_gecko_mozglue_DirectBufferAllocator_nativeAllocateDirectBuffer[libmozglue.so +0x32446] > #04: dvmPlatformInvoke[libdvm.so +0x1e294] > #05: _Z16dvmCallJNIMethodPKjP6JValuePK6MethodP6Thread[libdvm.so +0x4d414] > #06: ???[libdvm.so +0x276a4] > #07: _Z12dvmInterpretP6ThreadPK6MethodP6JValue[libdvm.so +0x2b580] > #08: _Z14dvmCallMethodVP6ThreadPK6MethodP6ObjectbP6JValueSt9__va_list[libdvm.so +0x5fc34] > #09: _Z13dvmCallMethodP6ThreadPK6MethodP6ObjectP6JValuez[libdvm.so +0x5fc5e] > #10: ???[libdvm.so +0x547da] > #11: __thread_entry[libc.so +0xe3dc] > #12: pthread_create[libc.so +0xdac8] This one is long enough to be useful, but again no source locations, and the function names are mangled. So, things are better than they were when the bug was open and stack traces were entirely empty, but the stack traces still aren't all that useful.
fix_linux_stack.py should work, as long as it can feed the right .so files to addr2line, and as long as the addr2line in the PATH understands arm.
Running dmd.py from my $OBJDIR/dist/bin seemed to work in that it found the libraries and symbolicated stuff. It couldn't do things from libdvm.so and other android libraries, but it did libxul and libmozglue just fine. I've updated the instructions at https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD to describe the Android setup. I think that means there's nothing left to do in this bug.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.