Closed
Bug 326594
Opened 19 years ago
Closed 17 years ago
Consider using backtrace() for Unix stack traces
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.9beta2
People
(Reporter: dbaron, Assigned: karlt)
References
Details
Attachments
(2 files, 3 obsolete files)
11.26 KB,
patch
|
benjamin
:
review+
mtschrep
:
approval1.9+
|
Details | Diff | Splinter Review |
985 bytes,
patch
|
benjamin
:
review+
mtschrep
:
approval1.9+
|
Details | Diff | Splinter Review |
There's some work in bug 323853 for using backtrace() for Unix stack traces. It needs some further work (to handle unlimited stack depth), and perhaps some work on getting the tools we have that deal with stack traces to deal with its output.
Reporter | ||
Updated•19 years ago
|
Assignee: dbaron → nobody
Assignee | ||
Comment 1•17 years ago
|
||
Using the symbols in gcc's unwind.h directly instead of through glibc's backtrace would avoid the need for a buffer (of stack depth).
Assignee | ||
Comment 2•17 years ago
|
||
With the current logic in nsStackWalk I get SIGSEGV with *bp, when this walks of the top/base/root of the stack (even though the stack is not corrupted):
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/xpcom/base/nsStackWalk.cpp&rev=1.10&mark=1168#1154
(gdb) p bp
$15 = (void **) 0x6e09004f4e5f6f6e
(gdb) p *info
$17 = {buffer = 0x523d00, size = 22, entries = 3}
(gdb) p *info->buffer@3
$28 = {0x2adc14856c1e, 0x2adc168ab84a, 0x4f53492e4f4e5f62}
(gdb) bt
#0 0x00002adc1546ad20 in NS_StackWalk (
aCallback=0x2adc14856644 <stack_callback>, aSkipFrames=1,
aClosure=0x2adc14969b88)
at /home/karl/moz/mozilla/xpcom/base/nsStackWalk.cpp:1168
#1 0x00002adc148566e9 in backtrace (t=0x2adc14969b80, skip=1)
at /home/karl/moz/mozilla/tools/trace-malloc/lib/nsTraceMalloc.c:914
#2 0x00002adc14856c1e in calloc (count=1, size=296)
at /home/karl/moz/mozilla/tools/trace-malloc/lib/nsTraceMalloc.c:1108
#3 0x00002adc168ab84a in g_malloc0 () from /usr/lib/libglib-2.0.so.0
#4 0x00002adc16899915 in g_hash_table_resize ()
from /usr/lib/libglib-2.0.so.0
#5 0x00002adc168c8416 in g_get_language_names ()
from /usr/lib/libglib-2.0.so.0
#6 0x00002adc147134b9 in XRE_main (argc=3, argv=0x7fff966152e8,
aAppData=0x5b1200)
at /home/karl/moz/mozilla/toolkit/xre/nsAppRunner.cpp:2675
#7 0x0000000000401249 in main (argc=7, argv=0x7fff966152e8)
at /home/karl/moz/mozilla/browser/app/nsBrowserApp.cpp:153
(gdb) info f
Stack level 0, frame at 0x7fff966147c0:
rip = 0x2adc1546ad20 in NS_StackWalk
(/home/karl/moz/mozilla/xpcom/base/nsStackWalk.cpp:1168);
saved rip 0x2adc148566e9
called by frame at 0x7fff96614810
source language c++.
Arglist at 0x7fff966147b0, args: aCallback=0x2adc14856644 <stack_callback>,
aSkipFrames=1, aClosure=0x2adc14969b88
Locals at 0x7fff966147b0, Previous frame's sp is 0x7fff966147c0
Saved registers:
rbp at 0x7fff966147b0, rip at 0x7fff966147b8
(gdb) p *(void**)0x7fff966147b0
$30 = (void *) 0x7fff96614800
(gdb) p *((void**)0x7fff966147b0 + 1)
$31 = (void *) 0x2adc148566e9
(gdb) p *(void**)0x7fff96614800
$32 = (void *) 0x7fff96614870
(gdb) p *(void**)0x7fff96614870
$33 = (void *) 0x7fff966148f0
(gdb) p *(void**)0x7fff966148f0
$34 = (void *) 0x6e09004f4e5f6f6e
(gdb) p /x (void*)__libc_stack_end
$37 = 0xffffffff966152e0
The algorithm seems to work for most frames but not all.
Not sure if this is specific to x86_64 or gcc-4.1.2 or compilation flags for libglib, with would have been something like
"-O2 -pipe -march=nocona -frename-registers -fweb -funswitch-loops -momit-leaf-frame-pointer".
Assignee | ||
Comment 3•17 years ago
|
||
This solves the problem above for me.
We just need to decide which other platforms should use this method.
As gcc compiled the code, it seems most reliable to ask gcc how the stack should be unwound.
% readelf -s /lib/libgcc_s.so.1| grep Unwind_Backtrace
31: 0000000000007eb5 200 FUNC GLOBAL DEFAULT 11 _Unwind_Backtrace@@GCC_3.3
So I guess we can do this with gcc version >= 3.3.
Assignee: nobody → mozbugz
Status: NEW → ASSIGNED
Assignee | ||
Comment 4•17 years ago
|
||
OK, on __i386, _Unwind_Backtrace is only unwinding one frame. Will look at that.
Assignee | ||
Comment 5•17 years ago
|
||
_Unwind_Backtrace is only effective if the elf file contains unwind
information, the presence of which can be recognised by the .eh_frame_hdr
section (readelf -S).
Assignee | ||
Comment 6•17 years ago
|
||
On i386, this unwind information is included when compiling with -fexceptions
or -funwind-tables, and g++ (4.1.2) turns on -fexceptions by default for C++,
but we explicitly disable with -fno-exceptions.
glibc's i386 |backtrace| function first uses _Unwind_Backtrace to get as far
as it can, then continues with the saved-ebp-register-is-a-frame-pointer
method that we use (plus a couple more safety checks).
According to comments in glibc's i386 backtrace, the assumptions of the
saved-frame-pointer method are valid "with every stack frame if not compiled
without frame pointer" (-fomit-frame-pointer).
On i386, _Unwind_Backtrace will only be helpful for functions in external
libraries compiled from C++ or from C with -funwind-tables/exceptions. gcc
uses -fexceptions when building libgcc_s, and glibc uses -fexceptions for
source files that call user provided callbacks, so libc libm and libpthread
contain unwind info for at least some functions.
However glibc strongly recommends against using -fomit-frame-pointer, so our
saved-frame-pointer method should work fine there, and probably in most other
libraries.
If there are distributions using -fomit-frame-pointer then using
google-breakpad's StackwalkerX86 is likely more versatile than glibc's
combination of _Unwind_Backtrace and the saved-frame-pointer linking.
The main problem with our saved-frame-pointer method on i386 is that it
doesn't notice when its assumptions are invalid, resulting in a SIGSEGV.
google-breakpad's method looks like it safely tests memory before accessing
it. We could set up a SIGSEGV handler or use a syscall like mincore or write
(thanks, roc) to detect whether it is safe to access the memory.
Assignee | ||
Comment 7•17 years ago
|
||
On amd64, even gcc -fno-unwind-tables -fno-exceptions still generates a
.eh_frame_hdr section and _Unwind_Backtrace works fine even for the simplest
of C programs.
glibc's x86_64 backtrace only uses _Unwind_Backtrace, and that is the method I
think we should use on this platform.
On amd64, the saved-frame-pointer linkage assumptions are not safe. It seems
that, even when rbp is a frame pointer and saved, it is not necessarily saved
at the start of the frame, so our current method fails to find the register.
(gdb) info f
Stack level 0, frame at 0x7fff23ff8170:
rip = 0x2b4a87a89de6 in NS_StackWalk
(/home/karl/moz/mozilla/xpcom/base/nsStackWalk.cpp:1226);
saved rip 0x2b4a86e756e9
called by frame at 0x7fff23ff81c0
source language c++.
Arglist at 0x7fff23ff8160, args: aCallback=0x2b4a86e75644 <stack_callback>,
aSkipFrames=1, aClosure=0x2b4a86f88b88
Locals at 0x7fff23ff8160, Previous frame's sp is 0x7fff23ff8170
Saved registers:
rbp at 0x7fff23ff8160, rip at 0x7fff23ff8168
(gdb) p *(void**)0x7fff23ff8160
$4 = (void *) 0x7fff23ff81b0
(gdb) info f 1
Stack frame at 0x7fff23ff81c0:
rip = 0x2b4a86e756e9 in backtrace
(/home/karl/moz/mozilla/tools/trace-malloc/lib/nsTraceMalloc.c:914);
saved rip 0x2b4a86e75c1e
called by frame at 0x7fff23ff8230, caller of frame at 0x7fff23ff8170
source language c.
Arglist at 0x7fff23ff81b0, args: t=0x2b4a86f88b80, skip=1
Locals at 0x7fff23ff81b0, Previous frame's sp is 0x7fff23ff81c0
Saved registers:
rbp at 0x7fff23ff81b0, rip at 0x7fff23ff81b8
(gdb) p *(void**)0x7fff23ff81b0
$5 = (void *) 0x7fff23ff8220
(gdb) info f 2
Stack frame at 0x7fff23ff8230:
rip = 0x2b4a86e75c1e in calloc
(/home/karl/moz/mozilla/tools/trace-malloc/lib/nsTraceMalloc.c:1108);
saved rip 0x2b4a88eca84a
called by frame at 0x7fff23ff8250, caller of frame at 0x7fff23ff81c0
source language c.
Arglist at 0x7fff23ff8220, args: count=1, size=296
Locals at 0x7fff23ff8220, Previous frame's sp is 0x7fff23ff8230
Saved registers:
rbp at 0x7fff23ff8220, rip at 0x7fff23ff8228
(gdb) p *(void**)0x7fff23ff8220
$6 = (void *) 0x7fff23ff82a0
(gdb) info f 3
Stack frame at 0x7fff23ff8250:
rip = 0x2b4a88eca84a in g_malloc0; saved rip 0x2b4a88eb8915
called by frame at 0x7fff23ff8290, caller of frame at 0x7fff23ff8230
Arglist at 0x7fff23ff8230, args:
Locals at 0x7fff23ff8230, Previous frame's sp is 0x7fff23ff8250
Saved registers:
rbx at 0x7fff23ff8238, rbp at 0x7fff23ff8240, rip at 0x7fff23ff8248
(gdb) p *(void**)0x7fff23ff82a0
$7 = (void *) 0x6e09004f4e5f6f6e
(gdb) p *(void**)0x7fff23ff8240
$8 = (void *) 0x7fff23ff82a0
(gdb) info f 4
Stack frame at 0x7fff23ff8290:
rip = 0x2b4a88eb8915 in g_hash_table_resize; saved rip 0x2b4a88ee7416
called by frame at 0x7fff23ff83e0, caller of frame at 0x7fff23ff8250
Arglist at 0x7fff23ff8248, args:
Locals at 0x7fff23ff8248, Previous frame's sp is 0x7fff23ff8290
Saved registers:
rbx at 0x7fff23ff8258, rbp at 0x7fff23ff8260, r12 at 0x7fff23ff8268,
r13 at 0x7fff23ff8270, r14 at 0x7fff23ff8278, r15 at 0x7fff23ff8280,
rip at 0x7fff23ff8288
(gdb) p *(void**)0x7fff23ff8260
$11 = (void *) 0x7fff23ff82a0
(gdb) info f 5
Stack frame at 0x7fff23ff83e0:
rip = 0x2b4a88ee7416 in g_get_language_names; saved rip 0x2b4a86d32601
called by frame at 0x7fff23ff8b30, caller of frame at 0x7fff23ff8290
Arglist at 0x7fff23ff8288, args:
Locals at 0x7fff23ff8288, Previous frame's sp is 0x7fff23ff83e0
Saved registers:
rbx at 0x7fff23ff83a8, rbp at 0x7fff23ff83b0, r12 at 0x7fff23ff83b8,
r13 at 0x7fff23ff83c0, r14 at 0x7fff23ff83c8, r15 at 0x7fff23ff83d0,
rip at 0x7fff23ff83d8
(gdb) p *(void**)0x7fff23ff83b0
$12 = (void *) 0x7fff23ff8b20
(gdb) info f 6
Stack frame at 0x7fff23ff8b30:
rip = 0x2b4a86d32601 in XRE_main
(/home/karl/moz/mozilla/toolkit/xre/nsAppRunner.cpp:2675);
saved rip 0x401249
called by frame at 0x7fff23ff8bf0, caller of frame at 0x7fff23ff83e0
source language c++.
Arglist at 0x7fff23ff8b20, args: argc=3, argv=0x7fff23ff8cc8,
aAppData=0x5b1200
Locals at 0x7fff23ff8b20, Previous frame's sp is 0x7fff23ff8b30
Saved registers:
rbx at 0x7fff23ff8b00, rbp at 0x7fff23ff8b20, r12 at 0x7fff23ff8b08,
r13 at 0x7fff23ff8b10, r14 at 0x7fff23ff8b18, rip at 0x7fff23ff8b28
Assignee | ||
Comment 8•17 years ago
|
||
The ppc unwind code should be safe as the ABI requires it on linux at least.
The easy safety checks on bp for i386 are borrowed from glibc's i386 backtrace.
_Unwind_Backtrace will also be useful on ia64 (if anyone uses that), and this patch tries it out on other platforms that have the function. (It can't be worse than nothing.)
I also intend to swap the sun #ifdef/#else block to tidy things up but it's easier to see the changes like this.
Attachment #279157 -
Attachment is obsolete: true
Attachment #280850 -
Flags: review?(dbaron)
Assignee | ||
Comment 9•17 years ago
|
||
BTW, regarding __libc_stack_end:
Although it is not defined in any header file, it is not a GLIBC_PRIVATE symbol and glibc seems to expect external use. glibc-2.5/elf/Versions has:
GLIBC_2.1 {
# functions used in other libraries
_dl_mcount;
# historically used by Garbage Collectors
__libc_stack_end;
}
Reporter | ||
Comment 10•17 years ago
|
||
What OS+CPU combinations have you tested this on, and what testing did you do?
Assignee | ||
Comment 11•17 years ago
|
||
I've only tested this on Intel/Linux:
amd64 was Gentoo.
i386 was Ubuntu 7.04.
(Assuming I have the #if conditions right:)
* The only ppc change is the "long(next) & 3" which should be unnecessary but
provides half a safety check for corrupted stacks.
* The ppc64 change was from looking at glibc code but is not enabled.
* The other differences are in gcc platforms that previously returned
NS_ERROR_NOT_IMPLEMENTED but now give _Unwind_Backtrace a try.
Testing involved running with
"--trace-malloc malloc.log --shutdown-leaks sdleak.log"
and flicking through the output from uncategorized.pl to check that it looked reasonable.
Assignee | ||
Comment 12•17 years ago
|
||
The line counts in the last hunk of the patch are not right (emacs diff-mode didn't do all that I hoped), so the patch does not apply correctly.
I'll wait to see if there are further comments before updating the patch.
Assignee | ||
Comment 13•17 years ago
|
||
There is an issue with _Unwind_Backtrace that I just came across today:
If the pc jumps (function call) to dynamic library that has been unloaded,
then _Unwind_Backtrace fails with SIGSEGV.
The old code (happened to?) notice that the stack was invalid and so the walk was aborted gracefully in this case.
It still seems that _Unwind_Backtrace is successful a much greater proportion of the time.
Assignee | ||
Comment 14•17 years ago
|
||
This is the patch that I propose we land.
The changes are that the line counts are correct and the __sun-specific code is now first with the generic gcc code later, which makes the #if statements a little cleaner.
Attachment #280850 -
Attachment is obsolete: true
Attachment #284250 -
Flags: review?(dbaron)
Attachment #280850 -
Flags: review?(dbaron)
Reporter | ||
Comment 15•17 years ago
|
||
Comment on attachment 284250 [details] [diff] [review]
safety checks and libgcc_s v3
>Index: xpcom/base/nsStackWalk.cpp
>-/* vim: set shiftwidth=4 tabstop=8 autoindent cindent expandtab: */
>+/* -*- Mode: C; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*-
>+ * vim: set shiftwidth=4 tabstop=8 autoindent cindent expandtab: */
Mode: C++
In your changes to the old stack walking code:
>+#if (defined(__ppc__) && defined(XP_MACOSX)) || defined(__ppc64__)
What led you to add the __ppc64__ test? Do you know that it's correct?
>+ void *pc = *(bp+2);
>+#else // i386 or ppc32 linux
And do you know that this path is correct for ppc32 Linux?
>+ void *pc = *(bp+1);
>+#endif
(I'll comment on the new code shortly; I don't trust editing text here for too long...)
Reporter | ||
Comment 16•17 years ago
|
||
Comment on attachment 284250 [details] [diff] [review]
safety checks and libgcc_s v3
r+sr=dbaron with the comments above addressed. My main concern about the comments is that I'd rather say nothing than have comments asserting something that's not actually known or tested to be true -- I didn't see any mention that you'd tested on those systems, so I'm a little surprised to see the comments. (But I missed the above mention of glibc code and ppc64, so that one's ok.)
I'm also a little skeptical about the sanity-checking code in the old Linux NS_StackWalk -- if the stack tracing doesn't work, we may as well crash -- in situations where it doesn't work, people shouldn't be using it. That said, I'm ok with it.
Attachment #284250 -
Flags: superreview+
Attachment #284250 -
Flags: review?(dbaron)
Attachment #284250 -
Flags: review+
Assignee | ||
Comment 17•17 years ago
|
||
In reply to comment #15)
> Mode: C++
Done.
>
> In your changes to the old stack walking code:
> >+#if (defined(__ppc__) && defined(XP_MACOSX)) || defined(__ppc64__)
>
> What led you to add the __ppc64__ test? Do you know that it's correct?
>
> >+ void *pc = *(bp+2);
> >+#else // i386 or ppc32 linux
>
> And do you know that this path is correct for ppc32 Linux?
>
> >+ void *pc = *(bp+1);
> >+#endif
glibc-2.5/sysdeps/powerpc/powerpc32/backtrace.c has:
for ( count = 0;
current != NULL && count < size;
current = BOUNDED_1 (current->next), count++)
array[count] = current->return_address;
with
struct layout
{
struct layout *__unbounded next;
void *__unbounded return_address;
};
and includes the comment "Note that every routine is required by the ABI to lay out the stack like this".
I haven't tested on powerpc but this convinced me that our code was correct for powerpc linux.
The powerpc64 version differs only in:
struct layout
{
struct layout *__unbounded next;
long condition_register;
void *__unbounded return_address;
};
I also made the following changes:
-#if (defined(__ppc__) && defined(XP_MACOSX)) || defined(__ppc64__)
+#if (defined(__ppc__) && defined(XP_MACOSX)) || defined(__powerpc64__)
+ // ppc mac or powerpc64 linux
void *pc = *(bp+2);
-#else // i386 or ppc32 linux
+#else // i386 or powerpc32 linux
void *pc = *(bp+1);
#endif
__powerpc64__ is the preprocessor symbol to use with gcc/Linux.
I don't know how Darwin 64-bit stacks are laid out so I've added the comment to indicate that we only know that path to be correct for "ppc mac or powerpc64 linux".
Assignee | ||
Comment 18•17 years ago
|
||
Comment on attachment 285565 [details] [diff] [review]
safety checks and libgcc_s v4
Benjamin, should I get review from you on this too?
Attachment #285565 -
Flags: review?(benjamin)
Reporter | ||
Comment 19•17 years ago
|
||
comment 17 sounds good.
Comment 20•17 years ago
|
||
Comment on attachment 285565 [details] [diff] [review]
safety checks and libgcc_s v4
Was removing the x86-64 ifdef conditions intentional?
Assignee | ||
Comment 21•17 years ago
|
||
(In reply to comment #20)
> Was removing the x86-64 ifdef conditions intentional?
Yes, the saved-frame-pointer linkage assumptions in the old method are not safe
for x86_64 (comment 7 goes into more detail), and glibc's x86_64 backtrace just uses _Unwind_Backtrace.
Removing the x86_64 ifdef conditions means we use _Unwind_Backtrace also (and also removes "__asm__( "movq %%rbp, %0" : "=g"(bp));" which we no longer use).
HAVE_DLADDR && HAVE__UNWIND_BACKTRACE will be true when defined(linux) && defined(__GNUC__) && defined(__x86_64__).
Updated•17 years ago
|
Attachment #285565 -
Flags: review?(benjamin) → review+
Reporter | ||
Comment 22•17 years ago
|
||
<bsmedberg> dbaron: is there a point in reviewing/taking bug 326594 (use backtrace)?
<dbaron> bsmedberg, I don't think it needs your review separately unless you really want to; I think we should take it; it should help a *lot* on x86_64 Linux and some on i386 Linux.
Assignee | ||
Comment 23•17 years ago
|
||
Comment on attachment 285565 [details] [diff] [review]
safety checks and libgcc_s v4
Requesting aproval1.9 as this enables memory leak analysis and means that our crash handling doesn't crash (any where near as often) on x86_64. Any risk should be confined to debug or memory test builds.
Attachment #285565 -
Flags: approval1.9?
Updated•17 years ago
|
Attachment #285565 -
Flags: approval1.9? → approval1.9+
Assignee | ||
Comment 24•17 years ago
|
||
1.1887 mozilla/configure.in
1.11 mozilla/xpcom/base/nsStackWalk.cpp
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•17 years ago
|
Target Milestone: --- → mozilla1.9 M10
Comment 25•17 years ago
|
||
This breaks building on FreeBSD which has no <unwind.h>, but does have an __Unwind_backtrace(), presumably because it uses gcc 4.2. Please add a configure check for the header too. This is on FreeBSD-CURRENT-i386.
Assignee | ||
Comment 26•17 years ago
|
||
(In reply to comment #25)
gcc-4.2.2/gcc/Makefile.in looks to me like it is installs unwind.h into
$(libdir)/gcc/$(target_noncanonical)/$(version)/include
What are the contents of that directory on FreeBSD?
If unwind.h is not there, then can you check whether this patch fixes the issue on FreeBSD, please?
Attachment #284250 -
Attachment is obsolete: true
Comment 27•17 years ago
|
||
(In reply to comment #26)
> gcc-4.2.2/gcc/Makefile.in looks to me like it is installs unwind.h into
>
> $(libdir)/gcc/$(target_noncanonical)/$(version)/include
>
> What are the contents of that directory on FreeBSD?
FreeBSD doesn't use GCC's build infrastructure and doesn't have that directory. There are a bunch of unwind headers in the gcc source tree, but none are installed. I would have to start a thread on the freebsd mailing lists to find out why.
The patch, however, fixes the problem, and it's better style to explicitly check for the header anyway... Sorry about the delay in getting back to you.
Regards,
-Jeremy
Assignee | ||
Comment 28•17 years ago
|
||
Comment on attachment 288388 [details] [diff] [review]
check for unwind.h
I'm not an autoconf expert. Is this an appropriate way to check for unwind.h?
Attachment #288388 -
Flags: review?(benjamin)
Comment 29•17 years ago
|
||
Comment on attachment 288388 [details] [diff] [review]
check for unwind.h
Can I have it all on one line?
Attachment #288388 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 30•17 years ago
|
||
Comment on attachment 288388 [details] [diff] [review]
check for unwind.h
> Can I have it all on one line?
Sure.
Attachment #288388 -
Flags: approval1.9?
Comment 31•17 years ago
|
||
Reopening this since drivers don't seem to look at closed bugs for approval requests (ugh!). :(
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•17 years ago
|
Attachment #288388 -
Flags: approval1.9? → approval1.9+
Assignee | ||
Comment 32•17 years ago
|
||
Checked in attachment 288388 [details] [diff] [review], all on one line: revision 1.1891
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•