Open Bug 1082276 Opened 10 years ago Updated 2 years ago

NS_StackWalk in frame pointer mode can't walk through system libraries on Linux

Categories

(Core :: XPCOM, defect)

x86_64
Linux
defect

Tracking

()

People

(Reporter: jld, Unassigned)

References

Details

On x86 (and PPC) Linux, NS_StackWalk uses the x86 frame pointer ABI.  If the Gecko is a debugging and/or profiling build, this with work for Gecko's own libraries, but it probably won't for system libraries.  This is important if, for example, we're handling a crash in such a library and trying to print the stack of what caused it from inside the signal handler.  (This is especially relevant for system call sandboxing failures, because the forbidden system call is usually issued by a routine in libc.)  

Using exception handling tables (e.g., with _Unwind_Backtrace) might work, but there might be cases where that would break when it gets into Gecko (and frame pointer walking wouldn't).  If some sort of hybrid unwinding approach is called for, then perhaps the Lightweight Unwinding Library from the profiler could be used, at least on platforms where memory usually isn't scarce enough to cause problems for it.
What OS is this filed for? AFAIK the Windows system libraries always have a frame pointer because walking the stack without symbols is generally useful.

From http://msdn.microsoft.com/en-us/library/windows/desktop/ee416588%28v=vs.85%29.aspx
"Starting with Windows XP Service Pack 2, all Windows DLL and executable files are compiled with FPO disabled, because it makes debugging more accurate. Disabling FPO also allows sampling profilers to walk the stack during run-time, with minimal performance impact."
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #1)
> What OS is this filed for?

Linux.  (32-bit and 64-bit x86, and also seen on x86 B2G.  Probably also x86 Android, but I haven't tried that.)

> AFAIK the Windows system libraries always have a
> frame pointer because walking the stack without symbols is generally useful.

Thanks; that's useful to know.

On Linux we don't need debugging symbols, just the exception handling tables that are already in the library.  (On x86, and probably everything else besides ARM and IA64, that's almost but not quite the same format as debug unwind info: http://www.airs.com/blog/archives/460).

It looks like GCC 4.6, which turned on -fomit-frame-pointer by default, also turned on -fasynchronous-unwind-tables (both gated by the --enable-frame-pointer GCC configure flag).  So we should have full unwind tables on everything.  Including Firefox release builds — where, interestingly, libxul.so's EH sections are 15% of the size of the file and 10% of the total uncompressed size.  For glibc in particular, I've checked Debian and Ubuntu, and both of them have the EH tables.
Summary: NS_StackWalk in frame pointer mode can't walk through system libraries → NS_StackWalk in frame pointer mode can't walk through system libraries on Linux
I'm surprised that we have EH tables at all, considering that we disable exception handling. Presumably we won't ever need that data at runtime except when profiling. Can you make sure with glandium that our libxul.so preload doesn't bother reading that data in at all?
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> I'm surprised that we have EH tables at all, considering that we disable
> exception handling. Presumably we won't ever need that data at runtime
> except when profiling. Can you make sure with glandium that our libxul.so
> preload doesn't bother reading that data in at all?

Our readhead stuff only reads PT_LOAD sections, and the EH information lives in a different ELF segment (PT_GNU_EH_FRAME).
(In reply to Nathan Froyd (:froydnj) from comment #4)
> (In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> > I'm surprised that we have EH tables at all, considering that we disable
> > exception handling. Presumably we won't ever need that data at runtime
> > except when profiling. Can you make sure with glandium that our libxul.so
> > preload doesn't bother reading that data in at all?
> 
> Our readhead stuff only reads PT_LOAD sections, and the EH information lives
> in a different ELF segment (PT_GNU_EH_FRAME).

Ah, wait, no, that's not entirely true.  The EH information lives in the same PT_LOAD as .text, so we are reading that information when we preload.  There's no way around that.
In any case, I assume that the Linux (or other ELF-based Unix) builds where it would be worth removing or reducing the EH information (if possible) are builds that already don't have frame pointers — so switching NS_StackWalk to use EH unwinding wouldn't make them any worse.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.