Closed Bug 855466 Opened 11 years ago Closed 7 years ago

Build Linux Nightly to support Breakpad Unwinding

Categories

(Core :: Gecko Profiler, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: BenWa, Unassigned)

References

Details

We need Linux Nightly to support breakpad Unwinding. We likely want these to ship to CFI information but anyways that makes breakpad unwinding work efficiently will do.
The x86-64 nightlies should already be fine, since they ship with .eh_frame and function symbols.

I'm not sure what we'd need to do to make the 32-bit nightlies usable. Shipping them completely unstripped isn't workable because libxul is huge, and we don't actually need all that info. I'm not sure if there's a way to either produce the equivalent of .eh_frame on x86 or strip everything but the CFI info.
OS: Mac OS X → Linux
Hardware: x86 → x86_64
x86-64 should be fine, right? It's just x86 that we need to do something special for.
I just tried this out, with both 32- and 64-bit nightlies.  Starting
them thusly:

MOZ_PROFILER_NEW=1 MOZ_PROFILER_INTERVAL=100 MOZ_PROFILER_MODE=native \
  ./firefox/firefox -P dev -no-remote

in both cases they appear to load CFI and start native unwinds, etc.
In particular, in both cases I get

dump_symbols.cc:533: INFO: LoadSymbols: BEGIN   /path/to/firefox/libxul.so
dump_symbols.cc:633: INFO: LoadSymbols:   read CFI from .eh_frame
dump_symbols.cc:713: INFO: LoadSymbols: SUCCESS /path/to/firefox/libxul.so

In the 64-bit case, I can then back up in Cleopatra in the inverted
callstack after a bit of idling, ending up at XRE_Main::XRE_Main,
which is pretty convincing.  Unfortunately not so in the 32-bit case;
the stack seems pretty trashy.  I think the first priority is to
disable stack scanning (bug 855977) so we can see when the "reliable"
schemes (CFI, frame-pointer) are failing.

It might also be really useful to improve breakpad's logging, so
instead of saying just

   read CFI from .eh_frame

it says something like

   read CFI from .eh_frame, covering 1234567 text bytes

so we can get some idea of whether any useful amount of CFI was
obtained, or not.
Exciting!

(In reply to Julian Seward from comment #3)
> It might also be really useful to improve breakpad's logging, so
> instead of saying just
> 
>    read CFI from .eh_frame
> 
> it says something like
> 
>    read CFI from .eh_frame, covering 1234567 text bytes
> 
> so we can get some idea of whether any useful amount of CFI was
> obtained, or not.

I think such changes would be readily approved.
Oh, I didn't realize we had a .eh_frame section on x86. On x86-64 that's part of the ABI. I wonder if the compiler just isn't producing CFI in there for all the functions we care about, since it's probably only required to do so for things that could throw C++ exceptions.
Here's readelf -S output for both 64- and 32-bit Linux nightlies:

64-bit
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [14] .eh_frame_hdr     PROGBITS        0000000002359cb0 2359cb0 18b244 00   A  0   0  4
  [15] .eh_frame         PROGBITS        00000000024e4ef8 24e4ef8 6ea8cc 00   A  0   0  8

32-bit
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [14] .eh_frame_hdr     PROGBITS        020886e4 20886e4 0000b4 00   A  0   0  4
  [15] .eh_frame         PROGBITS        02089798 2088798 00040c 00  WA  0   0  4

So (as expected) the 32 bit .eh_frame section is almost empty.  Looks
like we'll need to build it with -fasynchronous-unwind-tables.
I was just about to tell you that you could slip that compiler flag into:
http://mxr.mozilla.org/mozilla-central/source/build/autoconf/frameptr.m4

when I suddenly remembered that we're already building our nightlies with --enable-profiling, which means they should have usable frame pointers which should be enough for unwinding. Are they not working?

I would assume that Linux system libraries would be built with frame pointers as well on x86.
So, the 32-bit story gets stranger.  Here's what there is for libxul
for nightlies, which IIUC are compiled with gcc-4.5:

> 32-bit
> [Nr] Name            Type       Addr     Off     Size   ES Flg Lk Inf Al
> [14] .eh_frame_hdr   PROGBITS   020886e4 20886e4 0000b4 00   A  0  0  4
> [15] .eh_frame       PROGBITS   02089798 2088798 00040c 00  WA  0  0  4

So .eh_frame is pretty much empty.

But for a build on 32-bit Ubuntu 12.04, using
browser/config/mozconfigs/linux32/nightly (is that the right one to
use?)  then I get an .eh_frame size of a6a834, which is way more
convincing.  And the stack traces from the profiler look plausible.

Ubuntu 12.04 uses gcc-4.6.3.  According to
http://gcc.gnu.org/gcc-4.6/changes.html, 4.6 started to use
-fomit-frame-pointer (and CFI).  Implication therefore is that the
nightlies don't unwind properly because they don't contain CFI, but
they do contain frame pointers, but breakpad isn't using them for some
reason.
Having dug around some more in this, I am of the view that we can't
properly assess what's going on until we have a way to disable stack
scanning.  Problem is that if better unwind methods fail, then it
always falls back to stack scanning, which provides semi-random 
semi-unreproducible stacks, which make it nearly impossible to
assess how well the non-scan methods have done.
Depends on: 855977
FTR, here's what is going on for the unwind data for gcc 4.5 vs 4.6 on
32-bit x86 Linux.

* In both cases, the code is built with frame pointers in.  That is
  the default for gcc 4.5, but only happens with 4.6, IIUC, because
  the nightly configs specify -fno-omit-frame-pointer.


* For both gcc 4.5 and 4.6, the libxul.so created (in the objdir)
  probably contains complete CFI.  But the placement in sections is
  different.  Here's 4.6:

  Name           Type      Addr     Off     Size   ES Flg Lk Inf Al
  .eh_frame_hdr  PROGBITS  01d661e0 1d661e0 14648c 00   A  0   0  4
  .eh_frame      PROGBITS  01ead46c 1ead46c a6a834 00  WA  0   0  4

  and 4.5:

  .eh_frame_hdr  PROGBITS  01e50694 1e50694 0000e4 00   A  0   0  4
  .eh_frame      PROGBITS  01e513c8 1e513c8 0004d0 00  WA  0   0  4
  .debug_frame   PROGBITS  00000000 22456a1c b88ba8 00     0   0  4

  Hence 4.6 puts all the CFI in .eh_frame, whereas 4.5 puts almost all
  of it in .debug_frame and only a tiny bit in .eh_frame.


* 'make package' nukes .debug_frame, leaving only .eh_frame_hdr and
  .eh_frame.  Hence the 4.6 build is left with full CFI and the 4.5
  build is left with almost none:

  4.6:
  .eh_frame_hdr  PROGBITS  01d661e0 1d661e0 14648c 00   A  0   0  4
  .eh_frame      PROGBITS  01ead46c 1ead46c a6a834 00  WA  0   0  4

  4.5
  .eh_frame_hdr  PROGBITS  01e50694 1e50694 0000e4 00   A  0   0  4
  .eh_frame      PROGBITS  01e513c8 1e513c8 0004d0 00  WA  0   0  4
Did you try -fasynchronous-unwind-tables on 4.5?
Yeah, -fasynchronous-unwind-tables at least gives plausible .eh_frame
with gcc-4.5:

sewardj@u1204x86:~/MOZ/TEST$ readelf -S -W firefox-GCC462/libxul.so | grep frame
  [14] .eh_frame_hdr     PROGBITS        01d661e0 1d661e0 14648c 00   A  0   0  4
  [15] .eh_frame         PROGBITS        01ead46c 1ead46c a6a834 00  WA  0   0  4

sewardj@u1204x86:~/MOZ/TEST$ readelf -S -W firefox-GCC453/libxul.so | grep frame
  [14] .eh_frame_hdr     PROGBITS        01e50694 1e50694 0000e4 00   A  0   0  4
  [15] .eh_frame         PROGBITS        01e513c8 1e513c8 0004d0 00  WA  0   0  4

sewardj@u1204x86:~/MOZ/TEST$ readelf -S -W firefox-GCC453-AUT/libxul.so | grep frame
  [14] .eh_frame_hdr     PROGBITS        01e50694 1e50694 17892c 00   A  0   0  4
  [15] .eh_frame         PROGBITS        01fca330 1fc9330 b76558 00  WA  0   0  4
At least to a first approximation, native unwind now works on 32 bit
nightlies, although it's hard to tell whether the stack traces are
bogus or not.
I just tried linux x64. We get about ~80% unwind (this will vary a lot across machine, system libs and use case). The quality was excellent expect that I couldn't unwind pass: (1) My driver, (2) System libs like gtk2, libc assembly function, (3) JS JIT frames. Once we support CFI+frame pointers this should solve (3) at least.
(In reply to Benoit Girard (:BenWa) from comment #14)
> Once we support CFI+frame pointers this should solve (3) at least.

breakpad on Linux x64 appears to support only CFI or stack scanning,
but no frame pointers.  Perhaps unsurprisingly as the ELF x86_64 ABI
doesn't use frame pointers.  So, I don't think we can do much better
here.  You could maybe try incrementally enabling stack-scanning
by setting MOZ_PROFILER_STACK_SCAN=1, =2, etc.
It wouldn't be hard to add frame pointer support to the x64 stackwalker, but I'm not sure how useful it'd be, since -fomit-frame-pointer is the ABI default.
Blocks: 898752
Julian, does any of this apply to LUL? Can this bug be closed because it was about the breakpad unwinder which we're not using for the profiler anymore?
Flags: needinfo?(jseward)
Markus, this can be closed.  This is unrelated to LUL.
Flags: needinfo?(jseward)
And Linux Nightlies support LUL unwinding by default. Closing this.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.