Closed Bug 855466 Opened 12 years ago Closed 8 years ago

Build Linux Nightly to support Breakpad Unwinding

Categories

(Core :: Gecko Profiler, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: BenWa, Unassigned)

References

Details

We need Linux Nightly to support breakpad Unwinding. We likely want these to ship to CFI information but anyways that makes breakpad unwinding work efficiently will do.
The x86-64 nightlies should already be fine, since they ship with .eh_frame and function symbols. I'm not sure what we'd need to do to make the 32-bit nightlies usable. Shipping them completely unstripped isn't workable because libxul is huge, and we don't actually need all that info. I'm not sure if there's a way to either produce the equivalent of .eh_frame on x86 or strip everything but the CFI info.
OS: Mac OS X → Linux
Hardware: x86 → x86_64
x86-64 should be fine, right? It's just x86 that we need to do something special for.
I just tried this out, with both 32- and 64-bit nightlies. Starting them thusly: MOZ_PROFILER_NEW=1 MOZ_PROFILER_INTERVAL=100 MOZ_PROFILER_MODE=native \ ./firefox/firefox -P dev -no-remote in both cases they appear to load CFI and start native unwinds, etc. In particular, in both cases I get dump_symbols.cc:533: INFO: LoadSymbols: BEGIN /path/to/firefox/libxul.so dump_symbols.cc:633: INFO: LoadSymbols: read CFI from .eh_frame dump_symbols.cc:713: INFO: LoadSymbols: SUCCESS /path/to/firefox/libxul.so In the 64-bit case, I can then back up in Cleopatra in the inverted callstack after a bit of idling, ending up at XRE_Main::XRE_Main, which is pretty convincing. Unfortunately not so in the 32-bit case; the stack seems pretty trashy. I think the first priority is to disable stack scanning (bug 855977) so we can see when the "reliable" schemes (CFI, frame-pointer) are failing. It might also be really useful to improve breakpad's logging, so instead of saying just read CFI from .eh_frame it says something like read CFI from .eh_frame, covering 1234567 text bytes so we can get some idea of whether any useful amount of CFI was obtained, or not.
Exciting! (In reply to Julian Seward from comment #3) > It might also be really useful to improve breakpad's logging, so > instead of saying just > > read CFI from .eh_frame > > it says something like > > read CFI from .eh_frame, covering 1234567 text bytes > > so we can get some idea of whether any useful amount of CFI was > obtained, or not. I think such changes would be readily approved.
Oh, I didn't realize we had a .eh_frame section on x86. On x86-64 that's part of the ABI. I wonder if the compiler just isn't producing CFI in there for all the functions we care about, since it's probably only required to do so for things that could throw C++ exceptions.
Here's readelf -S output for both 64- and 32-bit Linux nightlies: 64-bit [Nr] Name Type Address Off Size ES Flg Lk Inf Al [14] .eh_frame_hdr PROGBITS 0000000002359cb0 2359cb0 18b244 00 A 0 0 4 [15] .eh_frame PROGBITS 00000000024e4ef8 24e4ef8 6ea8cc 00 A 0 0 8 32-bit [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [14] .eh_frame_hdr PROGBITS 020886e4 20886e4 0000b4 00 A 0 0 4 [15] .eh_frame PROGBITS 02089798 2088798 00040c 00 WA 0 0 4 So (as expected) the 32 bit .eh_frame section is almost empty. Looks like we'll need to build it with -fasynchronous-unwind-tables.
I was just about to tell you that you could slip that compiler flag into: http://mxr.mozilla.org/mozilla-central/source/build/autoconf/frameptr.m4 when I suddenly remembered that we're already building our nightlies with --enable-profiling, which means they should have usable frame pointers which should be enough for unwinding. Are they not working? I would assume that Linux system libraries would be built with frame pointers as well on x86.
So, the 32-bit story gets stranger. Here's what there is for libxul for nightlies, which IIUC are compiled with gcc-4.5: > 32-bit > [Nr] Name Type Addr Off Size ES Flg Lk Inf Al > [14] .eh_frame_hdr PROGBITS 020886e4 20886e4 0000b4 00 A 0 0 4 > [15] .eh_frame PROGBITS 02089798 2088798 00040c 00 WA 0 0 4 So .eh_frame is pretty much empty. But for a build on 32-bit Ubuntu 12.04, using browser/config/mozconfigs/linux32/nightly (is that the right one to use?) then I get an .eh_frame size of a6a834, which is way more convincing. And the stack traces from the profiler look plausible. Ubuntu 12.04 uses gcc-4.6.3. According to http://gcc.gnu.org/gcc-4.6/changes.html, 4.6 started to use -fomit-frame-pointer (and CFI). Implication therefore is that the nightlies don't unwind properly because they don't contain CFI, but they do contain frame pointers, but breakpad isn't using them for some reason.
Having dug around some more in this, I am of the view that we can't properly assess what's going on until we have a way to disable stack scanning. Problem is that if better unwind methods fail, then it always falls back to stack scanning, which provides semi-random semi-unreproducible stacks, which make it nearly impossible to assess how well the non-scan methods have done.
Depends on: 855977
FTR, here's what is going on for the unwind data for gcc 4.5 vs 4.6 on 32-bit x86 Linux. * In both cases, the code is built with frame pointers in. That is the default for gcc 4.5, but only happens with 4.6, IIUC, because the nightly configs specify -fno-omit-frame-pointer. * For both gcc 4.5 and 4.6, the libxul.so created (in the objdir) probably contains complete CFI. But the placement in sections is different. Here's 4.6: Name Type Addr Off Size ES Flg Lk Inf Al .eh_frame_hdr PROGBITS 01d661e0 1d661e0 14648c 00 A 0 0 4 .eh_frame PROGBITS 01ead46c 1ead46c a6a834 00 WA 0 0 4 and 4.5: .eh_frame_hdr PROGBITS 01e50694 1e50694 0000e4 00 A 0 0 4 .eh_frame PROGBITS 01e513c8 1e513c8 0004d0 00 WA 0 0 4 .debug_frame PROGBITS 00000000 22456a1c b88ba8 00 0 0 4 Hence 4.6 puts all the CFI in .eh_frame, whereas 4.5 puts almost all of it in .debug_frame and only a tiny bit in .eh_frame. * 'make package' nukes .debug_frame, leaving only .eh_frame_hdr and .eh_frame. Hence the 4.6 build is left with full CFI and the 4.5 build is left with almost none: 4.6: .eh_frame_hdr PROGBITS 01d661e0 1d661e0 14648c 00 A 0 0 4 .eh_frame PROGBITS 01ead46c 1ead46c a6a834 00 WA 0 0 4 4.5 .eh_frame_hdr PROGBITS 01e50694 1e50694 0000e4 00 A 0 0 4 .eh_frame PROGBITS 01e513c8 1e513c8 0004d0 00 WA 0 0 4
Did you try -fasynchronous-unwind-tables on 4.5?
Yeah, -fasynchronous-unwind-tables at least gives plausible .eh_frame with gcc-4.5: sewardj@u1204x86:~/MOZ/TEST$ readelf -S -W firefox-GCC462/libxul.so | grep frame [14] .eh_frame_hdr PROGBITS 01d661e0 1d661e0 14648c 00 A 0 0 4 [15] .eh_frame PROGBITS 01ead46c 1ead46c a6a834 00 WA 0 0 4 sewardj@u1204x86:~/MOZ/TEST$ readelf -S -W firefox-GCC453/libxul.so | grep frame [14] .eh_frame_hdr PROGBITS 01e50694 1e50694 0000e4 00 A 0 0 4 [15] .eh_frame PROGBITS 01e513c8 1e513c8 0004d0 00 WA 0 0 4 sewardj@u1204x86:~/MOZ/TEST$ readelf -S -W firefox-GCC453-AUT/libxul.so | grep frame [14] .eh_frame_hdr PROGBITS 01e50694 1e50694 17892c 00 A 0 0 4 [15] .eh_frame PROGBITS 01fca330 1fc9330 b76558 00 WA 0 0 4
At least to a first approximation, native unwind now works on 32 bit nightlies, although it's hard to tell whether the stack traces are bogus or not.
I just tried linux x64. We get about ~80% unwind (this will vary a lot across machine, system libs and use case). The quality was excellent expect that I couldn't unwind pass: (1) My driver, (2) System libs like gtk2, libc assembly function, (3) JS JIT frames. Once we support CFI+frame pointers this should solve (3) at least.
(In reply to Benoit Girard (:BenWa) from comment #14) > Once we support CFI+frame pointers this should solve (3) at least. breakpad on Linux x64 appears to support only CFI or stack scanning, but no frame pointers. Perhaps unsurprisingly as the ELF x86_64 ABI doesn't use frame pointers. So, I don't think we can do much better here. You could maybe try incrementally enabling stack-scanning by setting MOZ_PROFILER_STACK_SCAN=1, =2, etc.
It wouldn't be hard to add frame pointer support to the x64 stackwalker, but I'm not sure how useful it'd be, since -fomit-frame-pointer is the ABI default.
Blocks: 898752
Julian, does any of this apply to LUL? Can this bug be closed because it was about the breakpad unwinder which we're not using for the profiler anymore?
Flags: needinfo?(jseward)
Markus, this can be closed. This is unrelated to LUL.
Flags: needinfo?(jseward)
And Linux Nightlies support LUL unwinding by default. Closing this.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.