Closed Bug 803184 Opened 12 years ago Closed 12 years ago

Remove .cfi_sections .debug_frame

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla20

People

(Reporter: espindola, Assigned: espindola)

Details

Attachments

(1 file, 1 obsolete file)

Attached patch patch (obsolete) — Splinter Review
The referenced gcc bug was closed as invalid and debuggers can use .eh_frame. Telling the assembler to only use debug_frame will impact anyone trying to unwind past this (in a crash for example).
Attached patch correct patchSplinter Review
Assignee: general → respindola
Attachment #672848 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #672848 - Flags: review?(jimb)
Attachment #672873 - Flags: review?(jimb)
Which unwinders will see .eh_frame but not .debug_frame? That is, what is the problem that the .cfi_sections directive causes?

I'd certainly be happy to remove it if that can be done safely; but it was needed when I wrote the patch. It would be nice to see Try results for this patch.
(In reply to Jim Blandy :jimb from comment #2)
> Which unwinders will see .eh_frame but not .debug_frame? That is, what is
> the problem that the .cfi_sections directive causes?
> 
> I'd certainly be happy to remove it if that can be done safely; but it was
> needed when I wrote the patch. It would be nice to see Try results for this
> patch.

As far as I know only gdb (and other debuggers) will see .debug_frame. A problem is that .debug_frame is not marked alloc:

[ 6] .eh_frame         PROGBITS        0000000000000000 000078 000038 00   A  0   0  8

[ 8] .debug_frame      PROGBITS        0000000000000000 0000b0 000040 00      0   0  8

So whatever wants to see it needs to load it from a file. Another problem is that it is stripped, so .debug_frame is not in the binaries we ship:

$ readelf -SW  libxul.so  | grep debug_frame
$ readelf -SW  libxul.so  | grep eh_frame
  [15] .eh_frame_hdr     PROGBITS        0000000001acbf00 1443f00 14a9fc 00   A  0   0  4
  [16] .eh_frame         PROGBITS        0000000001c16900 158e900 58e56c 00   A  0   0  8

so breakpad will have to guess frames to unwind past any functions in MethodJIT.cpp.
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #3)
> So whatever wants to see it needs to load it from a file. Another problem is
> that it is stripped, so .debug_frame is not in the binaries we ship:
> 
> $ readelf -SW  libxul.so  | grep debug_frame
> $ readelf -SW  libxul.so  | grep eh_frame
>   [15] .eh_frame_hdr     PROGBITS        0000000001acbf00 1443f00 14a9fc 00 
> A  0   0  4
>   [16] .eh_frame         PROGBITS        0000000001c16900 158e900 58e56c 00 
> A  0   0  8
> 
> so breakpad will have to guess frames to unwind past any functions in
> MethodJIT.cpp.

Breakpad, at least, uses symbol files derived from the non-stripped executables; it doesn't depend on what's in the shipped executable. It does the unwinding on the crash server, using a saved copy of the unwinding data.

SpiderMonkey doesn't use C++ exceptions. You didn't mention any specific component that needs the data to be loaded, so I still don't understand why the data needs to be in the .eh_frame section. I'm sorry to be so dense, but what, specifically, needs it to be there?

If the other assembly code produced by GCC hasn't created a .eh_frame section in one way or another, then the later "CFI(...)" annotations in MethodJIT.cpp will cause an error at assembly time, as they attempt to place directives in a section that doesn't exist.

If I understand correctly, the MethodJIT.cpp comment doesn't reference the GCC bug to say, "Because of this GCC bug, this is needed" --- instead, it cites the GCC bug to show that the .eh_frame section is not always created. So the GCC bug's being marked invalid has no bearing on the need for the .cfi_sections directive.

Are you able submit this patch to Try? If so, please do; I would like to see if we get assembly errors on x86.
In case this is sounding weird, given our previous discussions about unused arguments in breakpad:

I have zero attachment to the MethodJIT.cpp code. If it can go away without introducing build errors, then by all means it should go; why have effect-free junk around? But 1) the motivations given in comment 0 don't make sense to me, and 2) I haven't seen anything that addresses the issues that made it necessary in the first place.
> Breakpad, at least, uses symbol files derived from the non-stripped
> executables; it doesn't depend on what's in the shipped executable. It does
> the unwinding on the crash server, using a saved copy of the unwinding data.

Not for unwinding. On 64 bits it first tries cfi and then guessing. From toolkit/crashreporter/google-breakpad/src/processor/stackwalker_amd64.cc:

-------------------------------------------------
  // If we have DWARF CFI information, use it.
  scoped_ptr<CFIFrameInfo> cfi_frame_info(
      resolver_ ? resolver_->FindCFIFrameInfo(last_frame) : NULL);
  if (cfi_frame_info.get())
    new_frame.reset(GetCallerByCFIFrameInfo(frames, cfi_frame_info.get()));

  // If CFI failed, or there wasn't CFI available, fall back
  // to stack scanning.
  if (!new_frame.get()) {
-------------------------------------------

What we use unstripped binaries for is constructing the .sym files which are then used offline to produce human readable stacks for crash reports.

> If the other assembly code produced by GCC hasn't created a .eh_frame
> section in one way or another, then the later "CFI(...)" annotations in
> MethodJIT.cpp will cause an error at assembly time, as they attempt to place
> directives in a section that doesn't exist.

Incorrect. Gcc will print .cfi_* directives, but use the default that if no .cfi_sections is present than the assembler implicitly creates a .eh_frame. The x86_64 abi requires that every function have a .eh_frame entry.

> Are you able submit this patch to Try? If so, please do; I would like to see
> if we get assembly errors on x86.


https://tbpl.mozilla.org/?tree=Try&rev=20551a5701a5
(In reply to Jim Blandy :jimb from comment #5)
> In case this is sounding weird, given our previous discussions about unused
> arguments in breakpad:
> 
> I have zero attachment to the MethodJIT.cpp code. If it can go away without
> introducing build errors, then by all means it should go; why have
> effect-free junk around? But 1) the motivations given in comment 0 don't
> make sense to me,

What about breakpad will have to guess to get past any frame of a function in MethodJIT.cpp?

> and 2) I haven't seen anything that addresses the issues
> that made it necessary in the first place.

What were the issues? A missing .debug_frame? No current debugger needs one, they can use .eh_frame.
It's good form to have a try push in progress even before you ask for review, especially since you need one before landing anyway:

https://wiki.mozilla.org/Tree_Rules/Inbound
(In reply to Jim Blandy :jimb from comment #8)
> It's good form to have a try push in progress even before you ask for
> review, especially since you need one before landing anyway:
> 
> https://wiki.mozilla.org/Tree_Rules/Inbound

See comment 6.  :-)

Also, that's really what the writing on paper says.  Many people (myself included) push patches which they think are not risky without try runs...
(In reply to Ehsan Akhgari [:ehsan] from comment #9)
> (In reply to Jim Blandy :jimb from comment #8)
> > It's good form to have a try push in progress even before you ask for
> > review, especially since you need one before landing anyway:
> > 
> > https://wiki.mozilla.org/Tree_Rules/Inbound
> 
> See comment 6.  :-)

D'oh! Thanks; sorry for missing that.
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #6)
> > Breakpad, at least, uses symbol files derived from the non-stripped
> > executables; it doesn't depend on what's in the shipped executable. It does
> > the unwinding on the crash server, using a saved copy of the unwinding data.
> 
> Not for unwinding. On 64 bits it first tries cfi and then guessing.

This isn't correct. The dump_syms tool, which produces the .sym files we consume to create a stack trace, checks for both .debug_frame and .eh_frame sections. On Linux:

http://dxr.mozilla.org/mozilla-central/toolkit/crashreporter/google-breakpad/src/common/linux/dump_symbols.cc.html#l550

and on Mac:

http://dxr.mozilla.org/mozilla-central/toolkit/crashreporter/google-breakpad/src/common/mac/dump_syms.mm.html#l379

These are consolidated to create the data that FindCFIFrameInfo returns. So both .debug_frame and .eh_frame work fine for breakpad.

 From
> toolkit/crashreporter/google-breakpad/src/processor/stackwalker_amd64.cc:
> 
> -------------------------------------------------
>   // If we have DWARF CFI information, use it.
>   scoped_ptr<CFIFrameInfo> cfi_frame_info(
>       resolver_ ? resolver_->FindCFIFrameInfo(last_frame) : NULL);
>   if (cfi_frame_info.get())
>     new_frame.reset(GetCallerByCFIFrameInfo(frames, cfi_frame_info.get()));
> 
>   // If CFI failed, or there wasn't CFI available, fall back
>   // to stack scanning.
>   if (!new_frame.get()) {
> -------------------------------------------

This is consuming the data that the dump_syms code I referenced above parsed.

> > If the other assembly code produced by GCC hasn't created a .eh_frame
> > section in one way or another, then the later "CFI(...)" annotations in
> > MethodJIT.cpp will cause an error at assembly time, as they attempt to place
> > directives in a section that doesn't exist.
> 
> Incorrect. Gcc will print .cfi_* directives, but use the default that if no
> .cfi_sections is present than the assembler implicitly creates a .eh_frame.
> The x86_64 abi requires that every function have a .eh_frame entry.

I had misremembered the assembly-time error. Bug 645111 comment 8 explains why we couldn't take out the ".cfi_sections .debug_frame" directive. It seems that the assembler silently dropped the CFI data without that directive. Have you verified that getting a backtrace through a trampoline on x86?
(In reply to Jim Blandy :jimb from comment #11)
> (In reply to Rafael Ávila de Espíndola (:espindola) from comment #6)
> > > Breakpad, at least, uses symbol files derived from the non-stripped
> > > executables; it doesn't depend on what's in the shipped executable. It does
> > > the unwinding on the crash server, using a saved copy of the unwinding data.
> > 
> > Not for unwinding. On 64 bits it first tries cfi and then guessing.

The problem is not what minidump does. That only happens offline. Just like debug_frame, we don't include the .sym files when we ship firefox. This means that we require frame pointers to be able to unwind online in functions created in this file.

> I had misremembered the assembly-time error. Bug 645111 comment 8 explains
> why we couldn't take out the ".cfi_sections .debug_frame" directive. It
> seems that the assembler silently dropped the CFI data without that
> directive. Have you verified that getting a backtrace through a trampoline
> on x86?

Not sure how to trigger a crash in that, but the unwind information is there. Downloaded the opt package from try and run:

$ readelf -sdW firefox/libxul.so  | grep JaegerTrampoline$
178736: 000000000195b344     0 NOTYPE  LOCAL  DEFAULT   10 JaegerTrampoline

$ ~/build/src/tools/linux/dump_syms/dump_syms firefox/libxul.so | grep -A10 195b344
firefox/libxul.so: file contains no debugging information (no ".stab" or ".debug_info" sections)
STACK CFI INIT 195b344 3d .cfa: $rsp 8 + .ra: .cfa -8 + ^
STACK CFI 195b345 $rbp: .cfa -16 + ^ .cfa: $rsp 16 +
STACK CFI 195b348 .cfa: $rbp 16 +
STACK CFI 195b34a $r12: .cfa -24 + ^
STACK CFI 195b34c $r13: .cfa -32 + ^
STACK CFI 195b34e $r14: .cfa -40 + ^
STACK CFI 195b350 $r15: .cfa -48 + ^
STACK CFI 195b351 $rbx: .cfa -56 + ^

The same lines are available in libxul.so.sym.

For the reason explained before this fails in the parent push in mc:
$ readelf -sdW firefox/libxul.so  | grep JaegerTrampoline$
178736: 000000000195b344     0 NOTYPE  LOCAL  DEFAULT   10 JaegerTrampoline
$ ~/build/src/tools/linux/dump_syms/dump_syms firefox/libxul.so | grep -A10 195b344
// no output
Those look like 64-bit addresses and register names to me. My question was about x86 executables, as those are the ones that might lack .eh_frame sections.
Comment on attachment 672873 [details] [diff] [review]
correct patch

I'm going to r- this for now; to approve, I need two things:

1) An explanation of which unwinder is adversely affected by the status quo. (Breakpad is fine with that directive in place; you misunderstand how breakpad works.)

2) Directl verification that the problems described in Bug 645111 comment 8 are not reintroduced by the patch on x86 (not x86_64).
Attachment #672873 - Flags: review?(jimb) → review-
32 bits!?! This is a nop in 32 bits with the gcc we use. Copy and past the command line that the build uses to compile MethodJIT.cpp and add -save-temps. Then run:

$ grep debug_frame *.s
        .cfi_sections   .debug_frame
        .cfi_sections .debug_frame

One line is from the ASM, the other one is produced by gcc itself. Not convinced? Try:

$ rm MethodJIT.o
$ make -s MethodJIT.o
MethodJIT.cpp
$ md5sum  MethodJIT.o
b99894cc0d1db6fc1e4d79e8e923c398  MethodJIT.o

Replace the CFI line with an empty one so that the line info doesn't change:

$ make -s MethodJIT.o
MethodJIT.cpp
$ md5sum  MethodJIT.o 
b99894cc0d1db6fc1e4d79e8e923c398  MethodJIT.o

Identical md5.
I think that counts for 2). We still don't have an explanation for the problems Steve described, but maybe that doesn't matter.

How about 1)?
(In reply to Jim Blandy :jimb from comment #16)
> I think that counts for 2). We still don't have an explanation for the
> problems Steve described, but maybe that doesn't matter.

I believe him that there can be a gcc +gas + foobar unwinder (gdb?) combination that is broken, but I really don't know which one is it.

> How about 1)?

Any one that is running online, where "adversely affected" means: Requires frame pointers. Current examples include: gdb and valgrind. In the near future that will include running breakpad online, which is what we want to replace the unwinder in the gecko profiler with.
Attachment #672873 - Flags: review- → review?(ted)
Hi, Ted. I'm happy to have you review this patch instead of me. Hopefully I made the reasons for my r- clear in the discussion above, but it'll be good for someone to have a second look.
Okay, Rafael asked me to comment on this bug a while ago and I just hadn't found time to devote to it. I think you guys are talking past each other, mostly. Rafael is asserting that that comment is not really correct, and we're always going to have a .eh_frame section on x86-64, so we should just put the CFI there. We are in fact (in bug 779291), moving the in-tree SPS profiler to use Breakpad for unwinding, and thus Breakpad's CFI parsing code against the stripped binaries that we ship. In this situation, the CFI for these methods really should be in .eh_frame, so that we can use them on the release binaries.

My only question here is that given that we run dump_syms on the *unstripped* binaries, and Breakpad will prefer .debug_frame to .eh_frame, will it find the CFI for these methods without consulting .eh_frame?
> My only question here is that given that we run dump_syms on the
> *unstripped* binaries, and Breakpad will prefer .debug_frame to .eh_frame,
> will it find the CFI for these methods without consulting .eh_frame?

It will find the information on the .eh_frame, is that what you mean? Take a look at comment 12, note that the information is in the .sym files produced on try.
Comment on attachment 672873 [details] [diff] [review]
correct patch

Review of attachment 672873 [details] [diff] [review]:
-----------------------------------------------------------------

Ok thanks, there was a lot of info in this bug, it was hard to digest!
Attachment #672873 - Flags: review?(ted) → review+
https://hg.mozilla.org/mozilla-central/rev/7de261e17390
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #19)
> My only question here is that given that we run dump_syms on the
> *unstripped* binaries, and Breakpad will prefer .debug_frame to .eh_frame,
> will it find the CFI for these methods without consulting .eh_frame?

Breakpad consults both .debug_frame and .eh_frame. If both are present, it merges the data.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: