Closed Bug 670659 Opened 13 years ago Closed 13 years ago

gdb doesn't display line numbers for some functions in debug builds

Categories

(Firefox Build System :: General, defect)

All
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla8

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(1 file, 4 obsolete files)

After the landing of bug 537857, some people reported that they aren't getting file and line numbers in gdb when debugging debug builds.

It turns out some versions of gcc, combined with how ld works, end up creating a wrong .debug_ranges section, which in turn prevents gdb from displaying file and line numbers.

There seems to be different scenarios when this happens, but an apparently reliable way to reproduce with a small test case is:
-----8<-----
int foo() {
  return 42;
}

int bar() {
  return 1;
}

int main() {
  return foo();
}
----->8-----

Build with:
gcc -o test test.c -Wl,--gc-sections -ffunction-sections -g

Running under gdb:
$ gdb ./test
(gdb) start
Temporary breakpoint 1 at 0x400461
Starting program: /tmp/test 

Temporary breakpoint 1, 0x0000000000400461 in main ()

# objdump -WR test

test:     file format elf64-x86-64

Contents of the .debug_ranges section:

    Offset   Begin    End
    00000000 0000000000400452 000000000040045d 
    00000000 <End of list>

0x0000000000400461 is not within the only existing range (which is that of foo ; main doesn't have a corresponding range)

Testing with a fedora 13 chroot, it looks like this problem happens with the mozilla codebase on debug builds (opt or non-opt) and non-debug+non-opt builds. opt builds seem to be exempt of the problem.

On the buildbots, it doesn't happen. Nor does it happen on gcc 4.4 on debian (fedora 12 and fedora 13 both have gcc 4.4), but with that version, the ranges information is just empty for the testcase above...

With gcc 4.6 and ld 2.21.52, ranges look like this:
$ objdump -WR test

test:     file format elf64-x86-64

Contents of the .debug_ranges section:

    Offset   Begin    End
    00000000 0000000000400492 000000000040049d 
    00000000 0000000000000001 0000000000000001 (start == end)
    00000000 000000000040049d 00000000004004ad 
    00000000 <End of list>

It keeps a range for the gc'ed section, but makes it void.

As a temporary workaround for people who want working debugging symbols, removing the .debug_ranges section from libxul.so with objcopy seems to work, though i'm completely unsure what kind of ill effects this can have.
In the end, this looks like an actual bug in old versions of GNU ld, which very much depends on the debugging information gcc generates. This explains why I was able to reproduce the lack of line numbers in gdb on nsBlockFrame::Reflow with gcc 4.4 + ld from fedora 12, but not with gcc 4.5 + same ld. On the other hand,  gcc 4.4 + newer ld doesn't exhibit the problem. However, it's entirely possible that the gcc 4.5 + same ld variant wasn't actually working, and that some other place didn't get line numbers, as it depends on how the ranges are defined, how inlines are happening, and which functions are gc'ed. By the way, it turns out gcc 4.5 + ld on our build bots *is* affected on linux64, but not on linux, where ld seems to be putting wrong values in the ranges for functions that have been gc'ed, instead of removing them and stripping the remaining ones in the same compilation unit at the same time (which happens on linux64). These wrong values aren't expected to have any effect on debugging with gdb, contrary to the stripping.

The test case in comment 0 is actually good enough to accurately catch the LD bug, so getting configure to check for the bug should be good enough to decide whether we want to enable --gc-sections or not, with the downside that it will be effectively disabled on our linux64 builds, which is a shame, considering it will mean a 1.3MB bigger libxul.so.

Please note that our symbol dumper doesn't use debug ranges information, which means crash reports are not affected by this bug, but debug symbols downloadable from symbols.mozilla.org are. Yesterday linux64 opt nightly exhibits the lack of line number in gdb on e.g. nsAutoFloatManager::CreateFloatManager. As I wrote in comment 0, though, stripping the .debug_ranges section seems to be a good workaround.

Considering the actual (apparently limited) effect on debugging of our release builds, I'm not entirely sure we'd need to fix this on aurora.

Note that installing a newer binutils on our buildbots should fix the issue.
This is work in progress. The check would be better done by comparing debug_ranges from an object file and from the resulting linked binary.
Attachment #545705 - Attachment is obsolete: true
Comment on attachment 546174 [details] [diff] [review]
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then

Oh boy, this looks like fun.

Add in some AC_MSG_RESULTs so that the logs tell whether or not they have a broken ld?
Attachment #546174 - Flags: review?(khuey) → review+
(In reply to comment #4)
> Add in some AC_MSG_RESULTs so that the logs tell whether or not they have a
> broken ld?

AC_CACHE_CHECK displays the variable given as second argument as an AC_MSG_RESULT.

i.e. one would see "Checking whether removing dead symbols breaks debugging... yes/no"

Isn't that enough? Or would you prefer "no, your ld is broken" ?
(In reply to comment #5)
> (In reply to comment #4)
> > Add in some AC_MSG_RESULTs so that the logs tell whether or not they have a
> > broken ld?
> 
> AC_CACHE_CHECK displays the variable given as second argument as an
> AC_MSG_RESULT.
> 
> i.e. one would see "Checking whether removing dead symbols breaks
> debugging... yes/no"
> 
> Isn't that enough? Or would you prefer "no, your ld is broken" ?

That's what I wanted, I just forgot that it does that for you.
Assignee: nobody → mh+mozilla
Backed out because of Android bustage:
http://hg.mozilla.org/integration/mozilla-inbound/rev/3abbd2edc173
Whiteboard: [inbound]
Two problems:
- There was a missing -c when building the object file, which prevented the object file to be built (oops)
- The heuristic used doesn't work on android, because some of the crt object files contain debugging symbols, which end up mixed with the rest in the resulting program.
So, this addresses both issues listed in comment 9. I didn't feel like importing a complete python DWARF parser, or building the one from breakpad during configure just to detect that silly bug, so I'm parsing the (varying) output of objdump in a kind of hackish way.
Attachment #546798 - Flags: review?(ted.mielczarek)
Attachment #546174 - Attachment is obsolete: true
The previous version didn't yield the right result for newer ld, which actually doesn't have the problem, because of the objdump entries containing more than just the three values offset, begin, end for ranges that were gc'ed.
Attachment #547007 - Flags: review?(ted.mielczarek)
Attachment #546798 - Attachment is obsolete: true
Attachment #546798 - Flags: review?(ted.mielczarek)
Removed a spurious print.
Attachment #547375 - Flags: review?(ted.mielczarek)
Attachment #547007 - Attachment is obsolete: true
Attachment #547007 - Flags: review?(ted.mielczarek)
Comment on attachment 547375 [details] [diff] [review]
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then.

Review of attachment 547375 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #547375 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 547375 [details] [diff] [review]
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then.

--gc-sections is used unconditionally on ff7, and while it doesn't matter for our crash stats (they don't use the broken info), downloading the symbols from symbols.mozilla.org and using them in gdb would fail to find line numbers in subtle ways.
Attachment #547375 - Flags: approval-mozilla-aurora?
http://hg.mozilla.org/mozilla-central/rev/631c9b13ec1d
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
Target Milestone: --- → mozilla8
Depends on: 673921
Comment on attachment 547375 [details] [diff] [review]
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then.

Denied for mozilla-aurora. We'll wait 3 weeks for the next source migration.
Attachment #547375 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Depends on: 690682
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: