Last Comment Bug 670659 - gdb doesn't display line numbers for some functions in debug builds
: gdb doesn't display line numbers for some functions in debug builds
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Build Config (show other bugs)
: Trunk
: All Linux
: -- normal (vote)
: mozilla8
Assigned To: Mike Hommey [:glandium]
:
Mentors:
Depends on: 673921 690682
Blocks: 537857
  Show dependency treegraph
 
Reported: 2011-07-11 06:57 PDT by Mike Hommey [:glandium]
Modified: 2011-09-30 01:03 PDT (History)
2 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then (3.70 KB, patch)
2011-07-13 10:55 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Review
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then (4.23 KB, patch)
2011-07-15 09:37 PDT, Mike Hommey [:glandium]
khuey: review+
Details | Diff | Review
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then. (7.54 KB, patch)
2011-07-19 08:49 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Review
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then. (7.55 KB, patch)
2011-07-20 01:42 PDT, Mike Hommey [:glandium]
no flags Details | Diff | Review
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then. (7.53 KB, patch)
2011-07-21 05:42 PDT, Mike Hommey [:glandium]
ted: review+
christian: approval‑mozilla‑aurora-
Details | Diff | Review

Description Mike Hommey [:glandium] 2011-07-11 06:57:40 PDT
After the landing of bug 537857, some people reported that they aren't getting file and line numbers in gdb when debugging debug builds.

It turns out some versions of gcc, combined with how ld works, end up creating a wrong .debug_ranges section, which in turn prevents gdb from displaying file and line numbers.

There seems to be different scenarios when this happens, but an apparently reliable way to reproduce with a small test case is:
-----8<-----
int foo() {
  return 42;
}

int bar() {
  return 1;
}

int main() {
  return foo();
}
----->8-----

Build with:
gcc -o test test.c -Wl,--gc-sections -ffunction-sections -g

Running under gdb:
$ gdb ./test
(gdb) start
Temporary breakpoint 1 at 0x400461
Starting program: /tmp/test 

Temporary breakpoint 1, 0x0000000000400461 in main ()

# objdump -WR test

test:     file format elf64-x86-64

Contents of the .debug_ranges section:

    Offset   Begin    End
    00000000 0000000000400452 000000000040045d 
    00000000 <End of list>

0x0000000000400461 is not within the only existing range (which is that of foo ; main doesn't have a corresponding range)

Testing with a fedora 13 chroot, it looks like this problem happens with the mozilla codebase on debug builds (opt or non-opt) and non-debug+non-opt builds. opt builds seem to be exempt of the problem.

On the buildbots, it doesn't happen. Nor does it happen on gcc 4.4 on debian (fedora 12 and fedora 13 both have gcc 4.4), but with that version, the ranges information is just empty for the testcase above...

With gcc 4.6 and ld 2.21.52, ranges look like this:
$ objdump -WR test

test:     file format elf64-x86-64

Contents of the .debug_ranges section:

    Offset   Begin    End
    00000000 0000000000400492 000000000040049d 
    00000000 0000000000000001 0000000000000001 (start == end)
    00000000 000000000040049d 00000000004004ad 
    00000000 <End of list>

It keeps a range for the gc'ed section, but makes it void.

As a temporary workaround for people who want working debugging symbols, removing the .debug_ranges section from libxul.so with objcopy seems to work, though i'm completely unsure what kind of ill effects this can have.
Comment 1 Mike Hommey [:glandium] 2011-07-13 10:53:21 PDT
In the end, this looks like an actual bug in old versions of GNU ld, which very much depends on the debugging information gcc generates. This explains why I was able to reproduce the lack of line numbers in gdb on nsBlockFrame::Reflow with gcc 4.4 + ld from fedora 12, but not with gcc 4.5 + same ld. On the other hand,  gcc 4.4 + newer ld doesn't exhibit the problem. However, it's entirely possible that the gcc 4.5 + same ld variant wasn't actually working, and that some other place didn't get line numbers, as it depends on how the ranges are defined, how inlines are happening, and which functions are gc'ed. By the way, it turns out gcc 4.5 + ld on our build bots *is* affected on linux64, but not on linux, where ld seems to be putting wrong values in the ranges for functions that have been gc'ed, instead of removing them and stripping the remaining ones in the same compilation unit at the same time (which happens on linux64). These wrong values aren't expected to have any effect on debugging with gdb, contrary to the stripping.

The test case in comment 0 is actually good enough to accurately catch the LD bug, so getting configure to check for the bug should be good enough to decide whether we want to enable --gc-sections or not, with the downside that it will be effectively disabled on our linux64 builds, which is a shame, considering it will mean a 1.3MB bigger libxul.so.

Please note that our symbol dumper doesn't use debug ranges information, which means crash reports are not affected by this bug, but debug symbols downloadable from symbols.mozilla.org are. Yesterday linux64 opt nightly exhibits the lack of line number in gdb on e.g. nsAutoFloatManager::CreateFloatManager. As I wrote in comment 0, though, stripping the .debug_ranges section seems to be a good workaround.

Considering the actual (apparently limited) effect on debugging of our release builds, I'm not entirely sure we'd need to fix this on aurora.

Note that installing a newer binutils on our buildbots should fix the issue.
Comment 2 Mike Hommey [:glandium] 2011-07-13 10:55:18 PDT
Created attachment 545705 [details] [diff] [review]
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then

This is work in progress. The check would be better done by comparing debug_ranges from an object file and from the resulting linked binary.
Comment 3 Mike Hommey [:glandium] 2011-07-15 09:37:10 PDT
Created attachment 546174 [details] [diff] [review]
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then
Comment 4 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-18 09:02:12 PDT
Comment on attachment 546174 [details] [diff] [review]
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then

Oh boy, this looks like fun.

Add in some AC_MSG_RESULTs so that the logs tell whether or not they have a broken ld?
Comment 5 Mike Hommey [:glandium] 2011-07-18 09:06:43 PDT
(In reply to comment #4)
> Add in some AC_MSG_RESULTs so that the logs tell whether or not they have a
> broken ld?

AC_CACHE_CHECK displays the variable given as second argument as an AC_MSG_RESULT.

i.e. one would see "Checking whether removing dead symbols breaks debugging... yes/no"

Isn't that enough? Or would you prefer "no, your ld is broken" ?
Comment 6 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-18 09:08:24 PDT
(In reply to comment #5)
> (In reply to comment #4)
> > Add in some AC_MSG_RESULTs so that the logs tell whether or not they have a
> > broken ld?
> 
> AC_CACHE_CHECK displays the variable given as second argument as an
> AC_MSG_RESULT.
> 
> i.e. one would see "Checking whether removing dead symbols breaks
> debugging... yes/no"
> 
> Isn't that enough? Or would you prefer "no, your ld is broken" ?

That's what I wanted, I just forgot that it does that for you.
Comment 8 Mike Hommey [:glandium] 2011-07-19 00:25:55 PDT
Backed out because of Android bustage:
http://hg.mozilla.org/integration/mozilla-inbound/rev/3abbd2edc173
Comment 9 Mike Hommey [:glandium] 2011-07-19 01:59:40 PDT
Two problems:
- There was a missing -c when building the object file, which prevented the object file to be built (oops)
- The heuristic used doesn't work on android, because some of the crt object files contain debugging symbols, which end up mixed with the rest in the resulting program.
Comment 10 Mike Hommey [:glandium] 2011-07-19 08:49:36 PDT
Created attachment 546798 [details] [diff] [review]
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then.

So, this addresses both issues listed in comment 9. I didn't feel like importing a complete python DWARF parser, or building the one from breakpad during configure just to detect that silly bug, so I'm parsing the (varying) output of objdump in a kind of hackish way.
Comment 11 Mike Hommey [:glandium] 2011-07-20 01:42:16 PDT
Created attachment 547007 [details] [diff] [review]
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then.

The previous version didn't yield the right result for newer ld, which actually doesn't have the problem, because of the objdump entries containing more than just the three values offset, begin, end for ranges that were gc'ed.
Comment 12 Mike Hommey [:glandium] 2011-07-21 05:42:48 PDT
Created attachment 547375 [details] [diff] [review]
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then.

Removed a spurious print.
Comment 13 Ted Mielczarek [:ted.mielczarek] 2011-07-21 12:11:49 PDT
Comment on attachment 547375 [details] [diff] [review]
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then.

Review of attachment 547375 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 15 Mike Hommey [:glandium] 2011-07-21 23:54:51 PDT
Comment on attachment 547375 [details] [diff] [review]
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then.

--gc-sections is used unconditionally on ff7, and while it doesn't matter for our crash stats (they don't use the broken info), downloading the symbols from symbols.mozilla.org and using them in gdb would fail to find line numbers in subtle ways.
Comment 16 :Ehsan Akhgari (busy, don't ask for review please) 2011-07-22 14:19:42 PDT
http://hg.mozilla.org/mozilla-central/rev/631c9b13ec1d
Comment 17 christian 2011-07-26 15:03:30 PDT
Comment on attachment 547375 [details] [diff] [review]
Detect GNU ld bug with debugging symbols when using --gc-sections and don't use it then.

Denied for mozilla-aurora. We'll wait 3 weeks for the next source migration.

Note You need to log in before you can comment on or make changes to this bug.