Closed Bug 363590 Opened 16 years ago Closed 16 years ago

Support DWARF2 debugging

Categories

(Firefox Build System :: General, defect)

PowerPC
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jhpedemonte, Assigned: stanshebs)

Details

Attachments

(2 files, 1 obsolete file)

T Rowley wrote:
> I keep a casual watch on Webkit (mainly to see how their SVG is
> progressing), and have been seeing reports of the benefits of using
> the DWARF debugging format on OS-X.  Apparently the debugging
> information is quite a bit smaller and linking is *much* faster.
>
> Webkit did try switching to DWARF earlier when it was first
> supporting, in XCode 2.3, but apparently there were issues actually
> trying to use it for debugging.  Reports are that XCode 2.4.1 has
> fixed those problems, so we might want to experiment with this or
> even make it the default if we detect that the version of XCode is
> recent enough.

DWARF2 debugging info can be enabled in the Mozilla build by adding "--enable-debug=-gdwarf-2" to the .mozconfig file.  The build completes just fine with that flag, but when debugging, gdb spits out errors like this one:

warning: Could not find object file
"/mozilla/trunk/debug-dwarf/intl/uconv/src/nsUnicodeToCP1255.o" - no
debug information available for
"/mozilla/trunk/mozilla/intl/uconv/ucvlatin/nsUnicodeToCP1255.cpp".
The reason for gdb not finding the symbol data seems to be related to how static libs are linked into dylibs.  On other Unix/Linux OSes, when linking a shared library, the static libs are referenced directly.  On Mac, the static libs are extracted into the object files in the local directory, then linked together to form the dylib.

This is because Apple's linker doesn't have any of the standard flags to handle static libs (such as "--whole-archive").  Therefore, the configure script sets NO_LD_ARCHIVE_FLAGS to 1 on Mac, forcing the static libs to be extracted.

Did some research and found that Apple's linker does have a similar flag, "-all_load".  This patch adds that flag (and unsets NO_LD_ARCHIVE_FLAGS), and with that, I no longer get the gdb warnings and can debug just fine.

However, this flag is not like "--whole-archive" and its ilk.  The latter has a corresponding flag to unset that behavior ("--no-whole-archive"), such that the two flags are added around the static libs list on the command line.  Not so for "-all_load";  it is all or nothing.  I don't know if this causes any issues, particularly when linking in the any of the C or system libs.
Version: 1.8 Branch → Trunk
I'm ideally qualified. :-)
Assignee: nobody → stanshebs
Been meaning to post this, but got sidetracked by the holidays.  The patch here won't work.  It worked for whatever build of Firefox I had, but when I updated to latest trunk, it no longer worked.  And it doesn't work for XULRunner at all.  In both cases, I get many "symbol previously defined" errors.  Although I'm not sure why some of those message are errors (as in my case) while others are warnings (as can be seen when building Mozilla on OS X today).

After rereading the docs, I understand what is going on.  The DWARF2 debug info is kept in the object files.  So when the dylib is built, it uses the object files that were extracted into the current directory.  However, after the dylib is created, these object files are deleted.  So that is why gdb cannot find the debug data.

So one solution would be to not delete the object files after they are extracted.  But this means we now have duplicates, greatly increasing the hard drive space necessary for a build.

Another solution is to create a dSYM library for each dylib, using the dsymutil tool.  The dSYM lib would contain all of the debug data pulled from the object files.  I quickly hacked this into the build, and it worked.  The only downside was that creating the dSYM libraries takes a long time, especially for large dylibs such as layout.

It would be best if we had the appropriate linker flags like Linux does.  For now, building dSYM libraries seems to be the best solution.  We could put this into the build for those that want to debug with DWARF2.  But it shouldn't be the default on supported systems until the speed issue is solved.
Yeah, it's unfortunate that both the .o and the dsym approaches are flawed; this is a public bug system, so can't say more about why, ahem. The .o way is probably the way to go actually, dsym being designed more for relatively-stable frameworks that don't get recompiled a lot, pretty much the opposite of the Moz model. Making it an option would help the people (like me) willing to take the penalties to get better C++ support.
So is there any way that you can think of that would allow us to do this but not keep around two copies of each .o file?
Could we use symlinks instead of copying the .o's?
I just checked on my dwarf2 build, and yes, symlinks to the .o files work for finding the debug symbols.
As a break from layout code, I'm looking at this again.  The symlinking does seem to work correctly, so I'm looking at two solutions. One is to come up with a way to mass-produce symlinks in all the right places. The other is to figure out how to get GDB to find the object files at their original locations, sort of a dwarf-symbol equivalent to the "path" command for source files.
The numbers of the non-deletion of .o files are 683M objdir with deletion, 898M without. It's not twice the size because many object files are linked directly into dylibs. Both of these compare favorably with the 1781M size of a stabs-using objdir, which is twice as large because there are two copies of all the stabs info.

Usagewise, all seems OK so far, perhaps not winning as much at stepping as I was hoping for.
Attached patch rules.mk tweak to try out (obsolete) — Splinter Review
Here's the change I've been testing with - it simply leaves some .o files laying around where GDB expects to find them. Try it on code that has been troublesome to debug, see if dwarf2 is actually an improvement. Beware, I didn't bother to conditionalize, so it will make stabs objdirs bigger too, but wouldn't actually break anything. You still need the --enable-debug=gdwarf-2 in your mozconfig.
Here is a version of the patch that makes symlinks instead of leaving copies around. It's conceptually simple, just sniffs for a .o where the .a was and makes a symlink if found. If that doesn't work, it leaves the unpacked copy of the .o in place, so it's guaranteed to be found by the debugger either way.
Attachment #253823 - Attachment is obsolete: true
Attachment #254364 - Flags: review?(benjamin)
Comment on attachment 254364 [details] [diff] [review]
Better patch to rules.mk

Boy, this makes me sad. Do you know why we do the whole unarchive-then-link thing, instead of just linking the lib.a directly?
Attachment #254364 - Flags: review?(benjamin) → review+
I don't know the specific reason, but Mach-O has long had difficulties with multi-stage linking, typically with symbol visibility. Flat namespaces, two-level namespaces, etc, all get into the mix - even after six years of working with it at Apple, I never really understood it all. Probably it's worth retrying skipping the unarchive step to see if if still breaks.
(In reply to comment #12)
It's because there's no --whole-archive option in the Mac OS X ld.
does -all_load do the same thing?

We'd only need it if we're making a static library from SHARED_LIBRARY_LIBS. If making a component as a shared library, we'd probably only need to specify -u _NSGetModule.
-all_load does the same thing, but applies to the entire command line.  With --whole-archive, you can wrap your static libraries around --whole-archive and --no-whole-archive.  We might not care.
As I mentioned in comment #3, I had problems when using "-all_load":  many "symbol previously defined" errors.  I was never able to figure it out, though.
landed on trunk
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
I just tried this, and it works great but I get lots of "warning: .o file ... more recent than executable timestamp" in gdb.  Which is odd, since in fact the firefox-bin timestamps seem to be later than the .o symlink timestamps...
Or could that warning have to do with me using --enable-prebinding?
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.