Closed Bug 537857 Opened 14 years ago Closed 13 years ago

Remove dead symbols with linker

Categories

(Firefox Build System :: General, defect)

x86
Windows NT
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: taras.mozilla, Assigned: glandium)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [ts])

Attachments

(3 files, 1 obsolete file)

On Linux  -fdata-sections -ffunction-sections  gcc flags and -Wl,--gc-sections results in a 5% smaller libxul.so

Windows has similar options accessible via /Gy compiler flag and /OPT:REF linker flags.
Whiteboard: [ts]
We use -OPT:REF when building with MOZ_DEBUG_SYMBOLS (which our nightlies and release builds use):
http://mxr.mozilla.org/mozilla-central/source/config/config.mk#258
Ted you are right, -Gy and and -opt:ref are turned on in windows. 

This is mostly a linux issue. Mac has -dead_strip enabled, but not -ffunction-sections -fdata-sections.
OS: Windows 7 → Windows NT
Blocks: 447581
Attached patch gc sections (obsolete) — Splinter Review
Assignee: nobody → tglek
Attachment #444922 - Flags: review?(ted.mielczarek)
Attachment #444922 - Flags: review?(ted.mielczarek) → review+
Keywords: checkin-needed
http://hg.mozilla.org/mozilla-central/rev/bfcdd30e82bf
Status: NEW → RESOLVED
Closed: 14 years ago
Flags: in-testsuite-
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a5
Isn't there a footprint key word or bug to put this against? Also perf key word?
So... this patch made my Linux debug builds (--enable-debug="-g3") have no debug symbols.  Why exactly are we doing this symbol removal in debug builds?
backed out http://hg.mozilla.org/mozilla-central/rev/714faf8ec149
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
We probably want to stick the CFLAGS in MOZ_OPTIMIZE_FLAGS instead. If the LDFLAGS aren't safe to use on debug builds, then we'll have to figure something out (we don't have a MOZ_OPTIMIZE_LDFLAGS, but we could make one.)
since we link stuff that matters via CXX we can pass linker flags there too.
How will that work with optimized+debug?  Isn't that what we do for all our official builds, so that we have debug info for breakpad?
No, we do optimized+debug symbols (--enable-optimize --enable-debug-symbols).

If we use my suggestion from comment 8, then these symbols will be stripped even if you do a --enable-debug --enable-optimize build, but I think that's ok. If you --enable-optimize, then you should recognize that the compiler is going to do funny things to your code.
"funny things" and "not able to debug" are not the same thing, though...  Fwiw, all my optimized builds are -g and I actually use that reasonably often, because some bugs only show up in optimized builds.  There's obviously some damage from optimization, but not as extensive as stripping debug symbols completely.
This might well have been a gcc bug.

I did two try runs with this patch applied:
- one with gcc 4.5: http://tbpl.mozilla.org/?tree=Try&rev=dde2135a5c39
- one with gcc 4.3: http://tbpl.mozilla.org/?tree=Try&rev=5957b0abdf30

The resulting opt builds apparently all have all symbols, but debug build with gcc 4.3 lacks symbols for libxul.so while the one with gcc 4.5 doesn't.

I think we can re-enable this patch, provided we find some reliable way to automatically opt-out on broken gcc (which means finding exactly what's going wrong).
Mmmmm it actually only failed on linux x86 debug, not linux x64 debug, and it was dump_syms not supporting some of the dwarf content:

dump_syms: ../../../../../../toolkit/crashreporter/google-breakpad/src/common/dwarf_cu_to_module.cc:770: void google_breakpad::DwarfCUToModule::AssignLinesToFunctions(): Assertion `!line || current < line->address || within(*line, current)' failed.

I'll do another try with gcc 4.3 without PGO...
As for sizes:

                    Before     After     Difference
linux libxul.so    21747668  20809236   938432 (4.31%)
linux .tar.bz2     15389810  14984793   405017 (2.63%)

linux64 libxul.so  27209600  25887424  1322176 (4.85%)
linux64 .tar.bz2   17002585  16545591   456994 (2.68%)
(In reply to comment #14)
> Mmmmm it actually only failed on linux x86 debug, not linux x64 debug, and
> it was dump_syms not supporting some of the dwarf content:
> 
> dump_syms:
> ../../../../../../toolkit/crashreporter/google-breakpad/src/common/
> dwarf_cu_to_module.cc:770: void
> google_breakpad::DwarfCUToModule::AssignLinesToFunctions(): Assertion `!line
> || current < line->address || within(*line, current)' failed.
> 
> I'll do another try with gcc 4.3 without PGO...

It's better to be awake... obviously, this won't change anything, since the build was a debug build, thus without PGO.
--gc-sections creates two problems with elfhack:
- it removes some placeholders that are used to subtly modify test.so's layout to make sure some things are tested
- it makes the version of GNU ld used on the buildbots crash (segv), which is a side effect of the first problem (fwiw, the crash happens when all thread local variables are gc'ed)

Putting everything under the default visibility instead of hidden ensures that nothing is gc'ed.
Attachment #542130 - Flags: review?(tglek)
(In reply to comment #14)
> Mmmmm it actually only failed on linux x86 debug, not linux x64 debug, and
> it was dump_syms not supporting some of the dwarf content:
> 
> dump_syms:
> ../../../../../../toolkit/crashreporter/google-breakpad/src/common/
> dwarf_cu_to_module.cc:770: void
> google_breakpad::DwarfCUToModule::AssignLinesToFunctions(): Assertion `!line
> || current < line->address || within(*line, current)' failed.

From my testing, this is an actual bug in our breakpad code, which is fixed in upstream google-breakpad. The upstream issue is http://breakpad.appspot.com/227001 . I'm not sure how we hit something that is supposedly related to gold with our ancient GNU ld, but we apparently do, and the patch attached there looks like it does the right thing. I'll do some more checks that it does indeed.
Comment on attachment 542205 [details] [diff] [review]
part 3 - Fix assert failure in dump_syms in some cases when the linker discarded symbols

<ted> if it got review upstream, then just land it with r=whoever
Attachment #542205 - Flags: review?(ted.mielczarek) → review+
Attachment #542130 - Flags: review?(tglek) → review+
Backed out part 1 because of mochitest regressions :(
http://hg.mozilla.org/integration/mozilla-inbound/rev/69bdf578a5dd
Whiteboard: [ts] [inbound] → [ts]
The reason we had a mochitest regression is that using --gc-sections when linking the firefox binary was striping the jemalloc_stats function. That's because linking binaries doesn't export symbols, so anything that is not explicitely used is dropped. An jemalloc_stats is only there to be used by libxul.so, so for the linker it's not explicitely used...

Putting the option in DSO_LDOPTS instead of LDFLAGS ensures only shared libraries will be built with --gc-sections.
Attachment #542411 - Flags: review?(ted.mielczarek)
Attachment #444922 - Attachment is obsolete: true
Comment on attachment 542411 [details] [diff] [review]
part 1 - Remove dead symbols in linker.

Review of attachment 542411 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #542411 - Flags: review?(ted.mielczarek) → review+
(In reply to comment #5)
> Isn't there a footprint key word or bug to put this against? Also perf key
> word?

Since it is blocking a "perf" meta bug, should not this also be a "perf" bug?
Part 1 was backed out of mozilla-inbound. Parts 2 and 3 got merged in to mozilla-central:

http://hg.mozilla.org/mozilla-central/rev/ebb9315861a4
http://hg.mozilla.org/mozilla-central/rev/2ac11e401c56
Whiteboard: [ts] [inbound] → [ts]
Part 1 has been relanded on mozilla-inbound (comment 25)
Whiteboard: [ts] → [ts] [inbound]
http://hg.mozilla.org/mozilla-central/rev/f2a0b73c0bef
Status: REOPENED → RESOLVED
Closed: 14 years ago13 years ago
Resolution: --- → FIXED
Whiteboard: [ts] [inbound] → [ts]
Mike, 
This broke --enable-debug on linux. Gdb can't find line numbers anymore
(In reply to comment #30)
> Mike, 
> This broke --enable-debug on linux. Gdb can't find line numbers anymore

It works for me. Are you sure you weren't using stripped binaries?
It seems to be a problem with some versions of GNU ld, and it seems the one on buildbots is not affected nor the one currently in Debian unstable. The one in Fedora 12 is apparently affected, but I'm going to double check.
In the end, this very much looks like a gcc problem. gcc 4.4 from fc12 fails, while gcc 4.5 from fc14 works. (gcc 4.4 from fc12 with ld from fc14 fails too).
As far as my testing goes with the build bots it looks like gcc 4.3 works too, so this might be a gcc 4.4 only issue, but I'd like to double check that.
And it's not even a gcc 4.4 issue... gcc version 4.4.5 (Debian 4.4.5-8) gets me a working build... :-/
This one however fails:
gcc version 4.4.5 20101112 (Red Hat 4.4.5-2) (GCC) 
(version from fedora 13)
Depends on: 670659
Depends on: 675618
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.