Open Bug 521435 Opened 15 years ago Updated 2 years ago

teach gcc builds to use LTO

Categories

(Firefox Build System :: General, defect)

x86
All
defect

Tracking

(Not tracked)

People

(Reporter: graydon, Unassigned)

References

Details

Attachments

(4 files)

Last Saturday, gcc acquired the ability to do link-time optimization (LTO), the moral equivalent of msvc's /LTCG option. We should support this -- or at least give it a try -- on our gcc platforms (mac, linux, etc.) as it's likely to give a substantial cross-the-board speedup.

http://gcc.gnu.org/wiki/LinkTimeOptimization
(Er, note, this would be a speedup to execution-time. Build time will probably slow down, perhaps significantly.)
I assume this would require a GCC upgrade?
It's only been merged into the GCC trunk so far.  AIUI it will be in GCC 4.5.0.  Judging from prior releases 4.5.0 will come out some time in Q4.

I've seen a few bugs cropping up on the GCC mailing list so it might be worth holding off until 4.5.0 is out, unless you're feeling optimistic.
Moving to future until a stable release of GCC happens.
Component: Release Engineering → Release Engineering: Future
This should go to Core:Build Config to get support in our build system. If we get to the point of wanting to switch to a new stable version of GCC for nightlies/releases then please file a bug against RelEng for that.
Component: Release Engineering: Future → Build Config
Product: mozilla.org → Core
QA Contact: release → build-config
Version: other → Trunk
Graydon, you were unusually optimistic in filing this bug. I've been idle-time working with Jan Hubica on this for the past few weeks, he got gcc trunk to link and startup :)
Assignee: nobody → tglek
Blocks: 577813
No longer blocks: 577813
Depends on: 577813
Finally got some talos numbers of gcc trunk with/without lto.
Lto is a:
1% win on sunspider, dromeo_css

1% regession on tp_dist.

Note these are very preliminary, compiling with -O1 to start with(-O2 is broken on x86, -Os seems broken in general). C lto busts on nspr so not using LTO on C code. I didn't run the full talos yet, just what I felt was most interesting.

lto libxul is 33mb, nonlto is 31mb. This is on 64bit.
Firefox should now mostly work with GCC LTO.

GCC tracking bug is http://gcc.gnu.org/pr45375

paper http://arxiv.org/abs/1010.2196

Most promising seems to be build with LTO -O3 --param inline-unit-growth 5 that is 28.2MB, non-LTO -Os build is 28.2MB, too.
performance of -O3 build is about 1% better with LTO according to Taras benchmark.
OOPS, tracking bug URL is http://gcc.gnu.org/PR45375
It seems there have been many LTO improvements to GCC in recent years. We should look into this again.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #10)
> It seems there have been many LTO improvements to GCC in recent years. We
> should look into this again.

When I tried with 5.1, enabling LTO regressed talos.
(In reply to Mike Hommey [:glandium] from comment #11)
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #10)
> > It seems there have been many LTO improvements to GCC in recent years. We
> > should look into this again.
> 
> When I tried with 5.1, enabling LTO regressed talos.

Bummer. Was libxul any smaller?
I didn't look. Note it was desktop, not mobile. And it was PGO+LTO that was slower than PGO alone.
I'd like to share my results with GCC 4.9.3 and Firefox 39.0, benchmarked with Peacekeeper.

No PGO, no LTO: ~4400 points
LTO only: ~4600 points
PGO only: ~5000 points, xul 68MB
LTO + PGO: ~5500 points, xul 64MB

LTO caused a few crashes, backtraces showed that they all had common cause.

Compiler options include 64-bit, O2 and native march.
-march=native can't be used on Mozilla's builds.
IMHO, native march causes constant speedup, so the results should be the same minus some constant value.
I got a successful opt build of LTO, with Talos runs, here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=12ce14a5bcac9975b41a1f901bfc3a8dcb2d791b&selectedJob=165424387

I attached the three patches I used to make that happen.

I'm trying to get a PGO run of it for performance comparisons.
Product: Core → Firefox Build System
I got a successful LTO build; only needed a gcc patch to succeed. Everything needed is the attached four patches, although these are illustrative patches; not the actual changes we would apply if we wanted to pursue this.

Perherder shows a near universal 3-9% performance win; except for ARES6. I wonder if that test is correctly configured for up/down gain/loss. (Adding Joel just in case.)

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=7e5bd52e36fcc1703ced01fe87e831a716677295&framework=2&showOnlyImportant=1&selectedTimeRange=172800
Flags: needinfo?(jmaher)
Wrong link? This only shows 3 results. One of which is a > 1000% increase in warnings.
(In reply to Mike Hommey [:glandium] from comment #26)
> Wrong link? This only shows 3 results. One of which is a > 1000% increase in
> warnings.

Right link, bad options. That's the build metrics showing only important results. (Build metrics are not the normal view, but accessible from the dropdown)

Here's the non-filtered performance metrics: 
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=7e5bd52e36fcc1703ced01fe87e831a716677295&framework=1&selectedTimeRange=172800

The warning increase come from two things:
- LTO outputs One Definition Rule and and LTO Type Mismatch which we didn't have before and are numerous. They may indicate issues, not sure.
- I turned on final suggestions (for Bug 1332680) - those are suggestions-in-the-form-of-warnings
I am happy we make progress on this! The benchmark results looks quite good. There are some incremental things we could do on the top of that. For example for PGO builds it would be nice to drop the difference beween -Os and -Ofast/3. This prevents cross-module inlning of comdats and compiler optimize for size anyway all parts that are not executed in the train run.

I am trying to benchmark with talos locally and have issues with the runs sometimes producing results and sometimes not. Anything I could look into?

Concerning ODR warnings.
I looked into them briefly and those I analyzed are real issues (gcc might report some false positives and I would like to know about them). The warnings are not easiest to analyze even though I tried to make them informative.
The ODR mismatches often happens because named class uses some ifdef or type that is different in different units.
Also note that adding -flto=9 will make LTO linktime to always use 9 processes to do the final compilation stage.
It would be better to use -flto=jobserver and then add "+" to each Makefile rule that executes linking. This will allow GCC sub-processes to be controlled by the toplevel make depending on its -j command.
"+" is necessary to tell GNU make to pass down the pipe needed to contact jobserver.
thanks for the mention about ares6- that is in fact an improvement and I have a patch already filed to handle the reverse direction:
https://bugzilla.mozilla.org/show_bug.cgi?id=1443239
Flags: needinfo?(jmaher)
Thanks for the patches, I'm going to enable PGO+LTO for Fedora Firefox builds if it's feasible.
Unfortunately it fails soon at cargo-linker:

"/home/komat/tmp676-trunk-gtk3/src2/build/cargo-linker" "-Wl,--as-needed" "-Wl,-z,noexecstack" "-m64" "-L" ...

  = note: /home/komat/tmp676-trunk-gtk3/src2/objdir-optimized/toolkit/library/release/deps/liblibloading-48cd981c731eb1bf.rlib(libloading-48cd981c731eb1bf.libloading0.rcgu.o): In function `core::ptr::drop_in_place':
          libloading0-dae1b3bfe92793ed548dd7814337f5a0.rs:(.text._ZN4core3ptr13drop_in_place17h3cf1af1fed787f33E+0x1): undefined reference to `rust_libloading_dlerror_mutex_unlock'
          /home/komat/tmp676-trunk-gtk3/src2/objdir-optimized/toolkit/library/release/deps/liblibloading-48cd981c731eb1bf.rlib(libloading-48cd981c731eb1bf.libloading0.rcgu.o): In function `libloading::os::unix::DlerrorMutexGuard::new':
          libloading0-dae1b3bfe92793ed548dd7814337f5a0.rs:(.text._ZN10libloading2os4unix17DlerrorMutexGuard3new17h25f0dea1ba8750daE+0x1): undefined reference to `rust_libloading_dlerror_mutex_lock'
          /home/komat/tmp676-trunk-gtk3/src2/objdir-optimized/toolkit/library/release/deps/liblibloading-48cd981c731eb1bf.rlib(libloading-48cd981c731eb1bf.libloading0.rcgu.o): In function `<libloading::os::unix::DlerrorMutexGuard as core::ops::drop::Drop>::drop':
          libloading0-dae1b3bfe92793ed548dd7814337f5a0.rs:(.text._ZN81_$LT$libloading..os..unix..DlerrorMutexGuard$u20$as$u20$core..ops..drop..Drop$GT$4drop17h34dc679f28968fa3E+0x1): undefined reference to `rust_libloading_dlerror_mutex_unlock'
          /home/komat/tmp676-trunk-gtk3/src2/objdir-optimized/toolkit/library/release/deps/liblibloading-48cd981c731eb1bf.rlib(libloading-48cd981c731eb1bf.libloading0.rcgu.o): In function `<libloading::os::unix::Library as core::ops::drop::Drop>::drop':
          libloading0-dae1b3bfe92793ed548dd7814337f5a0.rs:(.text._ZN71_$LT$libloading..os..unix..Library$u20$as$u20$core..ops..drop..Drop$GT$4drop17h4d62911e45826704E+0x9): undefined reference to `rust_libloading_dlerror_mutex_lock'
          libloading0-dae1b3bfe92793ed548dd7814337f5a0.rs:(.text._ZN71_$LT$libloading..os..unix..Library$u20$as$u20$core..ops..drop..Drop$GT$4drop17h4d62911e45826704E+0xcb): undefined reference to `rust_libloading_dlerror_mutex_unlock'
          collect2: error: ld returned 1 exit status
Hello,
the problem here is that cargo linker is not enabling gcc LTO plugin. Either it needs to be called through gcc wrapper, add proper plugin parameter to the linker or ./toolkit/library/release/build/libloading-d78baa5b18daaadf/out/src/os/unix/global_static.o needs to be built with no LTO.
I think the last is easiest to arrange, but I got bit lost in the built machinery.

Honza
Hi,
as Martin Liska pointed out, one should no longer add -flto=64 to cflags but should use ac_add_options --enable-lto
I just built yesterday checkout of firefox git using gcc 8.2 with no problems.

Honza
Assignee: taras.mozilla → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: