Investigate turning off intra-crate ThinLTO for Rust code for DEVELOPER_OPTIONS builds
Categories
(Firefox Build System :: General, enhancement)
Tracking
(firefox74 fixed)
Tracking | Status | |
---|---|---|
firefox74 | --- | fixed |
People
(Reporter: mwoerister, Assigned: mwoerister)
Details
Attachments
(1 file)
The Rust compiler by default does "intra-crate" ThinLTO which means that it will perform ThinLTO among object files of the same crate. This improves runtime performance at the cost of additional compile time.
For local, non-release builds the additional runtime performance is likely not worth the additional compile time.
My local testing shows that an incremental -O1
build of the style
crate goes from 18.3 seconds to 16 seconds (i.e. ~13% faster) when disabling ThinLTO. A clobber build goes from 16 minutes to 14,5 minutes.
Assignee | ||
Comment 1•5 years ago
|
||
Upon further inspection it turns out that (through a loophole in the Rust compiler's commandline option logic) the build system already turns off intra-crate ThinLTO for DEVELOPER_OPTIONS
builds: The top-level Cargo.toml
sets codegen-units = 1
for these kinds of builds. Normally this setting has no effect on incremental builds. However, when it comes to deciding if ThinLTO should be enabled rustc
relies on this setting, even though it is overridden by -C incremental
.
In other words we already get the wins described above, it was only due to my specific test environment that I was able to observe the difference.
I'll create a patch that cleans this up. Once the proposal in https://github.com/rust-lang/compiler-team/issues/245 goes through, this loophole will be closed and the codegen-units setting in the Cargo.toml
will need to be removed anyway.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 2•5 years ago
|
||
By default the Rust compiler will perform a limited kind of ThinLTO on each
crate. For local builds this additional optimization is not worth the
increase in compile time.
Comment 3•5 years ago
|
||
The top-level Cargo.toml sets codegen-units = 1 for these kinds of builds
Maybe we should change that too, and only do codegen-units=1 on non-DEVELOPER_OPTIONS builds.
Assignee | ||
Comment 5•5 years ago
•
|
||
(In reply to Mike Hommey [:glandium] from comment #3)
Maybe we should change that too, and only do codegen-units=1 on non-DEVELOPER_OPTIONS builds.
I've been looking into this some more and I think the situation is even a bit more complicated than I thought initially:
- If
CARGO_INCREMENTAL=0
thenrustc
defaults to 16 codegen units per crate. We definitely want to override this withcodegen-units=1
in non-DEVELOPER_OPTIONS builds. We might also want to override it for non-incremental DEVELOPER_OPTIONS builds (i.e. builds using sccache?), but see 2.i below. - If
CARGO_INCREMENTAL=1
thencargo
will invokerustc
will use one of two settings:- If the crate comes from an external source (e.g. crates.io) then
rustc
is invoked non-incrementally, i.e. defaulting to 16 codegen units (ifcodegen-units
isn't specified explicitly somewhere). Given the high amount of parallel work we already have, it does not really make sense to also have per-crate parallelism. - If the crate comes from a local source then
cargo
will invokerustc
incrementally, using many many codegen units.
- If the crate comes from an external source (e.g. crates.io) then
So the current settings are actually pretty good:
- non-DEVELOPER_OPTIONS builds compile everything with
codegen-units=1
which is the only option that makes sense there. - DEVELOPER_OPTIONS builds compile external crates with
codegen-units=1
. And local crates are compiled with-Cincremental
which makes sense for those since we expect them actually change.
However, once codegen-units will start to have an effect in incremental mode, we'll get a bit of a problem with incremental builds:
- If we keep the
codegen-units=1
setting then the effectiveness of incremental compilation will be reduced by 70-90% because any tiny change will forcerustc
to send the entire crate through LLVM again, which makes up the bulk of compilation time. - If we just remove the
codegen-units=1
setting then all external crates will use rustc's default 16 codegen units setting, which introduces unnecessary overhead for these.
I think the second option is preferable but far from ideal. I'll try it out locally; let's see how it affects compile times.
It would be great if Cargo had a way of using specific settings for external crates in incremental mode (then we could just set codegen-units=1
for just those) but I don't think that is possible at the moment.
Assignee | ||
Comment 6•5 years ago
|
||
In my local tests removing the codegen-units=1
setting from Cargo.toml
increased clobber build times from 14m 55s to 15m 17s, so 22s longer. That's an increase of ~2.5%. Incremental builds are unaffected, as expected.
Overall this doesn't seem to be too much of a problem since it's just local clobber builds that are affected. It would probably be possible to work around the issue by setting a RUSTC_WRAPPER
that adds -Ccodegen-units=1
to each invocation that doesn't have -Cincremental
on it.
Comment 7•5 years ago
|
||
bugherder |
Description
•