Closed Bug 1613649 Opened 2 years ago Closed 2 years ago

Investigate turning off intra-crate ThinLTO for Rust code for DEVELOPER_OPTIONS builds

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set
normal

Tracking

(firefox74 fixed)

RESOLVED FIXED
mozilla74
Tracking Status
firefox74 --- fixed

People

(Reporter: mwoerister, Assigned: mwoerister)

Details

Attachments

(1 file)

The Rust compiler by default does "intra-crate" ThinLTO which means that it will perform ThinLTO among object files of the same crate. This improves runtime performance at the cost of additional compile time.

For local, non-release builds the additional runtime performance is likely not worth the additional compile time.

My local testing shows that an incremental -O1 build of the style crate goes from 18.3 seconds to 16 seconds (i.e. ~13% faster) when disabling ThinLTO. A clobber build goes from 16 minutes to 14,5 minutes.

Upon further inspection it turns out that (through a loophole in the Rust compiler's commandline option logic) the build system already turns off intra-crate ThinLTO for DEVELOPER_OPTIONS builds: The top-level Cargo.toml sets codegen-units = 1 for these kinds of builds. Normally this setting has no effect on incremental builds. However, when it comes to deciding if ThinLTO should be enabled rustc relies on this setting, even though it is overridden by -C incremental.

In other words we already get the wins described above, it was only due to my specific test environment that I was able to observe the difference.

I'll create a patch that cleans this up. Once the proposal in https://github.com/rust-lang/compiler-team/issues/245 goes through, this loophole will be closed and the codegen-units setting in the Cargo.toml will need to be removed anyway.

Assignee: nobody → mwoerister
Status: NEW → ASSIGNED

By default the Rust compiler will perform a limited kind of ThinLTO on each
crate. For local builds this additional optimization is not worth the
increase in compile time.

The top-level Cargo.toml sets codegen-units = 1 for these kinds of builds

Maybe we should change that too, and only do codegen-units=1 on non-DEVELOPER_OPTIONS builds.

Pushed by dvarga@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0081ffb08a11
Explicitly opt-out of Rust's local ThinLTO for DEVELOPER_OPTIONS builds. r=firefox-build-system-reviewers,chmanchester

(In reply to Mike Hommey [:glandium] from comment #3)

Maybe we should change that too, and only do codegen-units=1 on non-DEVELOPER_OPTIONS builds.

I've been looking into this some more and I think the situation is even a bit more complicated than I thought initially:

  1. If CARGO_INCREMENTAL=0 then rustc defaults to 16 codegen units per crate. We definitely want to override this with codegen-units=1 in non-DEVELOPER_OPTIONS builds. We might also want to override it for non-incremental DEVELOPER_OPTIONS builds (i.e. builds using sccache?), but see 2.i below.
  2. If CARGO_INCREMENTAL=1 then cargo will invoke rustc will use one of two settings:
    1. If the crate comes from an external source (e.g. crates.io) then rustc is invoked non-incrementally, i.e. defaulting to 16 codegen units (if codegen-units isn't specified explicitly somewhere). Given the high amount of parallel work we already have, it does not really make sense to also have per-crate parallelism.
    2. If the crate comes from a local source then cargo will invoke rustc incrementally, using many many codegen units.

So the current settings are actually pretty good:

  • non-DEVELOPER_OPTIONS builds compile everything with codegen-units=1 which is the only option that makes sense there.
  • DEVELOPER_OPTIONS builds compile external crates with codegen-units=1. And local crates are compiled with -Cincremental which makes sense for those since we expect them actually change.

However, once codegen-units will start to have an effect in incremental mode, we'll get a bit of a problem with incremental builds:

  • If we keep the codegen-units=1 setting then the effectiveness of incremental compilation will be reduced by 70-90% because any tiny change will force rustc to send the entire crate through LLVM again, which makes up the bulk of compilation time.
  • If we just remove the codegen-units=1 setting then all external crates will use rustc's default 16 codegen units setting, which introduces unnecessary overhead for these.

I think the second option is preferable but far from ideal. I'll try it out locally; let's see how it affects compile times.

It would be great if Cargo had a way of using specific settings for external crates in incremental mode (then we could just set codegen-units=1 for just those) but I don't think that is possible at the moment.

In my local tests removing the codegen-units=1 setting from Cargo.toml increased clobber build times from 14m 55s to 15m 17s, so 22s longer. That's an increase of ~2.5%. Incremental builds are unaffected, as expected.

Overall this doesn't seem to be too much of a problem since it's just local clobber builds that are affected. It would probably be possible to work around the issue by setting a RUSTC_WRAPPER that adds -Ccodegen-units=1 to each invocation that doesn't have -Cincremental on it.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74
You need to log in before you can comment on or make changes to this bug.