1353472 - stylo: Poor parallelism in rebuilding rust crates

Reporter

Description

•

8 years ago

When I rustup and then do "mach build binaries", my machine spends: - a few seconds compiling a bunch of misc crates on a lot of cores (great!) - a couple minutes compiling webrender on a single core - a couple minutes compiling gkrust on a single core - and only then do I get to the joy of waiting for the linker I'm not familiar with rust's build rules, but I'm going to guess that gkrust depends on webrender so that has to be built first. But does webrender have to be _fully_ built before we can even start on gkrust? Is there not something analogous to ".h" files where you can build against somebody's interface without needing their generated code?

Nathan Froyd [:froydnj]

Comment 1

•

8 years ago

There's not really anything like .h files; I think there are some ideas at the rustc level (gkrust doesn't have to wait for *everything* from webrender, and could proceed in parallel with parts of it), but nothing yet. Bug 1352815 discusses speeding debug mode Rust compilation up a bit.

(Away)

Reporter

Comment 2

•

8 years ago

Oh, I guess I should mention I was building opt.

(Away)

Reporter

Comment 3

•

8 years ago

Enabling stylo makes this problem stand out much worse. The style crate and gkrust now take six minutes each. So --enable-stylo adds 12 minutes of single-threaded work to my build.

Blocks: stylo-nightly-build

(Away)

Reporter

Comment 4

•

8 years ago

(In reply to David Major [:dmajor] from comment #2) > Oh, I guess I should mention I was building opt. I'll make this statement stronger: I _always_ build opt. (I tend to work on either performance or problematic code-gen, where I need to see the actual bits our users will see.) So debug changes like bug 1356991 don't do anything for me.

(Away)

Reporter

Updated

•

8 years ago

Summary: Poor parallelism in rebuilding rust crates → stylo: Poor parallelism in rebuilding rust crates

Bobby Holley (:bholley)

Comment 5

•

8 years ago

(In reply to David Major [:dmajor] from comment #3) > Enabling stylo makes this problem stand out much worse. The style crate and > gkrust now take six minutes each. So --enable-stylo adds 12 minutes of > single-threaded work to my build. The style crate should at least build in parallel with webrender. > So debug changes like bug 1356991 don't do anything for me. Wrong bug number? In general, this is a hard problem to solve. There has been a lot of look into compilation times in Rust (and incremental compilation), but most of the slowness of opt builds comes from the LLVM optimization passes, and it's not clear to me how much people have looked into that. If you have cycles to do so, probably worth talking to mw about it.

Bobby Holley (:bholley)

Comment 6

•

8 years ago

NI mw for any comments he can add on all this.

Flags: needinfo?(mwoerister)

Bobby Holley (:bholley)

Updated

•

8 years ago

Priority: -- → P5

(Away)

Reporter

Comment 7

•

8 years ago

> > So debug changes like bug 1356991 don't do anything for me. > Wrong bug number? Oops: bug 1352815.

(Away)

Reporter

Comment 8

•

8 years ago

> most of the slowness of opt builds comes from the LLVM optimization passes Hm, that's interesting. I wonder why we don't see huge build times for clang-cl gecko builds, where LLVM is building 10x as much code. Is it because of the difference in source language? Or maybe is it the fact that clang-cl builds still depend on MSVC link.exe for final codegen? (But that's true for stylo builds too... I wonder how much of that LLVM optimization is redundant with what link.exe would do.) Also, is the style crate incorporated into gkrust? (I'm totally just guessing this based on the build logs and times.) Do both need to be fully optimized?

Michael Woerister

Comment 9

•

8 years ago

The problem with building optimized Rust code is that the compilation model is much closer to using something like LTO in the C++ world. Each crate is built as a single, big object file and LLVM cannot process it in parallel. The way to get more parallelism for the LLVM part is to increase the number of codegen units. This will affect runtime performance though, as it limits inlining. I don't know if that is acceptable in your situation. One can also mark functions with #[inline]. Such functions will be placed into each codegen unit that calls them. With some profiling maybe you can spot functions where a strategic placement of #[inline] makes a big difference. Incremental compilation would speed up compilation (and also works in conjunction with optimized builds) but it would have the same impact on runtime performance, since it also splits the code into multiple object files. In the future ThinLTO might change the picture here in that it allows to parallelize LLVM without the performance hit. But nobody has gotten around to implementing it yet. There's also been talk about "MIR-only rlibs" [1] which would allow deferring all LLVM related things to the final executable, staticlib, or cdylib. I'm not sure it would help in this case though. [1] https://github.com/rust-lang/rust/issues/38913

Flags: needinfo?(mwoerister)

Nathan Froyd [:froydnj]

Comment 10

•

8 years ago

(In reply to David Major [:dmajor] from comment #4) > (In reply to David Major [:dmajor] from comment #2) > > Oh, I guess I should mention I was building opt. > > I'll make this statement stronger: I _always_ build opt. (I tend to work on > either performance or problematic code-gen, where I need to see the actual > bits our users will see.) So debug changes like bug 1356991 don't do > anything for me. FWIW, this is why we have --enable-rust-debug, so you can build with --enable-optimize --disable-debug --enable-rust-debug and get faster Rust compilation while still building all the C++ code opt. Maybe that doesn't help with your particular case, though?(In reply to David Major [:dmajor] from comment #8) > Hm, that's interesting. I wonder why we don't see huge build times for > clang-cl gecko builds, where LLVM is building 10x as much code. Is it > because of the difference in source language? My understanding is yes, to some degree. > Also, is the style crate incorporated into gkrust? (I'm totally just > guessing this based on the build logs and times.) Do both need to be fully > optimized? Yes and yes. There's no way to build with selective optimizations in the current compilation model.

Gregory Szorc [:gps]

Comment 11

•

8 years ago

I'm not a rust compiler expert, but if we can change the default build configuration so rust code compiles faster but doesn't sacrifice run-time "purity" significantly, I think that would be a net win for the average Firefox developer. I think "perfect is the enemy of good" applies and we shouldn't penalize the average developer by having them run the full spectrum of rust's optimization passes unless that needs to occur to catch common bugs, etc. There is an analogy to be drawn to PGO: we ship PGO to end users but you have to enable it in the build system because it adds several minutes to the build and offers little benefit to the average developer. I'm not sure how similar the boats are. But Rust is adding enough overhead to the Firefox build that wins from changing the default configuration seem justifiable.

Michael Woerister

Comment 12

•

8 years ago

> we shouldn't penalize the average developer by having them run the full spectrum of rust's optimization passes On a similar note: If your current opt build also enables Rust's LTO then that will probably mean noticeably longer compile times than a regular -O build. Those last 10% of performance that LTO brings might not be necessary testing most things.

Chris Cooper [:coop] (he/him)

Comment 13

•

7 years ago

(In reply to Michael Woerister from comment #12) > > we shouldn't penalize the average developer by having them run the full spectrum of rust's optimization passes > > On a similar note: If your current opt build also enables Rust's LTO then > that will probably mean noticeably longer compile times than a regular -O > build. Those last 10% of performance that LTO brings might not be necessary > testing most things. Is it worth co-mingling the PGO and LTO invocations, i.e. only PGO builds will also run with LTO enabled? Nightlies would still see the benefit from LTO, and developers get faster dep builds.

Michael Woerister

Comment 14

•

7 years ago

> Is it worth co-mingling the PGO and LTO invocations, i.e. only PGO builds will also run with LTO enabled? Nightlies would still see the benefit from LTO, and developers get faster dep builds. Sounds reasonable to me.

Bobby Holley (:bholley)

Comment 15

•

7 years ago

If we do this, please make it straightforward to enable LTO and not PGO in a mozconfig. For stylo performance work I want my Rust code to be maximally optimized, but don't really want to do PGO.

Mike Hommey [:glandium]

Comment 16

•

7 years ago

Android and OSX don't PGO, yet we'd like them to LTO rust...

Chris Peterson [:cpeterson]

Comment 17

•

7 years ago

Rust build parallelism does not need to block building Stylo support in Firefox.

Blocks: stylo-release
No longer blocks: stylo-nightly-build

(Away)

Reporter

Updated

•

7 years ago

Updated

•

7 years ago

Blocks: stylo-tooling

(Away)

Reporter

Comment 18

•

7 years ago

(In reply to David Major [:dmajor] from comment #8) > Also, is the style crate incorporated into gkrust? > Do both need to be fully optimized? I believe https://github.com/rust-lang/rust/issues/43212 should address this. Glandium did a much better job of explaining the problem than me.

Chris Peterson [:cpeterson]

Updated

•

7 years ago

Blocks: rust-great

Bas Schouten (:bas.schouten)

Comment 19

•

7 years ago

This is very damaging to productivity.

Gregory Szorc [:gps]

Comment 20

•

7 years ago

I suspect bug 1386371 will improve matters significantly. The build efficiency regressions from introducing Rust/Cargo have... not been pleasant to see. Various build peers are actively addressing problems. See the "rust-great" bug.

Bas Schouten (:bas.schouten)

Comment 22

•

7 years ago

The LTO removal may have helped a little. Things are still awful. My build times are still doubled to tripled for relatively simple changes to nsRect.h or files like that.

Bas Schouten (:bas.schouten)

Comment 23

•

7 years ago

(Optimized builds, that is)

Nathan Froyd [:froydnj]

Comment 24

•

7 years ago

(In reply to Bas Schouten (:bas.schouten) from comment #22) > The LTO removal may have helped a little. Things are still awful. My build > times are still doubled to tripled for relatively simple changes to nsRect.h > or files like that. This is presumably because nsRect.h triggers bindgen, which triggers some large rebuild of the style crate. Is it possible to do more fine-grained bindgen'ing, so we could only-rebindgen for nsRect.h and dependent things, which might trigger less rebuilding of the style crate?

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Comment 25

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #24) > (In reply to Bas Schouten (:bas.schouten) from comment #22) > > The LTO removal may have helped a little. Things are still awful. My build > > times are still doubled to tripled for relatively simple changes to nsRect.h > > or files like that. > > This is presumably because nsRect.h triggers bindgen, which triggers some > large rebuild of the style crate. Is it possible to do more fine-grained > bindgen'ing, so we could only-rebindgen for nsRect.h and dependent things, > which might trigger less rebuilding of the style crate? At the moment, I don't believe there's a way to do this, but let's ask :emilio if there are any plans for an "incremental" bindgen step.

Flags: needinfo?(emilio+bugs)

Bobby Holley (:bholley)

Comment 26

•

7 years ago

(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #25) > (In reply to Nathan Froyd [:froydnj] from comment #24) > > (In reply to Bas Schouten (:bas.schouten) from comment #22) > > > The LTO removal may have helped a little. Things are still awful. My build > > > times are still doubled to tripled for relatively simple changes to nsRect.h > > > or files like that. > > > > This is presumably because nsRect.h triggers bindgen, which triggers some > > large rebuild of the style crate. Is it possible to do more fine-grained > > bindgen'ing, so we could only-rebindgen for nsRect.h and dependent things, > > which might trigger less rebuilding of the style crate? > > At the moment, I don't believe there's a way to do this, but let's ask > :emilio if there are any plans for an "incremental" bindgen step. I don't really know what this would look like. The problem is that changes to nsRect cause changes to the generated bindings (as they should), which cause us to rebuild everything because rustc doesn't have incremental compilation yet. Seems like the better approach would be to have the bare minimum needed by bindgen exposed in the headers included by bindgen, and move most of the interesting stuff into a different header. See also bug 1390110.

Flags: needinfo?(emilio+bugs)

Chris Peterson [:cpeterson]

Comment 27

•

7 years ago

status-firefox57=wontfix unless someone thinks this bug should block 57

status-firefox57: --- → wontfix

BMO Automation

Updated

•

7 years ago

Product: Core → Firefox Build System

BMO Automation

Updated

•

2 years ago

Severity: normal → S3