stylo: Poor parallelism in rebuilding rust crates

NEW
Unassigned

Status

()

Core
Build Config
P5
normal
10 months ago
3 months ago

People

(Reporter: dmajor (offline), Unassigned)

Tracking

(Blocks: 3 bugs)

unspecified
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox57 wontfix)

Details

(Reporter)

Description

10 months ago
When I rustup and then do "mach build binaries", my machine spends:
- a few seconds compiling a bunch of misc crates on a lot of cores (great!)
- a couple minutes compiling webrender on a single core
- a couple minutes compiling gkrust on a single core
- and only then do I get to the joy of waiting for the linker

I'm not familiar with rust's build rules, but I'm going to guess that gkrust depends on webrender so that has to be built first.

But does webrender have to be _fully_ built before we can even start on gkrust? Is there not something analogous to ".h" files where you can build against somebody's interface without needing their generated code?

Comment 1

10 months ago
There's not really anything like .h files; I think there are some ideas at the rustc level (gkrust doesn't have to wait for *everything* from webrender, and could proceed in parallel with parts of it), but nothing yet.  Bug 1352815 discusses speeding debug mode Rust compilation up a bit.
(Reporter)

Comment 2

10 months ago
Oh, I guess I should mention I was building opt.
(Reporter)

Comment 3

8 months ago
Enabling stylo makes this problem stand out much worse. The style crate and gkrust now take six minutes each. So --enable-stylo adds 12 minutes of single-threaded work to my build.
Blocks: 1356991
(Reporter)

Comment 4

8 months ago
(In reply to David Major [:dmajor] from comment #2)
> Oh, I guess I should mention I was building opt.

I'll make this statement stronger: I _always_ build opt. (I tend to work on either performance or problematic code-gen, where I need to see the actual bits our users will see.) So debug changes like bug 1356991 don't do anything for me.
(Reporter)

Updated

8 months ago
Summary: Poor parallelism in rebuilding rust crates → stylo: Poor parallelism in rebuilding rust crates
(In reply to David Major [:dmajor] from comment #3)
> Enabling stylo makes this problem stand out much worse. The style crate and
> gkrust now take six minutes each. So --enable-stylo adds 12 minutes of
> single-threaded work to my build.

The style crate should at least build in parallel with webrender.

> So debug changes like bug 1356991 don't do anything for me.

Wrong bug number?

In general, this is a hard problem to solve. There has been a lot of look into compilation times in Rust (and incremental compilation), but most of the slowness of opt builds comes from the LLVM optimization passes, and it's not clear to me how much people have looked into that. If you have cycles to do so, probably worth talking to mw about it.
NI mw for any comments he can add on all this.
Flags: needinfo?(mwoerister)

Updated

8 months ago
Priority: -- → P5
(Reporter)

Comment 7

8 months ago
> > So debug changes like bug 1356991 don't do anything for me.
> Wrong bug number?
Oops: bug 1352815.
(Reporter)

Comment 8

8 months ago
> most of the slowness of opt builds comes from the LLVM optimization passes

Hm, that's interesting. I wonder why we don't see huge build times for clang-cl gecko builds, where LLVM is building 10x as much code. Is it because of the difference in source language? Or maybe is it the fact that clang-cl builds still depend on MSVC link.exe for final codegen? (But that's true for stylo builds too... I wonder how much of that LLVM optimization is redundant with what link.exe would do.)

Also, is the style crate incorporated into gkrust? (I'm totally just guessing this based on the build logs and times.) Do both need to be fully optimized?

Comment 9

8 months ago
The problem with building optimized Rust code is that the compilation model is much closer to using something like LTO in the C++ world. Each crate is built as a single, big object file and LLVM cannot process it in parallel. 

The way to get more parallelism for the LLVM part is to increase the number of codegen units. This will affect runtime performance though, as it limits inlining. I don't know if that is acceptable in your situation. One can also mark functions with #[inline]. Such functions will be placed into each codegen unit that calls them. With some profiling maybe you can spot functions where a strategic placement of #[inline] makes a big difference.

Incremental compilation would speed up compilation (and also works in conjunction with optimized builds) but it would have the same impact on runtime performance, since it also splits the code into multiple object files.

In the future ThinLTO might change the picture here in that it allows to parallelize LLVM without the performance hit. But nobody has gotten around to implementing it yet.

There's also been talk about "MIR-only rlibs" [1] which would allow deferring all LLVM related things to the final executable, staticlib, or cdylib. I'm not sure it would help in this case though.

[1] https://github.com/rust-lang/rust/issues/38913
Flags: needinfo?(mwoerister)
(In reply to David Major [:dmajor] from comment #4)
> (In reply to David Major [:dmajor] from comment #2)
> > Oh, I guess I should mention I was building opt.
> 
> I'll make this statement stronger: I _always_ build opt. (I tend to work on
> either performance or problematic code-gen, where I need to see the actual
> bits our users will see.) So debug changes like bug 1356991 don't do
> anything for me.

FWIW, this is why we have --enable-rust-debug, so you can build with --enable-optimize --disable-debug --enable-rust-debug and get faster Rust compilation while still building all the C++ code opt.  Maybe that doesn't help with your particular case, though?(In reply to David Major [:dmajor] from comment #8)

> Hm, that's interesting. I wonder why we don't see huge build times for
> clang-cl gecko builds, where LLVM is building 10x as much code. Is it
> because of the difference in source language?

My understanding is yes, to some degree.

> Also, is the style crate incorporated into gkrust? (I'm totally just
> guessing this based on the build logs and times.) Do both need to be fully
> optimized?

Yes and yes.  There's no way to build with selective optimizations in the current compilation model.

Comment 11

8 months ago
I'm not a rust compiler expert, but if we can change the default build configuration so rust code compiles faster but doesn't sacrifice run-time "purity" significantly, I think that would be a net win for the average Firefox developer. I think "perfect is the enemy of good" applies and we shouldn't penalize the average developer by having them run the full spectrum of rust's optimization passes unless that needs to occur to catch common bugs, etc.

There is an analogy to be drawn to PGO: we ship PGO to end users but you have to enable it in the build system because it adds several minutes to the build and offers little benefit to the average developer. I'm not sure how similar the boats are. But Rust is adding enough overhead to the Firefox build that wins from changing the default configuration seem justifiable.

Comment 12

8 months ago
> we shouldn't penalize the average developer by having them run the full spectrum of rust's optimization passes

On a similar note: If your current opt build also enables Rust's LTO then that will probably mean noticeably longer compile times than a regular -O build. Those last 10% of performance that LTO brings might not be necessary testing most things.

Comment 13

8 months ago
(In reply to Michael Woerister from comment #12)
> > we shouldn't penalize the average developer by having them run the full spectrum of rust's optimization passes
> 
> On a similar note: If your current opt build also enables Rust's LTO then
> that will probably mean noticeably longer compile times than a regular -O
> build. Those last 10% of performance that LTO brings might not be necessary
> testing most things.

Is it worth co-mingling the PGO and LTO invocations, i.e. only PGO builds will also run with LTO enabled? Nightlies would still see the benefit from LTO, and developers get faster dep builds.

Comment 14

8 months ago
> Is it worth co-mingling the PGO and LTO invocations, i.e. only PGO builds will also run with LTO enabled? Nightlies would still see the benefit from LTO, and developers get faster dep builds.

Sounds reasonable to me.
If we do this, please make it straightforward to enable LTO and not PGO in a mozconfig. For stylo performance work I want my Rust code to be maximally optimized, but don't really want to do PGO.
Android and OSX don't PGO, yet we'd like them to LTO rust...
Rust build parallelism does not need to block building Stylo support in Firefox.
Blocks: 1374034
No longer blocks: 1356991
(Reporter)

Updated

6 months ago
See Also: → bug 1378667
Blocks: 1345321
(Reporter)

Comment 18

6 months ago
(In reply to David Major [:dmajor] from comment #8)
> Also, is the style crate incorporated into gkrust? 
> Do both need to be fully optimized?

I believe https://github.com/rust-lang/rust/issues/43212 should address this. Glandium did a much better job of explaining the problem than me.
Blocks: 1380210
This is very damaging to productivity.

Comment 20

6 months ago
I suspect bug 1386371 will improve matters significantly. The build efficiency regressions from introducing Rust/Cargo have... not been pleasant to see. Various build peers are actively addressing problems. See the "rust-great" bug.

Updated

5 months ago
Duplicate of this bug: 1388438
The LTO removal may have helped a little. Things are still awful. My build times are still doubled to tripled for relatively simple changes to nsRect.h or files like that.
(Optimized builds, that is)
(In reply to Bas Schouten (:bas.schouten) from comment #22)
> The LTO removal may have helped a little. Things are still awful. My build
> times are still doubled to tripled for relatively simple changes to nsRect.h
> or files like that.

This is presumably because nsRect.h triggers bindgen, which triggers some large rebuild of the style crate.  Is it possible to do more fine-grained bindgen'ing, so we could only-rebindgen for nsRect.h and dependent things, which might trigger less rebuilding of the style crate?
(In reply to Nathan Froyd [:froydnj] from comment #24)
> (In reply to Bas Schouten (:bas.schouten) from comment #22)
> > The LTO removal may have helped a little. Things are still awful. My build
> > times are still doubled to tripled for relatively simple changes to nsRect.h
> > or files like that.
> 
> This is presumably because nsRect.h triggers bindgen, which triggers some
> large rebuild of the style crate.  Is it possible to do more fine-grained
> bindgen'ing, so we could only-rebindgen for nsRect.h and dependent things,
> which might trigger less rebuilding of the style crate?

At the moment, I don't believe there's a way to do this, but let's ask :emilio if there are any plans for an "incremental" bindgen step.
Flags: needinfo?(emilio+bugs)
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #25)
> (In reply to Nathan Froyd [:froydnj] from comment #24)
> > (In reply to Bas Schouten (:bas.schouten) from comment #22)
> > > The LTO removal may have helped a little. Things are still awful. My build
> > > times are still doubled to tripled for relatively simple changes to nsRect.h
> > > or files like that.
> > 
> > This is presumably because nsRect.h triggers bindgen, which triggers some
> > large rebuild of the style crate.  Is it possible to do more fine-grained
> > bindgen'ing, so we could only-rebindgen for nsRect.h and dependent things,
> > which might trigger less rebuilding of the style crate?
> 
> At the moment, I don't believe there's a way to do this, but let's ask
> :emilio if there are any plans for an "incremental" bindgen step.

I don't really know what this would look like. The problem is that changes to nsRect cause changes to the generated bindings (as they should), which cause us to rebuild everything because rustc doesn't have incremental compilation yet.

Seems like the better approach would be to have the bare minimum needed by bindgen exposed in the headers included by bindgen, and move most of the interesting stuff into a different header.

See also bug 1390110.
Flags: needinfo?(emilio+bugs)
status-firefox57=wontfix unless someone thinks this bug should block 57
status-firefox57: --- → wontfix
You need to log in before you can comment on or make changes to this bug.