Don't spawn more rustc processes than requested job count

RESOLVED FIXED in Firefox 56

Status

enhancement
RESOLVED FIXED
2 years ago
10 months ago

People

(Reporter: gps, Assigned: glandium)

Tracking

(Blocks 1 bug)

unspecified
mozilla56
Dependency tree / graph

Firefox Tracking Flags

(firefox56 fixed)

Details

Attachments

(2 attachments)

We currently use `cargo` to build Rust code. Cargo isn't interacting with the make jobserver. And, cargo attempts to use all cores by default. So if you have 16 available cores and are running with `make -j16`, you could get in a situation where 15 gcc/clang processes are running and cargo is running up to 16 rustc processes, for a total of 31 processes (likely all CPU bound) competing for scheduling on 16 cores. That results in a lot of thrashing and slows down overall execution.

This has been a longstanding problem. Bug 1340637 just made it worse. The geckodriver crate is quite large. And if make invokes multiple cargo processes simultaneously, each one will attempt to use #cores processes. So we could now have 14+16+16=46 processes competing for 16 cores. Not cool.

We need to better integrate Rust compilation / cargo with the primary build backend's jobserver so we don't spawn more worker jobs than the requested value.

We just talked about this in #build. glandium says he'd like to integrate cargo with the make jobserver. That's a solution. But it may not be the long-term one. We'll eventually support other build backends (tup, bazel, ninja). Unless all of them support make's jobserver "protocol," cargo's ability to use a jobserver won't help us if the backend doesn't implement a make jobserver. Instead, we'll need the build backend to have explicit control over invoked rustc processes. That likely means some kind of cargo feature that spits out an execution plan that can easily be transformed into something the build backend can use. I /think/ I've heard ted saying there are plans for this in cargo land. Not sure what the state is. Of course, we don't have non-make backends today, so make jobserver integration is a path forward for the short to medium term.
As mentioned in #build, all build systems should support the make jobserver "protocol", for GCC's lto.
This is probably the reason why Buck calls directly to `rustc` instead of calling Cargo
Perhaps the build peers are already aware, but just in case it's news...  

Cargo gained support for the make jobserver protocol[1] on 2017-06-02, so I would assume that we need to make use of this when calling Cargo?  Seems likes it should be available starting with Rust 1.18.

[1]: https://github.com/rust-lang/cargo/pull/4110
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #3)
> Perhaps the build peers are already aware, but just in case it's news...  
> 
> Cargo gained support for the make jobserver protocol[1] on 2017-06-02, so I
> would assume that we need to make use of this when calling Cargo?  Seems
> likes it should be available starting with Rust 1.18.

2017-06-02 was just a couple days prior to the 1.18 release, so it would have been committed while "Nightly" Rust was 1.19 and 1.18 was in beta...so I guess that it would be in 1.19, not 1.18?  Unless the change didn't make it into 1.19 for some reason, being so close to the train departure date?

If it is in 1.19, maybe we should be using 1.19 betas in automation now, or at least it would be worth seeing if 1.19 made any significant changes to Windows compilation times?  Ralph, WDYT?
Flags: needinfo?(giles)
(In reply to Nathan Froyd [:froydnj] from comment #4)
> (In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #3)
> > Perhaps the build peers are already aware, but just in case it's news...  
> > 
> > Cargo gained support for the make jobserver protocol[1] on 2017-06-02, so I
> > would assume that we need to make use of this when calling Cargo?  Seems
> > likes it should be available starting with Rust 1.18.
> 
> 2017-06-02 was just a couple days prior to the 1.18 release, so it would
> have been committed while "Nightly" Rust was 1.19 and 1.18 was in beta...so
> I guess that it would be in 1.19, not 1.18?  Unless the change didn't make
> it into 1.19 for some reason, being so close to the train departure date?

Ah sorry for confusion, I had a hard time finding a clear mapping of landing date to version number.
(In reply to Nathan Froyd [:froydnj] from comment #4)
> Unless the change didn't make> it into 1.19 for some reason[...]?

The jobserver changes landed in cargo 0.20 which will ship with rust 1.19. Rustc itself got similar jobserver support for its internal threaded codegen in 1.20.

> If it is in 1.19, maybe we should be using 1.19 betas in automation now, or
> at least it would be worth seeing if 1.19 made any significant changes to
> Windows compilation times?  Ralph, WDYT?

You're thinking that less contention might speed up the build? I thought in the absence of the job server, both mach and cargo would try to spawn one concurrent task per cpu. We can certainly try it now that stylo is enabled one windows. I worry we'll need to fix bug 1376010 first though.
Flags: needinfo?(giles)
I did a new push with rust 1.19.0-beta.2 so we can compare build times with today's m-c.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=438c016ab3427b2b3030d48a69763b3682c86568
Also, since the final tasks in cargo can take lots of time, and they are not very parallel, we should allow cargo to use as many processors as possible at the beginning.
... assuming you don't hit https://github.com/rust-lang/rust/issues/42867
My try push is somewhat inconclusive:

Windows XP opt 88 minutes vs 94, 97, 95 for recent m-c builds.
Windows XP debug 66 minutes vs 71, 65, 73 for m-c.

Windows 7 x64 opt 85 minutes vs 94, 97, 95 for m-c.
Windows 7 x64 debug 78 minutes *but* 66, 69, 67 for m-c.

So possibly 10% faster except for the x64 debug build. Not sure I believe the numbers though.
Comment on attachment 8886118 [details]
Bug 1367940 - Pass the make jobserver file descriptors down to cargo.

https://reviewboard.mozilla.org/r/156922/#review162270

::: config/rules.mk:910
(Diff revision 1)
>  ifdef MOZ_USING_SCCACHE
>  sccache_wrap := RUSTC_WRAPPER='$(CCACHE)'
>  endif
>  
>  define RUN_CARGO
> -env $(environment_cleaner) $(rust_unlock_unstable) $(rustflags_override) $(sccache_wrap) \
> ++env $(environment_cleaner) $(rust_unlock_unstable) $(rustflags_override) $(sccache_wrap) \

`+` in recipe rules means that the recipe is executed even during `make -n`. Since cargo isn't aware of `make -n`, this change breaks `make -n`.

However, since we rarely use `make -n`, I think it's acceptable to conditionally define `RUN_CARGO` depending on `make -n` and not pass fds to cargo in `make -n` mode. When cargo supports nice things, we can remove the workaround.
Attachment #8886118 - Flags: review?(gps) → review-
Comment on attachment 8886119 [details]
Bug 1367940 - Ensure sccache inherits the make jobserver file descriptors.

https://reviewboard.mozilla.org/r/156924/#review162274

The `+` will make this execute unconditionally. For an automation make file, I'm willing to turn a blind eye. However, I do insist that the --stop-server call also be made unconditionally. I'm not sure if you can combine both `-` and `+` on the same rule. Happy exploring.
Attachment #8886119 - Flags: review?(gps) → review-
Comment on attachment 8886118 [details]
Bug 1367940 - Pass the make jobserver file descriptors down to cargo.

https://reviewboard.mozilla.org/r/156922/#review162336
Attachment #8886118 - Flags: review?(gps) → review+
Comment on attachment 8886119 [details]
Bug 1367940 - Ensure sccache inherits the make jobserver file descriptors.

https://reviewboard.mozilla.org/r/156924/#review162338
Attachment #8886119 - Flags: review?(gps) → review+
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s 44259a19518d -d 8933624d74c8: rebasing 407114:44259a19518d "Bug 1367940 - Pass the make jobserver file descriptors down to cargo. r=gps"
merging config/rules.mk
warning: conflicts while merging config/rules.mk! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/5b7e684c4a73
Pass the make jobserver file descriptors down to cargo. r=gps
https://hg.mozilla.org/integration/autoland/rev/d56a606712f1
Ensure sccache inherits the make jobserver file descriptors. r=gps
Blocks: 1380964
https://hg.mozilla.org/mozilla-central/rev/5b7e684c4a73
https://hg.mozilla.org/mozilla-central/rev/d56a606712f1
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Duplicate of this bug: 1376010
Product: Core → Firefox Build System
Assignee: nobody → mh+mozilla
You need to log in before you can comment on or make changes to this bug.