parallel builds in cc-rs and webrender shader compilation end up starved for jobserver tokens
Categories
(Firefox Build System :: General, defect, P3)
Tracking
(Not tracked)
People
(Reporter: froydnj, Unassigned)
Details
The cc-rs
crate supports a parallel
feature for executing compilations of multiple files in parallel, using a jobserver. The build-parallel
crate, used by our webrender shader compilation, does much the same thing (basically copy-and-pasting cc-rs
code), but generalizing it for more-arbitrary computations.
But both of these crates do something locally good and globally bad: they drop the jobserver token they're using before they start to compile:
https://github.com/alexcrichton/cc-rs/blob/77c94412c68bddd29d3ea86aa00c0aaecaaa4904/src/lib.rs#L991-L997
https://github.com/jrmuizel/build-parallel/blob/fc61f40243364689b093e7eea785f3f903e33115/src/lib.rs#L37-L43
This means that other things get the chance to step in and grab the jobserver token before the build.rs
scripts using cc-rs
and build-parallel
get to continue. And for whatever reason, in practice, this means that there's a really long wait before those scripts reacquire the necessary tokens; I can watch the glslopt-rs
build script reliably sit around for minutes during a build waiting on a jobserver token. Which means that the long pole of the build process -- i.e. compiling Rust code -- gets even longer. (I think -- obviously the glslopt-rs
build script will complete faster once there are jobserver slots available, and it's not like other parts of the build aren't making progress, but I think on the whole the process will take longer we drag out Rust compilation longer than necessary.)
This issue is likely less of a problem for webrender, because by the time webrender gets around to being compiled (in a full build), there's hardly anything competing for jobserver tokens.
I don't know what the right solution is. I guess we could fix the cc-rs
and build-parallel
crates to not release the first jobserver token, so they could at least make progress when the jobserver is very busy. But it's hard to see how they could be modified to take advantage of many available jobserver slots without blocking unnecessarily in the case that those slots are not easily available.
![]() |
Reporter | |
Comment 1•5 years ago
|
||
Bah, Bugzilla reloaded and lost my CC list of people who might be interested or offer commentary on this bug.
Comment 2•5 years ago
|
||
What does make
do?
Updated•5 years ago
|
Comment 3•5 years ago
|
||
Note it's also a problem in cargo itself. See https://github.com/rust-lang/cargo/issues/7786
Comment 4•5 years ago
|
||
Thanks for the cc! I think this should probably be pretty easy to fix in cc
, I was basically just lazy at the time and didn't feel like writing everything to juggle the implicit token.
Otherwise this is one of the major drawbacks of the make
jobserver, there's just no way to have any sort of priority.
![]() |
Reporter | |
Comment 5•5 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #2)
What does
make
do?
I'm not sure what you want to know here. "How does make handle starting N new jobs and obtaining jobserver tokens for all of them"?
Comment 6•5 years ago
|
||
Does make
hold job server token until a compilation is complete? Does it reuse a token that it has or give it up before starting a new job?
Updated•5 years ago
|
![]() |
Reporter | |
Comment 7•5 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #6)
Does
make
hold job server token until a compilation is complete? Does it reuse a token that it has or give it up before starting a new job?
If I understand the code at:
http://git.savannah.gnu.org/cgit/make.git/tree/src/job.c#n251
http://git.savannah.gnu.org/cgit/make.git/tree/src/job.c#n1114
http://git.savannah.gnu.org/cgit/make.git/tree/src/job.c#n1825
(no line ranges on savannah, sadface)
correctly, I believe make
will only reuse the implicit token that it starts out with; otherwise, it will relinquish the tokens that it obtains.
I don't know exactly how the kernel operates, but I could see:
http://git.savannah.gnu.org/cgit/make.git/tree/src/posixos.c#n277
constantly bumping make
to the front of the queue waiting on the jobserver acquisition fd, as it constantly resubmits a request to know when the fd is available; compare with jobserver-rs
, which just waits forever, both on the poll
and the subsequent read
:
https://github.com/alexcrichton/jobserver-rs/blob/master/src/unix.rs#L115-L169
I don't know if it'd be possible to make jobserver-rs
aware of which flavor of make
we're dealing with and use a different strategy depending on the blocking-ness of the jobserver fds, or if that would make much of a difference, or if just using ppoll
would help any.
![]() |
Reporter | |
Comment 8•5 years ago
|
||
I think this is how we fix the problem: jobserver-rs
needs to grow another API, Client::acquire_with_timeout(timeout: Duration) -> io::Result<Status>
where Status
is enum Status { TimedOut, Success(Acquired) }
. (I think we can roll being interrupted into TimedOut
, because I don't think clients would actually handle those cases any differently.)
Once we have that, the core of cc-rs
becomes something like:
N = number of jobs we have to do
stuff jobs into some sort of work queue to share between us and threads
done = false (share between us and threads)
spawn N-1 threads whose work will be described later
# we have an implicit jobserver token here
loop {
get work
if there's no work {
break;
}
execute work
}
# signal that threads should stop
done = true;
join all threads
Each thread does:
loop {
if done {
break;
}
acquire jobserver token with short (1sec) timeout
if timed out {
continue;
}
# now we have a jobserver token
get work
if there's no work {
break;
}
execute work
# drop jobserver token here
}
The timeout is just so we don't have to handle threads being forever blocked while trying to get a jobserver token. I'm kind of concerned that if cc-rs
is handling a lot of files that we have a lot of threads constantly waking up and trying to do this, which steals some CPU time from other things trying to get useful work done. Another way to do it would have the core loop look like:
loop {
check for work
if no work {
break;
}
if one task left {
execute work
break;
}
try to acquire jobserver token (with timeout)
if timed out {
execute one task
continue;
}
# do two tasks
spawn thread and send above token + work to it
execute one task
}
wait on outstanding threads
and then individual threads would just do the work they were given, drop the jobserver token, and exit.
The worst-case performance here is pretty bad if we wind up timing out for every jobserver token acquisition attempt and the amount of work in tasks is relatively short.
Continuing to think out loud, maybe the best approach is a hybrid model: we have a central workqueue and the "main" script grabs jobs from that. But we also spawn a "jobserver acquisition" thread who tries to acquire a jobserver token continually, tries to grab work from the central workqueue, and then sends the token + any work to a new thread to do the actual work. Then we're (ideally) not tanking performance and we don't have a thundering herd of threads waking up every second while waiting for a jobserver token. I guess the "jobserver acquisition" thread is just the jobserver::HelperThread
abstraction, so we wouldn't even need a new API for jobserver-rs
! (One wrinkle is that the jobserver-rs
docs warn about HelperThread
because its cleanup mechanism is pretty gross on Unix, and I don't know that we want to introduce that grossness into arbitrary build scripts via cc-rs
. Maybe we could add timeouts under the hood and make it less gross somehow?)
Ideally we can convince the borrow checker that all of this is reasonable. :)
(The alternative to timeouts is to have another "jobserver-like" pipe fd that child threads can wait on and then we write to that to wake everybody up (something similar can be done on Windows, I think; I don't know if it's easily accomplished in wasm). But that would introduce a bunch of goo into the jobserver crate itself and doesn't feel very general.)
Alex, do you have thoughts on the proposed hybrid model?
Comment 9•5 years ago
|
||
Yeah I'm down for whatever works in the cc
crate. What's implemented there currently was just the easiest to land and had the smallest impact on code-size and was something I was relatively sure was correct.
I do think that we want to avoid thundering herd issues and spawning threads before a jobserver token is acquired. In cases where the jobserver is used to limit memory use it's important to not take extra memory before the token is acquired.
Otherwise though one wrinkle is that in theory it's not possible to implement a timeout right now. Windows works just fine and on Unix if the jobserver fds are nonblocking it's also fine, but the problem is that if the jobserver fds are blocking then there's no way to time out a read. I'm not sure how often make
is used with blocking fds still (IIRC it's only older versions doing that).
FWIW the compiler correctly manages the jobserver token, ensuring that it doesn't spawn work until it acquires an extra token and additionally makes use of the implicit token without trying to release it. It's unfortunately some pretty gnarly logic. I'm fine adding that logic to cc
if it seems worth it though, but I unfortunately don't have a ton of time to invest in it right now :(
Description
•