Closed Bug 1357825 Opened 3 years ago Closed 2 years ago

Use sccache for caching Rust compilation

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set

Tracking

(firefox56 fixed)

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: ted, Assigned: ted)

References

Details

Attachments

(2 files)

My cargo patch to add support for `RUSTC_WRAPPER` landed:
https://github.com/rust-lang/cargo/pull/3887

So you can now use sccache with cargo by just setting `RUSTC_WRAPPER=sccache`. I'd like to update the cargo we're using in automation to a nightly build and try this out. If all goes well it should help our build times in automation.
Depends on: 1359499
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #1)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=f960158956f354eb0e41cca412de055f05af2736

This had some consistently red builds that I don't quite understand. The Linux builds were all green except for the Stylo builds, which failed in configure, but I can't figure out what the difference is. In any event, I'm landing an updated sccache in bug 1357622, so I'm going to do another try push atop that.
The Windows builds here are failing during Rust compilation, with this error:
17:58:32     INFO -  sccache: encountered fatal error
17:58:32     INFO -  sccache: error : Failed to open file for hashing: "z:\\build\\build\\src\\obj-firefox\\toolkit\\library\\gtest\\rust\\Z"

I'm pretty confident that it's failing to parse the dep-info files properly:
https://github.com/mozilla/sccache/blob/b05a88bf4fa2a9b187f8b2175e18da45b44a51e5/src/compiler/rust.rs#L210

That `split(":")` is going to fail badly on lines with absolute Windows file paths...

I'm still getting failures in configure on the stylo and Android builds, and it's spewing out logs:
[task 2017-05-10T18:11:21.733030Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::commands: Command::Compile { "/home/worker/workspace/build/src/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-gcc", ["-E", "/tmp/conftest.0PruGL.c"], "/home/worker/workspace/build/src/obj-firefox" }
[task 2017-05-10T18:11:21.733556Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::commands: connect_or_start_server(4226)
[task 2017-05-10T18:11:21.733590Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::client: connect_to_server(4226)
[task 2017-05-10T18:11:21.733630Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::commands: run_server_process
[task 2017-05-10T18:11:21.733744Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::cmdline: parse
[task 2017-05-10T18:11:21.733986Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::commands: Command::InternalStartServer
[task 2017-05-10T18:11:21.734174Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::client: connect_with_retry(4226)
[task 2017-05-10T18:11:21.734367Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::client: connect_to_server(4226)
[task 2017-05-10T18:11:21.734488Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::client: connect_to_server(4226)
[task 2017-05-10T18:11:21.734636Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::client: connect_to_server(4226)
[task 2017-05-10T18:11:21.734837Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::client: connect_to_server(4226)
[task 2017-05-10T18:11:21.735056Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::client: connect_to_server(4226)
[task 2017-05-10T18:11:21.735273Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::client: connect_to_server(4226)
[task 2017-05-10T18:11:21.735438Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::client: connect_to_server(4226)
[task 2017-05-10T18:11:21.735696Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::client: connect_to_server(4226)
[task 2017-05-10T18:11:21.735898Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::client: connect_to_server(4226)
[task 2017-05-10T18:11:21.736527Z] 18:11:21     INFO -  DEBUG: | TRACE:sccache::client: connect_to_server(4226)
[task 2017-05-10T18:11:21.736567Z] 18:11:21     INFO -  DEBUG: | error: Connection to server timed out

I don't understand what's going on here yet, and why it's not impacting the other Linux builds.
I did a try push without the patch that adds logging:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=140554f4a8541a1422bded8911c7309066fffe62

...and the stylo builds are green there, so apparently the logging change breaks that somehow?
I keep getting sidetracked by other things, but I tracked down an sccache regression in handling -dep for MSVC and fixed it:
https://github.com/mozilla/sccache/pull/140

I did a try push with just the updated sccache and everything is green:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d9568915bdc02eb1c6a21aed52682dfd878a8eab

The try push from comment 7 is my patch to use `RUSTC_WRAPPER=sccache` atop that sccache update.
I had forgotten that we started building webrender by default, so we really needed bug 1371382 to make this show a useful effect.
Depends on: 1371382
Rebased atop bug 1371382 and pushed to try again. I think this should finally be good...
Try push of the central revision that previous try push was built on for comparison:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b876695c4751bed46f18aed7297cd6af93102d4f
...I think I mucked up my numbers here by pushing a too-new changeset to try and having a lot of cache misses. I'm going to try that again.
OK, I think part of my confusion was the fact that I'm updating sccache to a new build *and* enabling caching for Rust compilation, and as it turns out I think the new sccache build is a small regression on some platforms, so that was muddling things. I did a separate try push with just that patch so I could compare sensibly:

1) base central changeset:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b876695c4751bed46f18aed7297cd6af93102d4f

2) with updated sccache:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=48a41d0d35beab8649f7a8f52b1bb41f5a1ecec5

3) with sccache rust caching:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d38783e43fc74a70b595f32b43a50cfcc182178e


Comparing current mozilla-central against the updated sccache (#1 vs. #2), I'm seeing a 10-20s on most platforms:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=b876695c4751bed46f18aed7297cd6af93102d4f&newProject=try&newRevision=48a41d0d35beab8649f7a8f52b1bb41f5a1ecec5&framework=2&filter=build%20times&showOnlyImportant=0

Looking at the subtests, I think most of this is in configure. We fixed sccache to allow caching compile commandlines without an explicit object file name (https://github.com/mozilla/sccache/commit/04136395f252b4a30cd82136ef417da880736a51), which made us start trying to cache configure tests, but the configure test filenames have a unique bit in them so we never get cache hits, so it just made us slower. There may also be some other factors at play, but that seems to be the biggest. I may just back that out until I figure out something smarter to work around it.


Comparing the updated sccache against enabling rust caching (#2 vs. #3), it's sort of a wash on stock linux builds. I would expect slightly better since we're building webrender by default, so I will have to dig into that. For linux64-stylo debug it's about a 30 second build time win. For linux64-stylo opt, it's almost a 5 minute(!) build time win (284s):
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=48a41d0d35beab8649f7a8f52b1bb41f5a1ecec5&newProject=try&newRevision=d38783e43fc74a70b595f32b43a50cfcc182178e&framework=2&filter=build%20times&showOnlyImportant=0

Given that, I think I can safely say this is in a good place to land!
Comment on attachment 8877280 [details]
bug 1357825 - use sccache for caching Rust compilation.

https://reviewboard.mozilla.org/r/148624/#review153042
Attachment #8877280 - Flags: review?(nfroyd) → review+
Comment on attachment 8877279 [details]
bug 1357825 - Update sccache to 3544d1241a244d8f67e6d90bc5972077d747079c.

https://reviewboard.mozilla.org/r/148622/#review153050

browser/config/tooltool-manifests/macosx64/cross-clang.manifest and browser/config/tooltool-manifests/linux64/clang.manifest.centos6 still have an older sccache rev - should those be updated too?
Attachment #8877279 - Flags: review?(mshal) → review+
Thanks! My script to update tooltool manifests didn't include those... (Can't wait for the toolchain tasks bug...)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #21)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=ae23d383a1a2057d7027a4a6ffc9cfa4e3e15fcf

I fixed my script to update those extra manifests that mshal pointed out, rebased to latest central, and pushed to try one more time just to make sure nothing broke in the meantime.
Looks green, I'm going to land it.
Blocks: 1373334
https://hg.mozilla.org/mozilla-central/rev/7b26af810b5d
https://hg.mozilla.org/mozilla-central/rev/4e3a5199d4fe
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Depends on: 1373667
a regression in sccache times:
== Change summary for alert #7329 (as of June 15 2017 17:36 UTC) ==

Regressions:

506%  sccache requests_not_cacheable summary linux64-stylo opt taskcluster-c4.4xlarge     15.83 -> 96.00
  0%  sccache requests_not_cacheable summary windows8-64 opt buildbot-c3.4xlarge          0.00 -> 18.33

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7329

the 0% is a manual alert created, look at the downstream alert:
https://treeherder.mozilla.org/perf.html#/alerts?id=7341
That's a completely expected outcome that I should have thought of. :) It's not harmful, we're just running more compiles through sccache (all the rust compiles now), and some of them we can't cache.
Depends on: 1376593
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.