Closed Bug 1286934 Opened 4 years ago Closed 4 years ago

Switch automation build jobs to use sccache2


(Firefox Build System :: General, defect)

Not set


(firefox53 fixed)

Tracking Status
firefox53 --- fixed


(Reporter: ted, Assigned: ted)




(1 file)

I've got sccache2 far enough along that I'm going to start testing it on try. I plan to ensure that everything works OK as well as run comparative builds from the same base changeset to compare build times and cache hit rates.

If that all looks good we'll be able to land my patch to switch automation to use sccache2!
Before deploying, you should add support for SCCACHE_RECACHE. That's the most immediate thing that I found missing, I still need to go through most of the code (I've read it a little already).
I should mention why: I've had to use it a couple times in the past because of bugs in sccache causing bad things in the cache and breaking builds permanently on try. The last occurrence was related to MACOSX_DEPLOYMENT_TARGET.
Hi ted, what is the latest for this work?
I was stuck on those burning OS X jobs and I finally realized the problem--the mac builders are the only non-EC2 builders, so they don't have IAM credentials, and the code I'm using to locate AWS credentials (borrowed from rusoto) defaults to looking in `~/.aws/credentials`, but the credentials that is using are located in `~/.boto`. I've got a fix locally, I'll push it to try shortly and that should fix those builds, which should mean everything is green.
So after a lot of fiddling (obviously) here's a try job with the original sccache using SCCACHE_RECACHE=1 to force compiling everything instead of reading from the cache:

and here's a try run with sccache2 also using SCCACHE_RECACHE=1:

They're built atop the same base revision.
Blocks: 1284492
Blocks: 1269355
Here's a comparison of two try pushes, similar to comment 33, except I removed the SCCACHE_RECACHE, so this is comparing builds that should be mostly cache hits:

I also rebased on top of gps' patches to split the build metrics out between buildbot and taskcluster, as well as by instance type, which helps.
FYI I summarized the build time comparisons I did on try (comment 39 and comment 40):

I was mostly just trying to ensure that I didn't regress anything, but it looks like sccache2 will actually give us noticeable build time improvements on many platforms.
Blocks: 1318370
This still depends on landing bug 1295937, but that's close enough to landing that I thought I'd get this patch up for review. Most of the useful info should be in the commit message, but here's a few other things:

I'm currently building the binaries using this script on my local Windows/Linux/Mac machines: . I'd like to move to something better but I haven't quite figured that out yet. Taskcluster doesn't have support for mac workers yet, so to build binaries in TC I'd have to cross-compile them from Linux (which is feasible).

I'm using another script to upload the resulting binaries to tooltool:

...and a third script to take those resulting tooltool manifests and merge them into the in-tree manifests:

This whole process is kinda crappy. I would love if we could fix bug 1313111 and just have this all work in the taskgraph.
Depends on: 1295937
Comment on attachment 8811801 [details]
bug 1286934 - Switch to using sccache2.

This looks pretty straightforward!

We should get the scripts for building sccache in the tree. Even if they aren't hooked up to the task graph, it is better than them sitting in some random repo elsewhere. I guess you can toss them in `build/build-sccache` or some such and we can refactor things later.
Attachment #8811801 - Flags: review?(gps) → review+
I'm fine with putting them wherever, but the *build* is pretty simple. It's the tooltool bits that are the PITA. :-/
Comment on attachment 8811801 [details]
bug 1286934 - Switch to using sccache2.

::: browser/config/tooltool-manifests/linux32/releng.manifest:36
(Diff revision 1)
>  },
>  {
> -"size": 167175,
> -"digest": "0b71a936edf5bd70cf274aaa5d7abc8f77fe8e7b5593a208f805cc9436fac646b9c4f0b43c2b10de63ff3da671497d35536077ecbc72dba7f8159a38b580f831",
>  "algorithm": "sha512",
> -"filename": "sccache.tar.bz2",
> +"visibility": "public",

in-tree manifests don't need visibility

::: build/
(Diff revision 1)
>  endif
>  preflight_all:
>  	# Terminate any sccache server that might still be around
> -	-python2.7 $(TOPSRCDIR)/sccache/ > /dev/null 2>&1
> +	-$(TOPSRCDIR)/sccache2/sccache --stop-server > /dev/null 2>&1

Should need .exe on Windows, right?

::: build/
(Diff revision 1)
>  endif
>  preflight_all:
>  	# Terminate any sccache server that might still be around
> -	-python2.7 $(TOPSRCDIR)/sccache/ > /dev/null 2>&1
> +	-$(TOPSRCDIR)/sccache2/sccache --stop-server > /dev/null 2>&1

Is the redirection to /dev/null still necessary?
Comment on attachment 8811801 [details]
bug 1286934 - Switch to using sccache2.

> in-tree manifests don't need visibility

Turns out if you don't specify --visibility=public will refuse to upload things, which is why this is here. I'm not motivated enough to try to fix that just for the sake of making the manifests look nicer.

> Should need .exe on Windows, right?

This works because the msys shell will find it with or without the exe. (The change in mozconfig.cache is because we were passing the path to `test -e`.

> Is the redirection to /dev/null still necessary?

Probably not, but I stuck an "echo stats to the log" bit earlier in the build, so it'd be redundant. I haven't yet made sccache2 save its stats to disk, and with the 5 minute inactivity timeout the server would shut itself down in the lull between the compilation phase finishing and the time the build hits `postflight_all`, so the stats were getting lost.

Longer-term I might want to integrate special sccache handling into mach or something.
FYI, I moved the sccache2 repo to mozilla/sccache:
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.