Pervasive sccache cache write errors

RESOLVED FIXED in Firefox 55



Build Config
7 months ago
7 months ago


(Reporter: chmanchester, Assigned: ted)



Firefox Tracking Flags

(firefox55 fixed)


MozReview Requests


Submitter Diff Changes Open Issues Last Updated
Error loading review requests:


(1 attachment)



7 months ago
I noticed a pretty sizeable build time regression on autoland, inbound, and try starting around Tuesday afternoon:,4044b74c437dfc672f4615a746ea01f6e4c0312d,1,2%5D&series=%5Bautoland,077c454bbb47966e9661e9b00ba7100f14bbd6c9,1,2%5D&series=%5Bmozilla-inbound,4044b74c437dfc672f4615a746ea01f6e4c0312d,1,2%5D&series=%5Bautoland,4044b74c437dfc672f4615a746ea01f6e4c0312d,1,2%5D&series=%5Bmozilla-central,077c454bbb47966e9661e9b00ba7100f14bbd6c9,1,2%5D&series=%5Bmozilla-inbound,077c454bbb47966e9661e9b00ba7100f14bbd6c9,1,2%5D

It starts with seemingly innocuous changesets, and pushing a revision before the regression range to try doesn't improve the build time. Looking at the sccache stats, we stopped getting cache hits and started seeing pervasive cache write errors around this time.

Comment 1

7 months ago
Nothing around sccache in tree has changed for quite a while, so I don't think this is a build config issue. I think it's got to be a network issue or something like that. Maybe our network flows to s3 got screwed up, or some taskcluster-worker change landed that changed things?

If I add Windows buildbot+taskcluster builds to that graph I don't see the same regression with them:,4044b74c437dfc672f4615a746ea01f6e4c0312d,1,2%5D&series=%5Bautoland,077c454bbb47966e9661e9b00ba7100f14bbd6c9,1,2%5D&series=%5Bmozilla-central,4044b74c437dfc672f4615a746ea01f6e4c0312d,1,2%5D&series=%5Bautoland,4044b74c437dfc672f4615a746ea01f6e4c0312d,1,2%5D&series=%5Bmozilla-central,077c454bbb47966e9661e9b00ba7100f14bbd6c9,1,2%5D&series=%5Bmozilla-inbound,077c454bbb47966e9661e9b00ba7100f14bbd6c9,1,2%5D&series=%5Bmozilla-inbound,be7331d4f74b6970f67fe9da80f4d4d90ef60b73,1,2%5D&series=%5Bmozilla-inbound,be465112346255e89dd26003061b01f27cc7fd39,1,2%5D

Comment 2

7 months ago
There's a giant pile of perfherder alerts from this:

That may not be the full set. We should get these alerts to send email to dev-builds so we ensure they get noticed.

Comment 3

7 months ago
I looked at a log from a try push I did and the answer was actually staring me right in the face:
[task 2017-04-06T10:27:47.426667Z] 
[task 2017-04-06T10:27:47.426679Z] if [[ -n ${USE_SCCACHE} ]]; then
[task 2017-04-06T10:27:47.426698Z]     # Point sccache at the Taskcluster proxy for AWS credentials.
[task 2017-04-06T10:27:47.426739Z]     export AWS_IAM_CREDENTIALS_URL="http://taskcluster/auth/v1/aws/s3/read-write/taskcluster-level-${MOZ_SCM_LEVEL}-sccache-${TASKCLUSTER_WORKER_GROUP%?}/?format=iam-role-compat"
[task 2017-04-06T10:27:47.426755Z] fi
[task 2017-04-06T10:27:47.426764Z] + [[ -n 1 ]]
[task 2017-04-06T10:27:47.426810Z] + export 'AWS_IAM_CREDENTIALS_URL=http://taskcluster/auth/v1/aws/s3/read-write/taskcluster-level-1-sccache-us-east-/?format=iam-role-compat'
[task 2017-04-06T10:27:47.426867Z] + AWS_IAM_CREDENTIALS_URL='http://taskcluster/auth/v1/aws/s3/read-write/taskcluster-level-1-sccache-us-east-/?format=iam-role-compat'

The bucket name ends with `us-east-` which isn't right. The `${TASKCLUSTER_WORKER_GROUP%?}` bit is removing the last character from that environment variable, since it used to have a trailing letter, but that is no longer the case per the top of that log file:
[taskcluster 2017-04-06 10:24:01.021Z] Worker Group: us-east-1

This was broken by this merge:

Specifically this commit, which removed the availability zone letter from the workerGroup, which is what sets the TASKCLUSTER_WORKER_GROUP environment variable:

On the plus side, this is easy to fix!
Assignee: nobody → ted

Comment 4

7 months ago
garndt said he's going to back that change out and deploy new images.
Assignee: ted → garndt
Component: Build Config → Docker-Worker
Product: Core → Taskcluster


7 months ago
Duplicate of this bug: 1354061

Comment 6

7 months ago
...nevermind, we'll fix this in the build system.
Assignee: garndt → ted
Component: Docker-Worker → Build Config
Product: Taskcluster → Core
Comment hidden (mozreview-request)

Comment 8

7 months ago
Comment on attachment 8855355 [details]
bug 1350093 - fix sccache configuration to handle changes in the format of TASKCLUSTER_WORKER_GROUP.

I think this is fine, though I don't think it'd be that burdensome to rewrite the logic and dispense with `availability_zone`, since it's only used in this script, and only to figure out the region, which we already have, right?

::: build/mozconfig.cache:54
(Diff revision 1)
> +            # here simpler.x
> +            availability_zone="${TASKCLUSTER_WORKER_GROUP}x"

You have an extra character in your comment as well. :)

Maybe the comment should read something like:

"TASKCLUSTER_WORKER_GROUP used to be the region plus the availability zone, but it has since been changed to be only the region.  In order to avoid changing all the logic below that depends on the formatting of availabilty_zone, we simply tack on a character to TASKCLUSTER_WORKER_GROUP to make it mimic the previous semantics."

Or am I overthinking this because all this is new to me?  Not sure!

Since `TASKCLUSTER_WORKER_GROUP` has these semantics now, would it be reasonable to match it against the known regions we use--us-{east,west}-{1,2}?--so we fail faster if we change the syntax of this variable next time?
Attachment #8855355 - Flags: review?(nfroyd) → review+

Comment 9

7 months ago
I would like to get rid of most of this file at some point, it has always been overly-complicated. For this patch I just wanted to make the smallest changes I could to get things working again.

Ideally we'd find a better place to set these values, but I haven't worked that out yet.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #9)
> I would like to get rid of most of this file at some point, it has always
> been overly-complicated. For this patch I just wanted to make the smallest
> changes I could to get things working again.

This works for me.

Comment 11

7 months ago
bug 1350093 - fix sccache configuration to handle changes in the format of TASKCLUSTER_WORKER_GROUP. r=froydnj

Comment 12

7 months ago
Last Resolved: 7 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.