Enable sccache for linux32/64 builds

RESOLVED FIXED in mozilla53

Status

defect
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: garndt, Assigned: ted)

Tracking

unspecified
mozilla53
Dependency tree / graph

Details

Attachments

(1 attachment)

sccache should be used for taskcluster builds.  This bug is for the linux32/64 opt/debug builds first.
Depends on: 1269357
After some discussion on IRC and in bug 1187257, (note bug 1187257 comment 11), I think we should be able to enable this using sccache2 by making the following changes:
* Enable taskclusterProxy for the build tasks
* Add the proper scopes to the build tasks: auth:aws-s3:read-write:<bucket>/<prefix>. We'll probably need to add scopes for the buckets for all regions that are enabled for the worker types in use. (That's sort of unfortunate, it would be nice to figure out something less fragile here.)
* Get the temp AWS creds from the taskcluster proxy, probably in build-linux.sh we could just do `curl http://taskcluster/auth/v1/aws/s3/read-write/<bucket>/<prefix> > /some/path; export AWS_CREDENTIAL_FILE=/some/path`
* Stop setting CCCACHE_DIR. The in-tree sccache mozconfig won't use sccache if CCACHE_DIR is set (for builds that don't have a bulidprops.json):
https://dxr.mozilla.org/mozilla-central/rev/783356f1476eafd8e4d6fa5f3919cf6167e84f8d/build/mozconfig.cache#45

We apparently do this in multiple places. Both in mozharness:
https://dxr.mozilla.org/mozilla-central/rev/783356f1476eafd8e4d6fa5f3919cf6167e84f8d/testing/mozharness/configs/builds/releng_base_linux_64_builds.py#87
and in a script sourced by build-linux.sh:
https://dxr.mozilla.org/mozilla-central/rev/783356f1476eafd8e4d6fa5f3919cf6167e84f8d/taskcluster/scripts/builder/setup-ccache.sh#6

dustin says we probably want to add a property to the Task description here like 'needs-sccache':
https://dxr.mozilla.org/mozilla-central/rev/783356f1476eafd8e4d6fa5f3919cf6167e84f8d/taskcluster/taskgraph/transforms/task.py#142
persist it through to the job description here:
https://dxr.mozilla.org/mozilla-central/rev/783356f1476eafd8e4d6fa5f3919cf6167e84f8d/taskcluster/taskgraph/transforms/job/__init__.py#57

then in build_docker_worker_payload we'd add the specific bits: taskclusterProxy and the proper scopes:
https://dxr.mozilla.org/mozilla-central/rev/783356f1476eafd8e4d6fa5f3919cf6167e84f8d/taskcluster/taskgraph/transforms/task.py#344
Depends on: 1286934
Assignee: garndt → nobody
Looks good.

If the bucket names end with the region, the scopes could be auth:aws-s3:read-write:something-something-something-sccache-*.
Turns out roles: "assume:project:taskcluster:level-1-sccache-buckets" already existed.

I have now:
 - Granted moz-tree:level-<level> the scope assume:project:taskcluster:level-<level>-sccache-buckets
 - Add an expiration policy to buckets.
 - Added eu-central bucket
 - Added buckets for level 2
 - Set 15 day expiration policy on all buckets
 - Tagged buckets with Application=sccache for cost analysis
 - Consolidate IAM policies (same policy attached to tc-auth as used for generic-worker roles
 - Filed bug 1316410 to further reduce IAM policy

@ted, in-tree we should just give the build tasks the scope:
  assume:project:taskcluster:level-{{level}}-sccache-buckets

That'll expand to all the scopes, and we won't have to update anything in-tree if we add new
regions to the workerType definition as long as we also add new buckets externally.
Note: We document buckets that tc-auth has authority over here:
      https://mana.mozilla.org/wiki/display/TAS/General+Purpose+S3+Buckets
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #3)
> @ted, in-tree we should just give the build tasks the scope:
>   assume:project:taskcluster:level-{{level}}-sccache-buckets
> 
> That'll expand to all the scopes, and we won't have to update anything
> in-tree if we add new
> regions to the workerType definition as long as we also add new buckets
> externally.

That's great, thanks! I'll take a crack at wiring this up tomorrow.
Assignee: nobody → ted
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #8)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=64e53cece963

This worked. I respun all the Linux builds and verified that they had a high cache hit rate. They were all noticeably faster, too. This patch needs a little more work though, right now it's spewing the AWS credentials all over the logs as environment variables...
jonas and I chatted on IRC and he filed bug 1316698 about making the taskcluster proxy able to return a response that's compatible with the IAM auth endpoint, which would make using it easier (just point existing IAM-auth handling code at a different URL).
Depends on: 1316698
Depends on: 1318411
I don't have a working try build of this revision of the patch (once the fix in bug 1318411 is available I can test) but I figure this patch probably needs another revision anyway, since I don't know that I got all the task definition bits correct.
Comment on attachment 8813113 [details]
bug 1269355 - Enable sccache for taskcluster linux32/64 builds.

https://reviewboard.mozilla.org/r/94644/#review94934

I'm perfectly happy with this approach -- the comments here are minor.  It sounds like you'll need another round of review anyway, so I cleared the flag.

::: taskcluster/scripts/builder/build-linux.sh:56
(Diff revision 1)
>  export LIBRARY_PATH=$LIBRARY_PATH:$WORKSPACE/src/obj-firefox:$WORKSPACE/src/gcc/lib64
>  
> +if [[ -n ${USE_SCCACHE} ]]; then
> +    # Point sccache at the Taskcluster proxy for AWS credentials.
> +    export AWS_IAM_CREDENTIALS_URL="http://taskcluster/auth/v1/aws/s3/read-write/taskcluster-level-${MOZ_SCM_LEVEL}-sccache-${TASKCLUSTER_WORKER_GROUP%?}/?format=iam-role-compat"
> +    #TODO: we could set the bucket here as well and avoid doing it in mozconfig.cache

That seems like a good idea :)

::: taskcluster/taskgraph/transforms/task.py:365
(Diff revision 1)
> -    if worker.get('taskcluster-proxy'):
> +    if worker.get('taskcluster-proxy') or task.get('needs-sccache'):
>          features['taskclusterProxy'] = True

The pattern has been to set `features['taskclusterProxy'] = True` wherever it's needed, rather than adding everythign to the conditional here.  So you can just (re)-set it to True in the `if task.get('needs-sccache')` conditional.

::: taskcluster/taskgraph/transforms/task.py:375
(Diff revision 1)
>          task_def['scopes'].append('docker-worker:feature:allowPtrace')
>  
>      if worker.get('chain-of-trust'):
>          features['chainOfTrust'] = True
>  
> +    if task.get('needs-sccache'):

Is sccache supported for other worker implementations?  It'd be good to handle the case for those implementations -- either error if needs-sccache is set, or implement it.  Then nobody's surprised when they set the flag for OS X and nothing happens.
Attachment #8813113 - Flags: review?(dustin)
Comment on attachment 8813113 [details]
bug 1269355 - Enable sccache for taskcluster linux32/64 builds.

https://reviewboard.mozilla.org/r/94644/#review94956

::: taskcluster/scripts/builder/build-linux.sh:56
(Diff revision 1)
>  export LIBRARY_PATH=$LIBRARY_PATH:$WORKSPACE/src/obj-firefox:$WORKSPACE/src/gcc/lib64
>  
> +if [[ -n ${USE_SCCACHE} ]]; then
> +    # Point sccache at the Taskcluster proxy for AWS credentials.
> +    export AWS_IAM_CREDENTIALS_URL="http://taskcluster/auth/v1/aws/s3/read-write/taskcluster-level-${MOZ_SCM_LEVEL}-sccache-${TASKCLUSTER_WORKER_GROUP%?}/?format=iam-role-compat"
> +    #TODO: we could set the bucket here as well and avoid doing it in mozconfig.cache

I'll file a followup bug on this. It'd be nice to make mozconfig.cache less of a mess.

::: taskcluster/taskgraph/transforms/build.py:22
(Diff revision 1)
>  def set_defaults(config, jobs):
>      """Set defaults, including those that differ per worker implementation"""
>      for job in jobs:
>          job['treeherder'].setdefault('kind', 'build')
>          job['treeherder'].setdefault('tier', 1)
> +        job.setdefault('needs-sccache', True)

This is one bit I actually wasn't sure of--should we set this to `True` by default and then make individual tasks where we don't want it enabled set it to `False`? If we don't default it to `True` then we're going to inevitably add build tasks where we forget to enable it and make ourselves unhappy. If I did want to set it to `False` for some tasks, do I do that in the task definition .yml?

::: taskcluster/taskgraph/transforms/task.py:375
(Diff revision 1)
>          task_def['scopes'].append('docker-worker:feature:allowPtrace')
>  
>      if worker.get('chain-of-trust'):
>          features['chainOfTrust'] = True
>  
> +    if task.get('needs-sccache'):

Yes, it works on the windows generic worker (bug 1316329 is fixing an issue we uncovered during testing it there).

This seems like an unfortunate design pattern though, don't you think? Maybe it would be nicer if we could somehow enforce that the task transforms had handled all of the task attributes or something like that? I just don't see how we prevent that sort of bug from happning in the general case.
Comment on attachment 8813113 [details]
bug 1269355 - Enable sccache for taskcluster linux32/64 builds.

https://reviewboard.mozilla.org/r/94644/#review94956

> This is one bit I actually wasn't sure of--should we set this to `True` by default and then make individual tasks where we don't want it enabled set it to `False`? If we don't default it to `True` then we're going to inevitably add build tasks where we forget to enable it and make ourselves unhappy. If I did want to set it to `False` for some tasks, do I do that in the task definition .yml?

I think this level of defaulting (for all build tasks) is good.  And yes, you could set `needs-sccache: false` in the yaml file to prevent this default from applying.  But ultimately, this is just enabling sccache *access* - it doesn't control whether the job tries to use it, right?

> Yes, it works on the windows generic worker (bug 1316329 is fixing an issue we uncovered during testing it there).
> 
> This seems like an unfortunate design pattern though, don't you think? Maybe it would be nicer if we could somehow enforce that the task transforms had handled all of the task attributes or something like that? I just don't see how we prevent that sort of bug from happning in the general case.

I think it's just something we can be careful about in review: if you add handling for a flag to one implementation, add it to all of them (even if that just means raising an exception when it's used where it's not supported).  In this case, I only see a change to `build_docker_worker_payload` - should there be a similar change for `build_generic_worker_payload`?
Comment on attachment 8813113 [details]
bug 1269355 - Enable sccache for taskcluster linux32/64 builds.

https://reviewboard.mozilla.org/r/94644/#review94956

> I'll file a followup bug on this. It'd be nice to make mozconfig.cache less of a mess.

I filed bug 1319518, but I removed this TODO comment because I'll probably wind up putting the fix in mozharness instead.

> I think this level of defaulting (for all build tasks) is good.  And yes, you could set `needs-sccache: false` in the yaml file to prevent this default from applying.  But ultimately, this is just enabling sccache *access* - it doesn't control whether the job tries to use it, right?

That's correct. OK, I'll leave that as-is for now.

> I think it's just something we can be careful about in review: if you add handling for a flag to one implementation, add it to all of them (even if that just means raising an exception when it's used where it's not supported).  In this case, I only see a change to `build_docker_worker_payload` - should there be a similar change for `build_generic_worker_payload`?

We didn't actually need any changes for `build_generic_worker_payload`, it worked out of the box. The changes in the other bug are solely to setup a symlink to get consistent srcdir paths.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #19)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=43499c982578

Everything seems to be working smoothly in this try push.
Comment on attachment 8813113 [details]
bug 1269355 - Enable sccache for taskcluster linux32/64 builds.

https://reviewboard.mozilla.org/r/94644/#review95098

This looks good to me. I looked at Dustin's original review and you seem to have hit all the important points. Worst case, we just land fixups afterwards :)
Attachment #8813113 - Flags: review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ed6e2e4ca501
Add sccache to clang.manifest.centos6 tooltool manifest; r=me
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/26096b7866bb
Insert blank line to appease the flake8 linting overlord; r=me
https://hg.mozilla.org/mozilla-central/rev/c1b55d39dd28
https://hg.mozilla.org/mozilla-central/rev/ed6e2e4ca501
https://hg.mozilla.org/mozilla-central/rev/26096b7866bb
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Attachment #8813113 - Flags: review?(dustin)
You need to log in before you can comment on or make changes to this bug.