Closed
Bug 1316329
Opened 8 years ago
Closed 8 years ago
Fix sccache on taskcluster generic-worker builds so it can get cache hits
Categories
(Firefox Build System :: General, defect)
Firefox Build System
General
Tracking
(firefox53 fixed)
RESOLVED
FIXED
mozilla53
Tracking | Status | |
---|---|---|
firefox53 | --- | fixed |
People
(Reporter: ted, Assigned: ted)
References
Details
Attachments
(2 files)
In bug 1187257 I did some testing and was able to get sccache2 working on the Taskcluster Windows builds in generic-worker. Unfortunately they have zero cache hits because the working directory (under which the srcdir and objdir are placed) contains the Task ID, and sccache uses the full compiler commandline as an input to the hash to generate the cache key. We have plans to fix that (bug 1280640), but it's a fair bit of work. pmoore suggested that maybe we could make the build tasks create a symlink from a fixed location (like Z:\build) to the current task directory, and then run the build from the symlink directory instead. That seems fairly non-invasive. I'm a little worried that something in the build system might break because of symlinks, but it's worth a try to get a quick build time win from sccache.
Comment 1•8 years ago
|
||
Yes, let's do this with some extra commands in the task payload. We might need to adapt the worker type too to provide a fixed location that is writable by all users (this would require a PR in https://github.com/mozilla-releng/OpenCloudConfig). I'm thinking adding something to the start of the task that: 1) Creates the symlink at the chosen frozen location, to point to %CD% (the task user home directory) 2) cd into that directory 3) execute preexisting build steps... This can be implemented as a change to the decision task, i.e. in-tree changes to the decision task creation process. Probably easiest is to try the changes manually first with some manual task submissions in the Task Creator (https://tools.taskcluster.net/task-creator/), before building them into the decision task. CC'ing grenade for advice about worker type changes.
Assignee | ||
Comment 2•8 years ago
|
||
We'd probably want to add the symlink commands here, in the mozharness_on_windows transform: https://dxr.mozilla.org/mozilla-central/rev/783356f1476eafd8e4d6fa5f3919cf6167e84f8d/taskcluster/taskgraph/transforms/job/mozharness.py#205 We should probably do this the same way as proposed in bug 1269355 comment 1, by honoring the 'use-sccache' property of the task and only adding these commands if that's present.
Assignee | ||
Comment 3•8 years ago
|
||
If someone else makes the worker type changes (or points out an existing location we can use that wouldn't require changes) I will do the work to test the symlink commands.
Updated•8 years ago
|
Summary: Fix sccache on taskcluster generic-worker so it can get cache hits → Fix sccache on taskcluster generic-worker builds so it can get cache hits
Comment 4•8 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #3) > If someone else makes the worker type changes (or points out an existing > location we can use that wouldn't require changes) I will do the work to > test the symlink commands. Rob probably knows best, in terms of what might already be available. The worker type is defined in e.g. https://github.com/mozilla-releng/OpenCloudConfig/blob/master/userdata/Manifest/gecko-1-b-win2012.json but I'm not sure where the code is that mounts the Z: drive and sets ACLs, as this happens during worker runtime, rather than on AMI creation.
Flags: needinfo?(rthijssen)
Comment 5•8 years ago
|
||
previous attempts to have the payload create symlinks resulted in permissions errors due to the task user not having rights to create symlinks. i believe this is now fixed with https://github.com/mozilla-releng/OpenCloudConfig/commit/7cb5df0506923c79df9edf5b3e5b521685306011 where we grant that access. the z: drive is mounted long before the payload starts being processed and any user can write to the ephemeral drives so it should just be a case of creating the links in the payload.
Flags: needinfo?(rthijssen)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → ted
Assignee | ||
Comment 6•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=15293d42c4f5
Assignee | ||
Comment 7•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5d427db09788
Assignee | ||
Comment 8•8 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #7) > https://treeherder.mozilla.org/#/jobs?repo=try&revision=5d427db09788 The first set of builds worked OK, so I retriggered all of the builds and they were able to get cache hits, like in: https://public-artifacts.taskcluster.net/cwCqX111RxWqGXnb72kGIQ/0/public/logs/live_backing.log 22:13:37 INFO - Cache hits 3186 The builds were all significantly faster too, as expected.
Comment hidden (mozreview-request) |
Comment 10•8 years ago
|
||
mozreview-review |
Comment on attachment 8810672 [details] bug 1316329 - build from a symlink dir in generic-worker builds. https://reviewboard.mozilla.org/r/92960/#review93486 ::: taskcluster/taskgraph/transforms/job/mozharness.py:230 (Diff revision 1) > + # cache hash key, so create a symlink to the task dir and build from > + # the symlink dir to get consistent paths. > + if taskdesc.get('needs-sccache'): > + worker['command'].extend([ > + r'if exist z:\build rmdir z:\build', > + r'mklink /d z:\build %cd%', Each task runs as a different user, so I could imagine when we create the folder, we might need to also add an ACL to make it writable by the Everyone group, so that a future task user can delete it. To test this, we'll need a worker that runs two jobs, and since you can't guarantee getting the same worker twice, maybe we should set up a dedicated worker type with maxCapacity of 1 so that we can make a couple of try pushes. This also requires Z:\ to be writable by all - I hope this is the case already - if not, we can tweak the DSC that sets up the image in github.com/mozrelops/OpenCloudConfig repo.
Attachment #8810672 -
Flags: review?(pmoore)
Assignee | ||
Comment 11•8 years ago
|
||
Per comment 5 it sounds like write permissions shouldn't be a problem. I don't actually know what the permission situation is like for removing a symlink--I would assume that if a user has write permission to the containing directory they would have permission to remove the symlink, but I don't actually know! If you want to set up a dedicated worker and run those tests that would be great. You should be able to just duplicate the tasks from my try push and change the worker type.
Comment 12•8 years ago
|
||
As I'm on PTO on Monday, I've made a try push today: https://treeherder.mozilla.org/#/jobs?repo=try&revision=da612b47c461345fc0081115193f75f67ed03050 This just changes the worker type to gecko-1-b-win2012-beta (instead of gecko-1-b-win2012) - and I believe the beta worker type is essentially on the same version as the non-beta type. I set maxCapacity to 1 for the beta worker type, so since this will create multiple builds in serial fashion, I'm hoping we'll see if the tasks have permission to delete the symlink created by the previous task... As I'll be out on Monday, speak to grenade if you have questions about the *-beta worker types, as he kindly set them up.
Comment 13•8 years ago
|
||
Hey Ted, in my try push with your change, I didn't see your new code kicking in: https://public-artifacts.taskcluster.net/e0mo6vEcQYCrQ9fj2HFvOQ/0/public/task-graph.json Did I do something wrong in my push in comment 12? Thanks!
Flags: needinfo?(ted)
Assignee | ||
Comment 14•8 years ago
|
||
Ah yes! You need the patch from bug 1269355 as well, that's what sets the 'needs-sccache' flag on the task definition. (I realize I didn't actually post that patch, I've only pushed it to try.)
Flags: needinfo?(ted)
Assignee | ||
Comment 15•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f89e68716e1d
Assignee | ||
Comment 16•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8095dcbe9a5a
Comment hidden (mozreview-request) |
Assignee | ||
Comment 18•8 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #16) > https://treeherder.mozilla.org/#/jobs?repo=try&revision=8095dcbe9a5a pmoore's suspicions were correct, and we did need to add a command to set an ACL on the symlink so that other task users would be able to remove it. With that added this try push seems to be progressing reasonably well.
Assignee | ||
Comment 19•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4c14523a1acc
Assignee | ||
Comment 20•8 years ago
|
||
There is a unit test failure on those jobs that I hadn't paid attention to until pmoore asked about it--it's a legitimate failure and it's pointing out something that would be broken, so I'm glad I wrote that test! We call the Win32 API function `GetFinalPathNameByHandleW` to normalize the case of source file names, but in this case that will give us back the path with the symlink resolved. The script that dumps symbols for crash reporting uses simple prefix matching to check if source files fall within the source repository, and the source repository path is under the symlink, so that fails. It shouldn't be a terribly hard fix--we just need to resolve the srcdir path the same way.
Assignee | ||
Comment 21•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca5405acc96a
Assignee | ||
Comment 22•8 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #21) > https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca5405acc96a This push has a patch to fix that issue and the builds are green.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 25•8 years ago
|
||
mozreview-review |
Comment on attachment 8813475 [details] bug 1316329 - follow symlinks for srcdir path in symbol dumping on windows. https://reviewboard.mozilla.org/r/94888/#review95094
Attachment #8813475 -
Flags: review?(gps) → review+
Comment 26•8 years ago
|
||
mozreview-review |
Comment on attachment 8810672 [details] bug 1316329 - build from a symlink dir in generic-worker builds. https://reviewboard.mozilla.org/r/92960/#review95096
Attachment #8810672 -
Flags: review+
Assignee | ||
Comment 27•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/e5a62d4ce95aa7d460b3745332e0878974d12714 bug 1316329 - follow symlinks for srcdir path in symbol dumping on windows. r=gps https://hg.mozilla.org/integration/mozilla-inbound/rev/f4c1cb96ec823030444ad4fd5bf69c2eb2967d6b bug 1316329 - build from a symlink dir in generic-worker builds. r=pmoore,gps
Comment 28•8 years ago
|
||
I've added a try push against gecko-1-b-win2012-beta: * https://treeherder.mozilla.org/#/jobs?repo=try&revision=bbbe871baa51c34311c80e07782f8f036cdddb60 to complement the one against gecko-1-b-win2012: * https://treeherder.mozilla.org/#/jobs?repo=try&revision=2a6db612237d5b31268c5d3447504f7883489b37
Comment 29•8 years ago
|
||
mozreview-review |
Comment on attachment 8810672 [details] bug 1316329 - build from a symlink dir in generic-worker builds. https://reviewboard.mozilla.org/r/92960/#review95188 Looks good, thanks Ted!
Attachment #8810672 -
Flags: review?(pmoore) → review+
Comment 30•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/e5a62d4ce95a https://hg.mozilla.org/mozilla-central/rev/f4c1cb96ec82
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox53:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Updated•6 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•