Closed
Bug 1391476
Opened 7 years ago
Closed 7 years ago
uid/gid mismatch between checkouts can lead to permission failure doing checkout
Categories
(Firefox Build System :: Task Configuration, task)
Firefox Build System
Task Configuration
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla57
People
(Reporter: gps, Assigned: gps)
References
Details
Attachments
(11 files, 2 obsolete files)
59 bytes,
text/x-review-board-request
|
dustin
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
dustin
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
dustin
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
dustin
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
dustin
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
dustin
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
dustin
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
dustin
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
dustin
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
dustin
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
nalexander
:
review+
mshal
:
review+
|
Details |
From https://public-artifacts.taskcluster.net/f4JHtpnhQJyhTsYEm4G-LQ/0/public/logs/live_backing.log: [taskcluster 2017-08-17 21:53:49.252Z] Task ID: f4JHtpnhQJyhTsYEm4G-LQ [taskcluster 2017-08-17 21:53:49.253Z] Worker ID: i-00638e1e6479b6c06 [taskcluster 2017-08-17 21:53:49.253Z] Worker Group: us-east-1 [taskcluster 2017-08-17 21:53:49.253Z] Worker Node Type: c4.2xlarge [taskcluster 2017-08-17 21:53:49.253Z] Worker Type: gecko-1-b-android [taskcluster 2017-08-17 21:53:49.253Z] Public IP: 54.91.181.90 [taskcluster 2017-08-17 21:53:49.253Z] using cache "tooltool-cache" -> /home/worker/tooltool-cache [taskcluster 2017-08-17 21:53:49.253Z] using cache "level-1-checkouts-v1" -> /home/worker/checkouts [taskcluster 2017-08-17 21:53:50.218Z] Downloading artifact "public/image.tar.zst" from task ID: C1rHajDgRYi2RE5ANX64aQ. [taskcluster 2017-08-17 21:54:10.231Z] Download Progress: 10.21% [taskcluster 2017-08-17 21:54:15.230Z] Download Progress: 33.74% [taskcluster 2017-08-17 21:54:20.230Z] Download Progress: 57.12% [taskcluster 2017-08-17 21:54:25.230Z] Download Progress: 80.34% [taskcluster 2017-08-17 21:54:29.362Z] Downloaded artifact successfully. [taskcluster 2017-08-17 21:54:29.363Z] Downloaded 449.712 mb [taskcluster 2017-08-17 21:54:29.363Z] Decompressing downloaded image [taskcluster 2017-08-17 21:54:31.109Z] Loading docker image from downloaded archive. [taskcluster 2017-08-17 21:55:02.634Z] Image 'public/image.tar.zst' from task 'C1rHajDgRYi2RE5ANX64aQ' loaded. Using image ID sha256:c40afb824e2f6a00b688d856378b6c059bb420c1bde04d654ad86bbc1733fb9b. [taskcluster 2017-08-17 21:55:02.834Z] === Task Starting === [setup 2017-08-17T21:55:03.058941Z] run-task started [chown 2017-08-17T21:55:03.060631Z] recursively changing ownership of /home/worker/workspace to worker:worker [chown 2017-08-17T21:55:03.060708Z] recursively changing ownership of /home/worker/tooltool-cache to worker:worker [setup 2017-08-17T21:55:03.065489Z] running as worker:worker [vcs 2017-08-17T21:55:03.065540Z] executing ['hg', 'robustcheckout', '--sharebase', '/home/worker/checkouts/hg-store', '--purge', '--upstream', 'https://hg.mozilla.org/mozilla-unified', '--revision', '5b394c411b250de133a11512ce97efce97bc045b', 'https://hg.mozilla.org/try', '/home/worker/workspace/build/src'] [vcs 2017-08-17T21:55:03.114611Z] ensuring https://hg.mozilla.org/try@5b394c411b250de133a11512ce97efce97bc045b is available at /home/worker/workspace/build/src [vcs 2017-08-17T21:55:03.114763Z] (cloning from upstream repo https://hg.mozilla.org/mozilla-unified) [vcs 2017-08-17T21:55:03.888652Z] (sharing from existing pooled repository 8ba995b74e18334ab3707f27e9eb8f4e37ba3d29) [vcs 2017-08-17T21:55:03.889747Z] not trusting file /home/worker/checkouts/hg-store/8ba995b74e18334ab3707f27e9eb8f4e37ba3d29/.hg/hgrc from untrusted user 500, group 500 [vcs 2017-08-17T21:55:03.896502Z] abort: could not lock repository /home/worker/workspace/build/src: Permission denied [taskcluster 2017-08-17 21:55:04.077Z] === Task Finished === [taskcluster 2017-08-17 21:55:04.146Z] Artifact "public/build" not found at "/home/worker/artifacts/" [taskcluster 2017-08-17 21:55:04.209Z] Artifact "private/android-sdk" not found at "/home/worker/private/android-sdk" [taskcluster 2017-08-17 21:55:04.309Z] Artifact "private/java_home" not found at "/home/worker/private/java_home" [taskcluster 2017-08-17 21:55:04.646Z] Unsuccessful task run with exit code: 255 completed in 75.395 seconds I reckon this was caused by /home/worker/checkouts being owned by a different uid/gid than what the task ran as. This can happen if 2 tasks with different uid/gid run on the same worker. Ideally we normalize our uid/gid to 1000:1000 to avoid ownership mismatch. But it can happen. Especially on Try. We should fix it by doing a recursive chown of the version control share base.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 4•7 years ago
|
||
Comment on attachment 8898540 [details] Bug 1391476 - Ensure version control repository have proper permissions; This is still buggy in Try.
Attachment #8898540 -
Flags: review?(cmanchester)
Assignee | ||
Updated•7 years ago
|
Attachment #8898539 -
Flags: review?(cmanchester)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8898539 -
Attachment is obsolete: true
Assignee | ||
Updated•7 years ago
|
Attachment #8898540 -
Attachment is obsolete: true
Assignee | ||
Comment 15•7 years ago
|
||
mozreview-review |
Comment on attachment 8900044 [details] Bug 1391476 - Require that all cache paths be declared as volumes; https://reviewboard.mozilla.org/r/171384/#review176556 ::: taskcluster/docker/desktop1604-test/Dockerfile:8 (Diff revision 1) > > RUN useradd -d /home/worker -s /bin/bash -m worker > WORKDIR /home/worker > > +VOLUME /home/worker/tooltool-cache > +VOLUME /home/worker/.cache This breaks Docker image building because it attempts to `rm -rf /home/worker/.cache` and this fails because you can't remove a mount point. Good times.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 20•7 years ago
|
||
mozreview-review |
Comment on attachment 8900045 [details] Bug 1391476 - Tell run-task about volumes so it can sanitize them; https://reviewboard.mozilla.org/r/171386/#review176576 ::: taskcluster/docker/recipes/run-task:352 (Diff revision 2) > + if os.listdir(volume): > + print('error: volume %s is not empty; non-cache volumes are ' > + 'supposed to be empty; you likely found a TaskCluster ' > + 'worker bug' % volume) > + return 1 This assertion isn't true. When Docker builds images, a VOLUME will result in copying any files present in the specified path at the time of the VOLUME directive. However, I thought this wasn't getting preserved by the way TaskCluster does things. At least one of the Android images breaks this assumption. So we can: 1) Forbid volumes from having files 2) Allow volumes to have files and chown the root directory 3) Allow volumes to have files and recursively chown I'm not sure what the uid/gid of files copied to the volume will be. Presumably what the Docker image set up. So I would assume they would be properly set and we wouldn't need to chown them. But I wouldn't put anything past Docker.
Assignee | ||
Comment 21•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8900045 [details] Bug 1391476 - Tell run-task about volumes so it can sanitize them; https://reviewboard.mozilla.org/r/171386/#review176576 > This assertion isn't true. > > When Docker builds images, a VOLUME will result in copying any files present in the specified path at the time of the VOLUME directive. > > However, I thought this wasn't getting preserved by the way TaskCluster does things. At least one of the Android images breaks this assumption. > > So we can: > > 1) Forbid volumes from having files > 2) Allow volumes to have files and chown the root directory > 3) Allow volumes to have files and recursively chown > > I'm not sure what the uid/gid of files copied to the volume will be. Presumably what the Docker image set up. So I would assume they would be properly set and we wouldn't need to chown them. But I wouldn't put anything past Docker. My distrust of Docker is correct: Docker does **NOT** preserve the uid on files in a VOLUME. I tested this by running ``chown <path-in-volume>`` and confirmed from a running container that the files are owned by ``root:root``. And to be extra sure, the ``layer.tar`` file as part of the tarball produced by ``docker save`` confirmed that the uid/gid are ``0:0``. I'm leaning towards banning files in volumes because it is simpler and more closely approximates how caches work.
Comment 22•7 years ago
|
||
> My distrust of Docker is correct
TRUTH
Comment 23•7 years ago
|
||
mozreview-review |
Comment on attachment 8900038 [details] Bug 1391476 - Inline list append; https://reviewboard.mozilla.org/r/171372/#review176838
Attachment #8900038 -
Flags: review?(dustin) → review+
Comment 24•7 years ago
|
||
mozreview-review |
Comment on attachment 8900039 [details] Bug 1391476 - Dedent block; https://reviewboard.mozilla.org/r/171374/#review176840
Attachment #8900039 -
Flags: review?(dustin) → review+
Comment 25•7 years ago
|
||
mozreview-review |
Comment on attachment 8900040 [details] Bug 1391476 - Don't use ~ in paths; https://reviewboard.mozilla.org/r/171376/#review176844
Attachment #8900040 -
Flags: review?(dustin) → review+
Comment 26•7 years ago
|
||
mozreview-review |
Comment on attachment 8900041 [details] Bug 1391476 - Print cache info; https://reviewboard.mozilla.org/r/171378/#review176846
Attachment #8900041 -
Flags: review?(dustin) → review+
Comment 27•7 years ago
|
||
mozreview-review |
Comment on attachment 8900042 [details] Bug 1391476 - Track whether caches should be used in untrusted environments; https://reviewboard.mozilla.org/r/171380/#review176848 Are there any caches without this flag?
Attachment #8900042 -
Flags: review?(dustin) → review+
Comment 28•7 years ago
|
||
mozreview-review |
Comment on attachment 8900043 [details] Bug 1391476 - Capture Docker volumes in docker-worker config; https://reviewboard.mozilla.org/r/171382/#review176852 ::: taskcluster/taskgraph/transforms/task.py:197 (Diff revision 1) > Required('allow-ptrace', default=False): bool, > Required('loopback-video', default=False): bool, > Required('loopback-audio', default=False): bool, > Required('docker-in-docker', default=False): bool, # (aka 'dind') > > + # Docker volumes. Caches common share the same path. But not always. Not sure what this sentence means -- is it OK for them to share the same path? What does it mean? What if they don't share the same path?
Attachment #8900043 -
Flags: review?(dustin) → review+
Comment 29•7 years ago
|
||
I'd like a little feedback from Wander or Greg as Docker experts, too.
Flags: needinfo?(wcosta)
Flags: needinfo?(garndt)
Comment 30•7 years ago
|
||
mozreview-review |
Comment on attachment 8900044 [details] Bug 1391476 - Require that all cache paths be declared as volumes; https://reviewboard.mozilla.org/r/171384/#review176862 ::: commit-message-6a1c5:5 (Diff revision 2) > +Bug 1391476 - Require that all cache paths be declared as volumes; r?dustin > + > +See the inline comment for the rationale here. > + > +Depending on the order transforms run in, it is possible to evade this I don't understand what this comment means. The order's not a mystery.. obviously any check written in the decision task could be evaded by just deleting it. ::: taskcluster/docker/android-gradle-build/Dockerfile:7 (Diff revision 2) > FROM taskcluster/centos6-build-upd:0.1.6.20160329195300 > MAINTAINER Nick Alexander <nalexander@mozilla.com> > > # BEGIN ../desktop-build/Dockerfile > > # TODO remove when base image is updated hm, what's supposed to be removed here?
Attachment #8900044 -
Flags: review?(dustin) → review+
Comment 31•7 years ago
|
||
mozreview-review |
Comment on attachment 8900045 [details] Bug 1391476 - Tell run-task about volumes so it can sanitize them; https://reviewboard.mozilla.org/r/171386/#review176868
Attachment #8900045 -
Flags: review?(dustin) → review+
Comment 32•7 years ago
|
||
mozreview-review |
Comment on attachment 8900046 [details] Bug 1391476 - Automatically set cache/volume permissions in run-task; https://reviewboard.mozilla.org/r/171388/#review176870 I really like this model
Attachment #8900046 -
Flags: review?(dustin) → review+
Comment 33•7 years ago
|
||
mozreview-review |
Comment on attachment 8900047 [details] Bug 1391476 - Add UID and GID to cache parameters; https://reviewboard.mozilla.org/r/171390/#review176876
Attachment #8900047 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 34•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8900042 [details] Bug 1391476 - Track whether caches should be used in untrusted environments; https://reviewboard.mozilla.org/r/171380/#review176848 There are a handful, actually. The version control cache is the most notable. We trust robustcheckout to do sane things so we don't need to worry about cache poisoning as much. There are definitely a few caches whose configuration I question. It would be worthwhile to audit things. I wouldn't be opposed to changing the default so caches are disabled in "untrusted" environments by default. This was slightly beyond my tolerance for scope bloat in this series.
Comment 35•7 years ago
|
||
Fair enough. Everything is isolated to different workers by level now, so I'm not too worried about *malicious* poisoning.
Assignee | ||
Comment 36•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8900044 [details] Bug 1391476 - Require that all cache paths be declared as volumes; https://reviewboard.mozilla.org/r/171384/#review176862 > hm, what's supposed to be removed here? These TODO comments are no longer valid given our new VOLUME requirements. I'll mass remove them.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 55•7 years ago
|
||
If I understand the volume bits correctly, we're basically declaring some paths that are typically used for caches on non-try as volumes in the image's dockerfile so they have the same performance (outside of AUFS) as the caches do while not being persistent like caches are. Makes sense. I think it highlights that this could be made more explicit and cleaner if we moved the concept of caches to being declared as volumes in the task payload, and the type of volume is persistent (for caches) or temporary for things like we're doing here. These volumes should be cleared up by docker worker when the containers are removed: https://github.com/taskcluster/docker-worker/blob/master/src/lib/gc.js#L113-L118
Flags: needinfo?(garndt)
Updated•7 years ago
|
Flags: needinfo?(wcosta)
Comment 56•7 years ago
|
||
yeah, we could definitely support this better in taskcluster-worker, but this implementation is really nice
Comment 57•7 years ago
|
||
At one point I added persistent volumes to taskcluster-worker, but since then, I think it has been removed. Perhaps the code created then was not suitable to where taskcluster-worker has been taken now.
Comment 58•7 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #56) > yeah, we could definitely support this better in taskcluster-worker, but > this implementation is really nice Totally agree that this implementation is really nice, wasn't suggesting otherwise :)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 61•7 years ago
|
||
mozreview-review |
Comment on attachment 8900383 [details] Bug 1391476 - Don't install nexus.xml in a Docker volume; https://reviewboard.mozilla.org/r/171734/#review177114 LGTM, assuming try is happy.
Attachment #8900383 -
Flags: review+
Comment 62•7 years ago
|
||
Pushed by gszorc@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f6f4c5e1b725 Inline list append; r=dustin https://hg.mozilla.org/integration/autoland/rev/78aac0824b7e Dedent block; r=dustin https://hg.mozilla.org/integration/autoland/rev/88cd6294bef4 Don't use ~ in paths; r=dustin https://hg.mozilla.org/integration/autoland/rev/548978f63439 Print cache info; r=dustin https://hg.mozilla.org/integration/autoland/rev/985a6df057cb Track whether caches should be used in untrusted environments; r=dustin https://hg.mozilla.org/integration/autoland/rev/52a96399c1dc Capture Docker volumes in docker-worker config; r=dustin https://hg.mozilla.org/integration/autoland/rev/84fd52d2832a Require that all cache paths be declared as volumes; r=dustin https://hg.mozilla.org/integration/autoland/rev/047cc1e53f50 Don't install nexus.xml in a Docker volume; r=mshal https://hg.mozilla.org/integration/autoland/rev/1768e5267bcf Tell run-task about volumes so it can sanitize them; r=dustin https://hg.mozilla.org/integration/autoland/rev/06d162a52dcd Automatically set cache/volume permissions in run-task; r=dustin https://hg.mozilla.org/integration/autoland/rev/8e3a645ad94b Add UID and GID to cache parameters; r=dustin
Comment 63•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f6f4c5e1b725 https://hg.mozilla.org/mozilla-central/rev/78aac0824b7e https://hg.mozilla.org/mozilla-central/rev/88cd6294bef4 https://hg.mozilla.org/mozilla-central/rev/548978f63439 https://hg.mozilla.org/mozilla-central/rev/985a6df057cb https://hg.mozilla.org/mozilla-central/rev/52a96399c1dc https://hg.mozilla.org/mozilla-central/rev/84fd52d2832a https://hg.mozilla.org/mozilla-central/rev/047cc1e53f50 https://hg.mozilla.org/mozilla-central/rev/1768e5267bcf https://hg.mozilla.org/mozilla-central/rev/06d162a52dcd https://hg.mozilla.org/mozilla-central/rev/8e3a645ad94b
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Comment 64•7 years ago
|
||
mozreview-review |
Comment on attachment 8900383 [details] Bug 1391476 - Don't install nexus.xml in a Docker volume; https://reviewboard.mozilla.org/r/171734/#review177526 Thanks for doing this. If `android-dependencies` is green on try, it's good for me.
Attachment #8900383 -
Flags: review?(nalexander) → review+
Assignee | ||
Comment 65•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8900383 [details] Bug 1391476 - Don't install nexus.xml in a Docker volume; https://reviewboard.mozilla.org/r/171734/#review177526 I did get a green job on Try.
Comment 66•7 years ago
|
||
mozreview-review |
Comment on attachment 8900042 [details] Bug 1391476 - Track whether caches should be used in untrusted environments; https://reviewboard.mozilla.org/r/171380/#review196082 ::: taskcluster/taskgraph/transforms/job/run_task.py (Diff revision 2) > def docker_worker_run_task(config, job, taskdesc): > run = job['run'] > worker = taskdesc['worker'] = job['worker'] > common_setup(config, job, taskdesc) > > - if run.get('cache-dotcache') and int(config.params['level']) > 1: I don't think the cache-dotcache condition should have been removed. For one, it broke using run-task on the desktop-build docker image (because the desktop-build docker image doesn't have that path as a VOLUME)
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
Product: TaskCluster → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•