Closed Bug 1263973 Opened 8 years ago Closed 8 years ago

Port moz.build sandbox to CentOS 7

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: gps, Assigned: gps)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

The containerized moz.build evaluation sandbox implemented in bug 1139218 needs to be ported to CentOS 7.

It currently fails to run on CentOS 7. The error message is "failed to clone" or something like that. I suspect we need to support newer flags added by newer Linux.

Also, Redhat's docs on cgroup foo for RHEL 7 say to use systemd to launch processes in cgroups. Not sure if we want to tackle that in this bug or what. We already have the C program that starts the container written. I think it might be easier to continue using that. Then again, I /think/ systemd has magic commands that can run processes in a chroot/cgroup pretty turnkey. So that is tempting.
systemd does in fact have the ability to execute processes in chroots and cgroups. I'm going to take a stab at implementing this using systemd.
I think what we'll want to do is port the invocation to a socket activated systemd unit. I'm thinking we'll create a file-based socket and every connection to the socket spawns a new chrooted process via systemd. We'll then pass arguments and response over the socket.

This also means we'll need a way to encode the exit code in a response protocol since we no longer have access to $?.
Reading the systemd man pages (notably systemd.exec), I wasn't getting warm fuzzy feelings that the various CLONE_* flags to clone(2) were getting called when executing processes (I was just seeing a lot of cgroups foo). Looking at the systemd source, I can only find a reference to CLONE_IPC in nspawn.c. And other CLONE_* flags only seem to be called by machinectl foo (in addition to nspawn.c).

So it looks like we'll need to use systemd-nspawn (https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html) to securely spawn sandboxed processes with systemd to the same level of security we were spawning before. This makes things a /bit/ more complicated.

kang: before I put any more work into this, I'm curious if you have any subject matter expertise to share. I should be around the office next week if you wish to discuss IRL.
Flags: needinfo?(gdestuynder)
I'm using nspawn on a regular basis. I'ts akin to docker containers (and in fact can load docker files).
systemd's init system sets base cgroups and other things via init hence why its a good idea to have systemd manage cgroups for you when possible.
systemd also allows for restricting capabilities within the service file

It also allows for *some* namespacing functionality within the service file (ie not via nspawn): new filesystem namespace, new network namespace (IIRC there no IPC namespacing without nspawn for example, though arguably this may be ok)

Finally, it allows for loading a seccomp filter (system call filter) which is also quite powerful.

Keep in mind though that all this also depends on the version of systemd installed, though I think CentOS7 has most of this.

I would generally hope that we can start the current sandbox'd processes from systemd, with cgroups, and let our process care for namespacing (seems like the most straightforward solution right now)
Flags: needinfo?(gdestuynder)
What's the error you were seeing? I tried the sandbox launching program on a freshly installed CentOS7 on digitalocean and the program run pass clone() without any problem. It stopped at cgroup configuration but that's expected because I didn't config one.

The kernel version is:

  Linux centos-512mb-sgp1-01 3.10.0-327.36.1.el7.x86_64 #1 SMP Sun Sep 18 13:04:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Flags: needinfo?(gps)
The sandbox launching program is at least getting hung up on manipulating the mounts table. Specifically, it was failing to umount /sys, likely due to /sys/fs/cgroup/systemd and possibly /sys/fs/cgroup. I also recall issues with /dev/pts and /run. I think systemd is somehow "contaminating" the sandboxed environment and preventing us from detaching cleanly.
Flags: needinfo?(gps)
I'm looking at this again today.

/sys/fs/cgroup is marked as "shared" in the mounts table. So when we unmount things from the sandboxed process, they get unmounted in the main environment as well! It looks like we'll need to teach the sandbox launching program about unshare(2).
Err, I think the problem is there is a reference to the original mounts table via /proc or something...
(mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) seems to do the trick.
Comment on attachment 8803413 [details]
hgmo: update moz.build evaluation sandbox to work on CentOS 7 (bug 1263973);

https://reviewboard.mozilla.org/r/87690/#review86662

::: testing/docker/builder-hgweb-chroot/mozbuild-eval.c:167
(Diff revision 1)
> +    /* We can't unmount the cgroup namespace, presumably because this process
> +     * is attached to a cgroup. */
> +    /*
> +    if (umount2("/sys/fs/cgroup", 0)) {
> +        fprintf(stderr, "unable to unmount /sys/fs/cgroup\n");
> +        return 1;
> +    }

This system call fails with EBUSY. However, `lsof` from the host doesn't show any open file handles to this path. I'm not sure exactly why we can't unmount the cgroups filesystem from the child.

If I had to venture a guess, I'd say that somehow the association of the child process with the cgroup (see code above) is keeping a reference somewhere. However, at the point where we do the cgroup association, we should have a shared mount.

Perhaps I'll need to refactor the sandbox code to do cgroup association in the parent process.
I tried refactoring things to perform the cgroup association in the parent process. I also tried removing the cgroup association code completely. In both scenarios, /sys/fs/cgroup failed to unmount cleanly in the child process.

However, passing MNT_DETACH to umount2() does allow /sys/fs/cgroup and /sys to be unmounted. So hopefully that's good enough.
Comment on attachment 8803413 [details]
hgmo: update moz.build evaluation sandbox to work on CentOS 7 (bug 1263973);

https://reviewboard.mozilla.org/r/87690/#review86714

I also do not know why it can't umount, but for this path detach seems ok/safe to me
Comment on attachment 8803413 [details]
hgmo: update moz.build evaluation sandbox to work on CentOS 7 (bug 1263973);

https://reviewboard.mozilla.org/r/87690/#review86718
Attachment #8803413 - Flags: review?(gdestuynder) → review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/51b5fac31814
hgmo: update moz.build evaluation sandbox to work on CentOS 7 ; r=kang
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Still a few issues to address...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 8803514 [details]
hgmo: handle /boot unmount (bug 1263973);

https://reviewboard.mozilla.org/r/87762/#review87460
Attachment #8803514 - Flags: review?(gdestuynder) → review+
Comment on attachment 8803524 [details]
ansible/hg-web: allow mozbuild-eval to set its cgroup (bug 1263973);

https://reviewboard.mozilla.org/r/87766/#review87456

::: ansible/roles/hg-web/files/cgconfig-mozbuild.conf:11
(Diff revision 1)
> +    # the "hg" user access to add tasks to this cgroup.
> +    perm {
> +        task {
> +            uid = hg;
> +            gid = hg;
> +            fperm = 777;

I've not used fperm in cgconfig.conf before but I assume its the file creation mode (and not mask), thus world read/write/exec => could this work with fperm 770?

I saw your comment on it, but mainly this is just for sanity - otherwise 777 seems alright (some other user could change the task cgroups basically)
Comment on attachment 8803515 [details]
hgmo: acquire and drop capabilities (bug 1263973);

https://reviewboard.mozilla.org/r/87764/#review87458

::: ansible/roles/hg-web/tasks/main.yml:348
(Diff revision 3)
>          owner=root
>          group=root
>          mode=0755
>    when: chroot_mozbuild_exists
>  
> +- name: give mozbuild-eval elevated privileges

not sure if necessary, but if only certain users need to run /usr/local/bin/mozbuild-eval we could limit who's able to execute it (mainly because this is a privileged binary with these caps)

that would mean changing lines 340-345 to something like:

owner=root
group=usergrouphere
mode=0750
Comment on attachment 8803524 [details]
ansible/hg-web: allow mozbuild-eval to set its cgroup (bug 1263973);

https://reviewboard.mozilla.org/r/87766/#review87790

::: ansible/roles/hg-web/files/cgconfig-mozbuild.conf:11
(Diff revision 1)
> +    # the "hg" user access to add tasks to this cgroup.
> +    perm {
> +        task {
> +            uid = hg;
> +            gid = hg;
> +            fperm = 777;

I'll change this to 770 in the next version.
Comment on attachment 8803515 [details]
hgmo: acquire and drop capabilities (bug 1263973);

https://reviewboard.mozilla.org/r/87764/#review87804
Attachment #8803515 - Flags: review?(gdestuynder) → review+
Comment on attachment 8803524 [details]
ansible/hg-web: allow mozbuild-eval to set its cgroup (bug 1263973);

https://reviewboard.mozilla.org/r/87766/#review87806
Attachment #8803524 - Flags: review?(gdestuynder) → review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/f14fb4dc7ae5
hgmo: handle /boot unmount ; r=kang
https://hg.mozilla.org/hgcustom/version-control-tools/rev/5832ce0a5c07
ansible/hg-web: allow mozbuild-eval to set its cgroup ; r=kang
https://hg.mozilla.org/hgcustom/version-control-tools/rev/19052fd49fb4
hgmo: acquire and drop capabilities ; r=kang
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
This is deployed and appears to be working!
Status: RESOLVED → VERIFIED
Blocks: 1329738
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: