Closed Bug 1088387 Opened 10 years ago Closed 9 years ago

Start Linux sandboxing when the process is single-threaded (and proxy syscalls as needed until ready to be fully sandboxed)

Categories

(Core :: Security: Process Sandboxing, defect)

All
Linux
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jld, Assigned: jld)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

If we run a child process in its own pid namespace, its attempt to enable seccomp-bpf sandboxing fails because it's reading thread IDs from the /proc that was mounted in the outer namespace and then calling tgkill in the inner namespace to run a signal handler on them.

This might be workable-around, but I have another idea: get rid of BroadcastSetThreadSandbox.  Instead, enable seccomp-bpf earlier when the process is still single-threaded, but first create a thread to run outside of the sandbox and act as a proxy for syscalls that are rejected by the seccomp policy.

Obviously this doesn't work for any syscalls that care what thread they're invoked on or what else is on the stack at the time, or which would block on another syscall that needs to be proxied in this way, but if those are either all allowed by policy (gettid, PR_SET_NAME, thread creation) or can be fixed up somehow, then it should work.  The unsandboxed thread would exit, and policy violation handling would change to its normal handling, at the logical start-of-sandbox point.

I have a proof-of-concept that seems to be working for media plugins on desktop Linux, but this is likely more complicated for content processes and on B2G.

This should make it easier to integrate the Chromium support code that starts the sandbox.  It would still need some changes, because the process would have two threads instead of one when the sandbox starts, but it would be less code change — maybe an “about to sandbox” callback after the thread count check and before the actual prctl.
It's a little more fiddly on B2G, but not insurmountable.  Nuwa is the easy case: there's already a hook at the right point in the fork of AfterNuwaFork().  Regular plugin-container launching needs to start this very early, before initializing the Binder framework in content_process_main, which means making plugin-container link against libmozsandbox.  But it all seems to work on Flame and emulator-x86-kk.
No longer blocks: 1055310
Move process sandboxing bugs to the new Bugzilla component.

(Sorry for the bugspam; filter on 3c21328c-8cfb-4819-9d88-f6e965067350.)
Component: Security → Security: Process Sandboxing
Adjusting the title a bit — having that single-threaded “early initialization” point is also useful for chroot and network/IPC namespace isolation, even if pid namespaces aren't used.
Summary: The BroadcastSetThreadSandbox hack in the Linux sandbox breaks with pid namespace separation → Start Linux sandboxing when the process is single-threaded (and proxy syscalls as needed until ready to be fully sandboxed)
See commit message and comments for more info.

This patch adds gtests for the syscall proxy service, which call it directly and don't use seccomp-bpf, which means that they work on our current build hosts.  Two of the tests are attempts to catch race conditions, and needed multiple seconds to run when I pushed to try; I'm not sure if that's acceptable or if I should reduce the iteration count.

This patch also changes the GMP seccomp policy slightly — sigaltstack is now allowed on non-ASAN builds, because it's used during the time syscalls are being proxied (but not afterwards) and it controls signal delivery for the calling thread.  It should be relatively harmless.

I might try to find an additional reviewer for this, given the delicateness of what's going on in UnsafeSyscallProxy, if there's someone else who's “tall enough to write multithreaded code” and has time for it.
Attachment #8563614 - Flags: review?(gdestuynder)
Slight adjustment: replace a getenv with SandboxInfo, which didn't exist when this was originally written.

Not-so-slight adjustment: regenerate the diff with --patience, so that the big deletion/addition in Sandbox.cpp isn't cluttered up with “unchanged” lines that are only a brace.

r?(bent) because I think I need an IPC peer for the changes to content process startup?
Attachment #8563614 - Attachment is obsolete: true
Attachment #8563614 - Flags: review?(gdestuynder)
Attachment #8563834 - Flags: review?(gdestuynder)
Attachment #8563834 - Flags: review?(bent.mozilla)
Attachment #8563605 - Flags: review?(gdestuynder) → review+
The more I think about this patch, the more I wonder if it's really the right idea — assuming we're still going with unprivileged user namespaces, (1) most of those kernels will have SECCOMP_FILTER_FLAG_TSYNC, and (2) anything besides seccomp-bpf that needs that kind of unsandboxed helper thread, like chrooting the process or opening /proc/net/dev in the outer network namespace, can be implemented in a much less scary way.  Which means that the only thing that really needs this is pid namespace support on a narrow range of kernels, and those could use a local procfs mount if they really needed to (or just not get pid namespaces).

In particular, Ubuntu's 3.13.x kernel series (14.04 LTS and 12.04.5 LTS, but not 12.04.{0,1} LTS; i.e., the oldest supported ones with user namespaces) appears to support SECCOMP_FILTER_FLAG_TSYNC.

B2G gets much less value from pid namespaces compared to desktop, because each content process runs as a separate uid, so they're already limited in what they can do to other processes by the traditional Unix security model.
Comment on attachment 8563834 [details] [diff] [review]
Main patch: Start Linux/B2G sandboxing while single-threaded. [v1.1]

…and now I have working code for applying everything except PID namespaces to desktop media plugins (requiring unprivileged user namespaces) so I'll clear the reviews.  Sorry for the indecisiveness.  (The IPC/XRE part of this patch will be reappearing on another bug at some point, however.)
Attachment #8563834 - Flags: review?(gdestuynder)
Attachment #8563834 - Flags: review?(bent.mozilla)
This is almost certainly not happening.  The current plan is as comment #7 would have it: where removing the thread broadcast is really needed (pid namespaces), require SECCOMP_FILTER_FLAG_TSYNC (bug 1004011); the side benefits of have a sandboxing hook while the child process is single-threaded will be done separately (comment #8 and bug 1151607).
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: