Linux sandboxing: sendmsg(2) considered harmful

NEW
Unassigned

Status

()

4 years ago
10 months ago

People

(Reporter: jld, Unassigned)

Tracking

(Blocks: 1 bug, {meta})

Firefox Tracking Flags

(Not tracked)

Details

sendmsg(2) exposes a superset of the functionality of sendto(2) — with an extra level of pointer indirection, so seccomp-bpf can't filter it usefully without blocking the entire system call.

In particular: sendmsg(2) can, given a SOCK_DGRAM socket, send datagrams to any address in that socket's address family.  Obviously this is disastrous if that address family is AF_INET or AF_INET6, but this also applies to the unnamed AF_UNIX/SOCK_DGRAM sockets that can be obtained from socketpair(2).  It can't send to other unnamed sockets (aside from the other half of the socket pair), but it can reach named sockets (e.g., the interfaces exposed by services like syslogd and wpa_supplicant) and the Linux-specific "abstract" namespace.

We can't get really rid of sendmsg(2) or socketpair(2) — they're needed for the crash reporter, and will be needed for the Chromium file-access broker.

The real fix for this is bug 1041885 — chroot(2) should take care of named sockets, while CLONE_NEWNET isolates the abstract local-domain namespace along with real networking.  But, as currently planned, that won't be usable on distributions which don't yet have unprivileged user namespaces (e.g. Ubuntu with 3.2-series kernels) or which have them preffed off (e.g., Debian).  Also, we're not there yet.

What we *can* do is lock down socketpair to SOCK_STREAM and SOCK_SEQPACKET — but on 32-bit x86 this requires filtering on multiple arguments to socketcall(2).  The current policy-building code can't do that, and I'd prefer not to hack it up further if possible, given that it's going to be replaced with Chromium's compiler (bug 1055310) relatively soon.

It might also be worth checking that the child process doesn't have any SOCK_DGRAM sockets when the sandbox is started, and that it isn't passed any with ipc::FileDescriptor later on.  getsockopt(2) with SOL_SOCKET/SO_TYPE exposes that information.  (Some other code could still use sendmsg to pass a DGRAM socket to the child over a non-DGRAM socket, but there's not much to be done about that.)


(For the sake of completeness: in theory it's possible to restrict sendmsg to specific fds, and prevent changing what those fds refer to with close(2) or dup2(2) or anything else I'm not thinking of at the moment.  But this seems unwise given that it fails if we overlook *anything* that can change the fd table.)
(In reply to Jed Davis [:jld] from comment #0)
> What we *can* do is lock down socketpair to SOCK_STREAM and SOCK_SEQPACKET —
> but on 32-bit x86 this requires filtering on multiple arguments to
> socketcall(2).

It's actually worse than that.  I'll quote from the socketcall(2) man page:

SYNOPSIS
       int socketcall(int call, unsigned long *args);

The actual arguments are passed by pointer, so seccomp-bpf can't filter them.  This invalidates the above-quoted part of comment #0 (at least for the ~40% of Linux desktop Firefox profiles that aren't on newer kernels and/or x86_64).
Idea: the crash reporter is fixable.  The only thing it does with its socket pair is write a single byte and close it to signal that the dumping is done and the child can exit.  This means that a pipe would be fine — the write is currently done with sendmsg in order to do MSG_DONTWAIT | MSG_NOSIGNAL, but the equivalent might be accomplishable with a pipe, or maybe the write could be removed (since the child's read() will return 0 if the parent just closes the pipe, assuming there isn't some obscure edge case where that doesn't work).

And that should take care of media plugins.

This will have to be revisited to use the Chromium open() broker for content processes.  In particular, on B2G we're already running as a per-child uid/gid, and we have root access to potentially use chroot/namespace separation as well, so there's an answer for this (post-bug 930258).  Desktop is a bigger problem, but I'd suggest that this isn't really worth worrying about until after other forms of filesystem/network access are removed from the syscall policy.
Depends on: 1068410
So there might be some difficulties as far as fixing this for content processes:

 1  libxul.so!IPC::Channel::ChannelImpl::CreatePipe(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&, IPC::Channel::Mode) [ipc_channel_posix.cc : 329 + 0x12]
 2  libxul.so!IPC::Channel::ChannelImpl::ChannelImpl(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&, IPC::Channel::Mode, IPC::Channel::Listener*) [ipc_channel_posix.cc : 267 + 0xf]
 3  libxul.so!IPC::Channel::Channel(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&, IPC::Channel::Mode, IPC::Channel::Listener*) [ipc_channel_posix.cc : 970 + 0x6]
 4  libxul.so!mozilla::ipc::CreateTransport(int, int, mozilla::ipc::TransportDescriptor*, mozilla::ipc::TransportDescriptor*) [Transport_posix.cpp : 32 + 0x11]
 5  libxul.so!mozilla::ipc::Open(mozilla::ipc::PrivateIPDLInterface const&, mozilla::ipc::MessageChannel*, int, IPC::Channel::Mode, IPCMessageStart, IPCMessageStart) [ProtocolUtils.cpp : 157 + 0x5]
 6  libxul.so!mozilla::ipc::PBackground::Open(mozilla::dom::PContentChild*) [PBackground.cpp : 17 + 0x29]
 7  libxul.so!ChildImpl::OpenProtocolOnMainThread [BackgroundImpl.cpp : 2020 + 0x9]
 8  libxul.so!ChildImpl::CreateActorRunnable::Run [BackgroundImpl.cpp : 1945 + 0xb]
 9  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp : 823 + 0x14]

tl;dr: Creating a PBackground calls socketpair, apparently.
Move process sandboxing bugs to the new Bugzilla component.

(Sorry for the bugspam; filter on 3c21328c-8cfb-4819-9d88-f6e965067350.)
Component: Security → Security: Process Sandboxing
This bug isn't hugely actionable by itself.  For GMP it's already taken care of, by removing socketpair (although we're not guarding against a misusable socket somehow being explicitly passed to the process).  For content that can't happen (comment #3), so we'll have to control the process's access to the filesystem and network namespaces, not just syscalls.
Depends on: 1151632, 1213998
Keywords: meta
Assignee: jld → nobody
See Also: → bug 1355274
Depends on: 1430949
You need to log in before you can comment on or make changes to this bug.