Closed Bug 935111 Opened 6 years ago Closed 6 years ago

Enable seccomp-bpf for linux

Categories

(Core :: Security, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: ckerschb, Assigned: ckerschb)

References

Details

Attachments

(1 file, 3 obsolete files)

We can enable sandboxing in linux by making use of seccomp-bpf, similar to what B2G currently is doing. We have to define a new policy and separate it from the B2G policy.

How to get started:
  a) Make sure to have a linux kernel >= 3.5, since this is the first kernel version that supports seccomp-bpf.
  b) update your .mozconfig and add those two lines:
     ac_add_options --enable-content-sandbox
     ac_add_options --enable-content-sandbox-reporter
  c) Enable e10s. Do this in about:config by setting:
     browser.tabs.remote = true;
     then restart your browser.
Blocks: 925570
Adding a WIP patch which allows to run a sandboxed Firefox on Linux Desktop (whitelisting 88 syscalls). Kang suggested we could eventually add the needed syscalls for desktop to the B2G whitelist. Probably worth pointing out that FF on Linux (as far as I was able to test) does not make use of the following syscalls (which B2G however uses):
  gettimeofday, clock_gettime, getpid,
  rt_sigreturn, epoll_wait, nanosleep,
  getdents64,  sched_setscheduler, restart_syscall.

Following the principle of least privilege it probably makes sense to separate the whitelists for B2G and Linux Desktop and ifdef the cases.

As Kang also pointed out on IRC, the current process model allows the parent to make resource access decisions, where childs can only communicate with the parent and kill themselves. If we would allow all the syscalls in the sandbox we would just add a performance degradation layer. 88 syscalls sounds like a lot, but some are used way more frequent than others. E.g. collecting 450,000 syscalls, the top 10 are:
  1) recvfrom: 160174
  2) poll:     131081
  3) futex:     67511
  4) writev:    39115
  5) read:      17798
  6) write:      7900
  7) stat:       4487
  8) madvise:    3978
  9) getdents:   2121
  10) munmap:    2110

Kang, can you please ellaborate a little more on the process model and how we could optimize the sandbox?
Flags: needinfo?(gdestuynder)
Comment on attachment 828325 [details] [diff] [review]
wip_bug_742434_seccomp_bpf_for_linux.patch

Review of attachment 828325 [details] [diff] [review]:
-----------------------------------------------------------------

Christoph - the whitelist is already set up to differentiate based on platform (e.g. arm for b2g, x86_64 for linux desktop, etc.). Can the new syscalls not be added to the SECCOMP_WHITELIST_ADD_x86_64_HIGH list? (or added to a new SECCOMP_WHITELIST_ADD_x86_64_LOW list)?
Comment on attachment 828325 [details] [diff] [review]
wip_bug_742434_seccomp_bpf_for_linux.patch

Review of attachment 828325 [details] [diff] [review]:
-----------------------------------------------------------------

See inline comments.
This is probably the best description of the current process model: https://wiki.mozilla.org/Security/B2G/FirefoxOSCommsHardening
Performance / calls test for B2G: https://wiki.mozilla.org/B2G/Architecture/System_Security/Seccomp#Whitelist_performance_optimizations
B2G Sandboxing (and more) doc: https://developer.mozilla.org/en-US/docs/Mozilla/Firefox_OS/Security/System_security

Mainly I would recommend to classify the current linux whitelist in a similar manner to the B2G whitelist (keep in mind that B2G has some i386/x64/ARM ifdefs that you also don't have but might need. Maybe we'll need something nicer to add support for more platforms too. The issue here is that some platform use some syscalls that don't exist on other platforms, or exist but have different names).

Once the whitelist is classed, this can help identifying what is to be modified/fixed to make the desktop multiprocess firefox fit into the same process model as B2G (or change the process model if necessary, but since that's a huge amount of work, I would discourage that right now)

::: security/sandbox/linux/Sandbox.cpp
@@ +37,5 @@
>  
>  struct sock_filter seccomp_filter[] = {
>    VALIDATE_ARCHITECTURE,
>    EXAMINE_SYSCALL,
> +// TODO: we need to prperly ifdef that part

typo: properly
and yes, if we ifdef desktop vs B2G this is not the right way. Either whitelist should be enabled if MOZ_CONTENT_SANDBOX is enabled.

Maybe you can idef on GONK or B2G. See the configure script for exact variable names.

Something like:
#idef MOZ_CONTENT_SANDBOX
#idef <B2G> <= look up which one we can use
  SECCOMMP_WHITELIST_B2G
#else
  SECCOMP_WHITELIST_LINUX
#endif
#endif

@@ +188,5 @@
>       * See bug 880797 for reversal
>       */
>  
>      /* _exit(127); */
>    }

no empty line

::: security/sandbox/linux/seccomp_filter.h
@@ +186,5 @@
> + *  9) getdents:   2121
> + *  10) munmap:    2110
> + */
> +#define SECCOMP_WHITELIST_LINUX \
> +  /* These are calls we're ok to allow */ \

We're not OK to allow all of those actually :)
stuff like open and so on should eventually be removed. Check how it's done for B2G to classify them, and example of some of them being already classified.

man <syscall> is my best friend.
Attachment #828325 - Flags: feedback+
Oh also once classed, you can probably add the calls to this for desktop: https://wiki.mozilla.org/FoxInABox#Status (Check with Sid)
Flags: needinfo?(gdestuynder)
Assignee: nobody → mozilla
Depends on: 936145
Blocks: 936274
This patch allows to enable MOZ_CONTENT_SANDBOX and separates B2G and LINUX Desktop syscalls. Even though the macro SECCOMP_WHITELIST_ARCH_DESKTOP_LINUX is left empty, I think we should keep it and start filling it once we support other architectures.

Also, the syscalls |nanosleep| and |resart_syscall| are not used by Desktop, but of course we need them all the time for debugging. Should we leave them in SECCOMP_WHITELIST_B2G_AND_DESKTOP_LINUX, or should we duplicate them, having them in B2G and DESKTOP_LINUX so we can remove them from desktop once we finalize sandboxing work?
Attachment #8334735 - Flags: review?(jld)
Attachment #8334735 - Flags: review?(gdestuynder)
Comment on attachment 8334735 [details] [diff] [review]
bug_935111_enable_seccomp_bpf_for_linux.patch

Review of attachment 8334735 [details] [diff] [review]:
-----------------------------------------------------------------

r+ modulo the parts about gettimeofday and the duplicated syscalls.

::: security/sandbox/linux/seccomp_filter.h
@@ -111,3 @@
>    /* These are calls we're ok to allow */ \
>    SECCOMP_WHITELIST_ARCH_HIGH \
> -  ALLOW_SYSCALL(gettimeofday), \

My understanding is that gettimeofday is here because it's called frequently; this change would move it farther down the list, which could impact performance.

@@ +178,5 @@
> +#endif
> +
> +/* Desktop Linux specific syscalls */
> +#if defined(MOZ_CONTENT_SANDBOX) && !defined(MOZ_B2G) && defined(XP_UNIX) && !defined(XP_MACOSX)
> +#define SECCOMP_WHITELIST_DEKSTOP_LINUX \

Typo.

@@ +192,5 @@
> +  ALLOW_SYSCALL(getsockname), \
> +  ALLOW_SYSCALL(recvmsg), \
> +  ALLOW_SYSCALL(uname), \
> +  ALLOW_SYSCALL(rt_sigaction), \
> +  ALLOW_SYSCALL(sendmsg), \

Some of these (sendmsg, socketpair, fsync, uname; maybe more) are also in the shared list.  I don't know if this is a merge issue (I added them recently to get mochitests to pass on b2g, and so the electrolyzed crash reporter would work) or if the duplication is intentional.

I also don't know if some of these syscalls have different uses on b2g and desktop — e.g., if we change the crash reporter to not use socketpair/sendmsg in the child then we can remove them for b2g, but they may or may not still be needed on desktop.  I guess we can deal with problems like that when we get to them.

@@ +198,5 @@
> +  ALLOW_SYSCALL(socket), \
> +  ALLOW_SYSCALL(getuid), \
> +  ALLOW_SYSCALL(geteuid), \
> +  ALLOW_SYSCALL(mkdir), \
> +  ALLOW_SYSCALL(connect), \

At the risk of stating the obvious, some of these are syscalls we'll clearly want to remove when we can — socket/connect is obviously bad, sendto might be, quotactl, rename, kill, execve, symlink, and so on.  I don't know if we want to separate them into their own group with a comment.

@@ +251,5 @@
> +
> +#define SECCOMP_WHITELIST \
> +  SECCOMP_WHITELIST_B2G_AND_DESKTOP_LINUX \
> +  SECCOMP_WHITELIST_B2G \
> +  SECCOMP_WHITELIST_DEKSTOP_LINUX

(As above.)
Attachment #8334735 - Flags: review?(jld) → review+
(In reply to Jed Davis [:jld] from comment #6)
> ::: security/sandbox/linux/seccomp_filter.h
> @@ -111,3 @@
> >    /* These are calls we're ok to allow */ \
> >    SECCOMP_WHITELIST_ARCH_HIGH \
> > -  ALLOW_SYSCALL(gettimeofday), \
> 
> My understanding is that gettimeofday is here because it's called
> frequently; this change would move it farther down the list, which could
> impact performance.

I guess performance is critical on B2G - I changed to layout of the macros to accomdodate for that.
 
> > +#define SECCOMP_WHITELIST_DEKSTOP_LINUX \
> Typo.

Fixed!

> 
> @@ +192,5 @@
> > +  ALLOW_SYSCALL(getsockname), \
> > +  ALLOW_SYSCALL(recvmsg), \
> > +  ALLOW_SYSCALL(uname), \
> > +  ALLOW_SYSCALL(rt_sigaction), \
> > +  ALLOW_SYSCALL(sendmsg), \
> 
> Some of these (sendmsg, socketpair, fsync, uname; maybe more) are also in
> the shared list.  I don't know if this is a merge issue (I added them
> recently to get mochitests to pass on b2g, and so the electrolyzed crash
> reporter would work) or if the duplication is intentional.

The duplication was not intentional, I created the list before I pulled your update.
I verified that there are no more duplicate syscalls, besides rt_sigaction, which I marked with a comment.
 
> At the risk of stating the obvious, some of these are syscalls we'll clearly
> want to remove when we can — socket/connect is obviously bad, sendto might
> be, quotactl, rename, kill, execve, symlink, and so on.  I don't know if we
> want to separate them into their own group with a comment.

That is a good point - I created a list of the topmost dangerours syscalls - probably should/have to add move more syscalls over to that list. I guess we can do that once we get that there.
 
> @@ +251,5 @@
> > +
> > +#define SECCOMP_WHITELIST \
> > +  SECCOMP_WHITELIST_B2G_AND_DESKTOP_LINUX \
> > +  SECCOMP_WHITELIST_B2G \
> > +  SECCOMP_WHITELIST_DEKSTOP_LINUX
> 
> (As above.)

Fixed the typo.


Carrying over r+ from jld.
Attachment #828325 - Attachment is obsolete: true
Attachment #8334735 - Attachment is obsolete: true
Attachment #8334735 - Flags: review?(gdestuynder)
Attachment #8334958 - Flags: review+
Christoph - we also need to call SetCurrentProcessSandbox() somewhere to actually start the sandbox. Were you intending to do that in this bug or another one?
Flags: needinfo?(mozilla)
Already happens for Linux Desktop [1].
If you want to test yourself, just follow my instructions from Comment 0 (Description).

[1] https://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentChild.cpp#584
Flags: needinfo?(mozilla)
(In reply to Christoph Kerschbaumer [:ckerschb] from comment #9)
> Already happens for Linux Desktop [1].

Ah - looks like bug 921817 changed this (I thought it was b2g only). Cool!
Finalized the patch and pushed to try to verify it does not break B2G:
  https://tbpl.mozilla.org/?tree=Try&rev=c851333365d8

Also, carrying over r+ from jld.
This is ready to be checked in.
Attachment #8334958 - Attachment is obsolete: true
Attachment #8335276 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/b83c7ab4ced2
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
You need to log in before you can comment on or make changes to this bug.