Closed Bug 1259508 Opened 8 years ago Closed 8 years ago

Seccomp sandbox violation: sys_clone called in content process of Firefox desktop

Categories

(Core :: Security: Process Sandboxing, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed

People

(Reporter: tedd, Assigned: tedd)

References

Details

(Whiteboard: sblc1)

Attachments

(4 files, 4 obsolete files)

Attached file Seccomp error log (obsolete) —
When enabling content sandboxing on Linux desktop (using ac_add_options --enable-content-sandbox), the content process crashes early on due to a seccomp violation when calling sys_clone.

We currently have a policy for clone in place [1]. It seems that non whitelisted clone flags were used. (clone flag: 0x1200011)

A more readable form of the flags would be:

> CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD

used in the ARCH_FORK of libc [2]. Only CLONE_CHILD_CLEARTID is part of our whitelist, CLONE_CHILD_SETTID and SIGCHLD are not whitelisted.

The backtrace suggests that it is caused by libgconf, frame #16 is part of nsGConfService::GetString.

How do we want to handle this, if possible we should avoid adding to the whitelist, maybe we can remote that code somehow?

As far as the missing symbols are concerned, currently I would have to look them up manually with gdb (maybe there is a better way?)... Since those offsets in the log are specific to my compiled version, I can supply more information if required, just ni? me.

[1] https://dxr.mozilla.org/mozilla-central/rev/6202ade0e6d688ffb67932398e56cfc6fa04ceb3/security/sandbox/linux/SandboxFilter.cpp#90
[2] https://github.com/lattera/glibc/blob/59ba27a63ada3f46b71ec99a314dfac5a38ad6d2/nptl/sysdeps/unix/sysv/linux/x86_64/fork.c#L27
The last time I tried this I ran into a problem with fork() and gstreamer, but it looks like I didn't file a bug about it.

fork() isn't hugely bad by itself, but it's probably going to execve.  The biggest problem with that is probably that the seccomp-bpf filter would be preserved, but not the SIGSYS handler.  execve means filesystem access, but there's no way to broker it.  (There's an fexecve(), but on Linux it's implemented on top of execve via procfs.)

If there's some way to pre-perform whatever this is before starting the sandbox, that's one solution.  Changing the clone() policy to just return an error for this kind of thing might also work (but maybe via a trap function in order to optionally log it?), depending on what the library is trying to do.  And of course there's ELF symbol interposition.

The backtrace would help figure out what to do.  mozilla-central has tools/rb/fix_linux_stack.py but I don't know if that knows how to look up system libraries (or how hard it would be to add that, if not).
Thanks :jld, I think the pre-perform way would be ideal, I think in Bug 942698, they were looking for a way to turn on the sandbox once most of the initialization was completed, although the patches are outdated I think.

On a different note, isn't audio only started when required? So if the user browses the web without ever needing audio, this wouldn't be triggered right? 

I looked up the symbols in gdb for libxul
Attachment #8734436 - Attachment is obsolete: true
Ah, never mind my last comment about audio, shouldn't have anything to do with this bug, I confused it with a different bug.
Whiteboard: sb? → sb+
I did some further investigation and as you mentioned :jld, the fork() is followed by a execve(), the function in question is: _read_subprocess_line_argv() inside dbus/dbus-sysdeps-unix.c, of the dbus repository.

Now, libgconf eventually calls ensure_dbus_connection() which triggers the fork/execve if dbus-launch isn't running (or it doesn't have a connection to it).

Since we can't do much about libdbus nor libgconf, I started looking into our code that leads there. It seems that we use the GConfClient API by calling gconf_client_get_string() in nsGConfService, which in return goes into the libgconf library.

This all seems to be part of the initialization of nsProtocolProxyService, which is probably triggered by some events (I haven't looked into that).

I patched the RecvSetProcessSandbox() function to trigger the init code of nsProtocolProxyService, before seccomp is applied (I will add the patch later) to see if the clone call can be avoided by letting it spawn the process before seccomp is running.

The attached file is a strace log the child, which clearly shows the sys_clone call (which would have hit seccomp) followed by sys_execve() of dbus-launch, followed by sys_prctl() for NO_NEW_PRIVS and seccomp.

The patch is quick and dirty, just to illustrate the point, but it may be worth considering to trigger certain functions before applying seccomp, at least this way we would avoid adding sys_execve to the whitelist. 

But I don't know if this would expose us to additional risks, at least in this case I don't see it being a problem, but then again I don't know the code too well.
Patch to trigger init code before seccomp is applied.
(In reply to Julian Hector [:tedd] [:jhector] from comment #4)
> The patch is quick and dirty, just to illustrate the point, but it may be
> worth considering to trigger certain functions before applying seccomp, at
> least this way we would avoid adding sys_execve to the whitelist. 

We'll probably need to do that.  This has come up before, and I think I suggested having a “PreloadUnsandboxableThings” or something like that (cf. the PreloadSlowThings in ContentChild and TabChild for the B2G preallocated process), but whatever it was wound up being worked around differently.  I think Chromium takes this approach for some things, but I don't remember the details offhand.
Attached file libpulse Seccomp error log (obsolete) —
Another instance where sys_clone() is called, this time it seems to be out of libpulse.
I looked up some symbol names in libxul.
Attachment #8742374 - Attachment is obsolete: true
Made some modifications to also solve the libpulse problem.
Attachment #8738705 - Attachment is obsolete: true
:padenot, :badger, I ni? you two because I saw you in the git blame of the related code. If you aren't the right person to talk to and you know someone how can help out, please put them on ni?.

In the attached patch, I called |CubebUtils::GetCubebContext();| and the constructor of |NS_PROTOCOLPROXYSERVICE_CONTRACTID| before activating seccomp, to trigger the initialization code which does some fork/execve. 

Now this seems to fix my problem, but the question is, does this expose us to any additional security risks? I.e. does the code handling the initialization, use untrusted content in some way (like content that is provided by the webserver)

I think for the GetCubebContext part, calling it before seccomp is enabled, means that code is initialized which may not be needed during the session (I believe it is audio related, so I think browsing the web without the need for audio would avoid this code path). It probably isn't a big problem, but what are you thoughts on that? (as for ProtocolProxyService, I have not idea what this code does).

Thanks.
Flags: needinfo?(padenot)
Flags: needinfo?(daniel)
Whiteboard: sb+ → sblc1
(In reply to Julian Hector [:tedd] [:jhector] from comment #10)
> :padenot, :badger, I ni? you two because I saw you in the git blame of the
> related code. If you aren't the right person to talk to and you know someone
> how can help out, please put them on ni?.
> 
> In the attached patch, I called |CubebUtils::GetCubebContext();| and the
> constructor of |NS_PROTOCOLPROXYSERVICE_CONTRACTID| before activating
> seccomp, to trigger the initialization code which does some fork/execve. 
> 
> Now this seems to fix my problem, but the question is, does this expose us
> to any additional security risks? I.e. does the code handling the
> initialization, use untrusted content in some way (like content that is
> provided by the webserver)
> 
> I think for the GetCubebContext part, calling it before seccomp is enabled,
> means that code is initialized which may not be needed during the session (I
> believe it is audio related, so I think browsing the web without the need
> for audio would avoid this code path). It probably isn't a big problem, but
> what are you thoughts on that? (as for ProtocolProxyService, I have not idea
> what this code does).

As you correctly determined, GetCubebContext loads a bunch of audio-related things. Depending on the platform, it can be almost nothing (a malloc and a mutex init on OSX), or more involved (a bunch of syscalls, opening DLL and fetching symbols, on windows and linux). The execution of these methods does not depend on web content at all, we just call it the first time we need to output some audio. 

Last we checked, calling it early meant a longer startup time, but maybe it's better to have seccomp enabled, yeah.
(In reply to Paul Adenot (:padenot) from comment #11)
> Last we checked, calling it early meant a longer startup time, but maybe
> it's better to have seccomp enabled, yeah.

This will become a non-issue if audio gets remoted (which will trigger lazy init in the chrome process).
Flags: needinfo?(padenot)
Thanks Paul and Gian-Carlo, so for the time being (until Bug 1104619 is resolved) we would probably do this early initialization.
> as for ProtocolProxyService, I have not idea what this code does

ProtocolProxyService handles proxies, like PAC and related stuff. Sorry for being slow but I don't quite understand what you're saying ProtocolProxyService does wrong or what exactly the problem is there.

I'll ni mcmanus here to see if we can bring some insights.
Flags: needinfo?(daniel) → needinfo?(mcmanus)
(In reply to Daniel Stenberg [:bagder] from comment #14)
> ProtocolProxyService handles proxies, like PAC and related stuff. Sorry for
> being slow but I don't quite understand what you're saying
> ProtocolProxyService does wrong or what exactly the problem is there.

As I understand it, from the attacked stack trace and earlier comments: it's trying to read the systemwide proxy configuration, which calls into some external frameworks (gconf, dbus), and that's what's trying to fork/exec dbus-launch.
1 - do you have a stack that has more xul symbols in it?

2 - as you say it would be ok to remote this, I'm assuming the problem is that this happens in the child and it would be ok in the parent?

2a depending on exactly what the stack is (see 1), its possible that this class isn't needed at all in the child.. so we could just skip init of it.
Flags: needinfo?(mcmanus)
(In reply to Patrick McManus [:mcmanus] from comment #16)
> 1 - do you have a stack that has more xul symbols in it?


found it :)

Sandbox: frame #21: nsProtocolProxyServiceConstructor[/home/user/build/dist/bin/libxul.so +0xd32e84]
Sandbox: frame #22: nsComponentManagerImpl::CreateInstanceByContractID[/home/user/build/dist/bin/libxul.so +0xb29747]
Sandbox: frame #23: nsComponentManagerImpl::GetServiceByContractID[/home/user/build/dist/bin/libxul.so +0xb2a91f]
Sandbox: frame #24: nsGetServiceByContractIDWithError::operator()[/home/user/build/dist/bin/libxul.so +0xb4deeb]
Sandbox: frame #25: nsCOMPtr_base::assign_from_gs_contractid_with_error[/home/user/build/dist/bin/libxul.so +0xb46806]
Sandbox: frame #26: nsIOService::SpeculativeConnectInternal[/home/user/build/dist/bin/libxul.so +0xba0be9]
Sandbox: frame #27: nsDocument::MaybePreconnect[/home/user/build/dist/bin/libxul.so +0x141aecb]
Daniel, I think you can fix this.. the path being called is into nsIOService::SpeculativeConnect() rather than nsHttphandler::SpeculativeConnect().. there is an intentional difference, but it eludes me at the moment.

but one unintentional difference is that you'll see the nsHttpHandler version does indeed remote this.. https://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpHandler.cpp#2200

I think the IO service would be able to call the same remoting code..
Thanks Patrick, if we can remote this with a couple of lines code, that would probably be the preferred solution.

As for question 2 in Comment 16, the problem is that it happens in the child after we enabled seccomp, so simply calling the code before seccomp is enabled (see attached "DO NOT COMMIT" patch) solves this problem.

I am just not sure if that leads to any security problems, because an attacker could leverage that code that is triggered, to his advantage. So I was hoping someone who knows the code could answer that.
this bug is currently a blocker for enabling the seccomp sandbox on nightly

can you try and work up a patch? I don't think it will be very involved?
Flags: needinfo?(daniel)
Depends on: 1270147
necko bit now in 1270147
Flags: needinfo?(daniel)
With Daniel taking care of Bug 1270147, the only thing left here is audio. According to Comment 11, a early initialization could cause a performance impact.

Which won't be a problem anymore when audio is remoted (Comment 12).

As a workaround, until Bug 1104619 is resolved, I tried to do the early initialization in a not so hackish way by moving it into a separate function located in security/sandbox/linux/, but I couldn't get it to work, because mozilla::CubebUtils::GetCubebContext() is defined in libxul and not exported.

It probably isn't a problem to make it somehow available (export it etc), but I think it is even more hackish as a workaround.

Paul: is there maybe a way to trigger the cubeb init code that can be easily reached from security/sandbox/linux? I looked at some of the functions in the backtrace, but all of them require arguments and I don't know the code.

Jed: if there isn't an 'easy' way to trigger the init code from within the sandbox code, would it be acceptable to add the GetCubebContext() call inside the RecvSetProcessSandbox() function, until Bug 1104619 is resolved? (I know it is ugly, but we could mark it as a workaround with a comment)

Of course there is also the point of having more code that needs to be initialized before seccomp, but so far I only encountered these two instances.

So what do you both think?
Flags: needinfo?(padenot)
Flags: needinfo?(jld)
(In reply to Julian Hector [:tedd] [:jhector] from comment #23)
> Jed: if there isn't an 'easy' way to trigger the init code from within the
> sandbox code, would it be acceptable to add the GetCubebContext() call
> inside the RecvSetProcessSandbox() function, until Bug 1104619 is resolved?
> (I know it is ugly, but we could mark it as a workaround with a comment)

I don't have a problem with that.  But, yes, a comment that we're planning to eventually remove it (and mentioning the bug number) would be good.

If we're concerned about performance, it should be fairly easy to measure the time that that call takes.  Also, there might already be telemetry for content process startup time.
Flags: needinfo?(jld)
Maybe you can call `cubeb_init()` from the sandbox, and hand off the pointer to Gecko right after ?

You just need a pointer to a `cubeb*` (which is an opaque type), and a c-string.
Flags: needinfo?(padenot)
Thanks for the feedback, I decided to place the call inside the RecvSetProcessSandbox() function, because the context passed to cubeb_init() is used inside CubebUtils to check if it is already initialized or not.
So I wanted to avoid passing around too many pointers.

Try push for build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fa412e1d72b7
Attachment #8742960 - Attachment is obsolete: true
Attachment #8751730 - Flags: review?(jld)
Assignee: nobody → julian.r.hector
Attachment #8751730 - Flags: review?(jld) → review+
FYI, bug 1234092 removed gstreamer support, so comment #1 and anything I've mentioned about it trying to run gst-plugin-scanner should no longer be an issue.
Try run seems good and given Comment 11, the potential performance impact is bearable for the sake of enabling seccomp. After all, this is a temporary solution and will no longer be required once Bug 1104619 is resolved.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/e5e3a65e8220
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
See Also: → 1466593
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: