Closed Bug 1439057 Opened 2 years ago Closed 2 years ago

Remove blanket access to /dev/shm from Linux content processes

Categories

(Core :: Security: Process Sandboxing, enhancement, P1)

60 Branch
Unspecified
Linux
enhancement

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- fixed

People

(Reporter: jld, Assigned: jld)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 2 obsolete files)

Along similar lines to bug 1386404, we should do something about direct access to /dev/shm — this can allow the content process to interfere with (or spy on) interprocess communication that it shouldn't be touching.

We'll need to allow it if audio remoting is disabled, but hopefully that will continue to ride the trains to 60.

Our own IPC uses it for shared memory; on newer kernels we should be able to use memfd_create instead, but for older kernels we'll need to add an "anonymous temporary file" command to the broker.  Either IPC could be modified to call into libmozsandbox, or we could use seccomp-bpf to polyfill memfd_create if it's not implemented, so that IPC just needs to know about memfd_create.  (In the latter case, to avoid a race condition when the sandbox is started, IPC should try memfd_create a second time if memfd_create fails and then direct FS access fails.)

As a convenient side-effect, using the real memfd_create should increase graphics performance vs. brokered access to /dev/shm, although the effect may not be significant.

There may also be libraries calling shm_open/shm_unlink in libc; we could interpose them, but in general we can't allow non-O_EXCL opens.
Summary: Stop letting Linux content processes write to anything in /dev/shm → Remove blanket access to /dev/shm from Linux content processes
Will this change affect bug 1438678?
(In reply to Masatoshi Kimura [:emk] from comment #1)
> Will this change affect bug 1438678?

Not directly — that's reading the file before sandboxing restricts filesystem access, so if the named case in IPC isn't changed it would still work.  However, I would like to remove named shared memory if possible, because nothing else is using it.  In any case, bug 1438678 is easier in some ways with anonymous shared memory anyway.
For reasons passing understanding, memfd_create has no C wrapper in glibc until 2.27… but also we need to be able to build on glibc versions that predate memfd_create being added to the kernel anyway.

So, new idea: have IPC call into libmozsandbox with a regular function call to use the broker client if available, and fall back to the old code if that fails; the broker, which is in libxul, can just use base::SharedMemory itself as a backend (maybe with some refactoring to allow consuming the object and taking the fd).

As for memfd_create… libmozsandbox already has a modern version of the syscall numbers header, and this would allow inserting memfd_create in that part of the call chain.  But that's is a little weird, since it doesn't really belong in sandboxing, and we'd need to have the unsandboxed parent process call the sandbox library in order to benefit.  Another way to do that would be to just drop the syscall numbers into shared_memory_posix.cc under some ifdefs.  But it wouldn't be necessary for sandboxing, just as a performance optimization, so should be a separate bug.
See Also: → 1440203
Priority: -- → P2
Assignee: nobody → jld
Priority: P2 → P1
Blocks: 1440203
See Also: 1440203
Comment on attachment 8962572 [details]
Bug 1439057 - Tighten /dev/shm access in Linux content sandbox policy.

https://reviewboard.mozilla.org/r/231386/#review237108

Makes sense to me.

::: ipc/chromium/src/base/shared_memory_posix.cc:137
(Diff revision 1)
>  }
>  
>  bool SharedMemory::ShareToProcessCommon(ProcessId processId,
>                                          SharedMemoryHandle *new_handle,
>                                          bool close_self) {
> -  const int new_fd = dup(mapped_file_);
> +  DCHECK(mapped_file_ >= 0);

This check seems like something the code should have been doing all along...or should we just return `false` here?  I guess we'll return `false` below, since `new_fd` will either receive the invalid `mapped_file_` or a `dup(mapped_file_)`, which is presumably also -1.
Attachment #8962572 - Flags: review?(nfroyd) → review+
Comment on attachment 8962573 [details]
Bug 1439057 - Use the sandbox broker directly for shared memory file creation on Linux.

https://reviewboard.mozilla.org/r/231388/#review237118

r=me on the ipc/ parts.

::: ipc/chromium/src/base/shared_memory_posix.cc:109
(Diff revision 1)
> +    // Handle a possible race condition: CreateViaSandbox fails
> +    // because the sandbox isn't started, then the sandbox is started,
> +    // then filesystem access is attemped and fails.  In that case a
> +    // second CreateViaSandbox should succeed.

Just to be clear here, because I have zero sandboxing experience: the sandbox is being started on a different thread here and that's why we're racing?  Assuming this is the case, what guarantees that the sandbox startup has necessarily finished by the time we attempt this `CreateViaSandbox` call?  IOW, isn't this sort of a bandaid fix?
Attachment #8962573 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #7)
> Comment on attachment 8962572 [details]
> > +  DCHECK(mapped_file_ >= 0);
> 
> This check seems like something the code should have been doing all
> along...or should we just return `false` here?  I guess we'll return `false`
> below, since `new_fd` will either receive the invalid `mapped_file_` or a
> `dup(mapped_file_)`, which is presumably also -1.

This was sort of checked before, because dup(-1) will fail with EBADF and set off the DCHECK.  (Note that DCHECK is broken at the moment; see bug 1436156.  I'm assuming it will eventually be fixed.)  The situation before and after this patch:

No mapped_file_, before: fail DCHECK, yield -1, return true.
Other failure, before: fail DCHECK, yield -1, return true.

No mapped_file_, after: fail DCHECK, return false.
Other failure, after: return false.


(In reply to Nathan Froyd [:froydnj] from comment #8)
> Comment on attachment 8962573 [details]
> > +    // Handle a possible race condition: CreateViaSandbox fails
> > +    // because the sandbox isn't started, then the sandbox is started,
> > +    // then filesystem access is attemped and fails.  In that case a
> > +    // second CreateViaSandbox should succeed.
> 
> Just to be clear here, because I have zero sandboxing experience: the
> sandbox is being started on a different thread here and that's why we're
> racing?  Assuming this is the case, what guarantees that the sandbox startup
> has necessarily finished by the time we attempt this `CreateViaSandbox`
> call?  IOW, isn't this sort of a bandaid fix?

This can be called from any thread, and SetContentProcessSandbox runs on the main thread[*], but it doesn't need to have finished.  Setting gSandboxBrokerClient (and thus enabling SandboxSharedMemoryCreate) happens-before any thread applies the seccomp-bpf policy, so there are three phases:

1. Accessing /dev/shm works; SandboxSharedMemoryCreate fails.
2. Accessing /dev/shm works; SandboxSharedMemoryCreate works.
3. Accessing /dev/shm fails; SandboxSharedMemoryCreate works.

But there is — and I didn't realize this until just now — a bug: the seccomp-bpf policy can be applied after the open() and before the unlink(), which leaks a zero-length file.  (This can also happen if the process crashes in that window.)  If I pass the broker fd at process startup then all of this goes away, but then we have even more hard-coded file descriptors (bug 1440207).  But the alternative is to… introduce some kind of reader/writer lock to avoid the race?  Which might be worse.

I suppose there's one other alternative: desupport content sandboxing on kernels < 3.11 and use O_TMPFILE.  This isn't completely absurd now that Ubuntu 12.04 is post-EOL, but I don't like it.


[*] More detail: on newer kernels the main thread makes one system call that atomically applies the seccomp-bpf policy to all threads; on older kernels it has to send async signals and get them each to apply the policy to themselves one by one.
Comment on attachment 8962573 [details]
Bug 1439057 - Use the sandbox broker directly for shared memory file creation on Linux.

https://reviewboard.mozilla.org/r/231388/#review237894
Attachment #8962573 - Flags: review?(gpascutto) → review+
Comment on attachment 8962574 [details]
Bug 1439057 - Stop allowing access to /dev/shm in the Linux content sandbox.

https://reviewboard.mozilla.org/r/231390/#review237926
Attachment #8962574 - Flags: review?(gpascutto) → review+
Comment on attachment 8962574 [details]
Bug 1439057 - Stop allowing access to /dev/shm in the Linux content sandbox.

https://reviewboard.mozilla.org/r/231390/#review237960

::: security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp
(Diff revision 1)
>  {
>    // Policy entries that are the same in every process go here, and
>    // are cached over the lifetime of the factory.
>  #if defined(MOZ_CONTENT_SANDBOX)
>    SandboxBroker::Policy* policy = new SandboxBroker::Policy;
> -  policy->AddDir(rdwrcr, "/dev/shm");

This needs to be conditional on `media.cubeb.sandbox`, because PulseAudio uses `/dev/shm` (and there are, as I type this, people on IRC `#media` flipping that pref to troubleshoot a bug, so we should try not to break it).
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #9)
> If I pass the broker fd at process startup then
> all of this goes away, but then we have even more hard-coded file
> descriptors (bug 1440207).  But the alternative is to… introduce some kind
> of reader/writer lock to avoid the race?  Which might be worse.

So, even if I make some much-needed improvements to LaunchOptions to dynamically allocate the fd, there's another problem: the broker policy depends on the remote type (because of file processes), which is in ContentParent and would have to be propagated down through GeckoChildProcessHost and into SandboxLaunchPrepare.  This also complicates the broker's lifetime — it used to be held by the ContentParent, but now it might need to delete itself via runnable.  

> I suppose there's one other alternative: desupport content sandboxing on
> kernels < 3.11 and use O_TMPFILE.  This isn't completely absurd now that
> Ubuntu 12.04 is post-EOL, but I don't like it.

Alternately: remove /dev/shm access if the kernel supports memfd_create (after bug 1440203), and otherwise limit it to a prefix including the pid (my work-in-progress patches for bug 1447867 already give us such a prefix, as a side effect of reducing the possibility of name collision) which I can't think of any useful ways to exploit at the moment.
Attachment #8962573 - Attachment is obsolete: true
Attachment #8962574 - Attachment is obsolete: true
New approach, based on the name prefixes introduced in bug 1447867; no broker changes needed.
Comment on attachment 8962572 [details]
Bug 1439057 - Tighten /dev/shm access in Linux content sandbox policy.

https://reviewboard.mozilla.org/r/231386/#review243330
Attachment #8962572 - Flags: review?(gpascutto) → review+
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/baeab3bff807
Tighten /dev/shm access in Linux content sandbox policy. r=froydnj,gcp
https://hg.mozilla.org/mozilla-central/rev/baeab3bff807
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Backed out: https://hg.mozilla.org/mozilla-central/rev/7793bcae69f6
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/02395f0e8074
Tighten /dev/shm access in Linux content sandbox policy. r=gcp
https://hg.mozilla.org/mozilla-central/rev/02395f0e8074
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Sorry; forgot to flip the status-61 flag when backing out.
You need to log in before you can comment on or make changes to this bug.