Closed Bug 1401062 Opened 8 years ago Closed 7 years ago

Use clone() instead of fork() for sandboxed Linux processes and remove SandboxEarlyInit etc.

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla60

Tracking Flags:

Tracking

Status

firefox60

---

fixed

People

(Reporter: jld, Assigned: jld)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: sb+)

Attachments

(3 files)

Bug 1401062 - Delete the old namespace/chroot code and reorganize sandbox init. 7 years ago Jed Davis [:jld] ⟨⏰\|UTC-8⟩ ⟦he/him⟧ 59 bytes, text/x-review-board-request	gcp : review+	Details
Bug 1401062 - Create Linux child processes with clone() for namespace/chroot sandboxing. 7 years ago Jed Davis [:jld] ⟨⏰\|UTC-8⟩ ⟦he/him⟧ 59 bytes, text/x-review-board-request	gcp : review+	Details
Bug 1401062 - Avoid doing sandbox-related things to unsandboxed child processes. 7 years ago Jed Davis [:jld] ⟨⏰\|UTC-8⟩ ⟦he/him⟧ 59 bytes, text/x-review-board-request	gcp : review+	Details

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Description

•

8 years ago

Instead of starting child processes with fork() and then using unshare() early in startup while single-threaded to deal with namespace isolation (and gaining local capabilities for chroot()), we should create the child process and unshare all necessary namespaces in one step using clone(). Advantages: * This enables pid namespace isolation (bug 1151624), while still having the child process as a direct child of the parent as Gecko IPC expects. * Removing the kUnexpectedThreads workaround from bug 1222500, as well as any other IsSingleThreaded-related problems (e.g., bug 1291553). * We can decide what namespace-related actions to take in a process that has all the XPCOM/Gecko services running, rather than a limited context that needs to be fed information via environment variables. (For example, bug 1376910 will need an exception for a specific graphics driver, and as we enable more features we'll usually want to be able to control them with prefs rather than env vars.) Disadvantages: * We're bypassing fork() and therefore pthread_atfork(), so the post-clone path needs to be async signal safe and not cheat by using malloc. However, we're already doing this — it's required by POSIX, and apparently we've had run-ins with broken pthread_atfork()s that forced the issue. * If we just use syscall(__NR_clone, ...), we're bypassing the part of glibc that updates its cached copy of the pid and tid. In principle, this shouldn't be a problem as long as we don't call anything like raise() — which means blocking signals across clone() and resetting signal handlers in the child before unblocking. Alternately, Chromium uses glibc's clone() wrapper, which insists on calling a function on an alternate stack instead of behaving like fork(), by longjmp'ing out of the callback; we could borrow their code if we run into problems using the syscall directly. Non-Features / Alternatives: * This isn't useful in the absence of unprivileged user namespace support. To handle that case, we'd need to run the child process indirectly via a setuid root wrapper. In this case, namespaces could be also be handled by a wrapper executable (and avoid the multithreaded-fork issues). * We're still doing a copy-on-write of the parent's entire address space; i.e., making the entire thing read-only and causing page faults on write accesses. This is a fundamentally expensive operation, but as far as I know we don't have data on how significant it might be for Firefox. If we wanted to avoid it, we'd need to run the child process indirectly via a small server process (the Chromium developers use the term “zygote”, and seem to consider direct use of fork/clone too expensive to even consider, for their use case). For both of these, we'd need IPC's base::LaunchApp, or a replacement for it, to do blocking I/O to get the process's actual pid. This can't wait until it gets the pid in the IPC Hello message; the launch-time pid will have already escaped into Endpoint<> objects by then. Basically, that pid needs to match what the child gets from base::GetCurrentProcessId to keep the IPC framework happy, and both of those have to match the child's actual pid (in the parent's pid namespace, as observed via SCM_CREDENTIALS) for the crash reporter to work. This isn't insurmountable, but we'd like to do *less* blocking in this path: see bug 1348361. And there needs to be a story for waitpid() when the “child process” isn't an actual child process. So my plan is to set things up to use clone() directly, and most of the related work (see “Advantages”, above) will carry over even if the core of the process creation winds up being rewritten in the future. I already have this working; it mostly just needs to be cleaned up and reviewed.

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

8 years ago

Depends on: 1401786

Jim Mathies [:jimm]

Updated

•

8 years ago

Priority: -- → P1

Whiteboard: sb+

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

8 years ago

Priority: P1 → P2

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Comment 1

•

8 years ago

A few notes: We can't completely get rid of SandboxEarlyInit: for non-tsync kernels we need to know the signal number we're going to use for the signal “broadcast” before libraries like glib have a chance to block it. (That could also be done in a static initializer, but we're trying to have fewer of those where possible.) However, it's no longer necessary to be strictly single-threaded there. It's possible to retain the old SandboxEarlyInit and dynamically control whether to use it or clone() with a pref, but this doesn't provide enough benefit to be worth it. For troubleshooting, setting MOZ_ASSUME_USER_NS=0 will force fork(); that will affect content the same way as the hypothetical pref, although for GMP it will remove the existing namespace/chroot layer. If we encounter regressions then we should fix them and uplift, and if it's too close to release for that (I don't think this will happen) we can just back out the patches on the branch. And I'm going to use the longjmp() trick to use glibc's clone() wrapper, instead of doing the syscall directly: it's not much code, the approach seems to be sound given that it's been used for years by Chromium, and it avoids a class of footguns that would probably be difficult to debug.

Comment hidden (mozreview-request)

Gian-Carlo Pascutto [:gcp]

Comment 5

•

7 years ago

mozreview-review

Comment on attachment 8941332 [details] Bug 1401062 - Delete the old namespace/chroot code and reorganize sandbox init. https://reviewboard.mozilla.org/r/211642/#review217768 ::: security/sandbox/linux/Sandbox.cpp:301 (Diff revision 1) > syscall(__NR_futex, reinterpret_cast<int*>(&gSetSandboxDone), > FUTEX_WAKE, 1); > } > > static void > EnterChroot() I guess this can go now?

Attachment #8941332 - Flags: review?(gpascutto) → review+

Gian-Carlo Pascutto [:gcp]

Comment 6

•

7 years ago

mozreview-review

Comment on attachment 8941333 [details] Bug 1401062 - Create Linux child processes with clone() for namespace/chroot sandboxing. https://reviewboard.mozilla.org/r/211644/#review217790

Attachment #8941333 - Flags: review?(gpascutto) → review+

Gian-Carlo Pascutto [:gcp]

Comment 7

•

7 years ago

mozreview-review

Comment on attachment 8941334 [details] Bug 1401062 - Avoid doing sandbox-related things to unsandboxed child processes. https://reviewboard.mozilla.org/r/211646/#review217792

Attachment #8941334 - Flags: review?(gpascutto) → review+

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Comment 8

•

7 years ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #5) > Comment on attachment 8941332 [details] > > static void > > EnterChroot() > > I guess this can go now? It could, but the function becomes non-empty again in the second patch.

Comment hidden (mozreview-request)

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Comment 12

•

7 years ago

Updated the commit message in the main patch to mention the force-namespace pref. We're in the 59 soft code freeze, so I think I'm going to defer landing this until it's branched and the 60 cycle starts next Monday.

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

7 years ago

Blocks: 1430756

Pulsebot

Comment 13

•

7 years ago

Pushed by jedavis@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0a64770aace0 Delete the old namespace/chroot code and reorganize sandbox init. r=gcp https://hg.mozilla.org/integration/autoland/rev/400800683ab6 Create Linux child processes with clone() for namespace/chroot sandboxing. r=gcp https://hg.mozilla.org/integration/autoland/rev/bd7ff5744eb2 Avoid doing sandbox-related things to unsandboxed child processes. r=gcp

Raul Gurzau (:RaulG)

Comment 14

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0a64770aace0 https://hg.mozilla.org/mozilla-central/rev/400800683ab6 https://hg.mozilla.org/mozilla-central/rev/bd7ff5744eb2

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox60: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla60

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

7 years ago

status-firefox57: affected → ---

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

7 years ago

Depends on: 1433288

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

7 years ago

Depends on: 1434328

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

7 years ago

Depends on: 1434528

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

7 years ago

Depends on: 1436882

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

7 years ago

Depends on: 1443078

Mike Hommey [:glandium]

Updated

•

7 years ago

Depends on: 1480401

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Assignee

Updated

•

7 years ago

No longer depends on: 1480401

You need to log in before you can comment on or make changes to this bug.