59 bytes, text/x-review-board-request
I'm not entirely sure why this hasn't been a massive cause of problems in the past 6 months. But here we are. Somehow, compiling with clang makes the problems happen *very* reliably. https://treeherder.mozilla.org/#/jobs?repo=try&revision=101fb8d5b1d692e43b6a83209b825b291702e268 The problem is that between the fake-fork and the execv in LaunchApp, the line that does CloseSuperfluousFds(shuffle.MapsToFunc()); allocates memory. Guess what happens when another thread was in the middle of allocating memory when the clone() system call happens? An allocator lock is held. Guess what happens when the created process wants to allocate memory? It's stuck waiting for that lock. The way this is normally handled is that the allocator is setting pthread atfork handlers to run before and after fork(), in parent and child processes, to lock everything before fork, and unlock everything after to ensure that no lock is left in the wrong state. And without that happening, well, we're doomed. Whatever is done to fix this should probably be backported up to esr60.
CloseSuperfluousFds itself should be safe (see comments in the func and bug 622992), so I'm guessing this was the FD shuffling that was changed in bug 1456911?
Yes, this is probably bug 1456911 — there's a std::function constructor. This should be fixable by using std::ref on a stack-allocated closure.
Assignee: nobody → jld
Priority: -- → P1
pthread_atfork not working in the fake fork is a known “feature”. (Android's fork() had the same one, although that might not be the case anymore in versions we support.) There was a glibc bug about exposing a nonstandard fork_with_flags but it didn't go anywhere. I'll see about making the comments in LaunchApp more explicit about this.
Summary: Sandbox "fork" doesn't run pthread atfork handlers, leading to dead locks → Linux base::LaunchApp allocates memory in the cloned child via std::function ctor
I disassembled a local build and I don't see the allocation; it appears to be writing the closure contents to the stack and passing a stack-based pointer into CloseSuperfluousFds. I'm using clang-6.0 on Debian unstable; addr2line resolves that part of the code to header files from the libstdc++-8-dev package, so I tried again with libc++, but still no allocation. Maybe some older Clang / LLVM / stdlib doesn't optimize this? In any case, it's still a bug; this instance of the constructor may allocate. The instance for std::reference_wrapper must not allocate (https://en.cppreference.com/w/cpp/utility/functional/function/function, and [func.wrap.func.con] in the standard at least requires that it can't throw bad_alloc).
Component: Security: Process Sandboxing → IPC
pthread_atfork didn't exist in older versions of Android, but we've had our own for a long while in mozglue (and we wrap fork). While it may be possible to get away with this right now by ensuring we don't allocate between fork and exec, further down the road, project fission is going to require content processes to be fork()ed and not executed, and we'll have to address the fact that atfork handlers have to run somehow.
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #4) > I disassembled a local build and I don't see the allocation; it appears to > be writing the closure contents to the stack and passing a stack-based > pointer into CloseSuperfluousFds. > > I'm using clang-6.0 on Debian unstable; addr2line resolves that part of the > code to header files from the libstdc++-8-dev package, so I tried again with > libc++, but still no allocation. Maybe some older Clang / LLVM / stdlib > doesn't optimize this? The disassembly for the try build linked previously has a call to moz_xmalloc in LaunchApp. And so do, by the way, nightlies.
Ah, actually, the moz_xmalloc is before the fork on nightlies. And there's an additional one between the fork and the execve in the clang build.
The fork server will be single-threaded, so it should be safe from anything that the atfork handlers were trying to prevent. (We may need to interpose pthread_create and/or filter known problems out of LD_PRELOAD, but it *will* be single-threaded.) It doesn't seem too hard to interpose pthread_atfork and emulate it if we really need to, but that's beyond the scope of this bug. (Also, on the topic of getting away with things, touching pthread mutexes in a pthread_atfork child handler invokes undefined behavior: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_atfork.html#tag_16_402_07 )
Comment on attachment 8997257 [details] Bug 1480401 - Avoid heap-allocated closures in async signal safe part of LaunchApp. https://reviewboard.mozilla.org/r/261122/#review268230 r=me, I guess, but I don't know about the below... ::: ipc/chromium/src/base/process_util_linux.cc:61 (Diff revision 1) > + // Constructing a std::function from a reference_wrapper won't allocate. > + CloseSuperfluousFds(std::ref(fdIsUsed)); That is, uh, quite an assumption. Is that justified by the standard?
Attachment #8997257 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #10) > Comment on attachment 8997257 [details] > Bug 1480401 - Avoid heap-allocated closures in async signal safe part of > LaunchApp. > > https://reviewboard.mozilla.org/r/261122/#review268230 > > r=me, I guess, but I don't know about the below... > > ::: ipc/chromium/src/base/process_util_linux.cc:61 > (Diff revision 1) > > + // Constructing a std::function from a reference_wrapper won't allocate. > > + CloseSuperfluousFds(std::ref(fdIsUsed)); > > That is, uh, quite an assumption. Is that justified by the standard? See comment #4; the standard doesn't seem to want to specify things like “won't call new/malloc”, understandably, but it does say that it can't throw bad_alloc. (For values of “the standard” that are drafts before and after C++11, because the actual standards are paywalled.) If that's not convincing enough I can convert this to C-style function pointers and casting from void*, but if I don't have to I'd rather not.
Is my justification in comment #11 convincing enough to land this?
Yes, that seems fine. Thanks for the explanation!
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/1e65391c7583 Avoid heap-allocated closures in async signal safe part of LaunchApp. r=froydnj
The large amount of failure that happened on landing of bug 1480631 suggests this is either not fixed or something else added some allocations between fake-fork and exec.
Is this something you'd consider uplifting to 62 beta or is the fix not solid yet?
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #17) > Is this something you'd consider uplifting to 62 beta or is the fix not > solid yet? It's not too useful without the patch for bug 1481978, which just landed. (There were two instances of the same mistake.) As for uplifting them both: I could try, but it doesn't appear to affect Mozilla's own builds (it depends on the C++ toolchain, and both glandium (comment #7) and I have disassembled Nightlies before this landed to confirm that they weren't affected), and I suspect that “it may affect some downstream Linux distributions” wouldn't be a strong enough justification this close to release.
You need to log in before you can comment on or make changes to this bug.