Apparently, ignoring the return value of ContentParent::SendSetProcessPrivileges — as I did when fixing bug 921817 (failure to sandbox non-preallocated content processes) — is not okay. We're explicitly ignoring the return value of message sends with mozilla::unused elsewhere in the constructor, so I guess we can do that here as well? It makes me a little uneasy given the potential security implications. I'm assuming that this didn't break in my testing (or anyone else's) because I'm compiling with GCC 4.4.3, which is the compiler for ICS B2G. I'd expect that aurora and beta would be similarly affected, but I don't know if we care about non-b2g seccomp on those branches enough to want to uplift.
The obvious one-line patch. Verified with try builds.
Revised after feedback on IRC to do something slightly less scary if the message send should fail.
Comment on attachment 829570 [details] [diff] [review] bug936169-sandbox-send-unused-hg1.diff Review of attachment 829570 [details] [diff] [review]: ----------------------------------------------------------------- This looks good, I think. Can you test it though to make sure that KillHard works as expected when called from the destructor? It does a delayed method call with 'this' as an argument so I just want to make sure we aren't causing more problems here.
(In reply to ben turner [:bent] (needinfo? encouraged) from comment #3) > This looks good, I think. Can you test it though to make sure that KillHard > works as expected when called from the destructor? It does a delayed method > call with 'this' as an argument so I just want to make sure we aren't > causing more problems here. Sometimes it works, although there seems to be some confusion between the waitpid in base::KillProcess and whatever we're doing to child processes. Sometimes it... doesn't work. I added a KillHard if (mSubprocess->GetChildProcessHandle() & 256), and sometimes this causes the parent process itself to be sent SIGTERM, and I can't figure out why. It will reproduce with gdb attached, but not with a conditional breakpoint on kill(), suggesting it's timing-sensitive. I know it's not trying to call kill(0, 15), pid 0 indicating the current process group, because I modified kill() to segfault instead in that case and it doesn't. No thread in the process is caught in the act of suicide (and I'd think it would be, unless it's tkill()ing a different thread). In at least one case there was a thread in ContentParent::CreateBrowserOrApp → ContentParent::Init → nsObserverService::AddObserver when the signal was delivered, but that's not the thread gdb switched to (if that even means anything).
I added a printk to the send_signal routine in the kernel. Here's what happens (the "group" parameter is whether the signal is for the thread group == process, rather than the specific thread): kill -15: 45.45 => 262.262 (group=1) kill -15: 262.262 => 45.94 (group=0) kill -15: 45.94 => 45.94 (group=0) kill -17: 45.108 => 1.1 (group=1) The parent's main thread sends SIGTERM to the child process, which responds by sending SIGTERM to the parent's I/O thread. The parent's SIGTERM handler then re-raises the signal, and finally the process dies and init gets SIGCHLD. What's really happening here is that the signal is delivered before the child has called exec. So it gets SIGTERM, and runs the same signal handler (in nsProfileLock.cpp, for reference), and calls raise(). Specifically, it calls the raise() in mozglue, which is pthread_kill(pthread_self(), sig), and those pthread routines use a copy of the tid stored in TLS, which hasn't been updated and still has the tid of the thread that called fork() in the parent. (This, incidentally, may be a bug in Bionic: pthread_kill and pthread_self are async signal safe, so they should work correctly in the forked child of a multithreaded process, and this appears to not be the case.) We have that shim because, according to bug 741272, Bionic's raise() sometimes signaled the wrong thread. There's a commit upstream, which seems to present in ICS and up, which looks like a fix: https://android.googlesource.com/platform/bionic/+/56faf66fd7a90 — but I think it's still not right, because it's calling kill rather than tkill.
(In reply to Jed Davis [:jld] from comment #4) > there seems to be some confusion between the > waitpid in base::KillProcess and whatever we're doing to child processes. That's actually from DidProcessCrash in process_util_posix.cc, which we're almost certainly misusing in IsProcessDead in process_watcher_posix_sigchld.cc. We call it after something has already waited on the child, and it indicates that the process hasn't died. Then, 3000 ms later, the delayed call to ContentParent::ShutDownProcess happens, and calls IsProcessDead *again*, and sees that the process apparently isn't dead, and sends SIGKILL to the process that stopped existing 3 seconds ago. The comment in DidProcessCrash suggests that, in the Nuwa case, we can wind up trying to waitpid on a grandchild process, which we shouldn't be trying to do in the first place, but in that case we get ECHILD on a process that *does* still exist (maybe). So this is complicated.
And! The homescreen app seems to get confused when it tries to start an app that doesn't actually start, because then it stops responding to clicks on any icon in the same page (or the strip at the bottom, if it was one of those) as the failed app. Locking/unlocking the phone resets it; switching to and from a running app might also.
Comment on attachment 829570 [details] [diff] [review] bug936169-sandbox-send-unused-hg1.diff Review of attachment 829570 [details] [diff] [review]: ----------------------------------------------------------------- The patch itself looks fine. Let's file followups for anything that looks broken?
Attachment #829570 - Flags: review?(bent.mozilla) → review+
→ mozilla-inbound; it's desktop that needs this.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
I've created bug 943170, bug 943174, and bug 943181, for comments 5 through 7 respectively. I didn't want to mark them as blocking this bug because it's already resolved. I considered making a tracker for them (and potential future bugs with sudden/unexpected app exit?), but I wasn't sure if that made sense, so I haven't yet.
You need to log in before you can comment on or make changes to this bug.