seccomp sandboxing doesn't affect non-main threads already started


Seccomp state is per-thread, not per-process — i.e., it's copied by clone() but not shared, so our prctl() doesn't affect other threads that already exist by the time.

If Nuwa is enabled, there's a solution of sorts — the child process returns from fork with a single thread and has to create threads that setcontext themselves into duplicates of the parents' non-main threads, which gives us a window where a single prctl can affect everything.

For non-Nuwa (or for Nuwa if that doesn't work; e.g., if it means we'd have to add something dangerous to the whitelist) it's harder.  We could find a signal that isn't already being used by the profiler or about:memory or something else, but we'd also need to iterate threads in a way that can't race with thread creation.  Nuwa managed to accomplish this, so we might be able to do whatever it does.

Proof of concept (syscall 14 is mknod):

(gdb) thread 2
[Switching to thread 2 (Thread 1243.1244)]#0  epoll_wait () at bionic/libc/arch-arm/syscalls/epoll_wait.S:10
10          ldmfd   sp!, {r4, r7}
(gdb) call syscall(14)
$1 = -1
(gdb) thread 12
[Switching to thread 12 (Thread 1243.1416)]#0  __futex_syscall3 () at bionic/libc/arch-arm/bionic/atomics_arm.S:182
182         swi     #0
(gdb) call syscall(14)

Program received signal SIGSYS, Bad system call.
syscall () at bionic/libc/arch-arm/bionic/syscall.S:50
50          ldmfd   sp!, {r4, r5, r6, r7}
The program being debugged was signaled while in a function called from GDB.
Fixed via bug 970676.
