Closed Bug 970676 Opened 10 years ago Closed 10 years ago

bionic doesn't setuid() on threads

Categories

(Firefox OS Graveyard :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(blocking-b2g:1.3+, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed)

RESOLVED FIXED
1.4 S2 (28feb)
blocking-b2g 1.3+
Tracking Status
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: pauljt, Assigned: jld)

References

Details

(Keywords: sec-high)

Attachments

(3 files, 10 obsolete files)

393 bytes, patch
Details | Diff | Splinter Review
17.12 KB, patch
jld
: review+
dveditz
: sec-approval+
Details | Diff | Splinter Review
16.09 KB, patch
jld
: review+
Details | Diff | Splinter Review
From looking at bug 969040, I believe jld has discovered that bionic does not setuid correctly on threads. The implication of this is that threads inside child processes run as root instead of the child process. Jld - do I have this correct and can you provide more information please?
Flags: needinfo?(jld)
note this only happens for preallocated processes

So for example, this doesn't happen for homescreen, but it happens for apps you start afterwards.

Also, ps doesn't report this correctly.

Here's an example for clock (tasks=threads, 2072):

root@android:/ # cat /proc/2072/task/*/status|grep Uid                         
Uid:    12072   12072   12072   12072
Uid:    0       0       0       0
Uid:    0       0       0       0
Uid:    0       0       0       0
Uid:    0       0       0       0
Uid:    0       0       0       0
Uid:    0       0       0       0
Uid:    0       0       0       0
Uid:    0       0       0       0
Uid:    0       0       0       0
Uid:    0       0       0       0
Uid:    0       0       0       0
Uid:    0       0       0       0
Uid:    12072   12072   12072   12072
Uid:    12072   12072   12072   12072
Uid:    12072   12072   12072   12072

Now homescreen (388):

root@android:/ # cat /proc/388/task/*/status|grep Uid                          
Uid:    10388   10388   10388   10388
Uid:    10388   10388   10388   10388
Uid:    10388   10388   10388   10388
Uid:    10388   10388   10388   10388
Uid:    10388   10388   10388   10388
Uid:    10388   10388   10388   10388
Uid:    10388   10388   10388   10388
Uid:    10388   10388   10388   10388
Uid:    10388   10388   10388   10388
Uid:    10388   10388   10388   10388
Uid:    10388   10388   10388   10388
Uid:    10388   10388   10388   10388
Uid:    10388   10388   10388   10388
Uid:    10388   10388   10388   10388

Also this is weird, since homescreen is not preallocated:
app_388   401   388   79872  31824 00000000 400d05f8 S (Preallocated a
CC'ing some Nuwa people in case it is related.
note that, :jld also found out that ps on b2g doesn't correctly report the thread uid. It just takes the process uid and displays it as the thread uid. This is probably why this wasn't previously caught.
[I was in the middle of filing a bug; I'll copy the text I was writing here instead.]

Bionic (Android libc) has an implementation of setuid() (and setgid() and so on) that simply invokes the corresponding Linux system call — which affects only the calling thread and any child threads it later creates, not the entire thread group (process).

The result is that, in any preallocated (or, for 1.4 and up, Nuwa-created) process on b2g, an attacker with control over the main thread of a content process can escalate to root by taking control of one of these privileged threads.

For what little it's worth, this appeas to be a bug in Bionic — the POSIX spec calls for the entire process's permissions to be changed.

glibc's solution to this problem is… complicated.  It uses signals (sort of like an inter-processor interrupt) to make each other thread call setuid (or setgid or whatever), the same way I was trying to do with seccomp — and to make that work, it #define's SIGRTMIN to a function call (!) so that it can steal signals from the bottom of the SIGRTx range without interfering with other code.

As for the Android toolbox ps, it gets the USER value by doing stat() on the /proc directory for the pid rather than the tid (if applicable); this has a one-line fix, thusly:

USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
app_512   512   358   64780  23360 ffffffff 400b85e0 S /system/b2g/plugin-container
root      513   512   64780  23360 00000000 400b85e0 S Chrome_ChildThr
root      514   512   64780  23360 00000000 400b87ec S JS GC Helper
root      515   512   64780  23360 00000000 ffff0520 S Socket Thread
root      516   512   64780  23360 00000000 400b87ec S BgHangManager
root      517   512   64780  23360 00000000 400b87ec S (Nuwa)
root      518   512   64780  23360 00000000 400b87ec S ImageBridgeChil
root      519   512   64780  23360 00000000 400b78b0 S Binder Thread #
root      520   512   64780  23360 00000000 400b78b0 S Binder Thread #
root      530   512   64780  23360 00000000 400b87ec S Timer
app_512   554   512   64780  23360 00000000 400b87ec S Network Seer
app_512   556   512   64780  23360 00000000 400b87ec S HTML5 Parser
app_512   559   512   64780  23360 00000000 400b87ec S Analysis Helper
app_512   560   512   64780  23360 00000000 400b87ec S Analysis Helper
app_512   561   512   64780  23360 00000000 400b87ec S Built-in Keyboa
app_512   562   512   64780  23360 00000000 400b87ec S Media State
Flags: needinfo?(jld)
This sounds like a blocker to me (I think sec-high is appropriate), but not sure what we can block security wise at this point. I've nominated 1.3, but triage please correct to 1.4 if that is more appropriate. Note this likely affects all production devices - I'm flashing a buri to test 1.1 at the moment.
blocking-b2g: --- → 1.3?
Whiteboard: sec-high
This is a patch to system/core/ to show the per-thread uid when doing `ps -t`; it might be useful for people investigating.

Note that b2g-ps runs /system/bin/toolbox directly, not via the /system/bin/ps link, if you want to update the toolbox on a device you can't (or choose not to) fully flash.
Keywords: sec-high
Whiteboard: sec-high
blocking-b2g: 1.3? → 1.3+
Confirmed this on 1.1 Venezuela build:

pc:hamachi-feb10 ptheriault$ adb shell b2g-ps
APPLICATION      USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
b2g              root      140   1     226152 64540 ffffffff 00000000 R /system/b2g/b2g
Homescreen       app_450   450   140   74156  26824 ffffffff 00000000 S /system/b2g/plugin-container
Usage            app_499   499   140   68552  23972 ffffffff 00000000 S /system/b2g/plugin-container
Communications   app_516   516   140   70020  24408 ffffffff 00000000 S /system/b2g/plugin-container
Settings         app_530   530   140   75244  29508 ffffffff 00000000 S /system/b2g/plugin-container
Gallery          app_564   564   140   104744 28040 ffffffff 00000000 R /system/b2g/plugin-container
plugin-containe  root      640   140   30868  2480  ffffffff 00000000 R /system/b2g/plugin-container

pc:hamachi-feb10 ptheriault$ adb shell cat /proc/564/task/*/status|grep Uid
Uid:	10564	10564	10564	10564
Uid:	0	0	0	0
Uid:	0	0	0	0
Uid:	0	0	0	0
Uid:	0	0	0	0
Uid:	0	0	0	0
Uid:	0	0	0	0
Uid:	0	0	0	0
Uid:	0	0	0	0
Uid:	0	0	0	0
Uid:	0	0	0	0
Uid:	0	0	0	0
Uid:	10564	10564	10564	10564
Attached patch bug970676-threadsec-wip0.diff (obsolete) — Splinter Review
I took the code I'd written for bug 969040 and transplanted it into ContentChild, to apply it to the IPC uid/gid code.  There's a lot of FIXME and printf_stderr at the moment and I don't know if this belongs where it does (and the whitelist change should be looked at more carefully).  But it does seem to work:

APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
Homescreen       2 app_137   137   123   75936  29716 ffffffff 4005b3c0 S /system/b2g/plugin-container
Chrome_ChildThr  2 app_137   138   137   75936  29716 c00bb45c 4005b3c0 S Chrome_ChildThr
JS GC Helper     2 app_137   139   137   75936  29716 c005deb8 4005b5cc S JS GC Helper
Socket Thread    2 app_137   140   137   75936  29716 c009e6d4 4005b438 S Socket Thread
BgHangManager    2 app_137   141   137   75936  29716 c005deb8 4005b5cc S BgHangManager
(Nuwa)           2 app_137   142   137   75936  29716 c005deb8 4005b5cc S (Nuwa)
ImageBridgeChil  2 app_137   143   137   75936  29716 c005deb8 4005b5cc S ImageBridgeChil
Binder Thread #  2 app_137   144   137   75936  29716 c01ad2ac 4005a690 S Binder Thread #
Binder Thread #  2 app_137   145   137   75936  29716 c01ad2ac 4005a690 S Binder Thread #
Timer            2 app_137   153   137   75936  29716 c005dfbc 4005b5cc S Timer
Network Seer     2 app_137   157   137   75936  29716 c005deb8 4005b5cc S Network Seer
HTML5 Parser     2 app_137   159   137   75936  29716 c005deb8 4005b5cc S HTML5 Parser
Assignee: nobody → jld
Attachment #8373826 - Flags: feedback?
Comment on attachment 8373826 [details] [diff] [review]
bug970676-threadsec-wip0.diff

Review of attachment 8373826 [details] [diff] [review]:
-----------------------------------------------------------------

awesome. going through proc is indeed straightforward and avoids dealing with signaling madness.

while i think we should fix this differently for bionic/setuid as well and provide an upstream patch, we should probably go with this meanwhile.

We do need to have proper error handling tho.
Attachment #8373826 - Flags: feedback? → feedback+
Attached patch bug970676-threadsec-hg0.diff (obsolete) — Splinter Review
Cleaned up the FIXMEs, did some general cleanup, tried to add some comments, and backed out the msgget removal (which should be a separate bug).
Attachment #8373826 - Attachment is obsolete: true
Attached patch bug970676-threadsec-hg1.diff (obsolete) — Splinter Review
…and now without breakage for the i386 sandbox build.  (recvmsg != recvmmsg.)
Attachment #8374525 - Attachment is obsolete: true
So currently, I think that only reason we even launch the content process with root is so that we can assign specific group privs
http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/process_util_linux.cc#317

It seems to me that we could flip the logic around. Rather than launching the content process with root just so that it can set the group, if needed, we should launch the content process without root, but assign all of the needed groups.

And then have the process drop the group privs that aren't needed.

This would reduce the amount of escalation that could be accomplished.
(In reply to Dave Hylands [:dhylands] from comment #12)
> It seems to me that we could flip the logic around. Rather than launching
> the content process with root just so that it can set the group, if needed,
> we should launch the content process without root, but assign all of the
> needed groups.
> 
> And then have the process drop the group privs that aren't needed.

Apparently setgroups can't do that.  Even the same group list the process already has is rejected:

getgroups(7, [0, 4, 6, 27, 50, 106, 180]) = 7
setgroups(7, [0, 4, 6, 27, 50, 106, 180]) = -1 EPERM (Operation not permitted)

The man page mentions CAP_SETGID, but that seems to be permission to set *any* gid, and that seems dangerous (and dropping it with capset would also be per-thread, I think?).
Sigh.

So I think we should figure out exactly which threads need the elevated priviledge for the camera/video stuff.

If the threads needing the privs are fairly well isolated, then we could drop the privs for any other thread at fork time (we already intercept the fork call in mozglue in order to fix a bug in bionic's atfork). Then we'd have alot less threads to worry about (and dropping privs at fork time seems safer to me).
(In reply to Dave Hylands [:dhylands] from comment #14)
> So I think we should figure out exactly which threads need the elevated
> priviledge for the camera/video stuff.

I was talking about this with mwu, and apparently camera access is in mediaserver now, but SD card access is an issue: http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/process_util_linux.cc#305

...but is the content process really opening SD card files directly?  Isn't DeviceStorage remoted over IPC?

One thing to keep in mind is that, whatever we do, it needs to work on 1.3 (and maybe earlier branches, but I'm not sure about that).
So, the Gallery app can access files regardless of group membership (see above re IPC), but the Camera app needs to be in group 1015 or else it can't access the camera, and it also eventually decides that the SD card is full.  Communication with the mediaserver seems to be via the Android Binder, which I know little about.  I don't yet know what the actual check is and whether we could easily use something besides group membership.
On the nexus 4, no app  had access to the sdcard unless they were root. All of the files were already remoted, except opening a video file, which used to happen in the content app. Bug 910498 landed on m-c Jan 21. It looks like that bug didn't land on 1.3 (which would have meant opening video files would have been remoted as well). So it looks like we still needs to be able to write to sdcard.
Well, if it helps enough, we could ask for uplift on bug 910498.
So I guess we can test on master and remove the calls to setgroups and see if camera still works. If we can remove the call to setgroups, then we should be able to lower root priviledge after the initial fork and before the exec of plugin container.
So I found another syscall that needs whitelisting I think: epoll_ctl 

See bug 971635 for details of error.
Adding epoll_ctl to seccomp_filter.h fixed the camera bug in 971635. Attached is an attempt an updated patch. Check it well since its the first one I've made :)
(In reply to Dave Hylands [:dhylands] from comment #19)
> So I guess we can test on master and remove the calls to setgroups and see
> if camera still works.

The camera does *not* work if I remove the setgroups, on m-c, on keon.  I haven't tried other devices.
Attached patch bug970676-threadsec-hg2.diff (obsolete) — Splinter Review
Squash.
Attachment #8374531 - Attachment is obsolete: true
Attachment #8374762 - Attachment is obsolete: true
Comment on attachment 8374829 [details] [diff] [review]
bug970676-threadsec-hg2.diff

Assuming that we don't want to try to change the underlying security model (and backport that to at least 1.3) to avoid the multithreaded setuid, how do we feel about this patch?
Attachment #8374829 - Flags: review?(dhylands)
Attached patch bug970676-threadsec-hg3.diff (obsolete) — Splinter Review
Rebase; conflict with bug 971370.
Attachment #8374829 - Attachment is obsolete: true
Attachment #8374829 - Flags: review?(dhylands)
Attachment #8376001 - Flags: review?(dhylands)
Comment on attachment 8376001 [details] [diff] [review]
bug970676-threadsec-hg3.diff

Review of attachment 8376001 [details] [diff] [review]:
-----------------------------------------------------------------

We talked about this offline, let's keep the security code in Sandbox.cpp and just fix the build system to always build this function regardless of the MOZ_CONTENT_SANDBOX build flag. Spreading our security stuff throughout the tree makes me nervous.
Attachment #8376001 - Flags: feedback-
Comment on attachment 8376001 [details] [diff] [review]
bug970676-threadsec-hg3.diff

Review of attachment 8376001 [details] [diff] [review]:
-----------------------------------------------------------------

This has the right flavor, I just think we need to close up the holes.

r- since I think I should look at it again after its fixed.

::: dom/ipc/ContentChild.cpp
@@ +596,5 @@
>  }
>  
> +// See bug 970676 and bug 969040 for why we need all of this.
> +static void
> +SetThreadPrivileges(ChildPrivileges aPrivs, bool isMainThread)

Why do we need to pass isMainThread? Can't this code just figure it out?

i.e. getpid() == gettid() implies isMainThread or just use NS_IsMainThread()

@@ +678,5 @@
> +                continue;
> +            }
> +            char *endptr;
> +            tid = strtol(de->d_name, &endptr, 10);
> +            if (tid <= 0 || endptr == de->d_name) {

endptr will never be == de->d_name since you checked that de->d_name[0] is a digit.

You should instead check that *endptr == '\0' so that somthing like 123abc won't be considered to be a tid.

We also shouldn't crash. We should just ignore things that don't look like a tid. At the very least whatever we do here should be consistent with what we do if !isdigit(de->d_name[0]).

I think you can simply this whole thing (from line 677 thru 686) to just be:

tid = strtol(de->d_name, &endptr, 10);
if (tid <= 0 || *endptr != '\0') {
  continue;
}

@@ +691,5 @@
> +            }
> +            if (syscall(__NR_tgkill, pid, tid, sSetPrivilegesSignum) != 0) {
> +                printf_stderr("SetProcessPrivileges: tgkill(%d,%d): %s\n",
> +                              pid, tid, strerror(errno));
> +                MOZ_CRASH();

This seems bad. If that thread goes away between the time that you call readdir and the time that you call tgkill then the tgkill call will fail with errno == ESRCH. That's an expected result. We shouldn't crash. We should just keep going.

@@ +693,5 @@
> +                printf_stderr("SetProcessPrivileges: tgkill(%d,%d): %s\n",
> +                              pid, tid, strerror(errno));
> +                MOZ_CRASH();
> +            }
> +            unused << sem_wait(&sSetPrivilegesDone);

This also seems like it could deadlock. If we signal the child process and it dies or goes away before it can signal the semaphore, then we'll wind up waiting forever here.

We should at the very least use sem_timedwait, and check to see if the thread we're waiting for is still running. I think you can use waitid with WNOHANG and WNOWAIT to determine if a child thread is still running.

This code also needs to deal with EINTR return code from sem_wait.

::: security/sandbox/linux/seccomp_filter.h
@@ +22,5 @@
>    ALLOW_SYSCALL(mmap2),
>  #elif defined(__x86_64__)
> +#define SECCOMP_WHITELIST_ARCH_HIGH \
> +  ALLOW_SYSCALL(recvmsg), \
> +  ALLOW_SYSCALL(sendmsg),

Why would these be needed on 64-bit and not on 32-bit?

Shouldn't these be enabled for all architectures (esp since Geekphone will have an x86 phone running FxOS)
Attachment #8376001 - Flags: review?(dhylands) → review-
(In reply to Dave Hylands [:dhylands] from comment #27)
> ::: dom/ipc/ContentChild.cpp
> @@ +596,5 @@
> >  }
> >  
> > +// See bug 970676 and bug 969040 for why we need all of this.
> > +static void
> > +SetThreadPrivileges(ChildPrivileges aPrivs, bool isMainThread)
> 
> Why do we need to pass isMainThread? Can't this code just figure it out?
> 
> i.e. getpid() == gettid() implies isMainThread or just use NS_IsMainThread()

It could, but the caller knows whether it's on the main thread or not, so it seemed simpler to pass it in.  (Also, for NS_IsMainThread in particular, I wasn't sure if it was safe from non-NSPR threads.)

> We should at the very least use sem_timedwait, and check to see if the
> thread we're waiting for is still running. I think you can use waitid with
> WNOHANG and WNOWAIT to determine if a child thread is still running.

tgkill with signal 0 might also work.
 
> ::: security/sandbox/linux/seccomp_filter.h
> @@ +22,5 @@
> >    ALLOW_SYSCALL(mmap2),
> >  #elif defined(__x86_64__)
> > +#define SECCOMP_WHITELIST_ARCH_HIGH \
> > +  ALLOW_SYSCALL(recvmsg), \
> > +  ALLOW_SYSCALL(sendmsg),
> 
> Why would these be needed on 64-bit and not on 32-bit?
> 
> Shouldn't these be enabled for all architectures (esp since Geekphone will
> have an x86 phone running FxOS)

It's already allowed on i386 — there's an "ipc" syscall that multiplexes all the socket operations (also SysV message queues and semaphores, I think?) instead of separate syscalls the way the other architectures have.  (Yes, this is overly permissive; there's a bug for it.)
It occurs to me that if a thread we've already sandboxed/setuid'ed forks while we're signaling threads, we'll try to re-sandbox/setuid its child, which will fail (the setuid/setgid would succeed if not sandboxed, but setgroups won't).

The original version of the patch, which was just for sandboxing, used prctl(PR_GET_SANDBOX).  The corresponding check for permissions might be something like (privs != PRIVILEGES_INHERIT && geteuid() == 0), maybe?  And we'd need one or the other (or both?) depending on which combination of {bionic, seccomp} is involved.
We also need to consider the updater. Which is fork/exec'd from b2g, but which needs to stay as root.
(In reply to Dave Hylands [:dhylands] from comment #30)
> We also need to consider the updater. Which is fork/exec'd from b2g, but
> which needs to stay as root.

If it isn't a content process (plugin-container, PContent IPC protocol, etc.), or if it's a content process that hasn't been sent a SetProcessPrivileges message, then the patch for this bug shouldn't affect it.
This lets us deal with our Linux security sandboxing issues (uid/group and seccomp) in one place in security/sandbox/linux.  Broken out from the main patch for, I hope, easier review.
Attachment #8379449 - Flags: review?(bent.mozilla)
Attachment #8379449 - Flags: feedback?(gdestuynder)
…and this is the new version of the thread-iteration patch.

I don't know if it needs to happen in this bug, but I'd feel more comfortable with the edge-case code if I had a "stress option" to try to provoke it — maybe start a thread that continuously creates a new copy of itself and exits (being careful to make sure it's not a fork bomb, etc.).

Also, I might need to adjust some includes/ifdefs in this patch series to not make the seccomp desktop build more broken than it already is.
Attachment #8376001 - Attachment is obsolete: true
Attachment #8379490 - Flags: review?(gdestuynder)
Attachment #8379490 - Flags: review?(dhylands)
Attachment #8379449 - Flags: feedback?(gdestuynder) → feedback+
Attachment #8379490 - Flags: review?(gdestuynder) → review+
Comment on attachment 8379490 [details] [diff] [review]
bug970676-p1-sandbox-threads-hg0.diff

Review of attachment 8379490 [details] [diff] [review]:
-----------------------------------------------------------------

So I thinkwe just need a couple more cleanups. Sorry about my mis-guidance on using sem_timedwait from the previous review.

::: security/sandbox/linux/Sandbox.cpp
@@ +295,5 @@
> +    LOG_ERROR("opendir /proc/self/task: %s\n", strerror(errno));
> +    MOZ_CRASH();
> +  }
> +  unused << /* Can't fail with these parameters. */
> +    sem_init(&sSetSandboxDone, /* shared = */ 0, /* value = */ 0);

I think that we need to reinitialize the semaphore each iteration through the loop.

Otherwise there are probably weird races where the child exits prematurely, and we don't know if it happened before or after it posted to the semaphore.

So if a child dies, then the semaphore state is unknown.

@@ +334,5 @@
> +      // after receiving the signal but before entering the signal
> +      // handler, we need to avoid blocking forever.
> +      do {
> +        struct timespec timeLimit;
> +        clock_gettime(CLOCK_REALTIME, &timeLimit);

Using CLOCK_REALTIME raises a red flag for me.

I now realize that POSIX mandates the use of CLOCK_REALTIME for this API, but I'm pretty sure that if the time changes between our call to clock_gettime and sem_timedwait's call to clock_gettime then we can wind up waiting for arbitrarily long amounts of time.

I think we need to use something like: pthread_cond_timedwait_monotonic_np (bionic seems to implement both pthread_cond_timedwait_monotonic_np and pthread_cond_timedwait_monotonic) instead of using a semaphore.
(sorry about suggesting sem_timedwait)

@@ +342,5 @@
> +          break;
> +        }
> +        // Note: it could be EINTR.  Check the thread's existence anyway.
> +      } while (syscall(__NR_tgkill, pid, tid, 0) == 0);
> +      if (errno == ESRCH) {

I think the logic around errno might work, but it seems very odd to me.

I would remove the if check around the call to sem_timedwait and just blindly set errno to 0 before calling tgkill.
Attachment #8379490 - Flags: review?(dhylands) → feedback+
I just added a use of pthread_cond_timedwait_monotonic_np in bug 974054.
Attachment #8379449 - Flags: review?(bent.mozilla) → review+
(In reply to Dave Hylands [:dhylands] from comment #34)
> I think we need to use something like: pthread_cond_timedwait_monotonic_np
> (bionic seems to implement both pthread_cond_timedwait_monotonic_np and
> pthread_cond_timedwait_monotonic) instead of using a semaphore.
> (sorry about suggesting sem_timedwait)

pthread_cond_{signal,broadcast} aren't async signal safe; sem_post is.

The other way to avoid dealing with this kind of race is to intercept pthread_create and make it block while we're doing this, but do we have anything like libmozglue on desktop?
Another alternative: get rid of the semaphore entirely and loop on checking an atomic value (and probing the thread's existence), plus some combination of sched_yield and an exponential-backoff nanosleep.  That will catch an exiting thread, is unaffected by clock changes (this is in POSIX and the Linux nanosleep(2) man page specifically mentions it), and the part that takes place in the signal handler is async signal safe.  With a little tuning it should be possible to do this with reasonable overhead.
That sounds like a reasonable compromise.

Under normal situations, the child should be responding to the signal fairly quickly, all of the rest if just to deal wih edge cases that *shouldn't* happen, but which we still need to be prepared to deal with.

I don't even think you really need to do the exponential backoff. We should only be giving the thread a few seconds to respond to the signal, and we can limit that just by putting a limit on the number of times we'll loop.


What if you did this:
- Do a loop with something like a 100 msec delay, and have some max number of iterations that we'll loop, say 50 (which would imply about 5 seconds total).

- have the child send a signal back to the parent when its done. This will interrupt the sleep. The signal handler doesn't have to do anything, we're just using it to break the sleep.

- We need the loop since there are many things that could send a signal to the parent, and they would all wake up the sleep.

That keeps things responsive in normal situations and still gives us the timeout.
{n+1}th idea: futex() uses a relative timeout, and (despite the warnings in the man page) using it for a one-shot sleep/wakeup like this should be simple.

The limit on how long we wait makes me uneasy, because we need to do something if we run out and the thread still exists.  I don't like the idea of just skipping the thread, because it seems within the realm of possibility that an attacker could try to provoke this case (e.g., by consuming large amounts of CPU and maybe RAM, while launching an app via Web Activities).  Are we willing to accept killing the process in that case?
(In reply to Jed Davis [:jld] from comment #39)
> {n+1}th idea: futex() uses a relative timeout, and (despite the warnings in
> the man page) using it for a one-shot sleep/wakeup like this should be
> simple.
> 
> The limit on how long we wait makes me uneasy, because we need to do
> something if we run out and the thread still exists.  I don't like the idea
> of just skipping the thread, because it seems within the realm of
> possibility that an attacker could try to provoke this case (e.g., by
> consuming large amounts of CPU and maybe RAM, while launching an app via Web
> Activities).  Are we willing to accept killing the process in that case?

Seems reasonable to me (killing a thread that doesn't respond quickly enough).

Hopefully, this is just a stopgap.

In the long run, I'd like to "fix" this by downgrading the permissions when the parent forks and before it execs the plugin-container. Then all of this becomes moot.
Blocks: 969040
(In reply to Dave Hylands [:dhylands] from comment #40)
> In the long run, I'd like to "fix" this by downgrading the permissions when
> the parent forks and before it execs the plugin-container. Then all of this
> becomes moot.

Not quite — there's still bug 969040, the seccomp version of this problem.  In that case we might be able to move the sandbox start before the process becomes multithreaded — for comparison, Chromium's seccomp-bpf integration doesn't even try to handle sandboxing an already-multithreaded process.  But I expect it will be significantly harder than fixing the uid/group stuff.
(In reply to Dave Hylands [:dhylands] from comment #40)
> (In reply to Jed Davis [:jld] from comment #39)
> > The limit on how long we wait makes me uneasy, because we need to do
> > something if we run out and the thread still exists.  I don't like the idea
> > of just skipping the thread, because it seems within the realm of
> > possibility that an attacker could try to provoke this case (e.g., by
> > consuming large amounts of CPU and maybe RAM, while launching an app via Web
> > Activities).  Are we willing to accept killing the process in that case?
>
> Seems reasonable to me (killing a thread that doesn't respond quickly
> enough).

Sorry, I was unclear — I meant the POSIX process, not the Linux process.

I don't think it's possible to kill a thread by itself.  The thread can exit itself, or I can take down the entire thread group (with exit_group or an uncaught fatal signal), and I think those are the options.

Also, even if we could kill off a thread, there's no telling what that would do to whatever code was relying on its existence.  It might be better just to block forever.

Furthermore, there's another failure mode we may need to consider: some library creates a thread and blocks all signals on it (sigfillset, sigprocmask).  The tgkill will succeed, the thread will continue to exist indefinitely, and signal handler will never be called.
OK - killing the process also seems reasonable to me.
Redone with futex, and a 5 second timeout (per thread).  (Do I need to check for clock_gettime failure?  Is there a library routine I should be using for that instead?)
Attachment #8379490 - Attachment is obsolete: true
Attachment #8382550 - Flags: review?(dhylands)
…and let's try that with actually killing the process, instead of not.

My earlier comment that all the edge case stuff still hasn't been actively tested remains.
Attachment #8382550 - Attachment is obsolete: true
Attachment #8382550 - Flags: review?(dhylands)
Attachment #8382551 - Flags: review?(dhylands)
Comment on attachment 8382551 [details] [diff] [review]
bug970676-p1-sandbox-threads-hg3.diff

Review of attachment 8382551 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, although I'm not a fan of the way goto was used here.

::: security/sandbox/linux/Sandbox.cpp
@@ +237,5 @@
> +static base::ChildPrivileges sSetPrivilegesTo;
> +// The communication channel from the signal handler back to the main thread.
> +static mozilla::Atomic<int> sSetSandboxDone;
> +// about:memory has the first 3 RT signals.  (We should allocate
> +// signals centrally instead of hard-coding them like this.)

Lets file a followup bug about this.

@@ +333,5 @@
> +      }
> +      // Reset the futex cell and signal.
> +      sSetSandboxDone = 0;
> +      if (syscall(__NR_tgkill, pid, tid, sSetSandboxSignum) != 0) {
> +      sigfail:

nit: I think that goto labels shouldn't be indented. I couldn't find it until I did a search.

@@ +357,5 @@
> +      // If a thread doesn't respond within a reasonable amount of
> +      // time, but still exists, we crash -- the alternative is either
> +      // blocking forever or silently losing security, and it
> +      // shouldn't actually happen.
> +      static const int crashDelay = 5; // seconds

Since we're going to crash the main process if we hit this, we should maybe increase this to 10 seconds.

@@ +380,5 @@
> +          break;
> +        }
> +        // Has the thread ceased to exist?
> +        if (syscall(__NR_tgkill, pid, tid, 0) != 0) {
> +          goto sigfail;

These feels really cheesy to me (even though it is technically correct).

I'd think I would just set sandboxProgress = true and continue; or perhaps just put the whole send tgkill crahs logic into a subroutine.
Attachment #8382551 - Flags: review?(dhylands) → review+
Fixed comments; squashed; carrying over r+.
Attachment #8379449 - Attachment is obsolete: true
Attachment #8382551 - Attachment is obsolete: true
Attachment #8382678 - Flags: review+
Comment on attachment 8382678 [details] [diff] [review]
bug970676-threadsec-hg4.diff

sec-approval=dveditz
Attachment #8382678 - Flags: sec-approval+
…and here's the b2g28_v1_3 backport. The merge conflicts were minor enough to carry over r+, I think.

Note: effectively includes bug 948620 (allows disabling seccomp at runtime with an env var) because this bug changed that bug's code enough that it would be a bigger change to the patch to edit it out.  This should be completely irrelevant for 1.3 purposes, but I mention it for the sake of completeness.
Attachment #8383251 - Flags: review+
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/8a81645151fe with a=1.3+
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Per the changes announced on dev-b2g, dev-gaia, et al last week, 1.3+ blocking status does *NOT* grant automatic approval for uplift to b2g28. You need to explicitly request approval on the patch. Please backout.
Flags: needinfo?(jld)
Attachment #8383251 - Flags: approval-mozilla-b2g28+
Jed rightfully points out that this is a bit of a grey area given our "sec-high/crit have auto-approval" policy. Will clarify with B2G RelMan for future occasions.
Flags: needinfo?(jld)
Target Milestone: --- → 1.4 S2 (28feb)
Group: b2g-core-security → core-security
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: