Closed Bug 943170 Opened 11 years ago Closed 11 years ago

Fix Android mozglue for raise() to avoid Android pthread bug.

Categories

(Core :: mozglue, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: jld, Assigned: jld)

References

Details

Attachments

(1 file)

Copying bug 936169 comment 5:


I added a printk to the send_signal routine in the kernel.  Here's what happens (the "group" parameter is whether the signal is for the thread group == process, rather than the specific thread):

kill -15: 45.45 => 262.262 (group=1)
kill -15: 262.262 => 45.94 (group=0)
kill -15: 45.94 => 45.94 (group=0)
kill -17: 45.108 => 1.1 (group=1)

The parent's main thread sends SIGTERM to the child process, which responds by sending SIGTERM to the parent's I/O thread.  The parent's SIGTERM handler then re-raises the signal, and finally the process dies and init gets SIGCHLD.

What's really happening here is that the signal is delivered before the child has called exec.  So it gets SIGTERM, and runs the same signal handler (in nsProfileLock.cpp, for reference), and calls raise().  Specifically, it calls the raise() in mozglue, which is pthread_kill(pthread_self(), sig), and those pthread routines use a copy of the tid stored in TLS, which hasn't been updated and still has the tid of the thread that called fork() in the parent.

(This, incidentally, may be a bug in Bionic: pthread_kill and pthread_self are async signal safe, so they should work correctly in the forked child of a multithreaded process, and this appears to not be the case.)

We have that shim because, according to bug 741272, Bionic's raise() sometimes signaled the wrong thread.  There's a commit upstream, which seems to present in ICS and up, which looks like a fix: https://android.googlesource.com/platform/bionic/+/56faf66fd7a90 ­— but I think it's still not right, because it's calling kill rather than tkill.
I think what we want to do is `tgkill(getpid(), gettid(), sig)`.  This is what breakpad does to try to re-raise a signal in cases where returning from the signal handler won't work, and it would be possible to allow it under seccomp sandboxing (without allowing the process to send signals outside itself) by restricting the value of the first argument.
Observed under gdb that this successfully raises the intended signal.

Trying, to make sure this builds on older Android: https://tbpl.mozilla.org/?tree=Try&rev=e39d2509d853
Comment on attachment 8339065 [details] [diff] [review]
bug943170-mozglue-raise-nopthread-hg0.diff

Review of attachment 8339065 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/build/BionicGlue.cpp
@@ +134,5 @@
> +  // return "successfully" from raising a fatal signal).
> +  //
> +  // Bug 943170: POSIX specifies pthread_kill(pthread_self(), sig) as
> +  // equivalent to raise(sig), but Bionic also has a bug with these
> +  // functions, where a forked child will kill its parent instead.

What a pile of junk.
Attachment #8339065 - Flags: review?(mh+mozilla) → review+
→ b2g-inbound.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/02bb0a9faed6
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → mozilla28
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: