Fix Android mozglue for raise() to avoid Android pthread bug.

RESOLVED FIXED in mozilla28

Status

()

Core
mozglue
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: jld, Assigned: jld)

Tracking

Trunk
mozilla28
ARM
Gonk (Firefox OS)
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

4 years ago
Copying bug 936169 comment 5:


I added a printk to the send_signal routine in the kernel.  Here's what happens (the "group" parameter is whether the signal is for the thread group == process, rather than the specific thread):

kill -15: 45.45 => 262.262 (group=1)
kill -15: 262.262 => 45.94 (group=0)
kill -15: 45.94 => 45.94 (group=0)
kill -17: 45.108 => 1.1 (group=1)

The parent's main thread sends SIGTERM to the child process, which responds by sending SIGTERM to the parent's I/O thread.  The parent's SIGTERM handler then re-raises the signal, and finally the process dies and init gets SIGCHLD.

What's really happening here is that the signal is delivered before the child has called exec.  So it gets SIGTERM, and runs the same signal handler (in nsProfileLock.cpp, for reference), and calls raise().  Specifically, it calls the raise() in mozglue, which is pthread_kill(pthread_self(), sig), and those pthread routines use a copy of the tid stored in TLS, which hasn't been updated and still has the tid of the thread that called fork() in the parent.

(This, incidentally, may be a bug in Bionic: pthread_kill and pthread_self are async signal safe, so they should work correctly in the forked child of a multithreaded process, and this appears to not be the case.)

We have that shim because, according to bug 741272, Bionic's raise() sometimes signaled the wrong thread.  There's a commit upstream, which seems to present in ICS and up, which looks like a fix: https://android.googlesource.com/platform/bionic/+/56faf66fd7a90 ­— but I think it's still not right, because it's calling kill rather than tkill.
(Assignee)

Comment 1

4 years ago
I think what we want to do is `tgkill(getpid(), gettid(), sig)`.  This is what breakpad does to try to re-raise a signal in cases where returning from the signal handler won't work, and it would be possible to allow it under seccomp sandboxing (without allowing the process to send signals outside itself) by restricting the value of the first argument.
(Assignee)

Updated

4 years ago
Blocks: 943181
(Assignee)

Comment 2

4 years ago
Created attachment 8339065 [details] [diff] [review]
bug943170-mozglue-raise-nopthread-hg0.diff
Attachment #8339065 - Flags: review?(mh+mozilla)
(Assignee)

Comment 3

4 years ago
Observed under gdb that this successfully raises the intended signal.

Trying, to make sure this builds on older Android: https://tbpl.mozilla.org/?tree=Try&rev=e39d2509d853
Comment on attachment 8339065 [details] [diff] [review]
bug943170-mozglue-raise-nopthread-hg0.diff

Review of attachment 8339065 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/build/BionicGlue.cpp
@@ +134,5 @@
> +  // return "successfully" from raising a fatal signal).
> +  //
> +  // Bug 943170: POSIX specifies pthread_kill(pthread_self(), sig) as
> +  // equivalent to raise(sig), but Bionic also has a bug with these
> +  // functions, where a forked child will kill its parent instead.

What a pile of junk.
Attachment #8339065 - Flags: review?(mh+mozilla) → review+
(Assignee)

Comment 5

4 years ago
→ b2g-inbound.
Keywords: checkin-needed
https://hg.mozilla.org/integration/fx-team/rev/02bb0a9faed6
Keywords: checkin-needed
Whiteboard: [fixed-in-fx-team]
https://hg.mozilla.org/mozilla-central/rev/02bb0a9faed6
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → mozilla28
You need to log in before you can comment on or make changes to this bug.