jemalloc can deadlock during IPC fork on Android

RESOLVED FIXED in mozilla9

Status

()

Core
IPC
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: jchen, Assigned: jchen)

Tracking

Trunk
mozilla9
ARM
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [inbound])

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

6 years ago
The problem is Android didn't have pthread_atfork until 2.3, so we specifically don't call it. (http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.c#5577)

If another thread locks jemalloc during the IPC fork, the new child process will have jemalloc permanently locked. This is fine if we call exec immediately afterwards. However, before exec, we do stuff with file descriptors which involves stl containers that can potentially call jemalloc (http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/process_util_linux.cc#56), thus causing a deadlock in the child process if all the conditions are met.

The symptom is a blank page when loading. The parent is probably still responsive, but the child is not. I've only seen it when running talos on a tegra. It's possible that one of the talos bug is caused by this (also I think up until now this bug has been overshadowed by bug 662936)

On a side note, mxr says we can call pthread_atlock in NSS too (http://mxr.mozilla.org/mozilla-central/source/security/nss/lib/softoken/pkcs11.c). But I don't know the effects of this.
(Assignee)

Comment 1

6 years ago
Created attachment 554161 [details] [diff] [review]
Possible fix

This is one possible fix. It's pretty ugly though because it exposes two private functions from jemalloc.
bionic strikes again!  Nice analysis, and boo on folks blindly #ifdef'ing stuff out.

I'm not sure even this hacky fix this right; it appears to me that the jemalloc mutexes aren't initialized to be reentrant on linux, so if fork() itself malloc's, then we'll deadlock there.  fork() is specified to be async-signal-safe, so theoretically we can assume fork() doesn't malloc, but this being bionic who knows.  Worth checking bionic, or implementing our own fork() ... not out of the question.
(Assignee)

Comment 3

6 years ago
(In reply to Chris Jones [:cjones] [:warhammer] from comment #2)
> bionic strikes again!  Nice analysis, and boo on folks blindly #ifdef'ing
> stuff out.
> 
> I'm not sure even this hacky fix this right; it appears to me that the
> jemalloc mutexes aren't initialized to be reentrant on linux, so if fork()
> itself malloc's, then we'll deadlock there.  fork() is specified to be
> async-signal-safe, so theoretically we can assume fork() doesn't malloc, but
> this being bionic who knows.  Worth checking bionic, or implementing our own
> fork() ... not out of the question.

I think we will be okay? If fork mallocs it will be using the bionic malloc, not jemalloc. It is kind of scary though. fork() is at http://android.git.kernel.org/?p=platform/bionic.git;a=blob_plain;f=libc/bionic/fork.c;hb=HEAD (the __bionic_atfork_ lines are added in 2.3 and don't do us any good)

jmaher is seeing this on talos now, so we should get this fixed soon.
Assignee: nobody → nchen
Status: NEW → ASSIGNED
(Assignee)

Updated

6 years ago
Blocks: 678381
(In reply to Jim Chen [:jchen] (mobile intern :) from comment #3)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #2)
> > bionic strikes again!  Nice analysis, and boo on folks blindly #ifdef'ing
> > stuff out.
> > 
> > I'm not sure even this hacky fix this right; it appears to me that the
> > jemalloc mutexes aren't initialized to be reentrant on linux, so if fork()
> > itself malloc's, then we'll deadlock there.  fork() is specified to be
> > async-signal-safe, so theoretically we can assume fork() doesn't malloc, but
> > this being bionic who knows.  Worth checking bionic, or implementing our own
> > fork() ... not out of the question.
> 
> I think we will be okay? If fork mallocs it will be using the bionic malloc,
> not jemalloc.

As long as it only calls malloc() (or things that call malloc()) directly.

> It is kind of scary though. fork() is at
> http://android.git.kernel.org/?p=platform/bionic.git;a=blob_plain;f=libc/
> bionic/fork.c;hb=HEAD (the __bionic_atfork_ lines are added in 2.3 and don't
> do us any good)

Android's cpuacct_add() calls fopen, fclose, and fprintf(), none of which are async-signal-safe.  (Totally unnecessarily; could have equivalently used open/close/write.)  That means we really do have to guarantee that malloc() in bionic always resolves to bionic malloc().  If it doesn't, and resolves to jemalloc malloc(), then we *will* deadlock.  On the other hand, that's nice too, because we'll immediately know if we mess up.

> jmaher is seeing this on talos now, so we should get this fixed soon.

Want to keep driving this patch in?
(Assignee)

Comment 5

6 years ago
Comment on attachment 554161 [details] [diff] [review]
Possible fix

(In reply to Chris Jones [:cjones] [:warhammer] from comment #4)
> (In reply to Jim Chen [:jchen] (mobile intern :) from comment #3)
> 
> > It is kind of scary though. fork() is at
> > http://android.git.kernel.org/?p=platform/bionic.git;a=blob_plain;f=libc/
> > bionic/fork.c;hb=HEAD (the __bionic_atfork_ lines are added in 2.3 and don't
> > do us any good)
> 
> Android's cpuacct_add() calls fopen, fclose, and fprintf(), none of which
> are async-signal-safe.  (Totally unnecessarily; could have equivalently used
> open/close/write.)  That means we really do have to guarantee that malloc()
> in bionic always resolves to bionic malloc().  If it doesn't, and resolves
> to jemalloc malloc(), then we *will* deadlock.  On the other hand, that's
> nice too, because we'll immediately know if we mess up.
> 
Good find. So far I've only seen bionic functions using bionic malloc. Hope it stays this way.

> > jmaher is seeing this on talos now, so we should get this fixed soon.
> 
> Want to keep driving this patch in?

Yeah, although a custom pthread_atfork() and fork() wrapper shouldn't be much work either. Asking for review, and feedback from Jason on exposing the jemalloc functions.
Attachment #554161 - Flags: review?(jones.chris.g)
Attachment #554161 - Flags: feedback?(jasone)
Comment on attachment 554161 [details] [diff] [review]
Possible fix

This is pretty horrible, but probably less work than trying to maintain our own fork() that integrates properly with bionic.  Three things

+#ifdef ANDROID
+  ::_malloc_prefork();
+#endif

(1) Add a comment explaining this hack here.

(2) These calls need to be conditional on MOZ_MEMORY.

(3) File a followup bug for removing this, after we only care about 2.3 and above.  (And 3.[N] and above, if the bug exists in earlier 3.x.)
Attachment #554161 - Flags: review?(jones.chris.g) → review+

Comment 7

6 years ago
Comment on attachment 554161 [details] [diff] [review]
Possible fix

Looks fine to me.
Attachment #554161 - Flags: feedback?(jasone) → feedback+
Please leave the _malloc_* functions static on !android.
(Assignee)

Comment 9

6 years ago
Created attachment 556921 [details] [diff] [review]
Temporary fix

Addressed comments from cjones and glandium. Carried over cjones' r+.
Attachment #554161 - Attachment is obsolete: true
Attachment #556921 - Flags: review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/157511556e27
Whiteboard: [inbound]
http://hg.mozilla.org/mozilla-central/rev/157511556e27
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla9
Blocks: 683799
You need to log in before you can comment on or make changes to this bug.