Closed Bug 680190 Opened 13 years ago Closed 13 years ago

jemalloc can deadlock during IPC fork on Android

Categories

(Core :: IPC, defect)

ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla9

People

(Reporter: jchen, Assigned: jchen)

References

Details

(Whiteboard: [inbound])

Attachments

(1 file, 1 obsolete file)

The problem is Android didn't have pthread_atfork until 2.3, so we specifically don't call it. (http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.c#5577) If another thread locks jemalloc during the IPC fork, the new child process will have jemalloc permanently locked. This is fine if we call exec immediately afterwards. However, before exec, we do stuff with file descriptors which involves stl containers that can potentially call jemalloc (http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/process_util_linux.cc#56), thus causing a deadlock in the child process if all the conditions are met. The symptom is a blank page when loading. The parent is probably still responsive, but the child is not. I've only seen it when running talos on a tegra. It's possible that one of the talos bug is caused by this (also I think up until now this bug has been overshadowed by bug 662936) On a side note, mxr says we can call pthread_atlock in NSS too (http://mxr.mozilla.org/mozilla-central/source/security/nss/lib/softoken/pkcs11.c). But I don't know the effects of this.
Attached patch Possible fix (obsolete) — Splinter Review
This is one possible fix. It's pretty ugly though because it exposes two private functions from jemalloc.
bionic strikes again! Nice analysis, and boo on folks blindly #ifdef'ing stuff out. I'm not sure even this hacky fix this right; it appears to me that the jemalloc mutexes aren't initialized to be reentrant on linux, so if fork() itself malloc's, then we'll deadlock there. fork() is specified to be async-signal-safe, so theoretically we can assume fork() doesn't malloc, but this being bionic who knows. Worth checking bionic, or implementing our own fork() ... not out of the question.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #2) > bionic strikes again! Nice analysis, and boo on folks blindly #ifdef'ing > stuff out. > > I'm not sure even this hacky fix this right; it appears to me that the > jemalloc mutexes aren't initialized to be reentrant on linux, so if fork() > itself malloc's, then we'll deadlock there. fork() is specified to be > async-signal-safe, so theoretically we can assume fork() doesn't malloc, but > this being bionic who knows. Worth checking bionic, or implementing our own > fork() ... not out of the question. I think we will be okay? If fork mallocs it will be using the bionic malloc, not jemalloc. It is kind of scary though. fork() is at http://android.git.kernel.org/?p=platform/bionic.git;a=blob_plain;f=libc/bionic/fork.c;hb=HEAD (the __bionic_atfork_ lines are added in 2.3 and don't do us any good) jmaher is seeing this on talos now, so we should get this fixed soon.
Assignee: nobody → nchen
Status: NEW → ASSIGNED
Blocks: 678381
(In reply to Jim Chen [:jchen] (mobile intern :) from comment #3) > (In reply to Chris Jones [:cjones] [:warhammer] from comment #2) > > bionic strikes again! Nice analysis, and boo on folks blindly #ifdef'ing > > stuff out. > > > > I'm not sure even this hacky fix this right; it appears to me that the > > jemalloc mutexes aren't initialized to be reentrant on linux, so if fork() > > itself malloc's, then we'll deadlock there. fork() is specified to be > > async-signal-safe, so theoretically we can assume fork() doesn't malloc, but > > this being bionic who knows. Worth checking bionic, or implementing our own > > fork() ... not out of the question. > > I think we will be okay? If fork mallocs it will be using the bionic malloc, > not jemalloc. As long as it only calls malloc() (or things that call malloc()) directly. > It is kind of scary though. fork() is at > http://android.git.kernel.org/?p=platform/bionic.git;a=blob_plain;f=libc/ > bionic/fork.c;hb=HEAD (the __bionic_atfork_ lines are added in 2.3 and don't > do us any good) Android's cpuacct_add() calls fopen, fclose, and fprintf(), none of which are async-signal-safe. (Totally unnecessarily; could have equivalently used open/close/write.) That means we really do have to guarantee that malloc() in bionic always resolves to bionic malloc(). If it doesn't, and resolves to jemalloc malloc(), then we *will* deadlock. On the other hand, that's nice too, because we'll immediately know if we mess up. > jmaher is seeing this on talos now, so we should get this fixed soon. Want to keep driving this patch in?
Comment on attachment 554161 [details] [diff] [review] Possible fix (In reply to Chris Jones [:cjones] [:warhammer] from comment #4) > (In reply to Jim Chen [:jchen] (mobile intern :) from comment #3) > > > It is kind of scary though. fork() is at > > http://android.git.kernel.org/?p=platform/bionic.git;a=blob_plain;f=libc/ > > bionic/fork.c;hb=HEAD (the __bionic_atfork_ lines are added in 2.3 and don't > > do us any good) > > Android's cpuacct_add() calls fopen, fclose, and fprintf(), none of which > are async-signal-safe. (Totally unnecessarily; could have equivalently used > open/close/write.) That means we really do have to guarantee that malloc() > in bionic always resolves to bionic malloc(). If it doesn't, and resolves > to jemalloc malloc(), then we *will* deadlock. On the other hand, that's > nice too, because we'll immediately know if we mess up. > Good find. So far I've only seen bionic functions using bionic malloc. Hope it stays this way. > > jmaher is seeing this on talos now, so we should get this fixed soon. > > Want to keep driving this patch in? Yeah, although a custom pthread_atfork() and fork() wrapper shouldn't be much work either. Asking for review, and feedback from Jason on exposing the jemalloc functions.
Attachment #554161 - Flags: review?(jones.chris.g)
Attachment #554161 - Flags: feedback?(jasone)
Comment on attachment 554161 [details] [diff] [review] Possible fix This is pretty horrible, but probably less work than trying to maintain our own fork() that integrates properly with bionic. Three things +#ifdef ANDROID + ::_malloc_prefork(); +#endif (1) Add a comment explaining this hack here. (2) These calls need to be conditional on MOZ_MEMORY. (3) File a followup bug for removing this, after we only care about 2.3 and above. (And 3.[N] and above, if the bug exists in earlier 3.x.)
Attachment #554161 - Flags: review?(jones.chris.g) → review+
Comment on attachment 554161 [details] [diff] [review] Possible fix Looks fine to me.
Attachment #554161 - Flags: feedback?(jasone) → feedback+
Please leave the _malloc_* functions static on !android.
Attached patch Temporary fixSplinter Review
Addressed comments from cjones and glandium. Carried over cjones' r+.
Attachment #554161 - Attachment is obsolete: true
Attachment #556921 - Flags: review+
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla9
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: