Closed
Bug 772734
Opened 12 years ago
Closed 12 years ago
LaunchApp may hang
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dhylands, Assigned: dhylands)
References
Details
Attachments
(2 files, 4 obsolete files)
3.58 KB,
text/plain
|
Details | |
6.70 KB,
patch
|
dhylands
:
review+
|
Details | Diff | Splinter Review |
The ipc/chromium/src/base/process_util_linux.cc version of LaunchApp can hang. This can happen since it calls setenv after the fork and before the exec, and setenv may allocate memory. If setenv happens to want to allocate memory and the heap was locked at the time of the fork, then setenv will hang. The code needs to be changed to use execvpe or execve. Android doesn't seem to have an execvpe, so that means that we'd need to use execve and do the path search ourselves. execve is an actual syscall, so there shouldn't be an runtime library allocations.
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → dhylands
Assignee | ||
Comment 1•12 years ago
|
||
Actually, I'd be inclined to make the code call execvpe and implement execvpe for android by copying/modifying the appropriate code from the bionic runtime.
That would break a handful of little debugging tools that rely on env vars :/. Thinking.
Assignee | ||
Comment 3•12 years ago
|
||
I don't follow. We'd still be passing in the exact same environment. It's just that with the e versions of the exec functions you pass in the array already formatted. We can still export the entire environment, it's just that you pass it in explicitly rather than implicity. We can do any required allocation before the fork.
Assignee | ||
Comment 4•12 years ago
|
||
This backtrace proves my hypothesis of the problem.
Yes, but technically we're not supposed to touch |environ|. With things like binary extensions and plugins loaded, that could lead to security bugs. I can't think of any other (correct) way to do this that won't subtly break one of our platform-specific hacks, so maybe that's the best we can do :/. Will think some more.
Assignee | ||
Comment 6•12 years ago
|
||
Yeah - but if we only touch environ from within the implementation of execvpe, I don't see any harm done. The execvpe function that we provide would only be provided with bionic platforms and will only use stuff that could be used from bionic. For glibc based builds, we'd just use the execvpe that's provided.
Assignee | ||
Comment 7•12 years ago
|
||
Never mind - I see the problem. The other option is to do the setenv's before the fork. I'll poke around some more too.
That has the same kind of problem, unfortunately :(. If some other thread dlopen()s while we have LD_LIBRARY_PATH temporarily set, e.g., that could cause a security problem too.
Assignee | ||
Comment 9•12 years ago
|
||
Actually, the problem is bigger. If *anybody* calls setenv or getenv in a multi-threaded environment, then there are issues. Any call to setenv will potentially invalidate any pointers retrieved from getenv. The bionic implementations of getenv and setenv are not thread safe.
Gecko code uses PR_SetEnv()/PR_GetEnv() for just that reason.
Assignee | ||
Comment 11•12 years ago
|
||
Doing an MXR search on getenv and setenv says otherwise. If it is in fact true, that PR_GetEnv and PR_SetEnv are used everywhere, then there's should be no reason why we can't add a PR_DuplicateEnvironment function which returns something like an a copy of envron, and a version of PR_SetEnv which can set a variable into an environment instance. This could then be passed to execvpe/execve and you'd have no security holes.
Assignee | ||
Comment 12•12 years ago
|
||
Another way to workaround this would be to acquire the heap mutex prior to the fork and release it in both the child and the parent. Then the forked heap will be in a guaranteed consistent state.
Comment 13•12 years ago
|
||
bug 771765 would lift the need to setenv at all...
(In reply to Dave Hylands [:dhylands] from comment #11) > Doing an MXR search on getenv and setenv says otherwise. > Those are bugs. > If it is in fact true, that PR_GetEnv and PR_SetEnv are used everywhere, > then there's should be no reason why we can't add a PR_DuplicateEnvironment > function which returns something like an a copy of envron Yeah, that sounds pretty good.
(In reply to Dave Hylands [:dhylands] from comment #12) > Another way to workaround this would be to acquire the heap mutex prior to > the fork and release it in both the child and the parent. Then the forked > heap will be in a guaranteed consistent state. Aha! This is indeed the issue I remembered. jemalloc does this when pthread_atfork works; bionic doesn't implement pthread_atfork though. We ended up shimming our impl. (I don't think we need to release the locks in the child since we're about to exec().) So we have two problems here - not using jemalloc, which lets us get away with this code being broken - this code being broken I think I'd rather fix the brokenness.
Comment 16•12 years ago
|
||
PR_SetEnv uses putenv, which may well malloc just as much as setenv does.
Comment 17•12 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #16) > PR_SetEnv uses putenv, which may well malloc just as much as setenv does. In fact, in bionic, it calls strdup and setenv.
The "real bug" here is that we're setenv()'ing after fork(). Pre-forking while there's only one thread might work around that, and jemalloc is kind enough to work around that for us, but it's still wrong. I like the proposed solution of comment 11.
Comment 19•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #18) > I like the proposed solution of comment 11. I don't like the idea of adding an API to NSPR for just that. Especially when the "real bug" is that bionic is just broken in so many ways, and that the "workaround" from comment 11 is just really needed for bionic. Why not just fix bionic instead of adding workarounds over workarounds over workarounds for it? Or, before setenv'ing, actually checking what's in LD_LIBRARY_PATH, because, as it stands, LD_LIBRARY_PATH *already* contains /system/b2g due to /init.b2g.rc.
Comment 20•12 years ago
|
||
For the little story, to avoid having to set LD_LIBRARY_PATH, one could use rpath. Except the bionic linker doesn't support rpath.
Gecko is violating POSIX programming rules. This bug could manifest in any configuration where we're not using jemalloc. Adding assumptions on top of assumptions on top of assumptions for minor inconveniences seems worse than just fixing what gecko is doing wrong. The NSPR patch required to fix this would be small, simple, and generally useful.
Comment 22•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #21) > This bug could manifest in any configuration where we're not using jemalloc. I don't see how, except for bionic brokenness, this could happen.
Comment 23•12 years ago
|
||
Another option is to make plugin-container just work without LD_LIBRARY_PATH being required to be properly set beforehand, like the firefox and b2g binaries do.
Comment 24•12 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #23) > Another option is to make plugin-container just work without LD_LIBRARY_PATH > being required to be properly set beforehand, like the firefox and b2g > binaries do. It turns out that wouldn't be enough, because there's more than LD_LIBRARY_PATH we set. On Android, we also set ANDROID_PROPERTY_WORKSPACE, and on mac, we set DYLD_LIBRARY_PATH and DYLD_INSERT_LIBRARIES. So yeah, getting a copy of the environment and augmenting it before forking (doing it in the forked process would still hit the malloc problem) is about the only solution, however much it sucks to add an API to NSPR for that. That being said, it would still be worthwhile to make plugin-container work without LD_LIBRARY_PATH/DYLD_LIBRARY_PATH
Assignee | ||
Comment 25•12 years ago
|
||
Assignee | ||
Comment 26•12 years ago
|
||
Removed execvpe function and modified LauncApp to use execve instead. cjones says he thinks that we always use absolute pathnames.
Attachment #641991 -
Attachment is obsolete: true
I'm 100% sure GeckoChildProcessHost always uses absolute paths, because gecko binaries are never in the PATH. I'm 99.9% sure we don't have other LaunchApp() consumers.
Assignee | ||
Comment 28•12 years ago
|
||
Try build of this patch and the PR_DuplicateEnvironment patch https://tbpl.mozilla.org/?tree=Try&rev=cd68233ec698
Assignee | ||
Updated•12 years ago
|
Attachment #642034 -
Flags: review?(jones.chris.g)
Comment on attachment 642034 [details] [diff] [review] Bug 772734 - Fixes LaunchApp hang when launching plugin-container - v2 >diff --git a/ipc/chromium/src/base/process_util_linux.cc b/ipc/chromium/src/base/process_util_linux.cc >+class EnvironmentEnvp >+{ >+public: >+ EnvironmentEnvp() >+ : mEnvp(PR_DuplicateEnvironment()) {} Note: this allocation can fail. >+ >+ EnvironmentEnvp(const environment_map &em) >+ { >+ mEnvp = (char **)PR_Malloc(sizeof(char *) * (em.size() + 1)); This allocation can fail. If it does, this impl will end up crashing the parent process. Let's just |return;| here if it fails. AsEnvp() will return null and most likely the content process won't start up. >+ ~EnvironmentEnvp() >+ { >+ for (char **e = mEnvp; *e; ++e) { |mEnvp| can be null here, as a result of either ctor. r=me with those changes.
Attachment #642034 -
Flags: review?(jones.chris.g) → review+
Assignee | ||
Comment 30•12 years ago
|
||
So then the question becomes, what do you want to happen when mEnvp becomes NULL? I took a look at the kernel sources, and it seems to accept a NULL envp in the execve call. So we have 2 options: 1 - Pass the NULL through to execve. This seems bad because it might just happen that the execve will work even though the envp allocation failed (unlikely, but possible). Then the child will run successfully, but with the wrong environment. 2 - Detect that envp fails, log an error and return false to the caller of LaunchApp. I'm going to say we should go with option 2.
That works for me.
Assignee | ||
Comment 32•12 years ago
|
||
Added #ifdef so that the new code is only included for android (which is the only env that adds PR_DuplicateEnvironment in the mozglue patch) Did a try run and got clean builds (at least most of them had completed by the time I wrote this). https://tbpl.mozilla.org/?tree=Try&rev=a96f24ef0eb3
Attachment #642034 -
Attachment is obsolete: true
Attachment #643255 -
Flags: review?(jones.chris.g)
Comment on attachment 643255 [details] [diff] [review] Now works for android and linux >diff --git a/ipc/chromium/src/base/process_util_linux.cc b/ipc/chromium/src/base/process_util_linux.cc >+/* Tempoaray until we have PR_DuplicateEnvironment in prenv.h */ Temporary >+}; >+#endif /* HAVE_PR_DUPLICATE_ENVIRONMENT */ // <-- use a line comment Let's get this landed and some more apps out of process! :)
Attachment #643255 -
Flags: review?(jones.chris.g) → review+
Assignee | ||
Comment 34•12 years ago
|
||
Cleaned up nits
Attachment #643255 -
Attachment is obsolete: true
Attachment #643257 -
Flags: review+
Comment 35•12 years ago
|
||
Push backed out for XUL Android bustage: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=41f4bdd4b6bd https://hg.mozilla.org/integration/mozilla-inbound/rev/7dd75469589b [And yeah I wish we could just ditch those builds as much as you... :-) ]
Comment 36•12 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #35) > Push backed out for XUL Android bustage: > https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=41f4bdd4b6bd > > https://hg.mozilla.org/integration/mozilla-inbound/rev/7dd75469589b > > [And yeah I wish we could just ditch those builds as much as you... :-) ] Actually, that reveals a problem that would happen on native android builds as well when they start using child processes.
Assignee | ||
Comment 37•12 years ago
|
||
Had to move PR_DuplicateEnvironment into process_util_linux.cc so that the heap function alloc/free functions get wrapped/matched properly on android.
Attachment #643257 -
Attachment is obsolete: true
Assignee | ||
Updated•12 years ago
|
Attachment #643485 -
Flags: review+
Comment 38•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/b54897b245b5
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•