Closed
Bug 473629
Opened 16 years ago
Closed 15 years ago
crash on exit in glibc memalign with jemalloc statically linked
Categories
(Toolkit :: Startup and Profile System, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 555894
People
(Reporter: fta+bugzilla, Unassigned)
References
Details
(Keywords: crash)
Attachments
(1 file, 1 obsolete file)
833 bytes,
patch
|
timeless
:
review-
|
Details | Diff | Splinter Review |
I'm getting a crash on exit using trunk or 1.9.1.
It's both with the main window (with plenty of tabs) and when firefox is called from another app to open a new link (in a tab).
It's 100% reproducible here.
Here is the not very helpful trace:
$ firefox -g http://www.mozilla.org/
/usr/bin/gdb /usr/lib/firefox-3.2a1pre/firefox -x /tmp/mozargs.GkMszR
GNU gdb 6.8-debian
...
This GDB was configured as "i486-linux-gnu"...
(no debugging symbols found)
(gdb) r
Starting program: /usr/lib/firefox-3.2a1pre/firefox http://www.mozilla.org/
(no debugging symbols found)
...
(no debugging symbols found)
(no debugging symbols found)
[New Thread 0xb7ce26d0 (LWP 5863)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7ce26d0 (LWP 5863)]
0xb7d5d3a1 in ptmalloc_init () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt
#0 0xb7d5d3a1 in ptmalloc_init () from /lib/tls/i686/cmov/libc.so.6
#1 0xb7d60cc3 in memalign_hook_ini () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d60ae2 in memalign () from /lib/tls/i686/cmov/libc.so.6
#3 0xb8083d82 in ?? () from /lib/ld-linux.so.2
#4 0xb808422b in ___tls_get_addr () from /lib/ld-linux.so.2
#5 0xb5e40962 in ?? () from /lib/libselinux.so.1
#6 0xb5e39b7e in ?? () from /lib/libselinux.so.1
#7 0xb5e321d8 in ?? () from /lib/libselinux.so.1
#8 0xb5e42340 in _fini () from /lib/libselinux.so.1
#9 0xb8081a53 in ?? () from /lib/ld-linux.so.2
#10 0xb7d1bb89 in exit () from /lib/tls/i686/cmov/libc.so.6
#11 0xb7d0377d in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#12 0x080496f1 in ?? ()
could it be a jemalloc issue?
Comment 1•16 years ago
|
||
Better debugging symbols would be nice, but I'll toss this to jemalloc to see if they have any ideas.
Component: General → jemalloc
Product: Firefox → Core
QA Contact: general → jemalloc
Reporter | ||
Comment 2•16 years ago
|
||
I have most possible debugging symbols installed on my Ubuntu/Jaunty box, including libc, firefox but apparently, i still miss some. /me looking.
Reporter | ||
Comment 3•16 years ago
|
||
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7c906d0 (LWP 1554)]
0xb7d0b3a1 in ptmalloc_init () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt
#0 0xb7d0b3a1 in ptmalloc_init () from /lib/tls/i686/cmov/libc.so.6
#1 0xb7d0ecc3 in memalign_hook_ini () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d0eae2 in memalign () from /lib/tls/i686/cmov/libc.so.6
#3 0xb8031d82 in tls_get_addr_tail () from /lib/ld-linux.so.2
#4 0xb803222b in ___tls_get_addr_internal () from /lib/ld-linux.so.2
#5 0xb5dee962 in fini_context_translations () at setrans_client.c:211
#6 0xb5de7b7e in fini_lib () at init.c:127
#7 0xb5de01d8 in __do_global_dtors_aux () from /lib/libselinux.so.1
#8 0xb5df0340 in _fini () from /lib/libselinux.so.1
#9 0xb802fa53 in _dl_fini () from /lib/ld-linux.so.2
#10 0xb7cc9b89 in exit () from /lib/tls/i686/cmov/libc.so.6
#11 0xb7cb177d in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#12 0x080496f1 in ?? ()
(gdb)
Comment 4•16 years ago
|
||
For some reason the TLS code is calling into glibc's malloc implementation (via memalign()). Since ptmalloc_init() is being called as a result, we can tell that glibc's internal malloc data structures have never been initialized. I don't know why this is causing a crash, but it isn't even touching jemalloc. Maybe there's some strange interaction with SELinux...
Updated•16 years ago
|
Summary: crash on exit → crash on exit in glibc memalign with jemalloc statically linked
Comment 5•16 years ago
|
||
ubuntu bug: https://bugs.launchpad.net/bugs/319480
Comment 6•16 years ago
|
||
leaking env in nsAppRunner.cpp seems to fix this.
Comment 7•16 years ago
|
||
previous wasn't a legal diff
Attachment #383038 -
Attachment is obsolete: true
so um, are you saying it isn't safe to call PR_SetEnv w/ a null pointer?
Comment 9•16 years ago
|
||
I think it is the strdup call rather than the null pointer
check that fixes the crash. The string you pass to PR_SetEnv,
which simply passes it to putenv, needs to stay around after
PR_SetEnv returns because the string "shall become part of the
environment":
http://www.opengroup.org/onlinepubs/009695399/functions/putenv.html
The putenv man page also says:
A potential error is to call putenv() with an automatic
variable as the argument, then return from the calling
function while string is still part of the environment.
Comment 10•16 years ago
|
||
Comment on attachment 383040 [details] [diff] [review]
leaking env in nsAppRunner.cpp (non garbage patch)
ok, here's a free r-,
// We intentionally leak |expr| here since it is required by PR_SetEnv.
the code is already trying to leak, but it's trying to leak an nspr pointer. If you're going to do this, just fix SaveWordToEnv to do:
SaveWordToEnv(const char *name, const nsACString & word)
{
char *expr = PR_smprintf("%s=%s", name, PromiseFlatCString(word).get());
if (!expr)
return;
/* this copy is intentionally leaked per setenv requirements */
PR_SetEnv (strdup (env));
PR_smprintf_free(env);
}
Attachment #383040 -
Flags: review-
Comment 11•16 years ago
|
||
and note that there shouldn't be a space between foo and () if foo is a function.
Comment 12•16 years ago
|
||
(In reply to comment #10)
> (From update of attachment 383040 [details] [diff] [review])
> ok, here's a free r-,
> // We intentionally leak |expr| here since it is required by PR_SetEnv.
> the code is already trying to leak, but it's trying to leak an nspr pointer. If
> you're going to do this, just fix SaveWordToEnv to do:
>
Yeah. SaveWordToEnv is already properly leaked. But not all places are using that func to set their env. This patch ensures that also the direct invocations to PR_SetEnv are getting leaked.
Comment 13•16 years ago
|
||
well, please find them and identify the right one.
Comment 14•16 years ago
|
||
nsAppRunner.cpp is linked in libxul.so so I wonder whether it has been unloaded at this stage so that it's static strings inserted into the environment are no longer valid pointers.
Making sure the static strings are copied to the heap would fix this.
Setting environ to NULL on return might also be a hacky way to fix this.
The stack here highlights another issue too though:
readelf --reloc /lib32/ld-linux.so.2 includes this
Relocation section '.rel.plt' at offset 0x8d4 contains 5 entries:
Offset Info Type Sym.Value Sym. Name
0001afd0 00000b07 R_386_JUMP_SLOT 00014b40 __libc_memalign
0001afd4 00000407 R_386_JUMP_SLOT 00014c60 malloc
0001afd8 00001007 R_386_JUMP_SLOT 00014d30 calloc
0001afdc 00000807 R_386_JUMP_SLOT 00014c90 realloc
0001afe0 00000507 R_386_JUMP_SLOT 00014ae0 free
I don't know why only memalign calls use the libc-specific symbol name but dl-tls.c looks like it calls free on pointers from __libc_malloc.
That doesn't seem to explain the crash in ptmalloc_init but is probably something that should be fixed.
Attachment 386469 [details] [diff] would resolve the inconsistency here by making __libc_memalign use jemalloc's memalign. That would also avoid the call to memalign_hook_ini, avoiding this crash.
Depends on: 493541
Updated•16 years ago
|
Component: jemalloc → Startup and Profile System
Product: Core → Toolkit
QA Contact: jemalloc → startup
Comment 15•16 years ago
|
||
(In reply to comment #14)
> nsAppRunner.cpp is linked in libxul.so so I wonder whether it has been unloaded
> at this stage so that it's static strings inserted into the environment are no
> longer valid pointers.
>
right. This matches my evaluation. hence almost all PR_SetEnv need to be leaked in this file which is why i used this macro. We can move it further down or go through the list and explicitly strcpy all but one PR_Set env things.
Comment 16•16 years ago
|
||
(In reply to comment #14)
>
> Attachment 386469 [details] [diff] would resolve the inconsistency here by making
> __libc_memalign use jemalloc's memalign. That would also avoid the call to
> memalign_hook_ini, avoiding this crash.
I can check your patch as well. however, we also had other problems because of environ getting trashed after unloading libxul.so (namely, the libatk bridge made things crash when it tried to access environ in exit_hook); so leaking is probably still needed.
Comment 17•16 years ago
|
||
(In reply to comment #16)
> however, we also had other problems because of environ getting trashed after
> unloading libxul.so (namely, the libatk bridge made things crash when it
> tried to access environ in exit_hook); so leaking is probably still needed.
Yes, sounds like leaking would still be needed.
Here too:
http://hg.mozilla.org/mozilla-central/file/0471fa063e43/accessible/src/atk/nsAppRootAccessible.cpp#l567
I'm not a toolkit peer, but I think it would be better to change only the
instances that are not already leaked to use a function name that is not
PR_SetEnv. SaveWordToEnv() is probably fine for this.
Maybe the attempts to clear environment values could check that they are set
before needing to leak:
http://hg.mozilla.org/mozilla-central/file/f46e6aee1335/toolkit/xre/nsAppRunner.cpp#l3303
Comment 18•16 years ago
|
||
Just my notes on various environment APIs:
glibc's setenv (which copies) will reuse strings (from previous setenv calls)
if the value matches an old value for a variable. I don't know why it never
modifies or frees old strings. Perhaps this is to make the result of getenv
persistant but that is not guaranteed anyway because of putenv.
POSIX Programmer's Manual page for putenv says "The space used by string is no
longer used once a new string which defines name is passed to putenv()."
PR_SetEnv() docs say:
"The caller must ensure that the string passed to PR_SetEnv() is persistent."
nsIEnvironment deletes strings passed to PR_SetEnv once the value has been
replaced with another.
http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsIEnvironment.idl
IEEE Std 1003.1, 2003 Edition POSIX does not have const in the prototype
definition for putenv, but I don't know why this is because there doesn't seem
to be any reason why putenv should modify the string:
int putenv(char *string);
/home/karl/moz/dev/nsprpub/pr/src/misc/prenv.c: In function 'PR_SetEnv':
/home/karl/moz/dev/nsprpub/pr/src/misc/prenv.c:96: warning: passing argument 1 of 'putenv' discards qualifiers from pointer target type
Comment 19•16 years ago
|
||
(In reply to comment #17)
> (In reply to comment #16)
> > however, we also had other problems because of environ getting trashed after
> > unloading libxul.so (namely, the libatk bridge made things crash when it
> > tried to access environ in exit_hook); so leaking is probably still needed.
>
> Yes, sounds like leaking would still be needed.
Actually, on further thought, the atk crash is probably caused by something else.
If dependent libraries were unloaded before atexit hooks run then I expect bigger problems.
It's probably worth checking our theory here that libxul has been unloaded even for the crash in comment 3. "info shared" at the time of the crash will show the loaded shared libraries.
Comment 20•16 years ago
|
||
atk is generally broken, there are bugs, you can find them.
putenv isn't const because on some platforms you can do this:
char x[]="test=hello cruel world";
putenv(x);
char *evil = getenv("test");
evil[11]='d';
evil[13]='o';
evil[14]='l';
Updated•15 years ago
|
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
See Also: → https://launchpad.net/bugs/319480
You need to log in
before you can comment on or make changes to this bug.
Description
•