Closed
Bug 423334
Opened 17 years ago
Closed 17 years ago
crash at startup in [@ NS_CompareVersions] when using --with-libxul-sdk
Categories
(Core :: XPCOM, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla1.9
People
(Reporter: fta+bugzilla, Assigned: jasone)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
995 bytes,
patch
|
Details | Diff | Splinter Review |
This with trunk 20080316 0643 built with --with-libxul-sdk.
20080314 1505 was ok.
[Switching to Thread 0xb7cea6c0 (LWP 27665)]
free (ptr=0x805e9d8) at jemalloc.c:4222
4222 jemalloc.c: No such file or directory.
in jemalloc.c
(gdb) bt
#0 free (ptr=0x805e9d8) at jemalloc.c:4222
#1 0xb78d847a in NS_CompareVersions (A=0x805e998 "1.9b5pre", B=0xb7b9af1c "1.9b5pre") at nsVersionComparator.cpp:220
#2 0xb714be01 in XRE_main (argc=1, argv=0xbf917b34, aAppData=0x805dfe8) at nsAppRunner.cpp:2568
#3 0x08049033 in ?? ()
#4 0x00000001 in ?? ()
#5 0xbf917b34 in ?? ()
#6 0x0805dfe8 in ?? ()
#7 0x00000000 in ?? ()
(gdb) (gdb) bt f
#0 free (ptr=0x805e9d8) at jemalloc.c:4222
No locals.
#1 0xb78d847a in NS_CompareVersions (A=0x805e998 "1.9b5pre", B=0xb7b9af1c "1.9b5pre") at nsVersionComparator.cpp:220
va = {numA = 9, strB = 0x805e9db "b5pre", strBlen = 1, numC = 5, extraD = 0x805e9dd "pre"}
vb = {numA = 9, strB = 0x805e9eb "b5pre", strBlen = 1, numC = 5, extraD = 0x805e9ed "pre"}
A2 = 0x805e9d8 "1"
B2 = 0x805e9e8 "1"
result = 0
a = 0x0
b = 0x0
#2 0xb714be01 in XRE_main (argc=1, argv=0xbf917b34, aAppData=0x805dfe8) at nsAppRunner.cpp:2568
rv = 0
ar = <value optimized out>
override = 0x0
appData = {<nsXREAppData> = {size = 56, directory = 0x805e118, vendor = 0x805ea28 "Mozilla",
name = 0x805ea08 "Firefox", version = 0x805ea18 "3.0b5pre", buildID = 0x805e988 "2008031618",
ID = 0x805e208 "{ec8030f7-c20a-464f-9b0e-13a3a9e97384}", copyright = 0x805e238 "Copyright (c) 1998 - 2008 mozilla.org",
flags = 6, xreDirectory = 0x805e190, minVersion = 0x805e998 "1.9b5pre", maxVersion = 0x805e9a8 "1.9b5pre",
crashReporterURL = 0x805e268 "https://crash-reports.mozilla.com/submit", profile = 0x0}, <No data fields>}
iniFile = {<nsCOMPtr_base> = {mRawPtr = 0x805e298}, <No data fields>}
localIniFile = {<nsCOMPtr_base> = {mRawPtr = 0x805e298}, <No data fields>}
parser = {
mSections = {<nsBaseHashtable<nsDepCharHashKey,nsAutoPtr<nsINIParser_internal::INIValue>,nsINIParser_internal::INIValue*>> = {<nsTHashtable<nsBaseHashtableET<nsDepCharHashKey, nsAutoPtr<nsINIParser_internal::INIValue> > >> = {mTable = {
ops = 0xb7b99cd4, data = 0x0, hashShift = 28, maxAlphaFrac = 192 '�, minAlphaFrac = 64 '@', entrySize = 12,
entryCount = 1, removedCount = 0, generation = 0,
entryStore = 0x81050c0 ""}}, <No data fields>}, <No data fields>}, mFileContents = {mRawPtr = 0x805e478 "[Build"}}
i = <value optimized out>
#3 0x08049033 in ?? ()
No symbol table info available.
#4 0x00000001 in ?? ()
No symbol table info available.
#5 0xbf917b34 in ?? ()
No symbol table info available.
#6 0x0805dfe8 in ?? ()
No symbol table info available.
#7 0x00000000 in ?? ()
No symbol table info available.
you'll need symbols, but this should mean that someone corrupted the heap. start by looking for binary extensions. (please do get debug symbols --enable-debugger-info-modules.)
Reporter | ||
Comment 2•17 years ago
|
||
building both xulrunner and firefox with --enable-debugger-info-modules=yes doesn't add anything to the traceback.
building also with --enable-debug doesn't work with --with-libxul-sdk as firefox wants -lxpcom_core which is not provided by xulrunner (which builds it as a .a linked inside libxul.so).
If I drop --with-libxul-sdk, it builds fine with --enable-debug but it doesn't crash, which is expected as the crash is when NS_CompareVersions() compares the GRE versions.
whose libxul are you using?
--disable-strip ?
Component: General → XPCOM
Product: Firefox → Core
QA Contact: general → xpcom
Summary: crash at startup in NS_CompareVersions → crash at startup in [@ NS_CompareVersions]
Reporter | ||
Comment 4•17 years ago
|
||
I've identified the commit that triggered this crash.
It's when bug 418016 has landed.
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2008-03-14+08%3A30%3A00&maxdate=2008-03-14+08%3A40%3A00&cvsroot=%2Fcvsroot
if I back this one out, it no longer crashes.
Comment 5•17 years ago
|
||
This is breaking users of --with-libxul-sdk (which includes the Linux distros). This needs to be fixed for b5 since Ubuntu (at least) actually ships our betas.
Flags: blocking1.9?
Priority: -- → P1
Summary: crash at startup in [@ NS_CompareVersions] → crash at startup in [@ NS_CompareVersions] when using --with-libxul-sdk
Comment 6•17 years ago
|
||
ome reason free()ing the pointer we just allocated with strdup() earlier in this method fails. Seems odd, unless we suddenly started using jemalloc in between.
Reporter | ||
Updated•17 years ago
|
Priority: P1 → --
Updated•17 years ago
|
Priority: -- → P1
Assignee | ||
Comment 7•17 years ago
|
||
There is insufficient context provided in this bug report for me to understand what the actual report is (or maybe I'm lacking assumed background knowledge). Is the crash in Firefox? In another application that links against libxul? If another application, is libxul linked against, or is dlopen() used?
The fact that gdb was unable to find sources for jemalloc.c in the backtrace makes me think that the wrong source tree was being used while examining the program state. Lacking more specific information, my guess is that the crash is due to a program loading libxul via dlopen(), which causes the system malloc implementation to be used by the main application, and jemalloc to be used by libxul. If this guess is right, and using dlopen() to load libxul is considerted a supported mode of operation, then we have no choice but to keep jemalloc separate from libxul.
Assignee | ||
Updated•17 years ago
|
Status: NEW → ASSIGNED
Comment 8•17 years ago
|
||
this bug is about the firefox package in ubuntu being build --with-libxul-sdk. I am pretty sure it uses dlopen to load libxul.so in some way (otherwise i wouldn't be able to load different libxul.so builds using the gre.d conf mechanism, right?).
BTW, every application using the standalone glue uses dlopen as well.
Comment 9•17 years ago
|
||
Jason, yes, loading libxul via dlopen need to be a supported method... this is how the standalone glue works:
1) Start up without any dynamic dependencies on XPCOM/libxul
2) read configuration files (or the registry on windows) to find a compatible libxul
3) load it with dlopen
Updated•17 years ago
|
Target Milestone: --- → mozilla1.9beta5
Comment 10•17 years ago
|
||
+ing so we resolve one way or another r.e linux compat
Flags: blocking1.9? → blocking1.9+
Updated•17 years ago
|
Assignee: nobody → jasone
Status: ASSIGNED → NEW
Comment 11•17 years ago
|
||
per shaver and I agree this doesn't need to be a beta5 blocker
Comment 12•17 years ago
|
||
I wonder what else is needed to reproduce this, because I ship nightly --with-libxul-sdk builds in Fedora, and haven't hit this yet...
Updated•17 years ago
|
Keywords: regression
Comment 13•17 years ago
|
||
This patch will simply leak the INIParser that is being used across the potential change in allocators. Alexander, can you try applying this to your packages to see if that fixes the problem?
Reporter | ||
Comment 14•17 years ago
|
||
here are my build options:
xulrunner:
--enable-system-cairo
--disable-system-sqlite
--with-system-nspr
--with-system-nss
--enable-application=xulrunner
--enable-extensions=xmlrpc,venkman,inspector,irc,gnomevfs,cview,tasks,reporter,python/xpcom
--enable-webservices
--enable-safe-browsing
--with-default-mozilla-five-home=/usr/lib/xulrunner-1.9b5pre
--enable-startup-notification
--with-user-appdir=.mozilla
--with-system-jpeg=/usr
--with-system-zlib=/usr
--enable-system-hunspell
--disable-javaxpcom
--disable-crashreporter
--disable-elf-dynstr-gc
--disable-installer
--disable-strip
--disable-strip-libs
--disable-install-strip
--disable-tests
--disable-mochitest
--disable-updater
--enable-optimize
--with-distribution-id=com.ubuntu
and firefox:
--enable-system-cairo
--disable-system-sqlite
--disable-debug
--with-user-appdir=.mozilla
--with-system-jpeg=/usr
--with-system-zlib=/usr
--with-libxul-sdk=/usr/lib/xulrunner-devel-1.9b5pre
--disable-crashreporter
--disable-elf-dynstr-gc
--disable-gtktest
--disable-install-strip
--disable-installer
--disable-profilesharing
--disable-strip
--disable-strip-libs
--disable-tests
--disable-mochitest
--disable-updater
--disable-xprint
--enable-application=browser
--enable-canvas
--enable-default-toolkit=cairo-gtk2
--enable-gnomevfs
--enable-optimize
--enable-pango
--enable-postscript
--enable-svg
--enable-mathml
--enable-xft
--enable-xinerama
--enable-extensions=default,-reporter
--enable-single-profile
--enable-system-myspell
--enable-official-branding
--with-distribution-id=com.ubuntu
Updated•17 years ago
|
Blocks: 418016
Keywords: regression
Updated•17 years ago
|
Flags: blocking1.9? → blocking1.9+
Reporter | ||
Comment 15•17 years ago
|
||
(In reply to comment #13)
> Created an attachment (id=310581) [details]
> Just leak the INIParser, rev. 1
>
> This patch will simply leak the INIParser that is being used across the
> potential change in allocators. Alexander, can you try applying this to your
> packages to see if that fixes the problem?
I just did it.
no change, it's still crashing, exact same place (in the free()).
Updated•17 years ago
|
Assignee: jasone → benjamin
Comment 16•17 years ago
|
||
it sounds like we really want jemalloc to be a shared library that apps on linux just link to. we want the linkage to happen super crazy early, not via dlopen.
Comment 17•17 years ago
|
||
i think we want to move jemalloc from libxul to *-bin.
i spoke w/ stuart and he couldn't come up w/ any arguments against it.
this means that anyone who loads libxul and isn't using jemalloc directly doesn't get the win for it, but in theory they should get a binary that doesn't crash at startup (macosx is an exception, but we aren't supporting jemalloc there anyway).
Comment 18•17 years ago
|
||
The statically-linked library will only work if the posix_memalign stuff continues to work in either configuration. Are there Linux distros which don't have posix_memalign?
Reporter | ||
Comment 19•17 years ago
|
||
It occurs that in Ubuntu, and maybe in Debian too, the builder sets LDFLAGS to -Wl,-Bsymbolic-functions by default, unless it is told otherwise.
I've dropped that and the crash disappeared (with the leak patch applied).
I will retry without the patch.
Thanks to Alexander for noticing this.
Flags: blocking1.9+ → blocking1.9?
Updated•17 years ago
|
Flags: blocking1.9? → blocking1.9+
Reporter | ||
Comment 20•17 years ago
|
||
it works without the patch, just by dropping -Wl,-Bsymbolic-functions.
Comment 21•17 years ago
|
||
(In reply to comment #20)
> it works without the patch, just by dropping -Wl,-Bsymbolic-functions.
>
Is that an acceptable workaround?
Reporter | ||
Comment 22•17 years ago
|
||
This -Wl,-Bsymbolic-functions has been set to improve performance of the whole distribution. See https://wiki.ubuntu.com/DistCompilerFlags
So here, it's a trade between two performance improvements. I don't have figures to compare which one brings the best results but it feels slower now.
It would be nice to find another solution, such as the one proposed by timeless in comment #17.
Comment 23•17 years ago
|
||
Can we get the expedient fix for b5, then leave this open and blocking final for the more correct fix?
Reporter | ||
Comment 24•17 years ago
|
||
We can ship b5 without that ld flag in Ubuntu but please consider a more correct fix before final. I'd be happy to test whatever solution comes up.
Reporter | ||
Updated•17 years ago
|
Priority: P1 → P2
Target Milestone: mozilla1.9beta5 → mozilla1.9
Comment 25•17 years ago
|
||
I don't think this needs to block our release.
We definitely are trying to interpose malloc/free from libxul. According to my understanding of -Bsymbolic-functions, you can't compile libc with symbolic-functions, but I don't see an obvious reason why it shouldn't work to link libxul with Bsymbolic-functions.
In any case, it's not something I'm going to have time to fix. Well-tested patches welcome.
Assignee: benjamin → nobody
Comment 26•17 years ago
|
||
On the availability of posix_memalign, it's part of LSB 3.0 and later, and according to https://www.linux-foundation.org/lsb-cert/productdir.php?by_lsb , all of the Linux versions we intend to support ( http://developer.mozilla.org/en/docs/Linux_compatibility_reference ) are LSB 3.
Comment 27•17 years ago
|
||
http://bugs.opensolaris.org/view_bug.do;jsessionid=9e22e2018476faffffffffca050f38d727151?bug_id=6493264
while libxul contains jemalloc, isn't it essentially a libc, and hence something that can't be built w/ -Bsymbolic-functions?
Comment 28•17 years ago
|
||
Taking the advice in Comment #25 and -'ing.
Flags: blocking1.9+ → blocking1.9-
Assignee | ||
Updated•17 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Updated•17 years ago
|
Assignee: nobody → jasone
Status: ASSIGNED → NEW
Assignee | ||
Comment 29•17 years ago
|
||
This should be fixed by the reversion of bug #418016.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
Crash Signature: [@ NS_CompareVersions]
You need to log in
before you can comment on or make changes to this bug.
Description
•