Closed Bug 423334 Opened 17 years ago Closed 17 years ago

crash at startup in [@ NS_CompareVersions] when using --with-libxul-sdk

Categories

(Core :: XPCOM, defect, P2)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla1.9

People

(Reporter: fta+bugzilla, Assigned: jasone)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This with trunk 20080316 0643 built with --with-libxul-sdk. 20080314 1505 was ok. [Switching to Thread 0xb7cea6c0 (LWP 27665)] free (ptr=0x805e9d8) at jemalloc.c:4222 4222 jemalloc.c: No such file or directory. in jemalloc.c (gdb) bt #0 free (ptr=0x805e9d8) at jemalloc.c:4222 #1 0xb78d847a in NS_CompareVersions (A=0x805e998 "1.9b5pre", B=0xb7b9af1c "1.9b5pre") at nsVersionComparator.cpp:220 #2 0xb714be01 in XRE_main (argc=1, argv=0xbf917b34, aAppData=0x805dfe8) at nsAppRunner.cpp:2568 #3 0x08049033 in ?? () #4 0x00000001 in ?? () #5 0xbf917b34 in ?? () #6 0x0805dfe8 in ?? () #7 0x00000000 in ?? () (gdb) (gdb) bt f #0 free (ptr=0x805e9d8) at jemalloc.c:4222 No locals. #1 0xb78d847a in NS_CompareVersions (A=0x805e998 "1.9b5pre", B=0xb7b9af1c "1.9b5pre") at nsVersionComparator.cpp:220 va = {numA = 9, strB = 0x805e9db "b5pre", strBlen = 1, numC = 5, extraD = 0x805e9dd "pre"} vb = {numA = 9, strB = 0x805e9eb "b5pre", strBlen = 1, numC = 5, extraD = 0x805e9ed "pre"} A2 = 0x805e9d8 "1" B2 = 0x805e9e8 "1" result = 0 a = 0x0 b = 0x0 #2 0xb714be01 in XRE_main (argc=1, argv=0xbf917b34, aAppData=0x805dfe8) at nsAppRunner.cpp:2568 rv = 0 ar = <value optimized out> override = 0x0 appData = {<nsXREAppData> = {size = 56, directory = 0x805e118, vendor = 0x805ea28 "Mozilla", name = 0x805ea08 "Firefox", version = 0x805ea18 "3.0b5pre", buildID = 0x805e988 "2008031618", ID = 0x805e208 "{ec8030f7-c20a-464f-9b0e-13a3a9e97384}", copyright = 0x805e238 "Copyright (c) 1998 - 2008 mozilla.org", flags = 6, xreDirectory = 0x805e190, minVersion = 0x805e998 "1.9b5pre", maxVersion = 0x805e9a8 "1.9b5pre", crashReporterURL = 0x805e268 "https://crash-reports.mozilla.com/submit", profile = 0x0}, <No data fields>} iniFile = {<nsCOMPtr_base> = {mRawPtr = 0x805e298}, <No data fields>} localIniFile = {<nsCOMPtr_base> = {mRawPtr = 0x805e298}, <No data fields>} parser = { mSections = {<nsBaseHashtable<nsDepCharHashKey,nsAutoPtr<nsINIParser_internal::INIValue>,nsINIParser_internal::INIValue*>> = {<nsTHashtable<nsBaseHashtableET<nsDepCharHashKey, nsAutoPtr<nsINIParser_internal::INIValue> > >> = {mTable = { ops = 0xb7b99cd4, data = 0x0, hashShift = 28, maxAlphaFrac = 192 '�, minAlphaFrac = 64 '@', entrySize = 12, entryCount = 1, removedCount = 0, generation = 0, entryStore = 0x81050c0 ""}}, <No data fields>}, <No data fields>}, mFileContents = {mRawPtr = 0x805e478 "[Build"}} i = <value optimized out> #3 0x08049033 in ?? () No symbol table info available. #4 0x00000001 in ?? () No symbol table info available. #5 0xbf917b34 in ?? () No symbol table info available. #6 0x0805dfe8 in ?? () No symbol table info available. #7 0x00000000 in ?? () No symbol table info available.
Severity: normal → critical
Keywords: crash
you'll need symbols, but this should mean that someone corrupted the heap. start by looking for binary extensions. (please do get debug symbols --enable-debugger-info-modules.)
building both xulrunner and firefox with --enable-debugger-info-modules=yes doesn't add anything to the traceback. building also with --enable-debug doesn't work with --with-libxul-sdk as firefox wants -lxpcom_core which is not provided by xulrunner (which builds it as a .a linked inside libxul.so). If I drop --with-libxul-sdk, it builds fine with --enable-debug but it doesn't crash, which is expected as the crash is when NS_CompareVersions() compares the GRE versions.
whose libxul are you using? --disable-strip ?
Component: General → XPCOM
Product: Firefox → Core
QA Contact: general → xpcom
Summary: crash at startup in NS_CompareVersions → crash at startup in [@ NS_CompareVersions]
This is breaking users of --with-libxul-sdk (which includes the Linux distros). This needs to be fixed for b5 since Ubuntu (at least) actually ships our betas.
Flags: blocking1.9?
Priority: -- → P1
Summary: crash at startup in [@ NS_CompareVersions] → crash at startup in [@ NS_CompareVersions] when using --with-libxul-sdk
ome reason free()ing the pointer we just allocated with strdup() earlier in this method fails. Seems odd, unless we suddenly started using jemalloc in between.
Priority: P1 → --
Priority: -- → P1
There is insufficient context provided in this bug report for me to understand what the actual report is (or maybe I'm lacking assumed background knowledge). Is the crash in Firefox? In another application that links against libxul? If another application, is libxul linked against, or is dlopen() used? The fact that gdb was unable to find sources for jemalloc.c in the backtrace makes me think that the wrong source tree was being used while examining the program state. Lacking more specific information, my guess is that the crash is due to a program loading libxul via dlopen(), which causes the system malloc implementation to be used by the main application, and jemalloc to be used by libxul. If this guess is right, and using dlopen() to load libxul is considerted a supported mode of operation, then we have no choice but to keep jemalloc separate from libxul.
Status: NEW → ASSIGNED
this bug is about the firefox package in ubuntu being build --with-libxul-sdk. I am pretty sure it uses dlopen to load libxul.so in some way (otherwise i wouldn't be able to load different libxul.so builds using the gre.d conf mechanism, right?). BTW, every application using the standalone glue uses dlopen as well.
Jason, yes, loading libxul via dlopen need to be a supported method... this is how the standalone glue works: 1) Start up without any dynamic dependencies on XPCOM/libxul 2) read configuration files (or the registry on windows) to find a compatible libxul 3) load it with dlopen
Target Milestone: --- → mozilla1.9beta5
+ing so we resolve one way or another r.e linux compat
Flags: blocking1.9? → blocking1.9+
Assignee: nobody → jasone
Status: ASSIGNED → NEW
per shaver and I agree this doesn't need to be a beta5 blocker
I wonder what else is needed to reproduce this, because I ship nightly --with-libxul-sdk builds in Fedora, and haven't hit this yet...
Blocks: 418016
Keywords: regression
This patch will simply leak the INIParser that is being used across the potential change in allocators. Alexander, can you try applying this to your packages to see if that fixes the problem?
here are my build options: xulrunner: --enable-system-cairo --disable-system-sqlite --with-system-nspr --with-system-nss --enable-application=xulrunner --enable-extensions=xmlrpc,venkman,inspector,irc,gnomevfs,cview,tasks,reporter,python/xpcom --enable-webservices --enable-safe-browsing --with-default-mozilla-five-home=/usr/lib/xulrunner-1.9b5pre --enable-startup-notification --with-user-appdir=.mozilla --with-system-jpeg=/usr --with-system-zlib=/usr --enable-system-hunspell --disable-javaxpcom --disable-crashreporter --disable-elf-dynstr-gc --disable-installer --disable-strip --disable-strip-libs --disable-install-strip --disable-tests --disable-mochitest --disable-updater --enable-optimize --with-distribution-id=com.ubuntu and firefox: --enable-system-cairo --disable-system-sqlite --disable-debug --with-user-appdir=.mozilla --with-system-jpeg=/usr --with-system-zlib=/usr --with-libxul-sdk=/usr/lib/xulrunner-devel-1.9b5pre --disable-crashreporter --disable-elf-dynstr-gc --disable-gtktest --disable-install-strip --disable-installer --disable-profilesharing --disable-strip --disable-strip-libs --disable-tests --disable-mochitest --disable-updater --disable-xprint --enable-application=browser --enable-canvas --enable-default-toolkit=cairo-gtk2 --enable-gnomevfs --enable-optimize --enable-pango --enable-postscript --enable-svg --enable-mathml --enable-xft --enable-xinerama --enable-extensions=default,-reporter --enable-single-profile --enable-system-myspell --enable-official-branding --with-distribution-id=com.ubuntu
No longer blocks: 418016
Flags: blocking1.9+ → blocking1.9?
Keywords: regression
Blocks: 418016
Keywords: regression
Flags: blocking1.9? → blocking1.9+
(In reply to comment #13) > Created an attachment (id=310581) [details] > Just leak the INIParser, rev. 1 > > This patch will simply leak the INIParser that is being used across the > potential change in allocators. Alexander, can you try applying this to your > packages to see if that fixes the problem? I just did it. no change, it's still crashing, exact same place (in the free()).
Assignee: jasone → benjamin
it sounds like we really want jemalloc to be a shared library that apps on linux just link to. we want the linkage to happen super crazy early, not via dlopen.
i think we want to move jemalloc from libxul to *-bin. i spoke w/ stuart and he couldn't come up w/ any arguments against it. this means that anyone who loads libxul and isn't using jemalloc directly doesn't get the win for it, but in theory they should get a binary that doesn't crash at startup (macosx is an exception, but we aren't supporting jemalloc there anyway).
The statically-linked library will only work if the posix_memalign stuff continues to work in either configuration. Are there Linux distros which don't have posix_memalign?
It occurs that in Ubuntu, and maybe in Debian too, the builder sets LDFLAGS to -Wl,-Bsymbolic-functions by default, unless it is told otherwise. I've dropped that and the crash disappeared (with the leak patch applied). I will retry without the patch. Thanks to Alexander for noticing this.
Flags: blocking1.9+ → blocking1.9?
Flags: blocking1.9? → blocking1.9+
it works without the patch, just by dropping -Wl,-Bsymbolic-functions.
(In reply to comment #20) > it works without the patch, just by dropping -Wl,-Bsymbolic-functions. > Is that an acceptable workaround?
This -Wl,-Bsymbolic-functions has been set to improve performance of the whole distribution. See https://wiki.ubuntu.com/DistCompilerFlags So here, it's a trade between two performance improvements. I don't have figures to compare which one brings the best results but it feels slower now. It would be nice to find another solution, such as the one proposed by timeless in comment #17.
Can we get the expedient fix for b5, then leave this open and blocking final for the more correct fix?
We can ship b5 without that ld flag in Ubuntu but please consider a more correct fix before final. I'd be happy to test whatever solution comes up.
Priority: P1 → P2
Target Milestone: mozilla1.9beta5 → mozilla1.9
I don't think this needs to block our release. We definitely are trying to interpose malloc/free from libxul. According to my understanding of -Bsymbolic-functions, you can't compile libc with symbolic-functions, but I don't see an obvious reason why it shouldn't work to link libxul with Bsymbolic-functions. In any case, it's not something I'm going to have time to fix. Well-tested patches welcome.
Assignee: benjamin → nobody
On the availability of posix_memalign, it's part of LSB 3.0 and later, and according to https://www.linux-foundation.org/lsb-cert/productdir.php?by_lsb , all of the Linux versions we intend to support ( http://developer.mozilla.org/en/docs/Linux_compatibility_reference ) are LSB 3.
http://bugs.opensolaris.org/view_bug.do;jsessionid=9e22e2018476faffffffffca050f38d727151?bug_id=6493264 while libxul contains jemalloc, isn't it essentially a libc, and hence something that can't be built w/ -Bsymbolic-functions?
Taking the advice in Comment #25 and -'ing.
Flags: blocking1.9+ → blocking1.9-
Status: NEW → ASSIGNED
Assignee: nobody → jasone
Status: ASSIGNED → NEW
This should be fixed by the reversion of bug #418016.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Crash Signature: [@ NS_CompareVersions]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: