Closed Bug 29182 Opened 25 years ago Closed 24 years ago

If built with --disable-debug Mozilla crashes while registering components

Categories

(Core :: XPCOM, defect, P3)

x86
Linux
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: vesuri, Assigned: shaver)

Details

(Keywords: crash, relnote, Whiteboard: [dogfood-][nsbeta3-])

Overview Description:

If built with --disable-debug Mozilla crashes while registering its
components.

Steps to Reproduce: Build with --disable-debug, run.

Actual Results: A segmentation fault. GDB output follows, here TestXPC
is run but the same applies to the browser itself:

(gdb) run
Starting program: /home/build/mozilla-build/dist/bin/./TestXPC
(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...(no debugging symbols found)...
Program received signal SIGSEGV, Segmentation fault.
0x40008714 in _dl_relocate_object (l=0x80ee4d8, scope=0x80ee6dc, lazy=1,
    consider_profiling=0) at ../sysdeps/i386/dl-machine.h:326
326     ../sysdeps/i386/dl-machine.h: No such file or directory.
(gdb) bt
#0  0x40008714 in _dl_relocate_object (l=0x80ee4d8, scope=0x80ee6dc, lazy=1,
    consider_profiling=0) at ../sysdeps/i386/dl-machine.h:326
#1  0x402a8df4 in dl_open_worker (a=0xbffff0bc) at dl-open.c:182
#2  0x40009bde in _dl_catch_error (errstring=0xbffff0b8,
    operate=0x402a8b84 <dl_open_worker>, args=0xbffff0bc) at dl-error.c:141
#3  0x402a8f55 in _dl_open (
    file=0x80c1a18 "/home/build/mozilla-build/dist/bin/components/libnspng.so",
mode=1, caller=0x4011f62b) at dl-open.c:232
#4  0x40178ffd in dlopen_doit (a=0xbffff20c) at dlopen.c:41
#5  0x40009bde in _dl_catch_error (errstring=0x804ec28,
    operate=0x40178fd0 <dlopen_doit>, args=0xbffff20c) at dl-error.c:141
#6  0x40179642 in _dlerror_run (operate=0x40178fd0 <dlopen_doit>,
    args=0xbffff20c) at dlerror.c:125
#7  0x4017903e in __dlopen_check (
    file=0x80c1a18 "/home/build/mozilla-build/dist/bin/components/libnspng.so",
mode=1) at dlopen.c:53
#8  0x4011f62b in pr_LoadLibraryByPathname ()
   from /home/build/mozilla-build/dist/bin/libnspr4.so
#9  0x4011f550 in PR_LoadLibraryWithFlags ()
   from /home/build/mozilla-build/dist/bin/libnspr4.so
#10 0x4011f589 in PR_LoadLibrary ()
   from /home/build/mozilla-build/dist/bin/libnspr4.so
#11 0x400c8668 in nsLocalFile::Load ()
   from /home/build/mozilla-build/dist/bin/libxpcom.so
#12 0x400e2a32 in nsDll::Load ()
   from /home/build/mozilla-build/dist/bin/libxpcom.so
#13 0x400dcb75 in nsNativeComponentLoader::SelfRegisterDll ()
   from /home/build/mozilla-build/dist/bin/libxpcom.so
#14 0x400dd2f8 in nsNativeComponentLoader::AutoRegisterComponent ()
   from /home/build/mozilla-build/dist/bin/libxpcom.so
#15 0x400dc97e in nsNativeComponentLoader::RegisterComponentsInDir ()
   from /home/build/mozilla-build/dist/bin/libxpcom.so
#16 0x400dc889 in nsNativeComponentLoader::AutoRegisterComponents ()
   from /home/build/mozilla-build/dist/bin/libxpcom.so
#17 0x400daf9a in nsComponentManagerImpl::AutoRegister ()
   from /home/build/mozilla-build/dist/bin/libxpcom.so
#18 0x400dffe2 in nsComponentManager::AutoRegister ()
   from /home/build/mozilla-build/dist/bin/libxpcom.so
#19 0x8049702 in JS_PushArguments ()
---Type <return> to continue, or q <return> to quit---
#20 0x804aea1 in JS_PushArguments ()
#21 0x401f7711 in __libc_start_main (main=0x804ae68 <JS_PushArguments+6260>,
    argc=1, argv=0xbffffb94, init=0x80492d0 <_init>, fini=0x804b594 <_fini>,
    rtld_fini=0x40009df4 <_dl_fini>, stack_end=0xbffffb8c)
    at ../sysdeps/generic/libc-start.c:90
(gdb)

Expected results: Succesful component registering and startup.

Build Date & Platform: I first encountered the bug in November and it has
followed me all the way. The latest I've tried is

-rw-rw-r--   1 22       21770161 Feb 24 09:51 mozilla-source.tar.gz

I'm running on an i686-pc-linux-gnu, kernel 2.2.14, glibc 2.1.2, gcc 2.95.2,
a self built system with no problems whatsoever with other programs.

Additional information: Mozilla works just fine if built without
--disable-debug. I succesfully built the same sources with --enable-optimize
--enable-strip-libs --enable-x11-shm and I'm writing this bug report with the
results. However, if I use --disable-debug it just won't work and it never has
worked here.

Don't hesitate to ask if you've got questions.
Now that glibc-2.1.3 is officially out I compiled both GCC (for building
libstdc++) and Mozilla against it. However, the bug remains. The backtrace is a
little different now, it won't die in _dl_relocate_object but in
_dl_lookup_symbol instead:

(gdb) run
Starting program: /home/build/mozilla-build/dist/bin/TestXPC 

Program received signal SIGSEGV, Segmentation fault.
0x40007538 in _dl_lookup_symbol (
    undef_name=0x7da5d989 <Address 0x7da5d989 out of bounds>, ref=0xbfffee9c, 
    symbol_scope=0x80f03b4, 
    reference_name=0x8076940 "./components/libnspng.so", reloc_type=18)
    at ../sysdeps/i386/i686/dl-hash.h:76
76      ../sysdeps/i386/i686/dl-hash.h: No such file or directory.
(gdb) bt
#0  0x40007538 in _dl_lookup_symbol (
    undef_name=0x7da5d989 <Address 0x7da5d989 out of bounds>, ref=0xbfffee9c, 
    symbol_scope=0x80f03b4, 
    reference_name=0x8076940 "./components/libnspng.so", reloc_type=18)
    at ../sysdeps/i386/i686/dl-hash.h:76
#1  0x400092a3 in _dl_relocate_object (l=0x80f01b0, scope=0x80f03b4, lazy=1, 
    consider_profiling=0) at ../sysdeps/i386/dl-machine.h:326
#2  0x402fae64 in dl_open_worker (a=0xbffff03c) at dl-open.c:182

-Vesa
The loader on older linux weren't thread safe and we had to use. We got it 
fixed in the Redhat 6.0 distribution. This looks so much like the same bug: 
Crash when loading dlls.

Shaver ? Does this look like the same. Any suggestion on what needs to be 
installed.
Assignee: dp → shaver
marking new and cc'ing blizzard. vesuri, are you still seeing the problem?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Yes, the problem is still there. I just downloaded the latest sources (Apr 23
19:03) and built with --disable-debug. Backtrace goes through
pr_LoadLibraryByPathname() -> __dlopen_check -> _dlerror_run -> _dl_catch_error
-> dlopen_doit -> _dl_open -> _dl_catch_error -> dl_open_worker -> 
_dl_relocate_object and dies there:

#0  0x40009234 in _dl_relocate_object (l=0x808cad8, scope=0x808ccdc, lazy=1, 
    consider_profiling=0) at ../sysdeps/i386/dl-machine.h:326
326     ../sysdeps/i386/dl-machine.h: No such file or directory.

And again, this is GCC 2.95.2 with binutils 2.9.5.0.35 and glibc-2.1.3. Hmm,
reloc fails. This is weird, indeed.
Adding crash keyword.
Keywords: crash
reassigning to the (hopefully) correct instance of shaver
Assignee: shaver → shaver
--disable-debug shouldn't matter, if it's the same bug, and I don't think
TestXPC uses more than one thread.

It's always libnspng.so, though, which is interesting.  If you remove that
library, what happens?  Does your build use the system libpng, or build the one
out of the Mozilla tree?
Status: NEW → ASSIGNED
Assignee: shaver → vesuri
Status: ASSIGNED → NEW
I need answers to those questions to fix this, so I'm reassigning to Vesa until
I can get them.
                                         
Status: NEW → ASSIGNED
This bug is the same as 41414.
worksforme
And SuSE's going to ship 7.0 with a debugging version because it wouldn't
work for hundred thousands of people using SuSE linux.
Maybe I could do you a favour and send you a SuSE Linux distribution
to reproduce it yourself?

BTW: It's not a SuSE problem because I'm having the same problem on my
homebrewn distribution which is completely different. The only common
thing is the latest glibc2.1.3 on both systems.
Didn't the reporter of this bug (Vesa Halttunen, vesuri@jormas.com)
say he was using glibc 2.1.2?
A few lines above he stated that the bug is still there for him with
glibc2.1.3. What are the people using who do NOT see the bug?
I did use glibc 2.1.2 when I first reported the bug but if you read my comments
you'll notice the bug is also present when using glibc 2.1.3. I haven't had
time to look into this and I'm sorry for that.

To me it seems as if it might be a bug in either
glibc or in the dynamic loader (ld.so). Another thing popped into my mind as
well; is it possible that this comes up if Mozilla gets linked against the
libpng distributed with Mozilla but the shared library in the system is a
different version? I did have some strange crashing problems when I updated my
libpng a few weeks ago.. This is pure speculation since I'm currently not able
to test it (being in the US, not home) but I think I will do that when I get
back. What comes to Daniel Egger's comment I'd like to emphasis that I have a
homebrewn distribution as well - all binaries on my system have been built by
me. And all of them do work just fine. If someone else is willing to look into
this please go ahead but I do have this thing in my mind, on a lower priority =)
I reported bug 41414.  I'm using Redhat 6.2 + all the errata/security updates +
some of the Rawhide (redhat beta) rpms.  But no one on redhat6.2 has been able
to reproduce.  I dunno why.  I'm using glibc 2.1.3 and i compile without any
options regarding libpng in my .mozconfig.  I see this bug in the nightlies also
but I don't know what options regarding libpng are included there.  If someone
wants I can send them an "rpm -qa".
I'm on Red Hat 6.2 with glibc 2.1.3.  Pretty vanilla system.
*** Bug 41414 has been marked as a duplicate of this bug. ***
Last night I downloaded a talkback build for the first time and generated around
42 talkback reports for this bug.  How do I go about getting these connected to
this bug? Thanks.
*** Bug 47046 has been marked as a duplicate of this bug. ***
Shaver, please see my talkback reports in bug 47046.
Assignee: vesuri → shaver
Severity: normal → critical
Status: ASSIGNED → NEW
GetNewOrUsedProxy?  Smells like something jband would know more about.

It's not clear to me why this bug is assigned to me, but I'm pretty sure I'm not
going to have time to fix it.
Putting on [dogfood-] radar.  Not critical to everyday use. 
Whiteboard: [dogfood-]
GetNewOrUsedProxy would be dougt's world.
boy, this bug has morphed.  The first stack looks like it is the js component 
loader.  The Vesa Halttunen adds another stack that looks different - maybe a 
build problem then states it was from source that was pulled on Apr 23!  Lastly 
Daniel Egger and Trudelle believe that this is a dup of 41414 and 47046. Ugh.

I am going to reopen 47046 and take a look at it.  
I've seeing this bug on glibc 2.1.3 box. added myself in cc list.
I don't follow, dougt: how does the first stack look like the JS component
loader?  I see the native component loader on the stack, trying to load our PNG
plugin -- and possibly barfing on the entrained libpng.so, though I never really
got an answer to that question -- but there's no JS component loader anywhere.

I don't know what to do with this bug: I don't have enough information to fix
it, it doesn't happen for me, and I'm not motivated to go install SuSE 7.x -- or
duplicate Vesa's frankensystem.  I'm pretty tempted to mark it WONTFIX, since I
won't fix it.

(It would be interesting to see if just turning off symbolic debugging
information, versus turning off all the -DDEBUG stuff, makes a difference. 
Also, what happens if you rebuild just libnspng.so without debugging?)
Ok, new info: After a recompile of glibc it finally works on my system without a
debugging build of glibc though it still does not on a standard SuSE 7.0 system.

Mike: No, recompiling libns* with debugging doesn't help.

mike - I don't know what I was smoking.
Marking nsbeta3- per pdt review.
Whiteboard: [dogfood-] → [dogfood-][nsbeta3-]
Daniel: Let me make sure I get this straight: you recompiled the same version of
glibc, and now it works?  What options do you use with glibc?  SuSE?
It seems unclear to me whether this bug requires either of a "developer" or 
"user" release note for Netscape 6 RTM. If anyone feels it does, can they please 
draft one and then nominate with the relnote-user or relnote-devel strings in 
the Status Whiteboard.

Thanks :-)

Gerv
Keywords: relnote2
Options? I haven't used any special options for glibc ... and I use
--disable-debug --enable-optimize for mozilla normally.
leger@netscape.com is no longer a valid email. reassigning qa contact to the
component's default.
QA Contact: leger → rayw
Can anyone please verify this bug or close it? I haven't seen it for ages now
so I guess it has been fixed in the meantime.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → WORKSFORME
I'm going to mark this as WORKSFORME since shaver is MIA right now and I am his 
stunt double.
Component: XPCOM Registry → XPCOM
QA Contact: rayw → xpcom
You need to log in before you can comment on or make changes to this bug.