Closed Bug 4303 Opened 25 years ago Closed 25 years ago

[PP]Running the first time (without ~/.mozilla/registry) crashes both viewer and apprunner


(NSPR :: NSPR, defect, P1)



(Not tracked)



(Reporter: ramiro, Assigned: alecf)


I get the following error:

*** SilentDownload is being registered
Load(/builds3/ramiro/s/mozilla/dist/bin/components/ FAILED with
error: /builds3/ramiro/s/mozilla/dist/bin/components/ undefined
symbol: ZIP_CloseArchive
nsComponentManager: Load( FAILED with error:
/builds3/ramiro/s/mozilla/dist/bin/components/ undefined symbol:
Aborted (core dumped)
There are two separate problems here. The first is a problem with the XPinstall
build. But XPCOM should not dump core because it can't load one of the files in
the components directory.
There are two separate problems here. The first is a problem with the XPinstall
build. But XPCOM should not dump core because it can't load one of the files in
the components directory.
dp is working on the dlopen problem.

I see the same behavior in both dual and single cpu machines, fyi.
This looks related to 4306.
Assignee: dveditz → larryh
Component: XPInstall → NSPR
For the record, the problem is that linux seems to core dump on dlopen() if a
previous dlopen() failed.

It is supected that PR_LoadLibrary() isn't clearing error if there was one and
that is causing this wierdness.
Ccing shaver in.
dp / ramiro are you sure it core dumps?  For me, it was just exiting with 1.
I ran the dlltest.c test case again on both RedHat 5.2 and RH52 with kernel
2.2.1. I was unable to reproduce the symptom. ... OK, dlltest.c is pretty
simple. So, I hacked dlltest.c to do a PR_LoadLibrary() on a known non existent
library, then did another PR_LoadLibrary() on a library known to exist. The test

Absent a core dump, stack trace, other diagnostic data, I'm stuck. Somebody got
more data?
Today, 3-29-99I am not seeing this problem

I removed my registry and ran apprunner, and it did not exit 1.

I'll double check and report back
This problem has been sporadic for the last few weeks.  Sometimes I see it if I
recompile a library and forget a symbol (e.g. forget the =0 in an nsI*.h
interface file); then when the library load fails, the app either crashes or
exits, and sometimes will continue to do so for the first run or two after you
remove the registry file.

A clean build with all correct libraries (no unknown symbols) usually won't
demonstrate the problem; you have to have one or more libraries with missing
symbols or other problems.
this is a wily bug.  it seems that every other checkout and build can switch
behaviour, depending on how the libraries got linked.

we should keep this bug open until we are sure it is squashed.
More traffic on seamonkey-eng. Talked to dp to get a better understanding of
what is going on. ... Here it is as I understand it:

Client says PR_LoadLibrary(). The library being loaded itself needs some
library. The needed library has unresolved symbols. Subsequent calls to
PR_LoadLibrary() fails even if no error would occur.

dp suspected that dlerror() was not being called by NSPR after the first
error, that the man page says the error must be cleared by a call to dlerror()
before other dlopen() calls can succeed. By inspection, I believe we determined
that PR_LoadLibrary() does call dlerror() via DLLErrorInternal()for Linux after
dlopen() fails. Somebody check my work: ...nsprpub/pr/src/linking/prlink.c.

I'm gonna try to construct a test case that operates as described above to see
if I can reproduce the problem. ... Target: RH Linux 5.2, kernel 2.0.36. Will
that do it?
larry: yes, thats a good setup to test.

If I understood dp correctly, the problem is that if dlerror() returns a real
error (non NULL) it needs to be cleared before calling other dl functions.

Is this right, dp ?  According to the man page below, if dlerror() is called
following a dl call that resulted in an error, it will return NULL.

So, is this a bug in dlerror() ?  It doesnt behave as the man says ?

man dlerror:

       If dlopen fails for any reason, it returns NULL.  A  human
       readable  string  describing  the  most  recent error that
       occurred from any of the dl  routines  (dlopen,  dlsym  or
       dlclose) can be extracted with dlerror().  dlerror returns
       NULL if no errors have occurred  since  initialization  or
       since  it  was last called.  (Calling dlerror() twice con-
       secutively, will always result in the second call  return-
       ing NULL.)

One workaround to try might be to enable xpinstall (or other broken components)
on unix and try it on another platform.  Solaris with gcc 2.7 for example and
see if dlerror() is broken only on unix.
I was unable to reproduce the problem with my build from Friday April 9. Does
anyone have a reliable way to reproduce this?
Summary: Running the first time (without ~/.mozilla/registry) crashes both viewer and apprunner → [PP]Running the first time (without ~/.mozilla/registry) crashes both viewer and apprunner
Assignee: larryh → dveditz
Sigh. ... I have been unable to reproduce this.
I'm giving this back to dveditz.
NSPR now has its own Bugzilla product.  Moving this bug to the NSPR product.
Target Milestone: M6
This looks like it's working for me, too.
Putting this on M6 radar.
Assignee: dveditz → dp
This bug seems to have morphed into a dlopen() bug
Assignee: dp → larryh
Here is a way to reproduce this:

- cd intl/strres/src
- apply the following patch to nsStringBundle.cpp
- change this to #if 1
- gmake

Now if you run apprunner, you will see the problem.

Here is the patch. All it does is defines an undefined symbol in the component.

Index: nsStringBundle.cpp
RCS file: /cvsroot/mozilla/intl/strres/src/nsStringBundle.cpp,v
retrieving revision 1.12
diff -c -r1.12 nsStringBundle.cpp
*** nsStringBundle.cpp	1999/04/22 07:32:49	1.12
--- nsStringBundle.cpp	1999/05/01 14:16:52
*** 64,69 ****
--- 64,76 ----

+ #if 0
+   // XXX specially for larryh to
+   // XXX reproduce the linux dlopen() crash bug# 5795
+   extern int undefined_symbol;
+   undefined_symbol = 1;
+ #endif
    mProps = nsnull;

    nsINetService* pNetService = nsnull;
Target Milestone: M6 → M7
I dont see progress on this one.

Larry can we plan to get this in for M7.
If you need help like a tree to debug and stuff, let me know.
not likely to show in m6 release builds.
need to get this fixed as soon as we can in m7
Target Milestone: M7 → M8
would like to get this in m8.
Target Milestone: M8 → M9
Assignee: larryh → alecf
if this instance of the problem is seen on 5.2 only
alecf and leaf are posting the upgrade minimums that we need to
resolve this problem andm the doced in bug 8849.
and we can close this bug out.

we need to drop support for standard RH 5.2 installations.
Closed: 25 years ago
Resolution: --- → DUPLICATE
oh, HERE is this bug...yes, this is the same as 8849.
I'm marking dupe.

*** This bug has been marked as a duplicate of 8849 ***
Target Milestone: M9 → ---
You need to log in before you can comment on or make changes to this bug.