Open
Bug 286598
Opened 20 years ago
Updated 10 years ago
Access violation on unloading of the 5.08 Netscape Directory SDK for C when using SSL
Categories
(Directory :: LDAP C SDK, defect)
Tracking
(Not tracked)
NEW
People
(Reporter: amanda.bortolin, Assigned: mcs)
Details
Attachments
(3 files)
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)
Build Identifier:
I'm having an issue on Solaris where when we unload our shared library that
uses the Netscape SDK, the Netscape SDK shared libraries are still accessing
each other while they are being unloaded. For example, we unload our library,
then the libprldap50.so lib is unloaded by the os. Then we get an access
violation. After analyzing the core we find that the nspr4 lib is trying to
access the libprldap50 lib after it has been unloaded. We only seem to see
this behaviour when SSL is being used.
Here is a bit more info for you.
I did a pmap before our shared component is unloaded, a pmap of the core when
the access violation happened and a pstack of the core when the access
violation happened.
Here's what you can deduce from the logs.
pmap_before.txt:
libnspr4.so is loaded at 71630000
libprldap50.so is loaded at 70D60000
libsecLDAP.so is loaded at 6C280000 <- this is our shared lib that uses
the Netscape LDAP SDK
pmap_access_violation.txt:
libsecLDAP.so and libprldap50.so have been unloaded.
libnspr4.so is still loaded at 71630000
pstack_access_violation.txt
70d62930 ???????? (f71f80, 0, 1, 0, 14, 7166fa58)
716570e8 _pt_thread_death (f71f80, 71657064, 71670094, 7f4523ac, 0, 82) + 84
7fb653d0 tsd_exit (0, 0, 7fb15254, 70ee1860, 15a60, 0) + 70
7fb58bdc _thrp_exit (727f0800, 203dc, 0, 0, 14e54, 7fb152f4) + 68
7fb57c4c _t_cancel (0, 7fb7a074, 0, 7fb657b4, 1, 7fb937f8) + 2c
7fb657b4 _lwp_start (0, 0, 0, 0, 0, 0)
Looks like the npsr4 lib is trying to access something where the prldap50 lib
USED to be loaded.
I assume what is happening is that all the Netscape SDK shared libs are loaded
by the os when our libsecLDAP.so lib is loaded (since we link against them).
When our shared lib is unloaded, the os starts unloading the Netscape SDK libs
but they don't realize they are being unloaded and continue doing things.
I have also seen this on windows but it is harder to reproduce. Remember, this
only *seems* to happen with ssl turned on.
Reproducible: Sometimes
Steps to Reproduce:
1. load a program that uses the Netscape SDK (5.08) and initialize SSL
2. unload the program
3. see access violation on unloading
Actual Results:
access violation
Expected Results:
unload all SDK component successfully.
Assignee | ||
Comment 4•20 years ago
|
||
I added Wan-Teh and Rich to the bug CC (Wan-Teh is one of the leads for NSPR and
NSS issues and Rich is one of the leads on Netscape->Red Hat Directory Server).
I agree with your analysis, but I am not sure why this is happening. Neither
libprldap nor libldap create any threads. Perhaps you are unloading the
libraries before all LDAP-related threads have exited? I think that would cause
this kind of crash because libprldap installs a NSPR thread private data
destructor function which gets called when a thread exits. See:
http://lxr.mozilla.org/mozilla/source/directory/c-sdk/ldap/libraries/libprldap/ldappr-threads.c#390
and:
http://lxr.mozilla.org/mozilla/source/directory/c-sdk/ldap/libraries/libprldap/ldappr-threads.c#610
Also make sure you have exactly one ldap_unbind() call for each successful call
to ldap_init().
I have added tracing to my code and have confirmed that we are doing exactly one
successful unbind for every successful ldapssl_init before we unload.
I've also narrowed this down so that it only happens when ssl is configured with
mutual authentication. It does not seem to happen when we using ssl configured
with server authentication.
Not sure where to go from here... I can't think of anything else to try. Help?
Amanda
(In reply to comment #4)
> I added Wan-Teh and Rich to the bug CC (Wan-Teh is one of the leads for NSPR and
> NSS issues and Rich is one of the leads on Netscape->Red Hat Directory Server).
>
> I agree with your analysis, but I am not sure why this is happening. Neither
> libprldap nor libldap create any threads. Perhaps you are unloading the
> libraries before all LDAP-related threads have exited? I think that would cause
> this kind of crash because libprldap installs a NSPR thread private data
> destructor function which gets called when a thread exits. See:
>
>
http://lxr.mozilla.org/mozilla/source/directory/c-sdk/ldap/libraries/libprldap/ldappr-threads.c#390
>
> and:
>
>
http://lxr.mozilla.org/mozilla/source/directory/c-sdk/ldap/libraries/libprldap/ldappr-threads.c#610
>
> Also make sure you have exactly one ldap_unbind() call for each successful call
> to ldap_init().
Assignee | ||
Comment 6•20 years ago
|
||
Have you tried calling PR_Cleanup() before unloading the LDAP/NSPR/NSS
libraries? See:
http://lxr.mozilla.org/seamonkey/source/nsprpub/pr/src/misc/prinit.c#363
No, because I'm not using nspr directly (I don't link against it or include any
of it's headers). It looks like it is the LDAP directory libraries that are
using NSPR. Shouldn't they call PR_Cleanup() on shutdown?
(In reply to comment #6)
> Have you tried calling PR_Cleanup() before unloading the LDAP/NSPR/NSS
> libraries? See:
> http://lxr.mozilla.org/seamonkey/source/nsprpub/pr/src/misc/prinit.c#363
Assignee | ||
Comment 8•20 years ago
|
||
Yes, in theory libprldap or libssldap could call PR_Cleanup(). The problem is
that many applications (such as the Mozilla browsers and e-mail clients) use
NSPR for other things as well, so it would be bad if the LDAP code called
PR_Cleanup() unexpectedly. We could add a prldap_cleanup() call that calls
PR_Cleanup() but I do not at the moment see a safe way to automatically call
PR_Cleanup() for you. In any case it would be really good to know if calling
PR_Cleanup() fixes the problem you are seeing.
Comment 9•20 years ago
|
||
A known workaround for this probllem is to leak the NSPR shared object by making
an extra explict dlopen() on it without a corresponding dlclose(). This will
cause NSPR to never be unloaded, and will hide this problem.
Comment 10•20 years ago
|
||
Mark,
PR_Cleanup currently has several problems. I consider it to be incomplete, it
doesn't do a full cleanup of threads and other resources allocated by NSPR.
Therefore, even if you call it in the ldap shutdown function, I don't think it
will help much with this problem. See bugs 254987 , 254983 , 255452 for more
info . Until those are fixed, I think it is best never to call PR_Cleanup and
keep the NSPR shared object loaded for the life of the process.
Reporter | ||
Comment 11•20 years ago
|
||
Is this a known issue only on Solaris? Or is all platforms (Unix and windows?)?
I'm not very keen on leaking the shared object. Is there a fix planned for this?
(In reply to comment #9)
> A known workaround for this probllem is to leak the NSPR shared object by making
> an extra explict dlopen() on it without a corresponding dlclose(). This will
> cause NSPR to never be unloaded, and will hide this problem.
>
Comment 12•20 years ago
|
||
It's an issue for all platforms. To fix it would require :
1) to complete the implementation of PR_Cleanup
2) to expose APIs through the LDAP SDK to decide when to call PR_Cleanup
Currently, I'm not aware of plans to fix 1), and 2) depends on 1) .
Assignee | ||
Comment 13•20 years ago
|
||
I assume you do not have control over the unloading process (in which case you
could just not unload your library). Or maybe you find that to be unacceptable
for VM footprint reasons or some other reason.
Another option would be to avoid NSPR, which will be some work if you want to
support SSL (I think you would have to avoid NSS in that case as well and use
something like OpenSSL).
Comment 14•20 years ago
|
||
Mark,
I admit I don't know much about the LDAP SDK code, but doesn't it rely on
NSPR/NSS for other things than SSL sockets ? Eg. OS abstraction of threading,
locking, etc. I can't imagine how you would be able to rid yourself of NSPR
unless much of the code was rewritten specifically for a particular OS, which
IMO would be a step backwards.
Reporter | ||
Comment 15•20 years ago
|
||
I'm a little bit confused. How will leaking the NSPR shared object fix this
problem? The core dump occurs when NSPR calls into prldap50 after prldap50 has
been unloaded. Does this mean I have to leak prldap50 as well?
(In reply to comment #9)
> A known workaround for this probllem is to leak the NSPR shared object by making
> an extra explict dlopen() on it without a corresponding dlclose(). This will
> cause NSPR to never be unloaded, and will hide this problem.
>
Comment 16•20 years ago
|
||
You may have to leak both. Try just leaking NSPR, and if that doesn't work,
leaking the SDK DLL as well. The stack of your crash doesn't show an actual
callback into the SDK so it may not be necessary.
Another possible workaround is to terminate all the threads in your application
that used NSPR and the SDK before you unload your shared library. This would
prevent the NSPR thread termination callback code from being invoked after NSPR
or the SDK are unloaded.
Assignee | ||
Comment 17•20 years ago
|
||
Regarding Comment #14, the core libldap code does not directly link with NSPR or
NSS. Two separate shared libraries, libprldap (which links with NSPR) and
libssldap (links with NSS), are used if an application wants to use NSPR or SSL.
libssldap has a dependency on libprldap. But the core libldap simply allows
optional callback functions to be installed to handle things like I/O and thread
safety. We have maintained the core libldap library this way because some
applications do not want to or can't use NSPR or NSS.
Assignee | ||
Comment 18•19 years ago
|
||
Amanda, any update on this bug?
Comment 19•19 years ago
|
||
Any update? We have a window of opportunity now to work on ldap csdk issues, so please let us know.
You need to log in
before you can comment on or make changes to this bug.
Description
•