Closed Bug 274968 Opened 20 years ago Closed 19 years ago

ldapssl_client_init hangs on Linux

Categories

(Directory :: LDAP C SDK, defect)

Other
Linux
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: kiran.singh, Assigned: mcs)

Details

Hi,
We are trying to run LDAP SSL client on Linux machine (RedHat AS 2.1). The 
sequence is
if (ssl)
{
ldapssl_client_init(certdbpath, NULL);
ld = ldapssl_init(host,port);
ldap_simple_bind(ld,who,passswd);
}

The program just waits indefinitely when we make the call ldapssl_client_init.

The certdbpath is set to the directory containing the cert7.db and key3.db. We 
are not planning to use client certificate.

We are using ldapsdk 5.0.8.
Is it a known problem?
What Linux Redhat versions are supported by LDAP SDK?
Has it been fixed in later versions?

Thanks
I cannot remember hearing about this particular problem before.  I know that the
Netscape Directory products use the LDAP C SDK and are tested and supported on
RH AS.  The 5.08 release is kind of old now, but I am surprised the application
does not return quickly from ldapssl_client_init().  Can you run under a
debugger and get a stack trace?  I also recommend using a newer LDAP C SDK
release if possible (building from source code might be the best approach
because that will allow you to debug the problem).
Summary: ldapssl_client_init hangs on Linux → ldapssl_client_init hangs on Linux
Thanks. Could you point me from where I can download the latest LDAP C SDK 
release? I will try with that first. 
IF that doesn't work either, then I will try to build from the sources and run 
it under debugger. Where can I get the sources as well?

BTW I tried the same program on a Linux RedHat ES Release 3 and the 
ldapssl_client_init call works.
We are working on making binary releases of the LDAP C SDK available from
ftp.mozilla.org with regular frequency, but the only builds available now are
very very old.

The recent Netscape Directory Server releases (6.x) ship with recent LDAP C SDK
binaries, so one approach is to grab libraries from Netscape's server package. 
But that assumes you are a Netscape customer.

Here are the instructions for building from source:

  http://www.mozilla.org/directory/csdk.html
Hi, 
I downloaded the C ldap sdk sources from the site you mentioned and built it on 
Linux. I ran into some build problem while building the ldap sdk and I had to 
make some changes in the make file. (it was looking for files in 
mozilla/dist/bin which were in mozilla/dist/Linux..../bin, so I made the path 
DIST point to there).

Anyways, after successful building, I linked the new libraries with my program 
and it still hangs in the call ldapssl_client_init. I have narrowed it down to 
SECMOD_LoadModule in nss_Init() function. It hangs in that call. 
This is the portion of the function nss_Init() in nssinit.c
...

    moduleSpec = PR_smprintf("name=\"%s\" parameters=\"configdir='%s' 
certPrefix=
'%s' keyPrefix='%s' secmod='%s' flags=%s %s\" 
NSS=\"flags=internal,moduleDB,modul
eDBOnly,critical\"",
                pk11_config_name ? pk11_config_name : NSS_DEFAULT_MOD_NAME,
                lconfigdir,lcertPrefix,lkeyPrefix,lsecmodName,flags,
                pk11_config_strings ? pk11_config_strings : "");

loser:
    PORT_Free(flags);
    if (lconfigdir) PORT_Free(lconfigdir);
    if (lcertPrefix) PORT_Free(lcertPrefix);
    if (lkeyPrefix) PORT_Free(lkeyPrefix);
    if (lsecmodName) PORT_Free(lsecmodName);

    printf("nss_init, moduleSpec=%s\n", moduleSpec);
    if (moduleSpec) {
        SECMODModule *module = SECMOD_LoadModule(moduleSpec,NULL,PR_TRUE);
        PR_smprintf_free(moduleSpec);
        if (module) {
            if (module->loaded) rv=SECSuccess;
            SECMOD_DestroyModule(module);
        }
    }
....

This is the moduleSpec value before going in there:
moduleSpec=name="NSS Internal Module" parameters="configdir='/usr/ar601
/frame/bin' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly " 
NSS="flags=internal,moduleDB,moduleDBOnly,critical"

The directory /usr/ar601/frame/bin contains the cert7.db and key3.db file. Does 
it need secmod.db file too? 

I have tried with the dummy secmod.db file but same results.

I have got all the db files(cert7.db, key3.db and secmod.db file) by installing 
mozilla browser.

How do I know these are the latest sources? how do I find out the SDK version 
from the sources?
 
What do you suggest we do to fix the problem?

I am not sure why the code is hanging inside SECMOD_LoadModule().  If you can
provide a test program, I could take a look and try to debug the problem.  I
think it is a good idea to have a secmod.db but I thought one would be created
if you did not have one (NSS should not hang in any case).  I wonder if there is
some kind of threading issue/conflict happening (between your application code
and the LDAP C SDK, NSS, or NSPR code)?

As for whether you have the latest code, if you checked out the trunk code using
CVS by following the instructions on the LDAP C SDK build page you have the
latest code.  There is no version number hard-coded in the source code; it is
set at build time by the make process.
I have further narrowed down the problem. In SECMOD_LoadModule, it recurses to 
load each module found in the database file and the hang happens when the 
modulespec is this: 

modulespec= name="NSS Internal PKCS #11 Module" parameters="co
nfigdir='/home/ardev/ksingh/ldap' certPrefix='' keyPrefix='' secmod='secmod.db' 
f
lags=readOnly " NSS="trustOrder=75 cipherOrder=100 slotParams={0x00000001=
[slotFl
ags=RSA,RC4,RC2,DES,DH,SHA1,MD5,MD2,SSL,TLS,AES,RANDOM askpw=any timeout=30 ] }

It goes into SECMOD_LoadPKCS11Module (pk11load.c) and it hangs while calling 
C_Initialize(line 232)in the piece of code below:
....
    if (PK11_GETTAB(mod)->C_Initialize(&secmodLockFunctions) != CKR_OK) {
printf("loadPKCS11module, 12\n");
        mod->isThreadSafe = PR_FALSE;
        if (PK11_GETTAB(mod)->C_Initialize(NULL) != CKR_OK) goto fail;
    }

Now good news is that the example program ssnoauth.c with the same database 
files runs fine. So your suspicion related to threading conflict may be true.

Our program is a DLL or shared object(on unix platforms) which statically links 
to LDAP SDK lib. But our own dll is dynamically linked using loadlibrary in a 
multithreaded process. Do you know of any problem in such environment? I would 
appreciate any help in this matter.

Thanks

This all sounds VERY familiar to me but I can't pull the details out of my brain
at the moment.  I assume the NSS and NSPR libraries are loaded from DLLs (I
don't think you canstatically link with them).

I added Wan-Teh to the Cc in case he knows about this problem or remembers
better than I do.  Seems like something that is either a known NSPR/NSS problem
or that has been fixed, but I can't remember for sure.
I can't think of any explanation for this bug.
Sorry.

One difference between Red Hat AS 2.1 and
Red Hat Enterprise Linux 3 is the pthread
library.  But the problems usually have to
do with process creation and signal handling.
I don't know of any problems with "loadlibrary"
in a multithreaded process on Red Hat AS 2.1
Ok, I have finally managed to get the stack trace form the hanging thread. 
Please let me know what you can make out of this. 
Thanks

#0  0x400e4c85 in sigsuspend () from /lib/libc.so.6
#1  0x401fdc19 in pthread_kill_other_threads_np () from /lib/libpthread.so.0
#2  0x401fd1ec in pthread_create () from /lib/libpthread.so.0
#3  0x401fdb05 in pthread_kill_other_threads_np () from /lib/libpthread.so.0
#4  0x40171dda in execve () from /lib/libc.so.6
#5  0x40172255 in execvp () from /lib/libc.so.6
#6  0x410cdf18 in safe_popen () from /usr/lib/libsoftokn3.so
#7  0x410ce141 in RNG_SystemInfoForRNG () from /usr/lib/libsoftokn3.so
#8  0x410beb0b in nsc_CommonInitialize () from /usr/lib/libsoftokn3.so
#9  0x410bebd2 in NSC_Initialize () from /usr/lib/libsoftokn3.so
#10 0x4083c02f in SECMOD_LoadPKCS11Module (mod=0x8168748) at pk11load.c:232
#11 0x4083e9c0 in SECMOD_LoadModule (
    modulespec=0x81684e0 "library= name=\"NSS Internal PKCS #11 Module\" 
paramete
rs=\"configdir='/home/ardev/ksingh/ldap/csdk' certPrefix='' keyPrefix='' 
secmod='
secmod.db' flags=readOnly \" NSS=\"Flags=internal,critical trustOrder"...,
    parent=0x8168200, recurse=1) at pk11pars.c:306
#12 0x4083ea26 in SECMOD_LoadModule (
    modulespec=0x81679b0 "name=\"NSS Internal Module\" 
parameters=\"configdir='/h
ome/ardev/ksingh/ldap/csdk' certPrefix='' keyPrefix='' secmod='secmod.db' 
flags=r
eadOnly \" NSS=\"flags=internal,moduleDB,moduleDBOnly,critical\"", parent=0x0,
    recurse=1) at pk11pars.c:319
#13 0x40819a64 in nss_Init (configdir=0x81677e8 "/home/ardev/ksingh/ldap/csdk",
    certPrefix=0x8167810 "", keyPrefix=0x0, secmodName=0x407c8629 "secmod.db",
    readOnly=1, noCertDB=0, noModDB=0, forceOpen=0, noRootInit=0,
    optimizeSpace=0) at nssinit.c:462
#14 0x40819caa in NSS_Initialize (
    configdir=0x81677e8 "/home/ardev/ksingh/ldap/csdk",
    certPrefix=0x8167810 "", keyPrefix=0x0, secmodName=0x407c8629 "secmod.db",
    flags=1) at nssinit.c:543
#15 0x407c677e in ldapssl_basic_init (
    certdbpath=0x813d7a8 "/home/ardev/ksingh/ldap/csdk", keydbpath=0x0,
    secmoddbpath=0x407c8629 "secmod.db") at clientinit.c:236
#16 0x407c68bc in ldapssl_clientauth_init (
    certdbpath=0x813d7a8 "/home/ardev/ksingh/ldap/csdk", certdbhandle=0x0,
    needkeydb=0, keydbpath=0x0, keydbhandle=0x0) at clientinit.c:355
#17 0x407c6b3f in ldapssl_client_init (
    certdbpath=0x813d7a8 "/home/ardev/ksingh/ldap/csdk", certdbhandle=0x0)
    at clientinit.c:531
#18 0x405ad410 in initLDAPSSL (host=0x813d700 "bala.eng.remedy.com", port=636,
    certPath=0x813d7a8 "/home/ardev/ksingh/ldap/csdk", ioTimeout=40)
    at arealdap.c:299
#19 0x405ae5a6 in AREAVerifyLoginCallback (object=0x813a938,
    user=0xbf1ff12c "user3", password=0xbf1ff10c "user3",
    networkAddr=0xbf1fefec "10.40.22.153", authString=0xbf1ff00c "",
    response=0xbf1fefe0) at arealdap.c:1131
#20 0x08091f0b in AREAVerifyLogin (argument=0xbf1ff61c, result=0xbf1ff22c)
    at eaimpl.c:365
#21 0x08092b6e in arexternalserverrequest_3_svc (argument=0xbf1ff61c,
    result=0xbf1ff22c, rqstp=0x813d5d8) at eaimpl.c:692
#22 0x0804efa2 in HandleRPCs (queueIndex=3 '\003') at linux/arrpcsvc.c:777
#23 0x080a5e7a in WorkerThread (argument=0x8135c78) at wkrmain.c:302
#24 0x080b8e02 in RestartableThreadMain (argument=0x81363f0)
    at ../../common/share/threaded/threadux.c:1096
#25 0x080b8d9e in UnixThreadStartRoutine (args=0x8136438)
    at ../../common/share/threaded/threadux.c:1061
#26 0x401fafaf in pthread_exit () from /lib/libpthread.so.0

Kiran, thanks a lot for the stack trace.
The safe_popen function creates a new
process, so it is likely to interfere
with the "LinuxThreads" library on Red
Hat AS 2.1.  However, I don't remember
getting a report of this hang before.

Is your main executable properly linked
with the pthread library?  Could you
invoke the "ldd" command on your main
executable and report the order between
libc.so and libpthread.so?  libpthread.so
must be before libc.so in the output of
the "ldd" command.
In mozilla/security/nss/lib/freebl/unix_rand.c, we have
the following code:
-------------------
/*
 * Bug 100447: On BSD/OS 4.2 and 4.3, we have problem calling safe_popen
 * in a pthreads environment.  Therefore, we call safe_popen last and on
 * BSD/OS we do not call safe_popen when we succeeded in getting data
 * from /dev/urandom.
 */

#ifdef BSDI
    if (bytes)
        return;
#endif
-------------------

As a workaround, you can remove "#ifdef BSDI" and "endif"
for the code you will run on Red Hat AS 2.1.
Hi,

1. Here is the ldd output. arplugin is the main execuatble which links 
arealdap.so which is the shared library for ldap access. I have ldd output on 
both here:

[root@frame bin]# ldd arplugin
        libdl.so.2 => /lib/libdl.so.2 (0x4002c000)
        libarrpc.so => /usr/ar601/frame/bin/libarrpc.so (0x40030000)
        libstdc++-libc6.2-2.so.3 => /usr/lib/libstdc++-libc6.2-2.so.3 
(0x40050000)
        libm.so.6 => /lib/i686/libm.so.6 (0x40093000)
        libc.so.6 => /lib/i686/libc.so.6 (0x400b6000)
        libpthread.so.0 => /lib/i686/libpthread.so.0 (0x401f2000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x40208000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

[root@frame bin]# ldd arealdap.so
        libc.so.6 => /lib/i686/libc.so.6 (0x4006e000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x401aa000)
        libicuuc.so.20 => /usr/ar601/frame/bin/libicuuc.so.20 (0x401c0000)
        libicui18n.so.20 => /usr/ar601/frame/bin/libicui18n.so.20 (0x40227000)
        libnspr4.so => /usr/ar601/frame/bin/libnspr4.so (0x4030a000)
        libprldap50.so => /usr/ar601/frame/bin/libprldap50.so (0x4034f000)
        libldap50.so => /usr/ar601/frame/bin/libldap50.so (0x40354000)
        libssldap50.so => /usr/ar601/frame/bin/libssldap50.so (0x4038e000)
        libssl3.so => /usr/ar601/frame/bin/libssl3.so (0x4039a000)
        libnss3.so => /usr/ar601/frame/bin/libnss3.so (0x403cf000)
        libsoftokn3.so => /usr/ar601/frame/bin/libsoftokn3.so (0x40476000)
        libpthread.so.0 => /lib/i686/libpthread.so.0 (0x404f9000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x80000000)
        libicudt20l.so => /usr/ar601/frame/bin/libicudt20l.so (0x4050e000)
        libm.so.6 => /lib/i686/libm.so.6 (0x40cef000)
        libdl.so.2 => /lib/libdl.so.2 (0x40d12000)
        libplc4.so => /usr/ar601/frame/bin/libplc4.so (0x40d16000)
        libplds4.so => /usr/ar601/frame/bin/libplds4.so (0x40d1b000)

In the ldd output, the libpthread.so is listed after libc.so. What should I do 
to make it before?


2. Regarding your other response, in the stack trace that I provided, the 
program was running with the the linux libsoftokn3.so 
(/usr/lib/libsoftokn3.so). Then I linked and ran it with the libsoftokn3.so 
provided in the mozilla sources with the same results. So I am going to make 
the chnage in the unix_rand.c as suggested by you and see how it goes.

Thanks
What should I make after I change unix_rand.c?
The best way to ensure proper linking order
is to use the compiler (gcc/g++) to link
your main executable, and pass the -pthread
(or -pthreads, I don't remember which) compiler
flag.

If you use ld to link your main executable,
you need to make sure that -lpthread appears
before -lc.

Then, use "ldd" to verify the linking order.

After you modify unix_rand.c, cd back to
the mozilla/security/nss/lib directory,
do a "gmake clean", and then "gmake".
The reason "gmake clean" is needed after unix_rand.c
is modified is that unix_rand.c is included by another
C file (an unusual practice), and our makefile does
not declare this dependency.  So a "gmake clean" is
needed to force that C file to be recompiled.
After I changed the order of libpthread and libc in our executable makefile,   
the ldapssl_client_init() doesn't hang anymore. Phew!!!! Thank you so much!!!!

Should I still make the change in unix_rand.c ???

Do you know why libpthread.so must be before libc.so in the linking order ?
No need to make the change to unix_rand.c.

The reason it's important to link with -lpthread
before -lc is that libc contains stub implementations
of many pthread functions (such as pthread_mutex_lock)
so that the same libc can be used in single-threaded
and multithreaded environment.  If the linking order
is wrong, your app will get some stub implementation
and some real implementation of the pthread functions.
That causes strange problems like this.  If libpthread
is before -lc, your app will get only the real
implementation of the pthread functions in libpthread.
Thank you so much for the explanation.

On a different note, I am now running into ldaperr 81 which I get from 
ldap_simple_bind afterwards. I know that I am unable to configure the Active 
Directory server properly with the certificate. Is the error beacuse of that? 
Do you know how to configure AD server for the certificate? 

Thanks
Kiran
(In reply to comment #15)

I have filed bug 278132 on the NSS makefile dependency situation.
Please see it for a discussion of the history of this issue and a
proposed solution. 

In regard to the original bug, the link order issue needs to be documented
somewhere.   Maybe RedHat or someone else in the Linux community has already
documented it.

In regard to the ldap error 81 (LDAP_SERVER_DOWN), you may very well get there
error if the client and server are unable to complete the SSL/TLS handshake.  I
do not know anything about configuring AD, but I suspect other people have done
what you are trying to do.  Have you searched for a solution?  If you are still
stuck, send me private email.
Can we close this bug?
Resolving this bug as WORKSFORME (because the original problem was solved by changing the link order).
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.