ldapssl_client_init hangs on Linux

RESOLVED WORKSFORME

Status

Directory
LDAP C SDK
--
critical
RESOLVED WORKSFORME
13 years ago
12 years ago

People

(Reporter: Kiran Singh, Assigned: mcs)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

13 years ago
Hi,
We are trying to run LDAP SSL client on Linux machine (RedHat AS 2.1). The 
sequence is
if (ssl)
{
ldapssl_client_init(certdbpath, NULL);
ld = ldapssl_init(host,port);
ldap_simple_bind(ld,who,passswd);
}

The program just waits indefinitely when we make the call ldapssl_client_init.

The certdbpath is set to the directory containing the cert7.db and key3.db. We 
are not planning to use client certificate.

We are using ldapsdk 5.0.8.
Is it a known problem?
What Linux Redhat versions are supported by LDAP SDK?
Has it been fixed in later versions?

Thanks
(Assignee)

Comment 1

13 years ago
I cannot remember hearing about this particular problem before.  I know that the
Netscape Directory products use the LDAP C SDK and are tested and supported on
RH AS.  The 5.08 release is kind of old now, but I am surprised the application
does not return quickly from ldapssl_client_init().  Can you run under a
debugger and get a stack trace?  I also recommend using a newer LDAP C SDK
release if possible (building from source code might be the best approach
because that will allow you to debug the problem).
Summary: ldapssl_client_init hangs on Linux → ldapssl_client_init hangs on Linux
(Reporter)

Comment 2

13 years ago
Thanks. Could you point me from where I can download the latest LDAP C SDK 
release? I will try with that first. 
IF that doesn't work either, then I will try to build from the sources and run 
it under debugger. Where can I get the sources as well?

BTW I tried the same program on a Linux RedHat ES Release 3 and the 
ldapssl_client_init call works.
(Assignee)

Comment 3

13 years ago
We are working on making binary releases of the LDAP C SDK available from
ftp.mozilla.org with regular frequency, but the only builds available now are
very very old.

The recent Netscape Directory Server releases (6.x) ship with recent LDAP C SDK
binaries, so one approach is to grab libraries from Netscape's server package. 
But that assumes you are a Netscape customer.

Here are the instructions for building from source:

  http://www.mozilla.org/directory/csdk.html
(Reporter)

Comment 4

13 years ago
Hi, 
I downloaded the C ldap sdk sources from the site you mentioned and built it on 
Linux. I ran into some build problem while building the ldap sdk and I had to 
make some changes in the make file. (it was looking for files in 
mozilla/dist/bin which were in mozilla/dist/Linux..../bin, so I made the path 
DIST point to there).

Anyways, after successful building, I linked the new libraries with my program 
and it still hangs in the call ldapssl_client_init. I have narrowed it down to 
SECMOD_LoadModule in nss_Init() function. It hangs in that call. 
This is the portion of the function nss_Init() in nssinit.c
...

    moduleSpec = PR_smprintf("name=\"%s\" parameters=\"configdir='%s' 
certPrefix=
'%s' keyPrefix='%s' secmod='%s' flags=%s %s\" 
NSS=\"flags=internal,moduleDB,modul
eDBOnly,critical\"",
                pk11_config_name ? pk11_config_name : NSS_DEFAULT_MOD_NAME,
                lconfigdir,lcertPrefix,lkeyPrefix,lsecmodName,flags,
                pk11_config_strings ? pk11_config_strings : "");

loser:
    PORT_Free(flags);
    if (lconfigdir) PORT_Free(lconfigdir);
    if (lcertPrefix) PORT_Free(lcertPrefix);
    if (lkeyPrefix) PORT_Free(lkeyPrefix);
    if (lsecmodName) PORT_Free(lsecmodName);

    printf("nss_init, moduleSpec=%s\n", moduleSpec);
    if (moduleSpec) {
        SECMODModule *module = SECMOD_LoadModule(moduleSpec,NULL,PR_TRUE);
        PR_smprintf_free(moduleSpec);
        if (module) {
            if (module->loaded) rv=SECSuccess;
            SECMOD_DestroyModule(module);
        }
    }
....

This is the moduleSpec value before going in there:
moduleSpec=name="NSS Internal Module" parameters="configdir='/usr/ar601
/frame/bin' certPrefix='' keyPrefix='' secmod='secmod.db' flags=readOnly " 
NSS="flags=internal,moduleDB,moduleDBOnly,critical"

The directory /usr/ar601/frame/bin contains the cert7.db and key3.db file. Does 
it need secmod.db file too? 

I have tried with the dummy secmod.db file but same results.

I have got all the db files(cert7.db, key3.db and secmod.db file) by installing 
mozilla browser.

How do I know these are the latest sources? how do I find out the SDK version 
from the sources?
 
What do you suggest we do to fix the problem?

(Assignee)

Comment 5

13 years ago
I am not sure why the code is hanging inside SECMOD_LoadModule().  If you can
provide a test program, I could take a look and try to debug the problem.  I
think it is a good idea to have a secmod.db but I thought one would be created
if you did not have one (NSS should not hang in any case).  I wonder if there is
some kind of threading issue/conflict happening (between your application code
and the LDAP C SDK, NSS, or NSPR code)?

As for whether you have the latest code, if you checked out the trunk code using
CVS by following the instructions on the LDAP C SDK build page you have the
latest code.  There is no version number hard-coded in the source code; it is
set at build time by the make process.
(Reporter)

Comment 6

13 years ago
I have further narrowed down the problem. In SECMOD_LoadModule, it recurses to 
load each module found in the database file and the hang happens when the 
modulespec is this: 

modulespec= name="NSS Internal PKCS #11 Module" parameters="co
nfigdir='/home/ardev/ksingh/ldap' certPrefix='' keyPrefix='' secmod='secmod.db' 
f
lags=readOnly " NSS="trustOrder=75 cipherOrder=100 slotParams={0x00000001=
[slotFl
ags=RSA,RC4,RC2,DES,DH,SHA1,MD5,MD2,SSL,TLS,AES,RANDOM askpw=any timeout=30 ] }

It goes into SECMOD_LoadPKCS11Module (pk11load.c) and it hangs while calling 
C_Initialize(line 232)in the piece of code below:
....
    if (PK11_GETTAB(mod)->C_Initialize(&secmodLockFunctions) != CKR_OK) {
printf("loadPKCS11module, 12\n");
        mod->isThreadSafe = PR_FALSE;
        if (PK11_GETTAB(mod)->C_Initialize(NULL) != CKR_OK) goto fail;
    }

Now good news is that the example program ssnoauth.c with the same database 
files runs fine. So your suspicion related to threading conflict may be true.

Our program is a DLL or shared object(on unix platforms) which statically links 
to LDAP SDK lib. But our own dll is dynamically linked using loadlibrary in a 
multithreaded process. Do you know of any problem in such environment? I would 
appreciate any help in this matter.

Thanks

(Assignee)

Comment 7

13 years ago
This all sounds VERY familiar to me but I can't pull the details out of my brain
at the moment.  I assume the NSS and NSPR libraries are loaded from DLLs (I
don't think you canstatically link with them).

I added Wan-Teh to the Cc in case he knows about this problem or remembers
better than I do.  Seems like something that is either a known NSPR/NSS problem
or that has been fixed, but I can't remember for sure.

Comment 8

13 years ago
I can't think of any explanation for this bug.
Sorry.

One difference between Red Hat AS 2.1 and
Red Hat Enterprise Linux 3 is the pthread
library.  But the problems usually have to
do with process creation and signal handling.
I don't know of any problems with "loadlibrary"
in a multithreaded process on Red Hat AS 2.1
(Reporter)

Comment 9

13 years ago
Ok, I have finally managed to get the stack trace form the hanging thread. 
Please let me know what you can make out of this. 
Thanks

#0  0x400e4c85 in sigsuspend () from /lib/libc.so.6
#1  0x401fdc19 in pthread_kill_other_threads_np () from /lib/libpthread.so.0
#2  0x401fd1ec in pthread_create () from /lib/libpthread.so.0
#3  0x401fdb05 in pthread_kill_other_threads_np () from /lib/libpthread.so.0
#4  0x40171dda in execve () from /lib/libc.so.6
#5  0x40172255 in execvp () from /lib/libc.so.6
#6  0x410cdf18 in safe_popen () from /usr/lib/libsoftokn3.so
#7  0x410ce141 in RNG_SystemInfoForRNG () from /usr/lib/libsoftokn3.so
#8  0x410beb0b in nsc_CommonInitialize () from /usr/lib/libsoftokn3.so
#9  0x410bebd2 in NSC_Initialize () from /usr/lib/libsoftokn3.so
#10 0x4083c02f in SECMOD_LoadPKCS11Module (mod=0x8168748) at pk11load.c:232
#11 0x4083e9c0 in SECMOD_LoadModule (
    modulespec=0x81684e0 "library= name=\"NSS Internal PKCS #11 Module\" 
paramete
rs=\"configdir='/home/ardev/ksingh/ldap/csdk' certPrefix='' keyPrefix='' 
secmod='
secmod.db' flags=readOnly \" NSS=\"Flags=internal,critical trustOrder"...,
    parent=0x8168200, recurse=1) at pk11pars.c:306
#12 0x4083ea26 in SECMOD_LoadModule (
    modulespec=0x81679b0 "name=\"NSS Internal Module\" 
parameters=\"configdir='/h
ome/ardev/ksingh/ldap/csdk' certPrefix='' keyPrefix='' secmod='secmod.db' 
flags=r
eadOnly \" NSS=\"flags=internal,moduleDB,moduleDBOnly,critical\"", parent=0x0,
    recurse=1) at pk11pars.c:319
#13 0x40819a64 in nss_Init (configdir=0x81677e8 "/home/ardev/ksingh/ldap/csdk",
    certPrefix=0x8167810 "", keyPrefix=0x0, secmodName=0x407c8629 "secmod.db",
    readOnly=1, noCertDB=0, noModDB=0, forceOpen=0, noRootInit=0,
    optimizeSpace=0) at nssinit.c:462
#14 0x40819caa in NSS_Initialize (
    configdir=0x81677e8 "/home/ardev/ksingh/ldap/csdk",
    certPrefix=0x8167810 "", keyPrefix=0x0, secmodName=0x407c8629 "secmod.db",
    flags=1) at nssinit.c:543
#15 0x407c677e in ldapssl_basic_init (
    certdbpath=0x813d7a8 "/home/ardev/ksingh/ldap/csdk", keydbpath=0x0,
    secmoddbpath=0x407c8629 "secmod.db") at clientinit.c:236
#16 0x407c68bc in ldapssl_clientauth_init (
    certdbpath=0x813d7a8 "/home/ardev/ksingh/ldap/csdk", certdbhandle=0x0,
    needkeydb=0, keydbpath=0x0, keydbhandle=0x0) at clientinit.c:355
#17 0x407c6b3f in ldapssl_client_init (
    certdbpath=0x813d7a8 "/home/ardev/ksingh/ldap/csdk", certdbhandle=0x0)
    at clientinit.c:531
#18 0x405ad410 in initLDAPSSL (host=0x813d700 "bala.eng.remedy.com", port=636,
    certPath=0x813d7a8 "/home/ardev/ksingh/ldap/csdk", ioTimeout=40)
    at arealdap.c:299
#19 0x405ae5a6 in AREAVerifyLoginCallback (object=0x813a938,
    user=0xbf1ff12c "user3", password=0xbf1ff10c "user3",
    networkAddr=0xbf1fefec "10.40.22.153", authString=0xbf1ff00c "",
    response=0xbf1fefe0) at arealdap.c:1131
#20 0x08091f0b in AREAVerifyLogin (argument=0xbf1ff61c, result=0xbf1ff22c)
    at eaimpl.c:365
#21 0x08092b6e in arexternalserverrequest_3_svc (argument=0xbf1ff61c,
    result=0xbf1ff22c, rqstp=0x813d5d8) at eaimpl.c:692
#22 0x0804efa2 in HandleRPCs (queueIndex=3 '\003') at linux/arrpcsvc.c:777
#23 0x080a5e7a in WorkerThread (argument=0x8135c78) at wkrmain.c:302
#24 0x080b8e02 in RestartableThreadMain (argument=0x81363f0)
    at ../../common/share/threaded/threadux.c:1096
#25 0x080b8d9e in UnixThreadStartRoutine (args=0x8136438)
    at ../../common/share/threaded/threadux.c:1061
#26 0x401fafaf in pthread_exit () from /lib/libpthread.so.0

Comment 10

13 years ago
Kiran, thanks a lot for the stack trace.
The safe_popen function creates a new
process, so it is likely to interfere
with the "LinuxThreads" library on Red
Hat AS 2.1.  However, I don't remember
getting a report of this hang before.

Is your main executable properly linked
with the pthread library?  Could you
invoke the "ldd" command on your main
executable and report the order between
libc.so and libpthread.so?  libpthread.so
must be before libc.so in the output of
the "ldd" command.

Comment 11

13 years ago
In mozilla/security/nss/lib/freebl/unix_rand.c, we have
the following code:
-------------------
/*
 * Bug 100447: On BSD/OS 4.2 and 4.3, we have problem calling safe_popen
 * in a pthreads environment.  Therefore, we call safe_popen last and on
 * BSD/OS we do not call safe_popen when we succeeded in getting data
 * from /dev/urandom.
 */

#ifdef BSDI
    if (bytes)
        return;
#endif
-------------------

As a workaround, you can remove "#ifdef BSDI" and "endif"
for the code you will run on Red Hat AS 2.1.
(Reporter)

Comment 12

13 years ago
Hi,

1. Here is the ldd output. arplugin is the main execuatble which links 
arealdap.so which is the shared library for ldap access. I have ldd output on 
both here:

[root@frame bin]# ldd arplugin
        libdl.so.2 => /lib/libdl.so.2 (0x4002c000)
        libarrpc.so => /usr/ar601/frame/bin/libarrpc.so (0x40030000)
        libstdc++-libc6.2-2.so.3 => /usr/lib/libstdc++-libc6.2-2.so.3 
(0x40050000)
        libm.so.6 => /lib/i686/libm.so.6 (0x40093000)
        libc.so.6 => /lib/i686/libc.so.6 (0x400b6000)
        libpthread.so.0 => /lib/i686/libpthread.so.0 (0x401f2000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x40208000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

[root@frame bin]# ldd arealdap.so
        libc.so.6 => /lib/i686/libc.so.6 (0x4006e000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x401aa000)
        libicuuc.so.20 => /usr/ar601/frame/bin/libicuuc.so.20 (0x401c0000)
        libicui18n.so.20 => /usr/ar601/frame/bin/libicui18n.so.20 (0x40227000)
        libnspr4.so => /usr/ar601/frame/bin/libnspr4.so (0x4030a000)
        libprldap50.so => /usr/ar601/frame/bin/libprldap50.so (0x4034f000)
        libldap50.so => /usr/ar601/frame/bin/libldap50.so (0x40354000)
        libssldap50.so => /usr/ar601/frame/bin/libssldap50.so (0x4038e000)
        libssl3.so => /usr/ar601/frame/bin/libssl3.so (0x4039a000)
        libnss3.so => /usr/ar601/frame/bin/libnss3.so (0x403cf000)
        libsoftokn3.so => /usr/ar601/frame/bin/libsoftokn3.so (0x40476000)
        libpthread.so.0 => /lib/i686/libpthread.so.0 (0x404f9000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x80000000)
        libicudt20l.so => /usr/ar601/frame/bin/libicudt20l.so (0x4050e000)
        libm.so.6 => /lib/i686/libm.so.6 (0x40cef000)
        libdl.so.2 => /lib/libdl.so.2 (0x40d12000)
        libplc4.so => /usr/ar601/frame/bin/libplc4.so (0x40d16000)
        libplds4.so => /usr/ar601/frame/bin/libplds4.so (0x40d1b000)

In the ldd output, the libpthread.so is listed after libc.so. What should I do 
to make it before?


2. Regarding your other response, in the stack trace that I provided, the 
program was running with the the linux libsoftokn3.so 
(/usr/lib/libsoftokn3.so). Then I linked and ran it with the libsoftokn3.so 
provided in the mozilla sources with the same results. So I am going to make 
the chnage in the unix_rand.c as suggested by you and see how it goes.

Thanks
(Reporter)

Comment 13

13 years ago
What should I make after I change unix_rand.c?

Comment 14

13 years ago
The best way to ensure proper linking order
is to use the compiler (gcc/g++) to link
your main executable, and pass the -pthread
(or -pthreads, I don't remember which) compiler
flag.

If you use ld to link your main executable,
you need to make sure that -lpthread appears
before -lc.

Then, use "ldd" to verify the linking order.

After you modify unix_rand.c, cd back to
the mozilla/security/nss/lib directory,
do a "gmake clean", and then "gmake".

Comment 15

13 years ago
The reason "gmake clean" is needed after unix_rand.c
is modified is that unix_rand.c is included by another
C file (an unusual practice), and our makefile does
not declare this dependency.  So a "gmake clean" is
needed to force that C file to be recompiled.
(Reporter)

Comment 16

13 years ago
After I changed the order of libpthread and libc in our executable makefile,   
the ldapssl_client_init() doesn't hang anymore. Phew!!!! Thank you so much!!!!

Should I still make the change in unix_rand.c ???

(Reporter)

Comment 17

13 years ago
Do you know why libpthread.so must be before libc.so in the linking order ?

Comment 18

13 years ago
No need to make the change to unix_rand.c.

The reason it's important to link with -lpthread
before -lc is that libc contains stub implementations
of many pthread functions (such as pthread_mutex_lock)
so that the same libc can be used in single-threaded
and multithreaded environment.  If the linking order
is wrong, your app will get some stub implementation
and some real implementation of the pthread functions.
That causes strange problems like this.  If libpthread
is before -lc, your app will get only the real
implementation of the pthread functions in libpthread.
(Reporter)

Comment 19

13 years ago
Thank you so much for the explanation.

On a different note, I am now running into ldaperr 81 which I get from 
ldap_simple_bind afterwards. I know that I am unable to configure the Active 
Directory server properly with the certificate. Is the error beacuse of that? 
Do you know how to configure AD server for the certificate? 

Thanks
Kiran
(In reply to comment #15)

I have filed bug 278132 on the NSS makefile dependency situation.
Please see it for a discussion of the history of this issue and a
proposed solution. 

(Assignee)

Comment 21

13 years ago
In regard to the original bug, the link order issue needs to be documented
somewhere.   Maybe RedHat or someone else in the Linux community has already
documented it.

In regard to the ldap error 81 (LDAP_SERVER_DOWN), you may very well get there
error if the client and server are unable to complete the SSL/TLS handshake.  I
do not know anything about configuring AD, but I suspect other people have done
what you are trying to do.  Have you searched for a solution?  If you are still
stuck, send me private email.

Comment 22

12 years ago
Can we close this bug?
(Assignee)

Comment 23

12 years ago
Resolving this bug as WORKSFORME (because the original problem was solved by changing the link order).
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.