Closed Bug 352673 Opened 18 years ago Closed 18 years ago

No way to safely cleanup in libssldap before NSS_Shutdown

Categories

(Directory :: LDAP C SDK, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: prattm, Assigned: neuroc0der)

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6
Build Identifier: http://www.mozilla.org/directory

There currently is no public API in libssldap to cleanup references to NSS objects, making calls to NSS_Shutdown by client apps unsafe.

Reproducible: Always

Steps to Reproduce:
The following code initializes an SSL session with SASL/EXTERNAL authentication:

ldapssl_clientauth_init(...);
ldapssl_init(...);
ldapssl_enable_clientauth(...);
ldap_sasl_bind_s(...);

If you try calling NSS_Shutdown and re-peforming the above steps, NSS_Shutdown always returns error SEC_ERROR_BUSY, and the libldapssl calls (in my case it was ldapssl_enable_clientauth) causes a segmentation fault.

I am running LDAP C SDK 5.17 with NSS 3.7.3 and NSPR 4.2.2 on Solaris 8.
Actual Results:  
Segmentation fault inside ldapssl_enable_clientauth

Expected Results:  
ldapssl_enable_clientauth, nor any libldapssl functions, should seg fault after NSS_Shutdown.

There is a thread in mozilla.dev.tech.ldap titled "Re-reading certificates at runtime" that discusses this further.
I have a somewhat different perspective on this RFE.

The sequence of calls in comment 0 shows that a number of ldap calls
are made to initialize state, which calls also initialize NSS.
The bug reporter wishes to start over, re-initializing NSS, for the 
purpose of picking up a different (new or modified) set of cert/key DBs.

IMO, architecturally speaking, the right way to do this is for his code to 
shut down libldap, effectively undoing the initialization steps shown above,
preferrably in reverse order, and have libldap shut down NSS in the course
of those steps.  Only libldap knows what handles/references it is holding
to NSS objects, and of course all those handles/references must be released/
destroyed before shutting down NSS.  So, rather than the user trying to 
shutdown NSS directly, tearing out the foundation on which libldap sits,
without libldap's participation, it would be better if libldap facilitated
that shutdown.  

I'd also like to suggest that methods be added/enhanced to facilitate the
caller specification of such things as:
a) DB file name prefixes
b) read-only vs read-write attribute of NSS

Someone suggested to the bug reporter than he might initialize NSS himself
before calling ldapssl_clientauth_init.  I guess that works with the present
implementation (the one in /directory/suncsdk/c-sdk), but it seems fragile.
It will break if libldap's implementation changes in ways that it ought
reasonably to be able to change.

One related comment: it appears to me that if ldapssl_enable_clientauth is 
called a second time on the same LDAP struct, it will leak the previously
set nickname and password strings.  I don't know whether it should 
(a) detect a second call and refuse to change the established values, or 
(b) free the previously used values on a second call.
(In reply to comment #1)
> ...  So, rather than the user trying to 
> shutdown NSS directly, tearing out the foundation on which libldap sits,
> without libldap's participation, it would be better if libldap facilitated
> that shutdown.

I agree with this.  If you have destroyed all LDAP * session handles, most every NSS resource should be released.  But apparently not all.


> I'd also like to suggest that methods be added/enhanced to facilitate the
> caller specification of such things as:
> a) DB file name prefixes
> b) read-only vs read-write attribute of NSS

OK, but the original intent was that (for libldap) NSS initialization should be separable from the layering of LDAP over SSL/TLS.  I am not sure we have really kept it separate, but more advanced applications should in theory be able to write their own NSS initialization code (directly calling NSS APIs) and still use functions like ldapssl_init(), ldap_start_tls_s(), ldapssl_install_routines(), and ldapssl_enable_clientauth().
 

> Someone suggested to the bug reporter than he might initialize NSS himself
> before calling ldapssl_clientauth_init.  I guess that works with the present
> implementation (the one in /directory/suncsdk/c-sdk), but it seems fragile.
> It will break if libldap's implementation changes in ways that it ought
> reasonably to be able to change.

If this is true that is something we should try to fix in libldap.  I don't think the initialization vs. "use LDAP over SSL" dependencies are very tight, but I am not 100% sure.  


> One related comment: it appears to me that if ldapssl_enable_clientauth is 
> called a second time on the same LDAP struct, it will leak the previously
> set nickname and password strings.  I don't know whether it should 
> (a) detect a second call and refuse to change the established values, or 
> (b) free the previously used values on a second call.

I would say (a).  Note that most applications just get a new LDAP * handle when they need to make a new connection or reauthenticate.  But the reporter of this bug is trying to use the LDAP reconnect option, which is not as widely used.
(In reply to comment #2)

> > One related comment: it appears to me that if ldapssl_enable_clientauth is 
> > called a second time on the same LDAP struct, it will leak the previously
> > set nickname and password strings.  I don't know whether it should 
> > (a) detect a second call and refuse to change the established values, or 
> > (b) free the previously used values on a second call.
> 
> I would say (a).  Note that most applications just get a new LDAP * handle when
> they need to make a new connection or reauthenticate.  But the reporter of this
> bug is trying to use the LDAP reconnect option, which is not as widely used.

Errr.... I mean to say (b) (free the old values and start using the new ones).  Sorry.
i dont like this approach of adding yet another libssldap call, as a wrapper
to NSS_Shutdown() or what else in NSS. my reasons for this are fairly simple:

- i expect NSS to handle NSS state. the user can initiate shutdown according
  to NSS guidelines and libssldap can get notified via NSS_RegisterShutdown
  handler to do its own cleanup before NSS shuts itself down.

- since some libssldap users use NSS not only for LDAP but also for other
  things in their applications introducing yet another wrapper makes things
  more complex and less transparent for them. ideally i would say libssldap
  here should be as transparent as it can be which is not the case today.

- the main interest here is to make libssldap 'restartable' after NSS shut
  down ie clean up its own state with minimal user interaction. we already
  have quite a few wrappers that were the source of problems in the past,
  sometimes due to poor design and sometimes due to changes in NSS.

the bottom line for me is this: if we can use NSS_RegisterShutdown() to
register our handler to do the cleanup i think we should use it and keep
the things simple and transparent. if we cannot use NSS_RegisterShutdown
then we gonna have to introduce some public wrapper that gonna perform
our own cleanup plus NSS cleanup upon the users request.
Architecurally, either 
a) libldap should BOTH initialize NSS and shutdown NSS for the appliction, or 
b) libldap should NEITHER initialize NSS nor shut it down, and the appliation
should do those things itself.  

Since ldap now initializes NSS for the app, it should also do the shutdown, IMO.
(In reply to comment #5)

> Since ldap now initializes NSS for the app, it should also do the shutdown,
> IMO.

point taken. can you please comment on the following stuff i did ask earlier 
today :

- NSS_RegisterShutdown() can be used to hookup my own callback
  so that its called upon NSS_Shutdown() invocation, prior to
  actual shutdown taking place. anything i should be aware of? 
  like limitations etc ?

- SSL_ClearSessionCache() shall be called prior to NSS_Shutdown
  i been told. tho it is my understanding SSL_ClearSessionCache
  is pretty harmless i'd like to understand if it is really is
  a prerequisite to NSS_Shutdown and if so any specific details
  explaining why.

- SSL_ClearSessionCache() can be safely called from NSS_Register
  Shutdown handler. true or false ? 

and can you also expand on proper NSS shutdown sequence as it 
seems to me now that more than one NSS/SSL call should be 
invoked to do that nice and clean.

regardless of which approach we end up taking this is important
for us to understand the logic of NSS shutdown and what steps
should be taken in what order.
ok, heres somewhat dodgy woraround for now :

        ldap_unbind(ld);
        SSL_ClearSessionCache();
        NSS_Shutdown();
        NSS_Initialize(MY_CERTDB_PATH,NULL,NULL,NULL,NSS_INIT_READONLY);

Rich was absolutely right saying that SSL_ClearSessionCache() has to be
called prior to NSS_Shutdown(). i dunno why NSS shutdown does not clean
up its own stuff like cache in this case but whatever explanation is we
gonna have to call SSL_ClearSessionCache() or ask users to do so for us.

now i did try above workaround in the tight loop where ldapssl_* init
functions are called then we bind over SSL to make sure it works then
the workaround sequence executed then goto ldapssl_* init over again.

funny thing is we actually do cleanup after ourselves in libssldap when
ldap_unbind() gets called. it does clean up via ldapssl_disposehandle()
for the session info and ldapssl_close() for the socket so its all rosy.

the reason NSS_Initialize() has to be called explicitly by the user is 
because we have that "inited" global there which protects basic_init 
from happening more than once right now. 

now regarding leaks. with above workaround running in the loop and lib
umem audit setup i found to my surprise that we dont actually leak any
thing there but NSS does, via NSS_Initialize(), big time. right now i
have absolutely no idea if i have to call any additional NSS/SSL/etc
functions upon NSS shutdown to prevent those NSS_Initialize leaks. on 
the other hand my testcase doesnt represent any real life scenario coz
its extremely unlikely any app gonna execute that many shutdown/init
seqs in the loop. if one call shutdown/init only every once in a while
i doubt those leaks gonna make much difference, still they are leaks
and we need to figure out how to get rid of them if its in our scope.

thats it for now. next i'm gonna find out about restarting basic_init
either via NSS_RegisterShutdown hook or new pub api [whichever works]
.
this is preliminary version of the fix. posting it so you can have a look
at general approach and advise on what else should be taken into acount
here. i'm still looking for any leftovers caused by re-inits and via diff 
ldapssl init functions to cleap up and checking if we have to do anything
special re NSPR and if we need some giant lock there to protect cleanup.

as you can see from the diff we can now handle shutdown and re-init when
users calling either new public api ldapssl_shutdown() OR users calling 
NSS_Shutdown() directly. handling both is pretty flexible i reckon.
Anton,

NSS_Shutdown can't call SSL_ClearSessionCache because this would introduce a circular dependency between shared libraries - libssl3 depends on libnss3, so libnss3 cannot call a libssl3 function .
thanx again for your help Julien! can you please advise of what else has
to be called [NSS/SSL cleanup functions i mean] prior to NSS_Shutdown()
in our case as there seems to be quite a few leaks in NSS after re-init.
i can post/send libumem ::findleaks/bufctl output if thats of any help.

(In reply to comment #9)
> Anton,
> 
> NSS_Shutdown can't call SSL_ClearSessionCache because this would introduce a
> circular dependency between shared libraries - libssl3 depends on libnss3, so
> libnss3 cannot call a libssl3 function .
> 

ok, i did run a few tests and it seems we are clean here with this patch
so far. 


- NSPR cannot be inited twice so all subsequent reinits will be noops. 

- there was a problem with NSS_RegisterShutdown() which i believe Julien
  just fixed. it wont apply to us directly with current NSS anyway coz 
  we are probably the last to call NSS_RegisterShutdown() in the NSS 
  chain. see https://bugzilla.mozilla.org/show_bug.cgi?id=353608

- sent all the info about NSS leaks upon reinit to Julien. most of them
  are known to NSS team and in the process of being plugged.

- there is nothing else we can call to help NSS shutdown/cleanup here so
  at this point i believe its done properly in the patch.

- locking. i'm still not sure about this and need your input here. since
  there is a potential for a race for "inited" gobal and un/register of
  the handler i feel like this should be locked somehow so neither init
  nor shutdown can take place while the other one still running. my take
  on it is to introduce some libssldap specific giant lock to take care
  of that but on the other hand its clear that any app should wait for
  either ldapssl_shutdown() or NSS_Shutdown() to return success before
  calling init sequence [say it does so on diff thread] and odds of this
  race condition happening are kinda low. what are your expectations ??

apart from locking issue i think this patch is ready for review and ci
if its any good. so please do review it and tell me what you think re
locking. again, many thanx to Julien for all the help.
R.e. locking, I think it is OK to make that issue the application's problem in this case.  Initialization of libraries is often confined to one thread (if they are shutting down libldap/libssldap/NSS and are using more than one thread, the application will need to coordinate among threads anyway).

(In reply to comment #12)

thanx Mark! what do you think about the rest of the patch ? is there
anything else i need to address ? if not i wanna check it in on sun_
merge branch and move on.
 
Re: NSS_Shutdown() and cache clearing - if there were a function SSL_Shutdown() it could also clear the caches and perform the NSS_Shutdown().  But AFAIK there's not so we have to clear the caches explicitly.

I think the code looks good.
(In reply to comment #14)

thanx for review Rich! perhaps we should file an RFE against NSS to provide
SSL_Shutdown(). its unclear that one has to use SSL_ClearSessionCache() in
order to clean up SSL lib state, critical to let NSS_Shutdown() go: neither 
docs nor header comments in NSS indicate that. maybe NSS can benefit from
SSL_Shutdown as some sort of aggregator 'cleanup on user request' function
that can take care of session cache and whatever else ? in any case Julien
and Nelson are on cc here so they can comment if its a good idea or not.

there is one more thing regarding locking. i found that if we use NSS own
NSS_IsInitialized() we can probably get rid of that "inited" global and
avoid locking altogether. i might give it a try later when merging ssldap
changes.
Comment on attachment 239042 [details] [diff] [review]
first cut of the patch

The changes look OK to me.  Please commit.

R.e. the idea of using NSS_IsInitialized() that is worth investigating.  If an application has already initialized NSS but also needs to call one of our functions to do some LDAP-specific NSS initialization we should make sure that works too.
Attachment #239042 - Flags: review+
Yes.  Go ahead and commit this first, then we can investigate using NSSIsInitialized(), perhaps as another bug.
thanx for all reviews and comments on this!

Checking in mozilla/directory/c-sdk/ldap/include/ldap_ssl.h;
/cvsroot/mozilla/directory/c-sdk/ldap/include/ldap_ssl.h,v  <--  ldap_ssl.h
new revision: 5.5.8.1; previous revision: 5.5
done
Checking in mozilla/directory/c-sdk/ldap/libraries/libldap_ssl.ex;
/cvsroot/mozilla/directory/c-sdk/ldap/libraries/libldap_ssl.ex,v  <--  libldap_ssl.ex
new revision: 5.5.8.1; previous revision: 5.5
done
Checking in mozilla/directory/c-sdk/ldap/libraries/libssldap/clientinit.c;
/cvsroot/mozilla/directory/c-sdk/ldap/libraries/libssldap/clientinit.c,v  <--  clientinit.c
new revision: 5.6.8.1; previous revision: 5.6
done
These have been merged on to the trunk.  This bug can be resolved as FIXED.
Reassigned to Anton (for tracking purposes) and marked fixed.
Assignee: mcs → anton.bobrov
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: