Closed Bug 485690 Opened 15 years ago Closed 15 years ago

LDAP SSL connections with clientauth sometimes use wrong credentials

Categories

(Directory :: LDAP C SDK, defect, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nelson, Assigned: nelson)

References

Details

Attachments

(2 files, 2 obsolete files)

Attached patch Patch v1 for LDAP C SDK (obsolete) — Splinter Review
This problem is seen in client programs that use SSL clientauth and that 
use different credentials (client certificates) on different connections 
to the same server during the same process lifetime.  It is also seen in 
programs that use clientauth on some connections to a server, but not on 
other connections to the same server during the same process lifetime.  

This is seen most commonly is test clients that are trying to simulate
different users, but it may also be seen in servers that are trying to 
replicate different data sets to a remote server with a different client
credential for each different data set.  

The problem is that after one connection has been established between 
the client and the server using some (or no) credential, all subsequent
connections to the same server may use the same credential (or same absent
credential) even if the client has called ldapssl_enable_clientauth 
specifying different credentials to be used for each subsequent connection.

Here is an explanation.  SSL remembers ("caches") information about 
previously established sessions with "remote" clients and servers, and 
reuses that information from old sessions whenever possible to minimize the 
expensive computational overhead of establishing new sessions with each new
connection.  When a new connection is made, if the server says "Let's reuse
the session we last used", then the most recent session from that socket's
cache will be used, even if that previous session was established using 
different credentials than the client now wishes to use.

NSS's SSL client library supports the use of multiple separate named caches, 
to allow the sessions associated with one set of credentials to be remembered 
separately from the sessions associated with other sets of credentials.  
So a client program that uses credentials for "Alice" with some sockets, and 
uses credentials for "Bob" with other sockets, can have separate caches for Alice and Bob.  If the connections for Alice are configured to use Alice's 
cache, and Bob's connections use Bob's cache, all confusion over different 
credentials can be eliminated.  

When a program (or thread) uses a socket for which it has not configured an explicitly named socket, then NSS uses a default unnamed cache for that 
socket.  There is only one default unnamed cache.  If multiple threads,
trying to act as separate users, do not specify explicitly named caches for
their respective users' identities, then the separate users will end up 
using a common set of credentials from the common cache, resulting in 
confusion.  

It may be reasonable for all LDAP connections that do NOT use client auth 
to share a common cache, because each connections that uses no client auth 
typically is (re)authenticated using some other method, such as simple name
and password authentication.  But it is not reasonable for client auth 
connections to share a cache with connections that use no client auth.

That is the problem experienced with the LDAP C SDK.  The LDAP C SDK never 
explicitly names a cache to be used with any SSL socket, so all SSL sockets
share the common unnamed cache, with the resulting identity confusion.  

A named cache is implicitly created the first time its name is configured 
to be used by a socket.  Cache entries have a lifetime after which they 
expire.  When all the cache entries for a named cache have expired, the 
cache disappears.  So, it's very easy to use named caches.  You just give
the name to the SSL socket before doing the first handshake on that socket,
and SSL does the rest.  

I have attempted to modify the LDAP C SDK to do that, to configure each 
SSL socket that will be used with a client certificate to use a cache with 
the same name as the "nickname" of the cert to be used.  I have attached 
to this bug an untested patch that I believe will solve the problem.  

I would like one of the LDAP C SDK cognoscenti (you know who you are :) to 
test this patch and see if it does indeed satisfactorily solve the problem 
for them.
Attachment #369808 - Flags: review?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: ssl connections with clientauth sometimes use wrong credentials → LDAP SSL connections with clientauth sometimes use wrong credentials
Attached patch alternative patch v1 (obsolete) — Splinter Review
This may be slightly more pleasing.
Attachment #369845 - Flags: review?
I found a third function in this source file that almost identical to 
the other two, so I patched it, too.
Attachment #369808 - Attachment is obsolete: true
Attachment #369845 - Attachment is obsolete: true
Attachment #369957 - Flags: review?
Attachment #369808 - Flags: review?
Attachment #369845 - Flags: review?
Comment on attachment 369957 [details] [diff] [review]
Patch v2 for LDAP C SDK

I'm asking Mark for this review, but any peer of this module could review it.
Attachment #369957 - Flags: review? → review?(mcs)
hey Nelson, thanx for the patch and explanation! i dont have time to test it 
right now but should be able to do so by the end of the week or so so if
somebody can test it earlier than that please do.

on a side note i think related NSS caching design is broken because it allows
for such things to happen in the first place pretty much without user consent
[i'm sure you have it documented somewhere but still]. i understand it would
be difficult to fix because of the performance implications and existing API
users expectations in terms of baseline performance they come to expect but
you should at least think about fixing it somehow. it is probably a very good
and efficient caching mechanism for browsers and client software alike and i
can certainly understand how that design came around.
Anton, I don't think the existing API or cache design in broken.  

It is necessary to tell/configure the socket in advance, saying 
"if the server asks you to re-use an existing session, use one 
associated with this identity, and if the server asks you to create a
new session, associate that new session with this identity".  
There is no other basis on which NSS can figure out the right identity 
to use for those purposes than to be told, and the SSL_SetSockPeerID 
function is how you tell it.
This patch looks ok

Is it safe to call SSL_SetSockPeerID() even if the peer ID has already been set?  I'm not sure how this can happen, but do we need to get the current peer ID and free/release it before we set it again?
> Is it safe to call SSL_SetSockPeerID() even if the peer ID has already 
> been set?  

It will leak the previous value.  But that's easy for us to fix in NSS.

> do we need to get the current peer ID and free/release it before we set 
> it again?

I think there's no public NSS API by which to do that, and in any case,
that should not be necessary.  If you think this is really a possibility,
Please let me know.
Depends on: 486999
It is a little surprising to me that the SSL session cache ignores differences in client auth credentials, etc. but I know it would be difficult to code it so it did the right thing automatically (the right thing is likely to be application specific).  I think this is a good fix for the LDAP C SDK, but I am not able to test it right now.
The SSL code isn't ignoring anything.  It hasn't received any information to
ignore.  If you carefully trace through the calls made by LDAP SDK to libSSL,
you will see that none of them tells libSSL what identity to use before the
handshake begins.  The patch fixes that.  The patch gives the SSL library 
the information with which it can correctly choose an SSL session to restart.  

Without that information, the SSL library can only guess which session to 
restart.  It guesses the most recently established session with that server.
That's generally right for the client that represents just a single identity.
It's a problem for a client that represents multiple identities.  But if such
a client doesn't tell SSL "For this socket, I want to be identity X", then 
libSSL has no basis on which to make such a decision.  

I got a similar reaction from some DS people at Sun.  There's clearly some 
fundamental misunderstanding at work here.  I wish I understood the cause 
of that.  Perhaps there is some fundamental difference in models between SSL
and LDAP.  Or perhaps the way that SSL manages sessions is not generally 
understood.  Please help me to determine the cause of this misunderstanding.
Once that is understood, I think we can help many Directory folks with this 
issue.
In Fedora Directory Server we do this for outgoing LDAP connections (e.g. if you want to use SSL with client cert auth in replication):
http://cvs.fedoraproject.org/viewvc/ldapserver/ldap/servers/slapd/ssl.c?revision=1.26&root=dirsec&view=markup
function slapd_SSL_client_auth
...
	    rc = ldapssl_enable_clientauth (ld, SERVER_KEY_NAME, pw, cert_name);
	    if (rc != 0) {
		errorCode = PR_GetError();
		slapd_SSL_warn("ldapssl_enable_clientauth(%s, %s) %i ("
			       SLAPI_COMPONENT_NAME_NSPR " error %d - %s)",
			       SERVER_KEY_NAME, cert_name, rc, 
			       errorCode, slapd_pr_strerror(errorCode));
	    } else {
		/* We cannot allow NSS to cache outgoing client auth connections -
		   each client auth connection must have it's own non-shared SSL
		   connection to the peer so that it will go through the
		   entire handshake protocol every time including the use of its
		   own unique client cert - see bug 605457
		*/

		ldapssl_set_option(ld, SSL_NO_CACHE, PR_TRUE);
	    }

So instead of trying to get the correct credentials to be used by the cached connection, we just avoid caching altogether.
In reply to comment 10, disabling the cache is another way to prevent the
problem, but it is extermely inefficient, forcing a new full handshake for
every client auth connection, and never reusing SSL sessions.  It's a 
"solution" but not an optimal one, IMO.

The patch I proposed is much more efficient, reusing sessions properly.
BTW, Rich, where is this bug 605457 ?  Got URL?
(In reply to comment #11)
> In reply to comment 10, disabling the cache is another way to prevent the
> problem, but it is extermely inefficient, forcing a new full handshake for
> every client auth connection, and never reusing SSL sessions.  It's a 
> "solution" but not an optimal one, IMO.
> 
> The patch I proposed is much more efficient, reusing sessions properly.

Sure.  We will investigate changing this in an upcoming version of DS.
(In reply to comment #12)
> BTW, Rich, where is this bug 605457 ?  Got URL?

It is not a public bug.  We found it mid-2002, so the bug was in the internal private AOL bug tracking system.  In fact, nelsonb@netscape.com at 2002-07-26 14:55:17 said to use SSL_SetSockPeerID with a unique identity string.
(In reply to comment #9)
> The SSL code isn't ignoring anything.  It hasn't received any information to
> ignore.  If you carefully trace through the calls made by LDAP SDK to libSSL,
> you will see that none of them tells libSSL what identity to use before the
> handshake begins.  The patch fixes that.  The patch gives the SSL library 
> the information with which it can correctly choose an SSL session to restart.

Thanks for the explanation.  I think I understand now.  When I wrote comment # 8, I was thinking that the SSL library could, for example, look at what certificate is being used for client authentication and use that info. to decide whether to reuse a cached session.  But it does not have that information at the time it needs to make the decision about session reuse.
sorry, i didnt really get enough time to have a go with a test setup. this
stuff is a royal pain to setup from scratch. i have tried to do it quickly
but failed with SSLException : bad_certificate on the server side thrown
for certutil generated self-signed cert. its not the issue of trust or
validity, it is something with the cert itself. i'm gonna try to generate
self signed cert with Java tools then import it into NSS, maybe that will
work, maybe not, maybe it all depends on the current moon phase.
doh, but of course i cant just move the keystore from JKS to NSS.
well, I'd say that testing SSL client auth with self-signed certs is doomed.
When an SSL server requests client auth, it sends a list of the names of 
the CAs whose client certs it accepts.  SO, when you're doing SSL client 
testing, you should
a) have your own CA for issuing client certs, with its own CA cert.
   (Certutil can do this nicely).
b) have that CA cert marked as trusted to issue client certs in the server
   that is requesting client auth, so that the server will take the name 
   from that CA cert and send it to the clients as the name of an issuer 
   whose certs will be accepted.
c) have that CA cert be imported into the client's cert DB, along with the
   client's own cert (issued by that CA).

It's easy to setup a shell script that 
- creates a cert DB for a CA
- creates a self signed CA cert in that CA's cert DB.
- exports the CA cert for that CA
- creates a cert DB for the client
- imports the CA's cert into the client's DB
- creates a cert signing request for the client using the client's DB
- signs and issues a new client cert, using the CA's DB
- imports the newly issued client cert into the client's DB.

NSS's QA test scripts do all those steps (and more) for every NSS QA test run.
- creates a client auth cert signed by that CA
-
we have so called Blind Trust manager provider on the server side, just for
testing so it doesnt matter. the problem i have is that NSS self signed
certs are rejected while JKS self signed certs work fine. SSLException isnt
very informative, just saying bad_certificate and thats it. any idea how to
generate Java Security friendly self signed cert with certutil ?
nevermind, i think i got it. what was happening is NSS refusing a handshake
because it didnt know about the server cert [insert another doh! here] and
didnt trust it. the thing that got me baffled was that SSLException with
bad_certificate error. i thought it was about some certificate extension
missing or some other extension not recognized or something like that. will
try to test it further now, sorry for the noise.
We use such a script as mentioned by Nelson:
http://github.com/richm/scripts/blob/702a161b5ad83c47d03ff5ac37ab52dcf691e16f/setupssl2.sh
Creates a self signed CA cert, a server cert for directory server, and a server cert for admin server (that both can be used for client cert auth too).
ok, verified the patch now and it does fix the problem as described and i dont
see it rendering anything broken on my simple tests so we should get this fix
in asap. here is some proof output on the server side just in case :

- unpatched 

08/Apr/2009:02:49:01 +0200] CONNECT conn=7 from=x.x.x.x:37556 to=y.y.y.y:636 protocol=LDAP
[08/Apr/2009:02:49:02 +0200] BIND REQ conn=7 op=0 msgID=1 type=SASL mechanism=EXTERNAL dn=""
[08/Apr/2009:02:49:02 +0200] BIND RES conn=7 op=0 msgID=1 result=0 authDN="cn=Server-Cert,dc=example,dc=com" etime=4
[08/Apr/2009:02:49:02 +0200] CONNECT conn=8 from=x.x.x.x:37557 to=y.y.y.y:636 protocol=LDAP
[08/Apr/2009:02:49:02 +0200] BIND REQ conn=8 op=0 msgID=1 type=SASL mechanism=EXTERNAL dn=""
[08/Apr/2009:02:49:02 +0200] BIND RES conn=8 op=0 msgID=1 result=0 authDN="cn=Server-Cert,dc=example,dc=com" etime=4

- patched

[08/Apr/2009:02:54:30 +0200] CONNECT conn=9 from=x.x.x.x:37558 to=y.y.y.y:636 protocol=LDAP
[08/Apr/2009:02:54:30 +0200] BIND REQ conn=9 op=0 msgID=1 type=SASL mechanism=EXTERNAL dn=""
[08/Apr/2009:02:54:30 +0200] BIND RES conn=9 op=0 msgID=1 result=0 authDN="cn=Server-Cert,dc=example,dc=com" etime=4
[08/Apr/2009:02:54:30 +0200] CONNECT conn=10 from=x.x.x.x:37559 to=y.y.y.y:636 protocol=LDAP
[08/Apr/2009:02:54:30 +0200] BIND REQ conn=10 op=0 msgID=1 type=SASL mechanism=EXTERNAL dn=""
[08/Apr/2009:02:54:30 +0200] BIND RES conn=10 op=0 msgID=1 result=49 authFailureID=1245310 authFailureReason="The SASL EXTERNAL bind request could not be processed because the client did not present an certificate chain during SSL/TLS negotiation" etime=2

this is from a single client process just using two separate LDAP handles.
I agree - this looks good.
i already told you the reason. API users dont expect blanket NSS caching in
the first place. yes, it is described in the docs, yes, rtfm is in order,
still from the API user pov this is wrong. if i wanna enable blanket cache
i would go ahead and enable it and i dont expect the API to set me up. the
reason we have this bug in the first place is just that, well that and not
rtfm, that is if it was documented at the time at all. again, i understand
why NSS does it and how that design came around but it is not a good design
from where i look at it [as API user]. i also know that there isnt much you
can do about it now even if you recognize there is a design problem and are
willing to fix it. one way to go about it would be introducing NSS handles
where each handle is associated with a given db, cache and whathaveyou but
i digress. anyway, this is just to tell you how it is from API user pov. i
have no time or energy to argue about it so take it or leave it, up to you.

(In reply to comment #9)
> I got a similar reaction from some DS people at Sun.  There's clearly some 
> fundamental misunderstanding at work here.  I wish I understood the cause 
> of that.  Perhaps there is some fundamental difference in models between SSL
> and LDAP.  Or perhaps the way that SSL manages sessions is not generally 
> understood.  Please help me to determine the cause of this misunderstanding.
> Once that is understood, I think we can help many Directory folks with this 
> issue.
Anton, I think your answer can be summarized as "caching of sessions is 
entirely unexpected".  That's a helpful answer.  Thanks.  

If we ever do an NSS 4.0, I think that lots of issues like this would change.
But I doubt that we ever will do so.
btw this is how it works in the current Sun Directory Server as well.

(In reply to comment #10)
> In Fedora Directory Server we do this for outgoing LDAP connections (e.g. if
> you want to use SSL with client cert auth in replication):
> So instead of trying to get the correct credentials to be used by the cached
> connection, we just avoid caching altogether.
Well, if Sun's DS disabled caching for auth connections that intended or
desired to use client auth, this bug would not exist.  This bug was brought
to my attention by DS folks (QA, I think, not sure) because their 
connections that they intended to use client auth were reusing old non-client 
auth sessions.
last week i spoke to the person responsible for related code in Sun DS and who 
also filed related Sun CR against NSS and this is what he told me. i think it
is still a problem for them since there are scenarios where many short lived
connections can take a perf hit with disabled cache. hopefully once your patch
is committed they can take advantage of it as well.

(In reply to comment #27)
> Well, if Sun's DS disabled caching for auth connections that intended or
> desired to use client auth, this bug would not exist.  This bug was brought
> to my attention by DS folks (QA, I think, not sure) because their 
> connections that they intended to use client auth were reusing old non-client 
> auth sessions.
I think this is an issue for SASL, as well as for SSL client auth.
Basically, any use of LDAP C SDK over SSL that does some sort of binding
of identity/credentials to the SSL session needs to separate the SSL 
sessions so that no single SSL session is used with multiple credentials. 

The patch attached here attempts to do that for connections authenticated
with SSL client auth (certs), but I suspect this is also an issue for SASL.
See bug 488449, and see if you think that is the same problem.  
If so, a patch similar to the one that helps this bug may help that bug.  

I don't know how you would get the SASL-client to give you a string that 
you could use as a client session cache name.  But I suspect that would be
part of the solution.
What can I do to move this patch along?
Priority: -- → P1
Comment on attachment 369957 [details] [diff] [review]
Patch v2 for LDAP C SDK

Sorry for the delay.  The patch looks fine, although I would add a comment block like this before each of the new code blocks:

    /*
     * If using client auth., arrange to use a separate SSL session cache
     * for each distinct certificate nickname.
     */

r=mcs if comments are added.  Please commit or ask me to do so.  Thanks!!!!!
Attachment #369957 - Flags: review?(mcs) → review+
Checking in ldapsinit.c; new revision: 5.17; previous revision: 5.16

Is there a tinderbox I can watch to see if this checkin has any unexpected
consequences on other platforms?
What release number should go into the target milestone field for this bug?
What will be the next expected release from the trunk?
(In reply to comment #33)
> What release number should go into the target milestone field for this bug?

6.0.7

> What will be the next expected release from the trunk?

6.0.7
Target Milestone: --- → 6.0
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: