Closed Bug 283690 Opened 20 years ago Closed 20 years ago

NSS improperly handles PKCS11 sessions for SSL derived keys.

Categories

(NSS :: Libraries, defect, P2)

3.9.5
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rrelyea, Assigned: rrelyea)

References

Details

Attachments

(1 file, 2 obsolete files)

There may be a duplicate of this bug. It would show up as 'nss passes invalid
object handles to C_DestroyObject()'. But I couldn't find the orginal bug.

-----------------------------------------------
When verifying that we are really are creating keys on their own session, I went
looking at our SSL code.

In SSL, we call a function to 'derive' keys for us.

This function returns a symkey, which basically has the session that the derived
keys all live on, and a data structure that has the handles for all the keys
that were derived.

We then call PK11_SymKeyFromHandle() to create separate symkeys for each of
these keys.

PK11_SymKeyFromHandle() gets a new symKey structure (which automatically returns
a structure with a session), closes the session, and addes the session from the
parent key, marking the symKey as not being the owner of the session.

After it creates all these SymKeys, the original  SymKey is freed! (which in
theory will free the session).
--------------

Why I think it works "most" of the time.

To make getting keys more efficient, NSS keeps a freelist of SymKeys. This saves
memory allocations, lock allocations, and session creations for the new keys.
What I believe is happening is when we free the parent SymKey, the SymKey is
placed on the free list, which the session that was created for it. Later, when
the key is reused, the session is problably reused. The only exception is cases
like PK11_SymKeyFromHandle, where the session is closed because the new key is
going to share a session with some other key.
I believe this problem is not only the source of bugs, but also a major
performance issue.

The NSS key library is trying not only to keep the keys on the freelist, but
also to limit the number of new sessions that need to be created to handle SSL
operations. Grabing new keys from the free list, then closing their sessions is
inefficient.

To fix this part of the problem, I suggest we keep 2 types of key freelists:
those which we preserve the session, and those which we know won't own their
sessions. The former is already kept on a per slot basis. The latter can be
allocated out of a general pool. When we want a symkey which we are going to
supply our own session for, we should grab it from the general free list. This
reduce the number of new session calls we need to make, and allow us to recycle
existing sessions better.

This doesn't really "FIX" the real problem that this bug describes, however (it
will *mask* the problem for longer). I'm wondering if we don't need a special
call for SSL to create a stack of keys from the derive, and then a call to free
that entire stack when the socket is finished. 

There are two other possible fixes I see for the 'real' problem:
1) hang on to the symkey returned by us from PK11_DeriveSymKey() along with all
the child keys.
2) keep a reference to the parent key in the child keys and free that reference
when we free the child.

I think the latter is cleanest.

bob
BTW if we recycle contexts like we recycle keys, we can come pretty close to
elliminating memory allocations in pk11wrap (and probably softoken) when running
SSL.

bob
Blocks: 196353
OK, I'm about to attach a patch to fix the base problem. In doing so I
discovered that we are using some trickery to keep the session around, by having
one of the keys 'inherit' the session. The problem comes is, because of
references or other things, this key is freed before the other keys. Most likely
the SSL session is over at this point, and the other keys won't be used, but  it
will trigger the delete of the non-existant keys.

This patch will guarrentee the session isn't squashed before until the last key
is deleted. I haven't tested the patch, I've just verified that it compiles.
This patch doesn't fix the performance issue I've identified.

bob
oops, the other attachement wasn't a patch;(.
Attachment #175587 - Attachment is obsolete: true
Saul, please consider testing the above patch to ensure that it is correct 
and to see if it improves performance.
The patch shouldn't improve performance, but it should improve the hardware issue.

To improve performance we need to implement the separate freelists I suggested
(which are not part of this patch).

bob
Assignee: wtchang → rrelyea
nominating for NSS 3.10.  

Current measurements show 25% of TOTAL server CPU time being spent in 
softoken bookkeeping code, that is, in softoken, but not in freebl.  

#1 source of lock contention is function pk11_ObjectFromHandleOnSlot().
Looks like majority of session keys are all on the same session.  
Investigation continues.
Priority: -- → P2
Target Milestone: --- → 3.10
Version: 3.9.7 → 3.9.5
Actually object handles are not on a per session basis, but a per slot basis
(the same object handle referenced from different sessions point to the same
object).

Two things that could help out this contention:
1) move to reader/writer locks. I don't know how much this will help because we
still have a fair number of writes (key creations and deletions), but there are
a *LOT* more reads.... it's especially noteworthy that it's contending on the
lookup.

2) There are a set of functions call 'tokenKeyLock()' and 'tokenKeyUnlock()'
which are using the same locks. Since we know we are condending on these locks,
we should create another slot level lock for these functions.

3) Look at increasing the bucket size for the hash tables.

bob
Comment on attachment 175588 [details] [diff] [review]
Don't let the owner of a key's session go until the key is done.

Bob, this patch increases the size of struct PK11SymKeyStr which is public,
IINM.
Doesn't this potentially break backward binary compatibility?

If you think it's OK for us to lengthen this struct, what do you think about
lengthening struct CERTCertificateRequestStr by adding	this member?
  SECItem ** extensions;
That change would benefit the fix for bug 263779.
This patch is intended to help us find and eliminate causes of invalid
handle errors.
I threw in an assertion at every place where softoken returns
CKR_XxXXXXX_HANDLE_INVALID and ran ssl.sh.  Assertion failed in the 
first SSL3 stress test.  Stack is 

[5] PR_Assert(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff16cb7d
[6] NSC_DestroyObject(hSession = 791U, hObject = 388U), line 201 in "pkcs11c.c"
[7] PK11_FreeSymKey(symKey = 0x57a710), line 159 in "pk11skey.c"
[8] PK11_DestroyContext(context = 0x5990b0, freeit = 1), line 97 in "pk11cxt.c"
[9] ssl3_CleanupKeyMaterial(mat = 0x5b75f0), line 1261 in "ssl3con.c"
[10] ssl3_DestroyCipherSpec(spec = 0x5b7550), line 1286 in "ssl3con.c"
[11] ssl3_DestroySSL3Info(ssl3 = 0x5b7420), line 8995 in "ssl3con.c"
[12] ssl_DestroySocketContents(ss = 0xfffffd7ffec04c50), line 377 in "sslsock.c"
[13] ssl_FreeSocket(ss = 0x5581a0), line 439 in "sslsock.c"
[14] ssl_DefClose(ss = 0x5581a0), line 247 in "ssldef.c"
[15] ssl_SecureClose(ss = 0x5581a0), line 904 in "sslsecur.c"
[16] ssl_Close(fd = 0x432fb0), line 1204 in "sslsock.c"
[17] PR_Close(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff168d95
[18] handle_connection(tcp_sock = 0x432fb0, model_sock = 0x4329b0, requestCert =
0), line 1078 in "selfserv.c"
[19] jobLoop(a = (nil), b = (nil), c = 0), line 507 in "selfserv.c"
[20] thread_wrapper(arg = 0x540780), line 476 in "selfserv.c"

PK11_DestroyContext was being called to destroy a MAC context, and the 
session handle was invalid.  The MAC key corresponding to that context
had just been destroyed. 

I did the test a second time and got a nearly identical crash.  The 
only difference was in frames 8 and 9.  ssl3_CleanupKeyMaterial was
trying to destroy the server "write" (encryption) key.  Now I will 
test with Bob's patch above.
With Bob's patch (above) patch applied, the crashes no longer occur in ssl.sh.
I have run ssl.sh several time through without failure.  
Comment on attachment 175588 [details] [diff] [review]
Don't let the owner of a key's session go until the key is done.

Bob, this patch appears to be correct.	There is still additional optimization
to be achieved, but  in the interest of correctness, I'd like you to check this
in, unless you have objections to your own patch :)
Attachment #175588 - Flags: review+
Comment on attachment 176111 [details] [diff] [review]
assert on invalid session/object handle

Wan-Teh, Please review this patch.  I believe that passing any PKCS11 function
a handle to a session or object that has already been destroyed is a
programming error.  
As I understand your criteria for asserts, you believe we should assert on
programming errors.  So, do you agree that the assertions in this patch are
appropriate?
Attachment #176111 - Flags: review?(wtchang)
RE: binary compatibility. (comment 10)

PK11SymKey is opaque publically (the data structure is defined in secmodti.h,
which is private).

The public definition is in secmodt.h which is simply:
typedef struct PK11SymKeyStr PK11SymKey;

As far as lengthening data public data structures, I think we should apply the
following criteria:
1) avoid it if there is a reasonable alternative.
2) never do it for a data structure the application would reasonably imbed it
one of it's own data structures.

Under those criteria some data structures obviously full in case '2':
SECItem.

Some are "relatively safe":
PK11SlotInfo,
SECMODModule

Some are "iffy":
CERTCertificate

I think we have lengthened some of these data structures in the past.
The CERTCertRequest looks like it fits between the two. It's API implies
applications are going to try to build one up itself (it has a constructor and a
destructor. I'd say it's probably at least as safe to extend as a
CERTCertificate (which I think we have in the past).

bob


bob
Comment on attachment 176111 [details] [diff] [review]
assert on invalid session/object handle

The assertions you added apply not only to NSS internal programming errors, but
also errors in other PKCS#11 applications.

The PKCS#11 API is public, and the caller may very well not be a piece of NSS
code. For example, the softoken can be called directly from the JDK or other
applications. Thus, IMO, we should not unconditionally assert if they pass
bogus session or object handles to us.

Perhaps we should conditionally assert, maybe based on the state of an
environment variable.
Attachment #176111 - Flags: superreview-
I checked in Bob's Patch above for him, while he is on vacation.
I'm marking this bug fixed.
There may be pressure to back-port this to NSS 3.9.x for some products.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Comment on attachment 176111 [details] [diff] [review]
assert on invalid session/object handle

Withdrawing patch
Attachment #176111 - Attachment is obsolete: true
Attachment #176111 - Flags: review?(wtchang)
Changed bug summary to show these are PKCS11 sessions, not SSL sessions,
that are being mishandled.
Summary: NSS improperly handles sessions for SSL derived keys. → NSS improperly handles PKCS11 sessions for SSL derived keys.
*** Bug 216552 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: