Closed Bug 292049 Opened 15 years ago Closed 15 years ago

NSS passes invalid handles to C_UnwrapKey

Categories

(NSS :: Libraries, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nelson, Assigned: nelson)

References

Details

Attachments

(1 file, 4 obsolete files)

A certain PKCS11 module crashes if/when its C_UnwrapKey method is called and
is passed CK_INVALID_HANDLE (that is, zero) as the session handle.  

NSS has been seen to do this when using a temporary ("session") RSA private key.
Here is a stack trace.  

C_UnwrapKey          (0, febdcbb4, 1dd848, 20e00c, 80, febdcbd0) + fc
pk11_AnyUnwrapKey    (b31c8, 1dd848, 183958, 0, febdcda0, 370) + 5a4
PK11_PubUnwrapSymKey (208058, febdcda0, 370, 10c, 0, 26930) + b0

I observe that NONE of the following functions checks for a valid session
handle before returning the symkey or using the session handle:

pk11_getKeyFromList
pk11_CreateSymKey
C_UnwrapKey

I'm going to attach a patch that will
a) help us figure out where the invalid session handle is coming from, and
b) avoid passing a zero session handle to C_UnwrapKey
P1 for 3.10.1
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → 3.10.1
Attached patch patch for debugging (obsolete) — Splinter Review
Bob, even though I don't intend to commit this patch as is, please review it
and tell me if any of my new error path code is obviously wrong. Thanks.

Saul, please test this patch on the system experiencing this failure.
Let's see if it either cures the crash or creates a core showing clearly
the source of the invalid session handle.
Attachment #181938 - Flags: review?(rrelyea)
Comment on attachment 181938 [details] [diff] [review]
patch for debugging

r+ for debugging.
pk11_GetNewSession should never return CK_INVALID_HANDLE.
PK11_GetRWSession can!
(the checks there should probably permanent).

pk11_CloseSession checks the only value itself. It's coded to always pair with
a pk11_GetNewSession.

Looking at the code, it looks like there is a bug in PK11_GetRWSession. If it
can't get any new sessions, it closes the slot session and marks it as invalid.
PK11_RestoreROSession then tries to restore it. The problem is if some keys
were created with the slot session, they now suddenly become invalid.
PK11_GetRWSession should just fail in those cases.
Attachment #181938 - Flags: review?(rrelyea) → review+
A full patch should probably add some comments;).
Attached patch new debug patch (obsolete) — Splinter Review
This patch replaces both patches above.  
It differs from the first patch by using CK_INVALID_SESSION instead of
CK_INVALID_HANDLE for comparison with session handles.
It differs from Bob's patch in that it releases the slot monitor 
even when the rwsession passed in is CK_INVALID_SESSION.  

Bob, please confirm that my revision of your patch is the right thing for this
function to do.

Saul, please give this new patch a try.
Attachment #181938 - Attachment is obsolete: true
Attachment #181982 - Attachment is obsolete: true
Attachment #182002 - Flags: review?(rrelyea)
Saul, I have a question about the occurence of this bug.
Does it happen one the first handshake, or the second?
Or does it happen after many handshakes have happened, (many > 1000) ?
Comment on attachment 182002 [details] [diff] [review]
new debug patch

r- PK11_GetRWSession releases the slot monitor itself if the session is
invalid.

perhaps the correct fix is to maintain your patch for RestoreROSession and  for
GetRWSession to always maintain the SlotMonitor.
The reason is there are 2 conditions where the rw session could be
CK_INVALID_SESSION without releasing the monitor... slot->session is
CK_INVALID_SESSION (a case which should assert), and the PKCS #11 module
returns crv as OK, but gives us an invalid session. The latter case is a bug in
the PKCS #11 module, but we should fail in that case, not hold locks open
forever;).
Attachment #182002 - Flags: review?(rrelyea) → review-
BTW the rest of the patch is fine.
Bob, You assert that PK11_GetRWSession doesn't hold the monitor when it
returns CK_INVALID_SESSION, but there are at least two paths through
PK11_GetRWSession that can return CK_INVALID_SESSION but hold the slot monitor.
They are:

a) slot->defRWSession is true, but slot->session is CK_INVALID_SESSION, or
b) C_OpenSession returns CKR_OK but has set rwsession to CK_INVALID_SESSION

PK11_GetRWSession does not detect these, and returns while holding the monitor.
That is why my code released the monitor when rwsession is CK_INVALID_SESSION.
I agree that we can and should assert in either of these cases.  But in the 
non-debug builds, what should PK11_GetRWSession do in these cases?  

Should PK11_GetRWSession ALWAYS release the monitor every time it returns
CK_INVALID_SESSION?  even when slot->defRWSession is true?

If slot->session is CK_INVALID_SESSION, should PK11_GetRWSession 
a) clear the slot->defRWSession ?  and/or
b) try to get a new RW session, and if it succeeds, make that the new 
   default RW session for that slot?

Should the definition of PK11_RWSessionHasLock be changed from
  (PRBool)(!slot->isThreadSafe || slot->defRWSession)
to 
  (PRBool)(!slot->isThreadSafe || 
           (slot->defRWSession && slot->session != CK_INVALID_SESSION)
??

Should PK11_RestoreROSession be changed to only exit the slot monitor if
  (slot->defRWSession && slot->session != CK_INVALID_SESSION)
??

I will attach another patch with my proposed answer to these questions.
This patch differs from the previous one only in pk11skey.c
It ensures that the lock is not held whenever GetRWSession returns 0.
It also attempts to "fix" the slotInfo (or at least return it to a self-
consistent state) in the event that slot->defRWSession is true but 
slot->session is zero.	

Does this seem better, Bob?
Attachment #182002 - Attachment is obsolete: true
Attachment #182026 - Flags: review?(rrelyea)
It sort of went in the other direction I was thinking. I was thinking to just
remove the exitMonitor in PK11_GetRWSession(). That would make all the rest a
lot easier to follow (you always have the monitor out of RWSession....).

The direction you went has the advantage as it is more what the callers would
expect (that no locks are held if the function fails). It also has an attempt at
recovering from slot->session being CK_INVALID_HANDLE in the defRW case. (defRW
BTW happens when NSS finds a token which is not read only, but can only support
one session). Anyway there is definate advantages for the added complexity.

Here are the things I saw wrong in the patch:
-- in PK11_GetRWSession():
+    PORT_Assert(rwsession != CK_INVALID_SESSION || crv != CKR_OK);
+    if (crv != CKR_OK || rwsession == CK_INVALID_SESSION) {
+	if (crv == CKR_OK) 
+	    crv = CKR_DEVICE_ERROR;
+	if (haveMonitor)
+	    PK11_ExitSlotMonitor(slot);

The PORT_Assert should read 'crv == CKR_OK && rwSession != CKR_INVALID_SESSION'.
It's valid for a PKCS #11 module to faile a C_OpenSession because it has ran
out of sessions.

+PK11_RWSessionHasLock(PK11SlotInfo *slot,CK_SESSION_HANDLE session_handle) 
+{
+    PRBool hasLock;
+    hasLock = (PRBool)(!slot->isThreadSafe || 
+    	      (slot->defRWSession && slot->session != CK_INVALID_SESSION));
+    return hasLock;
+}
hasLock should also be false if session_handle is CK_INVALID_SESSION.

in pk11_RestoreRWSession:
+    PORT_Assert(rwsession != CK_INVALID_SESSION);
+    if (rwsession != CK_INVALID_SESSION) {
+    	PRBool doExit = PK11_RWSessionHasLock(slot, rwsession);
+	if (!pk11_RWSessionIsDefault(slot, rwsession))
+	    PK11_GETTAB(slot)->C_CloseSession(rwsession);
+	if (doExit)
+	    PK11_ExitSlotMonitor(slot);

Remove the Assert (it's valid for PK11_GetRWSession to fail).

your direction is fine, r+ for either going to your previous patch and removing
the PK11_ExitMonitor call, or taking your current patch with the above changes.

bob
(In reply to comment #11)

> Here are the things I saw wrong in the patch:
> -- in PK11_GetRWSession():
> +    PORT_Assert(rwsession != CK_INVALID_SESSION || crv != CKR_OK);

Are you sure that's wrong?

> The PORT_Assert should read 
>    'crv == CKR_OK && rwSession != CKR_INVALID_SESSION'.
> It's valid for a PKCS #11 module to faile a C_OpenSession because it has ran
> out of sessions.

Yes, and it must return a crv other than CKR_OK in that case.
My assertion allows that case to proceed.  It says that the code may proceed
if either 
(a) we got a valid non-zero session number, or 
(b) there was a non-zero error code set.
It disallows the case of getting a 0 session ID together with CKR_OK.

It appears to me that the assertion you suggested above would cause an 
assertion failure if either:
- a 0 session id was returned, or 
- any non-zero error code was returned.



> +PK11_RWSessionHasLock(PK11SlotInfo *slot,CK_SESSION_HANDLE session_handle) 
> +{
> +    PRBool hasLock;
> +    hasLock = (PRBool)(!slot->isThreadSafe || 
> +    	      (slot->defRWSession && slot->session != CK_INVALID_SESSION));
> +    return hasLock;
> +}
> hasLock should also be false if session_handle is CK_INVALID_SESSION.

Agreed.

> in pk11_RestoreRWSession:
> +    PORT_Assert(rwsession != CK_INVALID_SESSION);

> Remove the Assert (it's valid for PK11_GetRWSession to fail).

Yes, but I claim that when PK11_GetRWSession fails, returning zero, it is 
not valid for the caller to thereafter call pk11_RestoreROSession.  
By analogy, when open() returns -1, one does not call close(-1).  
Do you agree with that?

I think this also means that, if we want to ensure that NSS never calls a 
PKCS11 C_* function with a zere sessionID, then we must inspect EVERY 
caller of PK11_GetRWSession to ensure that it tests the returned session ID
and does NOT proceed to call the PKCS11 C_* function if the returned 
sessionID was zero.  
Your right, your assert is correct. It might be easer to read if it read
(! (ckr == CKR_OK &&  rwsesson == CKR_INVALID_SESSION)) ) 

As for the other assert, that would be fine if we go through and change all the
callers of GetRWSession to abort on a bad session. Since I didn't see any
patches to that effect, I assumed the code was not changing.

This kind of change will have to be implemented carefully with the KeyPairGen
function, which currently knows way to much about the sideeffects of the
subfunctions it calls.
Blocks: 293319
Comment on attachment 182026 [details] [diff] [review]
ensure that lock is not held if returned session is 0

I have a newer patch, much more extensive, that detects possible zero session
handles in many more places and avoids sending them to the PKCS11 module.  I
will attach that patch here soon.
Attachment #182026 - Attachment is obsolete: true
Attachment #182026 - Flags: review?(rrelyea)
Unlike the previous patches, which were designated as only for debugging 
purposes, this patch is intended for real checkin.  Bob, please review.

This patch detects numerous previous-undetected situations that would
have caused zero session handles to be sent to a PKCS11 module, and 
prevents those invalid PKCS11 function calls from being made.  
With this patch in place, we see no more crashes with zero session 
handles being passed in to the PKCS11 module.  

This patch isn't as clean as it might be.  There are some stylistic
inconsistencies, and some error number inconsistencies, swo I won't be 
crushed if this fails review for those issues :).  We could use
some new error codes for these situations.  SEC_ERROR_BAD_DATA and
SEC_ERROR_LIBRARY_FAILURE aren't adequately descriptive, IMO.
Attachment #183351 - Flags: review?(rrelyea)
Comment on attachment 183351 [details] [diff] [review]
patch to eliminate zero session handles

r+ = relyea, don't let any religious arguments over the comments below keep you
from checking in the pack as is if you want.

the following comments are about code readability/style, not function.


/* XXX Perhaps this should be:
**   if (symKey->sessionOwner)	*/

change to

/* pk11_CloseSession handles the case where sessionOwner == PR_FALSE correctly
*/

+    if (session == CK_INVALID_SESSION) {
+	if (!isPerm)
+	    pk11_ExitKeyMonitor(symKey);
+	crv = CKR_SESSION_HANDLE_INVALID;
     } else {
-	pk11_ExitKeyMonitor(symKey);
+	crv = PK11_GETTAB(slot)->C_DeriveKey(session, &mechanism,
+	     baseKey->objectID, keyTemplate, templateCount, &symKey->objectID);
+	if (isPerm) {
+	    PK11_RestoreROSession(slot, session);
+	} else {
+	    pk11_ExitKeyMonitor(symKey);
+	}
     }

+    PORT_Assert(rwsession != CK_INVALID_SESSION || crv != CKR_OK);

The assert is more readable if we don't explicitly distribute the implied '!'
using boolean algrebra.... that is We're asserting that the function didn't
return CKR_OK and and invalid session at the same time, so it's more readable
to say:

PORT_Assert(!(crv == CKR_OK && rwsession == CKR_INVALID_SESSION));

The assert as written is functionally the same as the above assert, but
requires a lot more head scratching to believe it (or requires writting the
above assert and doing the Boolean algebra;).

bob
Attachment #183351 - Flags: review?(rrelyea) → review+
ARGU..... something got lost in the cut and paste, the second issue was...

    if (session == CK_INVALID_SESSION) {
+	if (!isPerm)
+	    pk11_ExitKeyMonitor(symKey);
+	crv = CKR_SESSION_HANDLE_INVALID;
     } else {
-	pk11_ExitKeyMonitor(symKey);
+	crv = PK11_GETTAB(slot)->C_DeriveKey(session, &mechanism,
+	     baseKey->objectID, keyTemplate, templateCount, &symKey->objectID);
+	if (isPerm) {
+	    PK11_RestoreROSession(slot, session);
+	} else {
+	    pk11_ExitKeyMonitor(symKey);
+	}
     }

Nesting is getting pretty deep here, good place for a 'goto done'.
QA Contact: bishakhabanerjee → jason.m.reid
Checking in pk11akey.c;  new revision: 1.4; previous revision: 1.3
Checking in pk11auth.c;  new revision: 1.3; previous revision: 1.2
Checking in pk11obj.c;   new revision: 1.7; previous revision: 1.6
Checking in pk11skey.c;  new revision: 1.102; previous revision: 1.101
Checking in pk11slot.c;  new revision: 1.86; previous revision: 1.85
Checking in pk11util.c;  new revision: 1.51; previous revision: 1.50

This was (is) a P1 bug, but I'm not going to try to fix it for 3.10.x now.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Target Milestone: 3.10.1 → 3.11
You need to log in before you can comment on or make changes to this bug.