Closed Bug 196353 Opened 22 years ago Closed 20 years ago

NSS deletes PKCS#11 session objects after session was closed

Categories

(NSS :: Libraries, defect, P1)

3.3.2
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: bugz, Assigned: julien.pierre)

References

Details

(Reporting for David Jacobson) The webserver, which accesses cryptographic services via NSS, sometimes makes a call to C_DestroyObject after the designated object has already been implicitly destroyed by C_CloseSession. (The object's CKA_TOKEN attribute is false, i.e. it is a session object. PKCS#11 requires that session objects be automatically deleted when the session that created them is closed. Thus the C_CloseSession is the first deletion, and the explicit C_DestroyObject is a second.) With luck, the C_DestroyObject fails with CKR_OBJECT_HANDLE_INVALID. But sometimes in the intervening time some other thread has done one of the oprations that generates a key, and that key gets assigned the same handle as the deleted key had. Then then erroneous C_DestroyObject destroys the other thread's object. We have truss output that shows the bug. But first, we use an empty function, debug_trace, to show things we need to that are not normally visible via truss. The first parameter is a tag, and the second is a value. The tags we use are 1 session handle created by C_OpenSession 2 object handle found by C_FindObjects 3 object handle created by C_CreateObject 4 object handle created by C_CopyObject 5 object handle created by C_GenerateKey 6 object handle for public key created by C_GenerateKeyPair 7 object handle for private key created by C_GenerateKeyPair 8 object handle created by C_UnwrapKey 9 object handle returned directly by C_DeriveKey (often 0) a object handle for client mac secret generated by C_DeriveKey b object handle for server mac secret generated by C_DeriveKey c object handle for client key generated by C_DeriveKey d object handle for server key generated by C_DeriveKey e object handle deleted by C_CloseSession Here are selected lines from the truss output, with line numbers of the original truss output down the left side. The lines are too long and have wraped. 3029 6755/19@19: -> libvpkcs11:C_OpenSession(0x1, 0x4, 0x56b7e0, 0xfe99aaa0) 3034 6755/19@19: -> libvpkcs11:debug_trace(0x1, 0x8b, 0x4, 0x56b7e0) 3036 6755/19@19: <- libvpkcs11:C_OpenSession() = 0 3037 6755/19@19: -> libvpkcs11:C_DeriveKey(0x8b, 0xfb04f734, 0x54, 0xfb04f740) 5700 6755/19@19: -> libvpkcs11:debug_trace(0xa, 0x55, 0x53, 0x0) 5710 6755/19@19: -> libvpkcs11:debug_trace(0xb, 0x56, 0x53, 0x0) 5714 6755/19@19: -> libvpkcs11:debug_trace(0xc, 0x5a, 0x53, 0x0) 5717 6755/19@19: -> libvpkcs11:debug_trace(0xd, 0x5b, 0x53, 0x0) 6037 6755/19@19: -> libvpkcs11:debug_trace(0x9, 0x0, 0x53, 0x1260388) 6043 6755/19@19: <- libvpkcs11:C_DeriveKey() = 0 62452 6755/36@36: -> libvpkcs11:C_CloseSession(0x8b, 0x56b7e0, 0x2e, 0xf8eff740) 62463 6755/36@36: -> libvpkcs11:debug_trace(0xe, 0x56, 0xff38f6ac, 0x73f00) 62470 6755/36@36: -> libvpkcs11:debug_trace(0xe, 0x5b, 0xff38f6ac, 0x73e58) 62916 6755/20@20: -> libvpkcs11:C_DestroyObject(0x8b, 0x56, 0x0, 0x0) 62921 6755/20@20: <- libvpkcs11:C_DestroyObject() = 130 Note that the C_DeriveKey launched in line 3037 creates a server mac secret with handle 0x56, traced in line 5710. In line 62452 C_CloseSession is called, which destroys this object, as traced in line 62463. Then in line 62916 there is a call to C_DestroyObject listing this object. But this is an invalid object handle! The return code of 130 is CKR_OBJECT_HANDLE_INVALID.
Here was my reply: Unfortunately, your scheme for generating object handles is going to cause all kinds of problems for NSS. I agree that NSS should be a better PKCS#11 citizen, and recognize when a session is being closed for which NSS still has object handles from. But the complexity of including that kind of logic in NSS today would be daunting. The SSL code does not synchronize objects with sessions (well, it does to some degree, but as can see, there are cases where it is possible for one thread to close the session before another thread destroys the object). The story I have heard in the past is that tokens should make an effort to ensure that object handles are not reused in a short timeframe. That is, once a specific object handle is released, it should not appear again for a long time (depending on how you define "long" time). NSS uses a monotonically increasing counter, so object handles are not reused until 2**28 objects have been created (it's 28 because IIRC the high-order 4 bits are used for some kind of "header" information). This avoids the kind of problem you are seeing. In your case, what is the size of your table? If it is indexed by a short, you would still have 16 bits left over to play with in the object handle. You could use a counter in those 16 bits, and simply mask the handle before indexing the table. Is that a possibility for you?
Bob, is my intrepretation correct? What could we do about changing NSS? It's my suspicion it would be a fairly involved change...
In case it's not clear, the token in question is not the softoken. This token generates object handles from an index, and the bug results from the same index being used for a recently destroyed object and a new object.
I think your evaluation is correct. NSS should attempt to keep track of the session the object was created with. We should avoid closing the session before we delete the object (most cases with SymKeys, I thought that that was the case, but perhaps with session sharing, this doesn't always happen as expected). As far as cycling through handles, it's pretty important that old handles not be reused quickly. Often it's not possible to know that the handle has been changed on the fly. This is particularly true of session handles (more so than object handles). Anyway we should see what is causing the sessions to be destroyed before the C_DestroyObject call in NSS. We should at least try to keep that one safe. bob
I've tried to reproduce this with the tip, 3.3.5 beta, and 3.3.2, but have been unable to do so. David, can you give me more information on your test setup? Also, is it possible for you to send me a complete log (like the excerpt on this bug) showing the failure?
P2, at least.
Priority: -- → P2
Bumping up priority and taking bug .
Assignee: bugz → julien.pierre.bugs
Priority: P2 → P1
Hardware: PC → All
Target Milestone: --- → Future
Target Milestone: Future → 3.9.5
Mass reassign of 3.9.5 target bugs to 3.9.6 .
Target Milestone: 3.9.5 → 3.9.6
Target Milestone: 3.9.6 → 3.10
Ah, this is it. making this bug depend on bug 283690, which describes a possible mechanism which triggers this problem.
Depends on: 283690
Since bug 283690 was fixed, I haven't seen any recurrences of this problem with SSL testing or with all.sh. I have been running with a softoken that had assertions in it in every place that returned CKR_OBJECT_HANDLE_INVALID or CKR_SESSION_HANDLE_INVALID and haven't hit any of them. SO, I propose that we mark this bug fixed. Any objections?
I think we should rerun our test case with the version of Solaris softoken that could reproduce this problem (pre-FCS solaris 10 build) before we close this bug.
*** Bug 257604 has been marked as a duplicate of this bug. ***
(In reply to comment #11) > I think we should rerun our test case with the version of Solaris softoken that > could reproduce this problem (pre-FCS solaris 10 build) before we close this > bug. OK, who can do that? I don't see enough info in this bug to know how to reproduce it.
QA Contact: bishakhabanerjee → jason.m.reid
I spent a little bit of time last week trying with old bits of solaris softoken and NSS 3.9.5 (before this fix), but didn't get it to crash. It used to crash within minutes, so I must have done something wrong. I will try again later today. I can only verify the 3.10 fix if the problem is reproduced with 3.9 first. Perhaps we should mark this bug fixed, and mark it VERIFIED later when the test is completed.
I was able to reproduce the bad handle problem with 3.9.5 about 5 times in a half hour period . But with 3.10, it did not occur in a run that lasted for a whole 2 hour meeting. Marking FIXED.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Making the description of this bug match the observed behavior
Summary: NSS double-deletes PKCS#11 objects → NSS deletes PKCS#11 session objects after session was closed
You need to log in before you can comment on or make changes to this bug.