Closed Bug 274538 Opened 16 years ago Closed 15 years ago
token private key is unwrapped with DES every time it's used in SSL server full handshakes, at a high cost
The softoken key database is encrypted, which means RSA private keys have to be decrypted before use . The key can only be decrypted after the token is logged in . In an SSL server, we have observed that the RSA key keeps getting decrypted each time the server performs a full handshake and thus an RSA private key operation. On the AMD64 Solaris platform, in an SSL full handshake test, the cost of unwrapping the RSA private key with DES is 21% of the CPU . In comparison, in the same test, the CPU time spent in the RSA routine is 40% ! We shouldn't be spending half the CPU time unwrapping the key as the time it takes to use the key . There needs to be a way to disable this behavior, and allow the RSA private key to be reused without further unwrapping as long as the token stays in the logged in state . Perhaps this is something we could toggle when softoken is in FIPS vs non-FIPS mode ? Does FIPS140 disallow us from keeping private keys in the clear in RAM after they are used, or is it only something we chose to do as an extra precaution ?
If you do this, we need to make this an option. There are well known algorithms for finding private keys in an address space (even if you don't know the layout of the address space). A well known hardware security vendor make a pretty big deal about how your private key could easily become vunerable if you had stack overflow attacks in your server, or in a server pluggin. That vendor was able to attack Apache and IIS, but our servers weren't vunerable to the attack because we kept the private key encrypted except when we were actually using it. Anyway if it's taking half as long to unwrap the key as it does to do the RSA operation, I'd say we need to look at what is going on in the unwrap. I suspect it's really taking that long because we are also reading the key back in from the database. bob
We aren't confident it's taking half the time to unwrap the key with DES as to do RSA anymore. There are some questions about dtrace reesults. It's still significant, though. In our benchmarks, we aren't hitting the disk to read the key material before unwrap. We are hitting the dbm cache, which I think is OK . Still, it would be much faster to keep the key unwrapped. Regarding programs that search the address space for private keys after exploiting stack overflows, I fully understand the vulnerability. However, it seems to me that, given that NSS is an open-source SSL implementation, you might just as well write a program that would look for both the DES key and the private key to unwrap it with ... And given knowledge of all the NSS internal structures, that exploit program might not be so complicated to write . The exploit program might just call NSS' own functions to do it ... I'll grant you that it will be harder to write than just looking for a private key alone, but I don't think there is anything we can truly do to hide it - if we want to be able to use the key, it has to be unwrapped at some point. So I think we need to have the ability to configure this .
Recent measurements done with the new PKCS#11-ized rsaperf have shown that RSA ops are taking up to 10% longer with token objects than session objects with NSS softoken. We have seen even higher differences with certain other hardware tokens. I think there is actually a fairly easy and clean fix here. In SSL_ConfigSecureServer, we can try to make a copy of the SECKEYPrivateKey token object to the same slot, but as a session object, using C_CopyObject . If this fails, we just use the object that's passed in, but if it works, we get a potentially large performance boost on full SSL handshakes .
I like the plan. I order to make it work, you'll need a new PKCS #11 wrapper function. Currently there are 2 wrapper functions in pk11wrap that use C_CopyObject: In pk11skey.c: PK11SymKey * PK11_ConvertSessionSymKeyToTokenSymKey(PK11SymKey *symKey, void *pw_arg); In pk11akey.c: PK11PrivKey * PK11_ConvertSessionPrivKeyToTokenPrivKey(PK11SymKey *symKey, void *pw_arg); in pk11obj.c: CK_OBJECT_HANDLE PK11_CopyKey(PK11Slot *slot, CK_OBJECT_HANDLE srcObject); There's currently no way to get a PrivKey from a handle. I believe PK11_CopyKey is used by the SECKEY_Copy* functions and not exported. The obvious thing would be to add PK11_ConvertTokenPrivKeyToSessionPrivKey() and PK11_ConvertTokenSymKeyToSessionPrivKey(), the question is do we need to change other attributes on copy as well? CKA_MODIFIABLE, CKA_PRIVATE, CKA_SENSITIVE/CKA_EXTRACTABLE (modification only one way is allowed on the last 2) are the only ones that I can think of. I think Technically only CKA_MODIFIABLE is the only attribute that needs a copy to change according to the spec, though tokens are free to restrict other attributes from changing. To go from session to token or token to session it's safest to copy. We currently have no other public pk11wrap calls which take a template, so I would like to avoid that solution if possible. bob
Bob, I don't see any reason why we would need to change any attribute. The less change, the better. BTW, I'm wondering if this approach will work for FIPS tokens, even though we aren't really concerned about performance in FIPS mode. Any reason why it wouldn't ? We aren't extracting the key here, just making a copy in the same slot. Adding PK11_ConvertTokenPrivKeyToSessionPrivKey is probably the way to go. But we would also need a new SECKEY function call to generate the new SECKEYPrivateKey object that SSL needs.
I'm just thinking of other reasons to copy a key. If we are going to extend the API, it's always good to look at want other things we are restricting, determine if the restrictions are arbitrary, and relax them if they are. For the function you want, the only attribute we need to change on the copy is CKA_TOKEN from true to false (the existing functions do the reverse). As for FIPS, it should work with our token. There is little reason to expect it to fail on other FIPS tokens since we aren't changing the EXTRACTABLE, PRIVATE or SENSITIVE attributes. Even if it doesn't work, you plan on falling back to using the Token private key. bob
Note: If Julian implements the token->session private key stuff this would no longer be applicable since only the token private key is unwrapped with DES each time. bob
oops, nevermind, that's what this bug is about.... bob
Bob, This is why I suggested in comment #3 that using a session private key was the fix for this bug. In retrospect, my suggestion would only fix the problem of using the private key in SSL server-side full handshake operations. It wouldn't help S/MIME signing/decryption, SSL client auth, OCSP response signing, or any other cases of using a token private key through pk11wrap . So, there may still be value in fixing softoken as I originally proposed, if we care about performance for these other cases. At this time, Sun only cares about the performance of the SSL server case. Therefore, I have changed the description to better reflect the narrow problem we want to solve. This bug isn't about fixing the other cases in pk11wrap or its users, or improving softoken's token key performance in general. Other bugs could be created later for that.
Summary: RSA private key is unwrapped with DES every time it's used, at a very high cost → token private key is unwrapped with DES every time it's used in SSL server full handshakes, at a high cost
Other than OCSP signing, those other cases are all low volume client side operations, which we would probably turn off the optimization anyway (on the client improving S/MIME signing and decryption performance by 10% isn't worth the loss of key protection. It's harder to nail down the client like we can the servers, and it's not likely we will be trying to do more than a handfull of RSA private key operations a second on the client. An OCSP signer could use your same trick to get it's performance win. bob
This patch is not for the trunk. It needs a little clean-up for the trunk, particularly in nss.def :) It's short. We've been using it for a while with a very noticable speed up. There's a related patch to libSSL that I will attach to this bug shortly. Bob, please have a look and tell me if you see anything that looks wrong or should be done differently. I've already noticed that this patch doesn't check for a zero slot->session handle.
Assignee: saul.edwards.bugs → nelson
Status: NEW → ASSIGNED
This patch is not ready for the trunk. In particular, it needs work near the comment: + /* should call pk11_mapSignKeyType(key->keyType) here XXX */ + CK_MECHANISM_TYPE keyMech = CKM_RSA_PKCS; + /* Maybe should be bestSlotMultiple? */ + PK11SlotInfo * bestSlot = PK11_GetBestSlot(keyMech, NULL /* wincx */); But in our performance hack tests, which assume only RSA priv keys, it works, with significant benefit.
retarget to 3.11
Target Milestone: 3.10 → 3.11
1st patch... looks good to me. Be aware that the patch as is may not actuall copy the private key to the target slot, but that's keeping with a lot of what's happenning under the covers in the rest of pk11 wrap. (The failure conditions would be if the key in the source token is not extractable). 2nd patch.. also looks good. You want the target slot to be the one where you'll get the best performance win. It seems that CKM_RSA_PKCS (what you have coded) is the right place, since the biggest performance cost is doing the unwrap. the PK11_Unwrap call will arrange for the unwrapped key to arrive in an appropriate slot.
Bob, Actually, in this case, you don't want to move the private key to a different slot at all . Doing so will result in unwanted behavior for server configurations. A single web server process with 3 virtual can be configured with 3 SSL listen sockets each using keys on different tokens. Eg. port 2000 using Accelerator1:Server-Cert and port 4000 using Accelerator2:Server-Cert . This is an actual use case with the already-released web server 6.x . Each accelerator may have a limited capacity (eg. 1000 RSA ops/s). But there can only one "best" slot per mechanism . NSS doesn't know about each slot's performance. If you always move the token key to a session key in the "best" slot, then suddenly one of the accelerators will go unused, and performance will be cut in half .
I meant to write "2 virtual servers with 2 SSL listen sockets".
Here is a simpler, even worse use case that would be broken by trying to move the server's private key : Someone has a hardware accelerator, but did not select the hardware module as default for RSA in secmod.db . The existing server works because the server uses a token key in the accelerator slot - the key for "Accelerator:Server-Cert" . With the change to create a session key in the "best slot", the RSA private key would be moved to the softoken slot, and the accelerator will no longer be used at all.
In reply to comment 17: That case is just a configuration error, and the fix would be to fix the configuration. But I agree that we do not always want to move keys to the one "best" slot. secmod.db allows multiple slots to be "default" for a mechanism. Perhaps the code should attempt to ask if the key is already in a "default" slot for the relevant mechanism, and leave it there if so. I don't know what API can be used to ask that question though. Bob?
NSS really only supports one 'default' slot. The code was set up so there could be a priority, but in was never really used. Most configurations have one 'slot' it favors for doing any particular operation. NSS will use either the softoken or that slot. We can revisit this, but I haven't heard large clammerings that the current stuff is to restrictive (I've been expecting the clammerings for years and they haven't materialized). It seems no one really tries to connect more than one accellerator to a particular system at once. There is a way to verify that the slot can 'do' the requested mechansim (PK11_DoesMechanism). Whenever the wrappers are trying to figure out if the current slot is 'sufficient' it uses PK11_DoesMechanism. Anyway, I thought that your code was meant to handle the case where the key is stored in softoken, but dropped into a hardware accellerator. The interactions can be tricky, here are the scenarios: --------- Cases where the 'obvious' is happenning -------------------- 1) The key is stored in the Softoken and there are no PKCS #11 modules set as a default RSA device. The key will obviously stay in the Softoken (PK11_GetBestSlot will return the softoken). 2) The key is in a PKCS #11 accellerator. The accellerator is listed as the default RSA device. The key will stay in the accellerator (PK11_GetBestSlot will return the accellerator). 3) The key is in the Softoken, and there is a PKCS #11 accellerator which is listed as the default RSA device. The accellerator can accept RSA keys. The key in the Softoken is 'not sensitive'. The key will be moved to the accellerator. (PK11_GetBestSlot will be the accellerator). ------------ Case that might not be what you want -------------- 4) The key is in the Softoken, and there is a PKCS #11 accellerator with is listed as the default RSA device. The accellerator can accept RSA keys. The key in the Softoken is 'sensitive'. The key will not be moved, but stay in the softoken. ----------- Causes that are probably right, but might be surprising ------- 5) The key is in a PKCS #11 module and it is not listed as the default RSA device. The key is not 'sensitive'. The key will be moved to the softoken (or the default RSA token if it exists). (may not be obvious to the user that this will happen. 6) The key is in a PKCS #11 module and it is not listed as the default RSA device. The key is 'senstive'. The key will stay in the token. ---------------------------------- In all these cases the key only moves if it is in 3 and 5. In both these cases the keys must not be sensitive (a pretty hard trick, actually). If you never want the key to move, then the best bet is to remove the destSlot code altogether. If you do want to handle storing the key in softoken and loading it into an accellerator, then you will probably want to make case 4 work as well. (Since it is difficult, if not impossible, to get a token private key in the softoken that is not sensitive). In that case you would want to use the PubWrap and Unwrap code (don't worry about the token supporting unwrap, NSS handles that case already). In anycase, the pk11_LoadPrivateKey has pretty limitted use and should either go away or change (it really only works with non-sensitive keys, which in the wild are typically only intermediate session keys NSS uses to get one token to unwrap or another). If you go the PubWrap/Unwrap route, then case 3 becauses 'if the token can accept keys' and case 5 because 'if the key is extractable'. bob
Bob, I was testing multiple accelerators in a single server back 4 years ago, the way I described it in comment #15 . It works fine but depends on manual configuration of the server - if NSS tries to move the private key, then it breaks. I think the current behavior of using the same slot must be preserved for servers. I don't think we can tell our customers the configuration they have been using that worked for years suddenly becomes invalid, both for the case in comment #15 and in comment #17 . The fix should be merely to copy the token key to a session key in the same slot, because doing that is known to benefit softoken and some other accelerators possibly as well. This fix isn't meant to handle the case where the key is stored in softoken and moved to an accelerator. We currently don't envision a need for this. We may have a need to extract private key from a PKCS#11 token, but only for the case where we want to bypass PKCS#11 altogether for the RSA operation and do it all direct with freebl from libssl - not to move it to an accelerator slot. This isn't expected to be a big performance win, and we already know how we will do it.
QA Contact: bishakhabanerjee → jason.m.reid
Comment on attachment 183353 [details] [diff] [review] patch for HACKS branch, add function to copy token priv key to session priv key This feature has been implemented on the performance hacks branch. The implementation is rather different from this patch. Rather than marking this patch r+ or r-, I will simply remove the review request.
Comment on attachment 183353 [details] [diff] [review] patch for HACKS branch, add function to copy token priv key to session priv key This patch modifies 3 files. Please review the patch to the two files in pk11wrap, and ignore the part of the patch that applies to nss/nss.def. >Index: pk11wrap/pk11akey.c (review this) >Index: pk11wrap/pk11pub.h (review this) >Index: nss/nss.def (ignore this)
Attachment #183353 - Flags: review?(rrelyea)
Comment on attachment 183353 [details] [diff] [review] patch for HACKS branch, add function to copy token priv key to session priv key r+. Note that pk11_loadPrivKey is pretty brittle, but moving private keys between tokens is brittle anyway. I'm not sure how much benefit there is to trying to improve the situation (wrap the private key out and wrap it back into the new token, for instance), so the patch get's an r+. bob
Attachment #183353 - Flags: review?(rrelyea) → review+
Checked in implementation of PK11_CopyTokenPrivKeyToSessionPrivKey Checking in nss/nss.def; new revision: 1.154; previous revision: 1.153 Checking in pk11wrap/pk11akey.c; new revision: 1.5; previous revision: 1.4 Checking in pk11wrap/pk11pub.h; new revision: 1.9; previous revision: 1.8
Marking resolved/fixed. The patches for this work were checked in as a part of the SSL performance work.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.