Closed Bug 274538 Opened 16 years ago Closed 15 years ago

token private key is unwrapped with DES every time it's used in SSL server full handshakes, at a high cost


(NSS :: Libraries, defect, P2)



(Not tracked)



(Reporter: julien.pierre, Assigned: nelson)




(2 files)

The softoken key database is encrypted, which means RSA private keys have to be
decrypted before use . The key can only be decrypted after the token is logged in .

In an SSL server, we have observed that the RSA key keeps getting decrypted each
time the server performs a full handshake and thus an RSA private key operation.

On the AMD64 Solaris platform, in an SSL full handshake test, the cost of
unwrapping the RSA private key with DES is 21% of the CPU . In comparison, in
the same test, the CPU time spent in the RSA routine is 40% !

We shouldn't be spending half the CPU time unwrapping the key as the time it
takes to use the key .

There needs to be a way to disable this behavior, and allow the RSA private key
to be reused without further unwrapping as long as the token stays in the logged
in state .

Perhaps this is something we could toggle when softoken is in FIPS vs non-FIPS
mode ? Does FIPS140 disallow us from keeping private keys in the clear in RAM
after they are used, or is it only something we chose to do as an extra precaution ?
Priority: -- → P2
Target Milestone: --- → 3.10
If you do this, we need to make this an option. There are well known algorithms
for finding private keys in an address space (even if you don't know the layout
of the address space).

A well known hardware security vendor make a pretty big deal about how your
private key could easily become vunerable if you had stack overflow attacks in
your server, or in a server pluggin. That vendor was able to attack Apache and
IIS, but our servers weren't vunerable to the attack because we kept the private
key encrypted except when we were actually using it.

Anyway if it's taking half as long to unwrap the key as it does to do the RSA
operation, I'd say we need to look at what is going on in the unwrap. I suspect
it's really taking that long because we are also reading the key back in from
the database.

We aren't confident it's taking half the time to unwrap the key with DES as to
do RSA anymore. There are some questions about dtrace reesults. It's still
significant, though.

In our benchmarks, we aren't hitting the disk to read the key material before
unwrap. We are hitting the dbm cache, which I think is OK . Still, it would be
much faster to keep the key unwrapped.

Regarding programs that search the address space for private keys after
exploiting stack overflows, I fully understand the vulnerability. However, it
seems to me that, given that NSS is an open-source SSL implementation, you might
 just as well write a program that would look for both the DES key and the
private key to unwrap it with ... And given knowledge of all the NSS internal
structures, that exploit program might not be so complicated to write . The
exploit program might just call NSS' own functions to do it ...

I'll grant you that it will be harder to write than just looking for a private
key alone, but I don't think there is anything we can truly do to hide it - if
we want to be able to use the key, it has to be unwrapped at some point. So I
think we need to have the ability to configure this .
Recent measurements done with the new PKCS#11-ized rsaperf have shown that RSA
ops are taking up to 10% longer with token objects than session objects with NSS
softoken. We have seen even higher differences with certain other hardware tokens.

I think there is actually a fairly easy and clean fix here. In
SSL_ConfigSecureServer, we can try to make a copy of the SECKEYPrivateKey token
object to the same slot, but as a session object, using C_CopyObject . If this
fails, we just use the object that's passed in, but if it works, we get a
potentially large performance boost on full SSL handshakes .
I like the plan. I order to make it work, you'll need a new PKCS #11 wrapper

Currently there are 2 wrapper functions in pk11wrap that use C_CopyObject:

In pk11skey.c:
PK11SymKey *
PK11_ConvertSessionSymKeyToTokenSymKey(PK11SymKey *symKey, void *pw_arg);
In pk11akey.c:
PK11PrivKey *
PK11_ConvertSessionPrivKeyToTokenPrivKey(PK11SymKey *symKey, void *pw_arg);
in pk11obj.c:
PK11_CopyKey(PK11Slot *slot, CK_OBJECT_HANDLE srcObject);

There's currently no way to get a PrivKey from a handle. I believe PK11_CopyKey
is used by the SECKEY_Copy* functions and not exported.

The obvious thing would be to add PK11_ConvertTokenPrivKeyToSessionPrivKey() and
PK11_ConvertTokenSymKeyToSessionPrivKey(), the question is do we need to change
other attributes on copy as well? CKA_MODIFIABLE, CKA_PRIVATE,
CKA_SENSITIVE/CKA_EXTRACTABLE (modification only one way is allowed on the last
2) are the only ones that I can think of. I think Technically only
CKA_MODIFIABLE is the only attribute that needs a copy to change according to
the spec, though tokens are free to restrict other attributes from changing. To
go from session to token or token to session it's safest to copy.

We currently have no other public pk11wrap calls which take a template, so I
would like to avoid that solution if possible.


I don't see any reason why we would need to change any attribute. The less
change, the better.

BTW, I'm wondering if this approach will work for FIPS tokens, even though we
aren't really concerned about performance in FIPS mode. Any reason why it
wouldn't ? We aren't extracting the key here, just making a copy in the same slot.

Adding PK11_ConvertTokenPrivKeyToSessionPrivKey is probably the way to go. But
we would also need a new SECKEY function call to generate the new
SECKEYPrivateKey object that SSL needs.
I'm just thinking of other reasons to copy a key. If we are going to extend the
API, it's always good to look at want other things we are restricting, determine
if the restrictions are arbitrary, and relax them if they are.

For the function you want, the only attribute we need to change on the copy is
CKA_TOKEN from true to false (the existing functions do the reverse).

As for FIPS, it should work with our token. There is little reason to expect it
to fail on other FIPS tokens since we aren't changing the EXTRACTABLE, PRIVATE
or SENSITIVE attributes. Even if it doesn't work, you plan on falling back to
using the Token private key.

Assignee: wtchang → saul.edwards.bugs
Note: If Julian implements the token->session private key stuff this would no
longer be applicable since only the token private key is unwrapped with DES each

oops, nevermind, that's what this bug is about....


This is why I suggested in comment #3 that using a session private key was the
fix for this bug. In retrospect, my suggestion would only fix the problem of
using the private key in SSL server-side full handshake operations. It wouldn't
help S/MIME signing/decryption, SSL client auth, OCSP response signing, or any
other cases of using a token private key through pk11wrap . So, there may still
be value in fixing softoken as I originally proposed, if we care about
performance for these other cases. At this time, Sun only cares about the
performance of the SSL server case. Therefore, I have changed the description to
better reflect the narrow problem we want to solve.

This bug isn't about fixing the other cases in pk11wrap or its users, or
improving softoken's token key performance in general. Other bugs could be
created later for that.
Summary: RSA private key is unwrapped with DES every time it's used, at a very high cost → token private key is unwrapped with DES every time it's used in SSL server full handshakes, at a high cost
Other than OCSP signing, those other cases are all low volume client side
operations, which we would probably turn off the optimization anyway (on the
client improving S/MIME signing and decryption performance by 10% isn't worth
the loss of key protection. It's harder to nail down the client like we can the
servers, and it's not likely we will be trying to do more than a handfull of RSA
private key operations a second on the client.

An OCSP signer could use your same trick to get it's performance win.

Depends on: 289530
This patch is not for the trunk.  It needs a little clean-up for the trunk,
particularly in nss.def :)   It's short.  

We've been using it for a while with a very noticable speed up.  
There's a related patch to libSSL that I will attach to this bug shortly.  

Bob, please have a look and tell me if you see anything that looks wrong or
should be done differently.  I've already noticed that this patch doesn't
check for a zero slot->session handle.
Assignee: saul.edwards.bugs → nelson
Attachment #183353 - Flags: review?
This patch is not ready for the trunk.	In particular, it needs work 
near the comment:
+	/* should call pk11_mapSignKeyType(key->keyType) here XXX */
+	/* Maybe should be bestSlotMultiple? */
+	PK11SlotInfo * bestSlot = PK11_GetBestSlot(keyMech, NULL /* wincx */);

But in our performance hack tests, which assume only RSA priv keys, it works,
with significant benefit.
retarget to 3.11
Target Milestone: 3.10 → 3.11
1st patch... looks good to me. Be aware that the patch as is may not actuall
copy the private key to the target slot, but that's keeping with a lot of what's
happenning under the covers in the rest of pk11 wrap. (The failure conditions
would be if the key in the source token is not extractable).

2nd patch.. also looks good. You want the target slot to be the one where you'll
get the best performance win. It seems that CKM_RSA_PKCS (what you have coded)
is the right place, since the biggest performance cost is doing the unwrap. the
PK11_Unwrap call will arrange for the unwrapped key to arrive in an appropriate

Actually, in this case, you don't want to move the private key to a different
slot at all . Doing so will result in unwanted behavior for server configurations.

A single web server process with 3 virtual can be configured with 3 SSL listen
sockets each using keys on different tokens. Eg. port 2000 using
Accelerator1:Server-Cert and port 4000 using Accelerator2:Server-Cert . This is
an actual use case with the already-released web server 6.x .

Each accelerator may have a limited capacity (eg. 1000 RSA ops/s).
But there can only one "best" slot per mechanism . NSS doesn't know about each
slot's performance. If you always move the token key to a session key in the
"best" slot, then suddenly one of the accelerators will go unused, and
performance will be cut in half .
I meant to write "2 virtual servers with 2 SSL listen sockets".
Here is a simpler, even worse use case that would be broken by trying to move
the server's private key :

Someone has a hardware accelerator, but did not select the hardware module as
default for RSA in secmod.db .

The existing server works because the server uses a token key in the accelerator
slot - the key for "Accelerator:Server-Cert" .

With the change to create a session key in the "best slot", the RSA private key
would be moved to the softoken slot, and the accelerator will no longer be used
at all.
In reply to comment 17:
That case is just a configuration error, and the fix would be to fix the 

But I agree that we do not always want to move keys to the one "best" slot.
secmod.db allows multiple slots to be "default" for a mechanism.  Perhaps
the code should attempt to ask if the key is already in a "default" slot 
for the relevant mechanism, and leave it there if so.  I don't know what 
API can be used to ask that question though.  Bob?
NSS really only supports one 'default' slot. The code was set up so there could
be a priority, but in was never really used. Most configurations have one 'slot'
it favors for doing any particular operation. NSS will use either the softoken
or that slot. We can revisit this, but I haven't heard large clammerings that
the current stuff is to restrictive (I've been expecting the clammerings for
years and they haven't materialized). It seems no one really tries to connect
more than one accellerator to a particular system at once.

There is a way to verify that the slot can 'do' the requested mechansim
(PK11_DoesMechanism). Whenever the wrappers are trying to figure out if the
current slot is 'sufficient' it uses PK11_DoesMechanism.

Anyway, I thought that your code was meant to handle the case where the key is
stored in softoken, but dropped into a hardware accellerator.

The interactions can be tricky, here are the scenarios:

--------- Cases where the 'obvious' is happenning --------------------

1) The key is stored in the Softoken and there are no PKCS #11 modules set as a
default RSA device. The key will obviously stay in the Softoken
(PK11_GetBestSlot will return the softoken).

2) The key is in a PKCS #11 accellerator. The accellerator is listed as the
default RSA device. The key will stay in the accellerator (PK11_GetBestSlot will
return the accellerator).

3) The key is in the Softoken, and there is a PKCS #11 accellerator which is
listed as the default RSA device. The accellerator can accept RSA keys. The key
in the Softoken is 'not sensitive'. The key will be moved to the accellerator.
(PK11_GetBestSlot will be the accellerator).

------------ Case that might not be what you want --------------

4) The key is in the Softoken, and there is a PKCS #11 accellerator with is
listed as the default RSA device. The accellerator can accept RSA keys. The key
in the Softoken is 'sensitive'. The key will not be moved, but stay in the softoken.

----------- Causes that are probably right, but might be surprising -------

5) The key is in a PKCS #11 module and it is not listed as the default RSA
device. The key is not 'sensitive'. The key will be moved to the softoken (or
the default RSA token if it exists). (may not be obvious to the user that this
will happen.

6) The key is in a PKCS #11 module and it is not listed as the default RSA
device. The key is 'senstive'. The key will stay in the token.


In all these cases the key only moves if it is in 3 and 5. In both these cases
the keys must not be sensitive (a pretty hard trick, actually). If you never
want the key to move, then the best bet is to remove the destSlot code altogether.

If you do want to handle storing the key in softoken and loading it into an
accellerator, then you will probably want to make case 4 work as well. (Since it
is difficult, if not impossible, to get a token private key in the softoken that
is not sensitive). In that case you would want to use the PubWrap and Unwrap
code (don't worry about the token supporting unwrap, NSS handles that case
already). In anycase, the pk11_LoadPrivateKey has pretty limitted use and should
either go away or change (it really only works with non-sensitive keys, which in
the wild are typically only intermediate session keys NSS uses to get one token
to unwrap or another).

If you go the PubWrap/Unwrap route, then case 3 becauses 'if the token can
accept keys' and case 5 because 'if the key is extractable'.


I was testing multiple accelerators in a single server back 4 years ago, the way
I described it in comment #15 . It works fine but depends on manual
configuration of the server - if NSS tries to move the private key, then it breaks.

I think the current behavior of using the same slot must be preserved for
servers. I don't think we can tell our customers the configuration they have
been using that worked for years suddenly becomes invalid, both for the case in
comment #15 and in comment #17 .

The fix should be merely to copy the token key to a session key in the same
slot, because doing that is known to benefit softoken and some other
accelerators possibly as well.

This fix isn't meant to handle the case where the key is stored in softoken and
moved to an accelerator. We currently don't envision a need for this.

We may have a need to extract private key from a PKCS#11 token, but only for the
case where we want to bypass PKCS#11 altogether for the RSA operation and do it
all direct with freebl from libssl - not to move it to an accelerator slot. This
isn't expected to be a big performance win, and we already know how we will do it.
QA Contact: bishakhabanerjee → jason.m.reid
Comment on attachment 183353 [details] [diff] [review]
patch for HACKS branch, add function to copy token priv key to session priv key

This feature has been implemented on the performance hacks branch.  The
implementation is rather different from this patch.  Rather than marking this
patch r+ or r-, I will simply remove the review request.
Attachment #183353 - Flags: review?
Comment on attachment 183353 [details] [diff] [review]
patch for HACKS branch, add function to copy token priv key to session priv key

This patch modifies 3 files.  Please review the patch to the two files in
and ignore the part of the patch that applies to nss/nss.def.

>Index: pk11wrap/pk11akey.c      (review this)
>Index: pk11wrap/pk11pub.h       (review this)
>Index: nss/nss.def              (ignore this)
Attachment #183353 - Flags: review?(rrelyea)
Comment on attachment 183353 [details] [diff] [review]
patch for HACKS branch, add function to copy token priv key to session priv key

r+. Note that pk11_loadPrivKey is pretty brittle, but moving private keys
between tokens is brittle anyway. I'm not sure how much benefit there is to
trying to improve the situation (wrap the private key out and wrap it back into
the new token, for instance), so the patch get's an r+.

Attachment #183353 - Flags: review?(rrelyea) → review+
Checked in implementation of PK11_CopyTokenPrivKeyToSessionPrivKey
Checking in nss/nss.def;         new revision: 1.154; previous revision: 1.153
Checking in pk11wrap/pk11akey.c; new revision: 1.5; previous revision: 1.4
Checking in pk11wrap/pk11pub.h;  new revision: 1.9; previous revision: 1.8
Marking resolved/fixed.  The patches for this work were checked in as a
part of the SSL performance work.
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.