Open Bug 285538 Opened 20 years ago Updated 2 years ago

PK11_GetBestSlot called repeatedly in SSL server code path

Categories

(NSS :: Libraries, defect, P3)

3.9.5

Tracking

(Not tracked)

People

(Reporter: julien.pierre, Unassigned)

Details

During a full SSL handshake, PK11_GetBestSlot gets called 7 times, in
PK11_GenerateRandom and PK11_CreateDigestContext, and pk11_HandUnwrap .
These functions have no way of specifying a slot. We need new APIs that take one.

Also, I think it should be possible for the SSL library to figure out the slot
to use once, and then cache it . We should not try to figure out the slot again
for every connection . PK11_GetBestSlot ends up acquiring locks on the global
pk11wrap slot list .

I believe the proper place to cache this information would be in the server's
model socket or listen socket . SSL_ConfigSecureServer could do it if a new SSL
option is set. This would still allow servers to change their configuration on
the fly if they want, but the server application would be in control of when
that happens (ie. when they create a new model or listen socket).
Julien,  

Each of those calls asks a different question, asking about different 
mechanisms.  In an environment with hardware accelerators or HSMs,
some of those calls will return different slots than others.  

In fact, even in the softken-only case, I believe that some of those
calls now return the "generic crypto" slot while others return the 
"DB slot".  

The answers to those questions could also be different from ciphersuite
to ciphersuite.  Imagine having an accelerator that accelerated (say) 
AES but not RC4, so the call that asks "in what slot should I do the
symmetric encryption" may not be the same info for all sockets in the 
server, unless the server only enables one ciphersuite.  

So, except in the special circumstance where there is only one slot
available (presently true only in FIPS mode), one cannot cache a 
single slot as the answer to all calls to GetBestSlot and 
GetBestSlotMultiple.  

Likewise, except in the special case of a server (or client) that has 
only one ciphersuite enabled, one cannot cache the answer to each of 
the 7 calls in some SSL structure, and then reuse those 7 answers for
all subsequent sockets.

How much time is being spent in those GetBestSlot(Multiple) calls?  
My guess is that the time is spent contending on the locks that 
protect the lists of slots for each mechanism, yes?

Here's are some alternative suggestions.  

In a server (where the slots and tokens are generally fixed for the life 
of the process), the answers from GetBestSlot and GetBestSlotMultiple will 
always be the same for each combination of arguments.  That is, the answer 
to the question "what slot best does DES?" will always be the same, and 
the answer to the question "what slot best does MasterSecret derivation and 
also derivation of the encryption and MAC keys?" will always be the same.
 
So, in a server environment, getBestSlotMultiple could remember the query 
(e.g. the combination of those two mechanisms) and the answer, and return
that same answer more quickly in subsequent calls.  

Also, in the server environment, we could eliminate the locks that protect 
the lists of slots for each mechanism, since those lists will not change
while the server is running.
A related point about "best slots":

I think for servers there might be efficiency to be gained by changing 
softoken to appear to be only one slot, the DB slot, as it does in FIPS mode.  
I do not suggest using FIPS mode, but rather creating a new "single slot" 
mode.  This would eliminate moving any keys from the DB slot to the 
"generic crypto" slot, as it appears is now being done in normal server operations. 

On the other hand, the "single slot mode" idea might hurt performance by
making the slot lock in that single slot hotter than the two slot locks 
are now.  IINM, presently, some threads are contending for the slot lock
of one slot while others are contend for the other slot's slot lock, and 
this divides the contention on slot locks rougly in half (I think).  
Reducing to a single slot (and a single slot lock) might make matters worse.

Maybe this should be a separate bug?  
But it's only an idea for investigation.  

There are perhaps several ways to investigate this.  

Using modutil to mark the DB slot as the "default" slot for all the 
mechanisms might be enough to make the server run in "single slot mode".
That hypothesis could be tested.

It may also be necessary to change  NSC_GetMechanismList to return the 
same list for the DB slot that it returns for the generic crypto slot.  
That's a trivial change.
Nelson,

Since we cannot cache a single slot, perhaps the answer would be to cache a set
of slots for all the possible ciphers. This would be checked once, when the SSL
server is configured (eg. SSL_ConfigureSecureServer) and stored along with the
rest of the configuration in the model / listen socket .
That cache would be something like a two dimensional array of 7 x N slots,
where 7 is the number of answers needed per connection and N is the number
of possible ciphersuites.  
As long as it's one time overhead to compute it, and stored only once, I think 7
X N is OK . Of course, that requires the other changes we have been discussing
about not duplicating all the data from the model to the connection socket ...
The big thing to be careful of:
The slot used for one operation may not be appropriate for another operation.

SSL should probably keep a cache of appropriate slots. It certainly should keep
a preferred a RandomSlot and DigestSlot, and new versions of PK11_GenerateRandom
(probably PK11_GenerateRandomOnSlot) and PK11_CreateDigestContext
(PK11_CreateDigestContextOnSlot) should be created.

The next question would be how to handle pk11_HandUnwrap(). The issue here is
HandWrap is called as a side effect of PK11_PubUnwrap (and PK11_Unwrap*). If we
are only interested in the case where we are always in the softoken, the call to
GetBestSlot() in pk11_HandUnwrap would go away if we set the softoken up to
handle the same algorithms as the crypto token. (In fact that should ditch the
call to pk11_HandUnwrap altogether). That is an easy experiment, just change all
the PR_FALSEs in mechanisms[] to PR_TRUEs (softoken/pkcs11.c). As an
alternative, we can test the softoken in FIPS mode (now that would be ironic if
we ran faster in FIPS mode than in non-FIPS mode;).

Nelson's comments about making the DB slot a single slot would also work
(probably a better solution).

Some notes on GetBestSlot().

For clients it's called once per SSL connection to find a slot to generate the
keys (not counting the PK11_CreateDigest & GenerateRandom calls).

In servers, though, this call isn't made. Instead the slot where the server's
private key lives is used instead. The call is only made if the slot where the
server's private key can't do the 'next' operation that's needed. In that case
the key is 'moved' to a slot that can. For the most efficient use, you really
want the slot used in the PK11_CreateDigestContext, and the private key, and the
slot you do your TLS or SSL key derivation, and the slot for the cipher suite
all to be the same slot, otherwise you will be moving keys back and forth
between the various slots (which is quite expensive on FIPS tokens).

bob
Assignee: wtchang → saul.edwards.bugs
Target Milestone: --- → 3.11
QA Contact: bishakhabanerjee → jason.m.reid
Saul is gone. Reassigning this performance bug to Nelson.
Assignee: saul.edwards.bugs → nelson
QA Contact: jason.m.reid → libraries
Priority: -- → P3
remove target milestone, since the target was missed.
Target Milestone: 3.11 → ---
Blocks: FIPS2008
No longer blocks: FIPS2008
Assignee: nelson → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.