NSS Crash when doing DHE ciphersuite with client auth

RESOLVED FIXED in 3.4

Status

NSS
Libraries
P1
normal
RESOLVED FIXED
17 years ago
17 years ago

People

(Reporter: Nelson Bolyard (seldom reads bugmail), Assigned: Robert Relyea)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(2 attachments)

The stack trace for this crash is reported in bug 
http://bugzilla.mozilla.org/show_bug.cgi?id=95987 

The problem is reproducible with tstclnt (NSS test client) by 
visiting www.kuix.de with the request
GET /mist/test6/ HTTP/1.0

The transaction proceeds as follows:
1. The first SSL handshake completes without requesting client auth.
It uses a DHE cipher suite.  All is well.
2. The client sends the HTTP request to the server.
3. The server initiates a second full SSL handshake, again using
a DHE cipher suite, and this time requesting client auth.  
4. If the client does not attempt to do client auth, and responds with 
no certificate, the handshake succeeds and does not crash.
5. When the client attempts to perform client authentication with a cert,
NSS crashes while preparing to compute the value for the client
key exchange (CKE) message.  The CKE message is sent after the client's 
certificate message which contains the client's cert chain, and before the  
client's "cert verify" message which includes a signature. 

The crash occurs when NSS tries to move the pre-master secret (PMS) from 
the private-key token to the generic-crypto token so that it can derive 
the master secret.  But the error that needs to be fixed has occured long
before then, at the point that the PMS was derived (DHE derivation) in
the private-key token.  The PMS should not have been derived in that token
in the first place. 

To derive the PMS, NSS generates a DHE public/private key (value) pair.
The function that does this, SECKEY_CreateDHPrivateKey, calls 
PK11_GetBestSlot(CKM_DH_PKCS_KEY_PAIR_GEN,cx) [where cx == NULL] to get 
the slot in which to generate the key pair.  It then uses this key pair 
to do the DHE derivation, which results in the creation of the PMS.  The 
PMS usually is created in the same slot where the generated pub/priv key 
pair exists.

The function PK11_GetBestSlot(CKM_DH_PKCS_KEY_PAIR_GEN,cx) sometimes 
returns a pointer to the generic-crypto slot, and other times returns
a pointer to the private-key slot.  When the client creates a cert 
message containing the client's cert chain, then the next call to 
PK11_GetBestSlot(CKM_DH_PKCS_KEY_PAIR_GEN,cx) will return a pointer to 
the private key slot, leading the the crash.

I have not yet determined why PK11_GetBestSlot(CKM_DH_PKCS_KEY_PAIR_GEN
doesn't always return the same answer.  However, I have determined the
following things:

a) In PK11_GetBestSlotMultiple, the call to PK11_GetSlotList(type[0]);
returns a pointer to pk11_dhSlotList, but list->head is null indicating
that the list is empty.  The list remains empty, and is not populated
as NSS runs.  This seems like a bug.

b) Then PK11_GetBestSlotMultiple calls 
PK11_GetAllTokens(type[0],PR_FALSE,PR_TRUE,wincx) and the list returned 
by that function usually contains both slots.  Sometimes the generic-crypto 
slot is first in the list, sometimes it is last.  The first value in this 
list is always the value returned by the function.  
I have not yet determined why the list returned by 
PK11_GetAllTokens(type[0],PR_FALSE,PR_TRUE,wincx) is inconsistent.

Bob, do you wanna tackle this PK11 bug?
(Reporter)

Updated

17 years ago
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → 3.4
(Reporter)

Updated

17 years ago
Blocks: 95987
(Assignee)

Comment 1

17 years ago
The reason the slot list is NULL is because the internal token does not have the
Diffie Hellman bits turned on by default. (pk11db.h - internal flags)
We should fix this.

The most likely reason GetBestSlot is inconsistant it depends on the "is logged
in" state of the private token. If the Database token is not logged in it will
prefer the Crypto token, otherwise it will prefer the database token (assuming
it can do the requested operation).

In any case it's not really a bug that the PMS gets generated in the db token
(it's actually quite common for the PMS to start in db token. This happens
anytime we use one of our database keys in the generation). The question is 1)
why are we trying to move it, and 2) why can't we move it? 

 
(Reporter)

Comment 2

17 years ago
> The question is 1) why are we trying to move it, 

Because after deriving the PMS in the Private Key token, we attempt to derive
the Master Secret (MS) from it using CKM_SSL3_MASTER_KEY_DERIVE_DH
and in pk11_DeriveWithTemplate(), the call to PK11_DoesMechanism(slot,derive)
[where slot is the private key token and derive is 
CKM_SSL3_MASTER_KEY_DERIVE_DH ] returns false.  
PK11GetAllTokens finds only one token that can do 
CKM_SSL3_MASTER_KEY_DERIVE_DH, and its the generic internal token, not the 
private key token.

BTW, there is no slot list for CKM_SSL3_MASTER_KEY_DERIVE_DH

> and 2) why can't we move it? 

pk11_KeyExchange geneates RSA key pairs in only two sizes, namely
512 bits and 256 bits.  But the DH PMS is 1024 bits (in this case,
but could be larger or smaller).  pk11_KeyExchange needs to be able
to handle larger symkeys.
(Reporter)

Comment 3

17 years ago
In that last comment, I answered the question "why does the key exchange
crash", but I think the question Bob was trying to ask was "why are we
doing a key exchange at all and not simply moving the key directly?"

The answer is that the DHE key pair is generated with the CKA_SENSITIVE
bit, so the PMS derived from it is also sensitive.  

I've made several changes and am testing them.  These changes include
a) make key exchanges work for keys larger than 480 bits.
b) Attempt to create DHE keys without the sensitive bit, to avoid doing
   the key exchange.
but I need a client auth cert that will be validated by the test server
to complete my testing.
(Reporter)

Comment 4

17 years ago
The crashes occurred when the secret being moved was too large to move
with a 512-bit RSA key. The code that generated key pairs for key 
exchanges would generate 512-bit keys, even when the secret was larger
than that. 

I changed the code that generates RSA key pairs for key exchange so that 
it generates keys up to 8K bits, the smallest power of two large enough
to move the secret.  This solves the crash, but the time required to 
generate the key pair to move a 1k bit symkey can be more than 5 seconds 
on an 800 Mhz Pentium-3.  

Maybe the code should simply fail, not generating the key pair (and hence
not moving the key) if the secret size is greater than some arbitrary 
limit, say, 480 bits (512 - 32, which is the PKCS#1 minimum overhead).
This would also force us to address the problems with deriving sensitive 
keys in the "wrong" tokens.  

Bob, Do you agree?
(Reporter)

Comment 5

17 years ago
Created attachment 48369 [details] [diff] [review]
Proposed fixes.
(Assignee)

Comment 6

17 years ago
yes, I agree with both the batch and the proposal.

bob
(Reporter)

Comment 7

17 years ago
I consider the changes in the patch above to be "workarounds".  
The real problem is that the DHE key pair is being generated in the 
"database" token, not in the "generic" token.  This causes the PMS
to be derived from the DHE key pair in the database token.  Since
the database token cannot derive the Master Secret from the PMS, the
PMS needs to be moved from the database token to the generic token.
The real solution is to generate the PMS key pair in the generic token.

I suspected that ephemeral RSA keys (SSL Step Down keys), might also be
generated in the database token, but I found that those keys are always 
generated in the "generic" token, because there is a slot list for RSA key 
generation, and it contains only one token, the generic token.  There was 
no similar slot list for DHE key generation.  

I learned that the slot lists are constructed from info inside the 
secmod.db files.  The default configuration does not include DH in
the list of "default mechanisms" for the generic slot.  

I added DH to the list of default mechanisms for the generic slot, 
and that cured everything.  Thereafter, the DHE key pairs were 
generated in the generic slot, so they did not have to be moved.  

So, I think the real fix for this problem involves changing NSS so
that new secmod.dbs created in the future will have the DH mechanism
in the default list for the generic crypto slot.  Existing secmod.dbs
should ideally also be changed to have the DH mechanism added, perhaps
by modutil,  but with the workarounds above in place, browsers should work 
correctly (not crash) even if the secmod dbs are not changed.

I have checked in the patches above. I think the long term secmod DB
fix is a change for Bob to make.  Bob, please assign this to yourself
if you agree.
(Assignee)

Comment 8

17 years ago
We need to make these changes to the NSS 3.3 branch as well...

I'll attach patches to both.

NOTE: Ian has landed the FIPS changes to the NSS 3.3 branch recently. Is PSM
pulling from the NSS 3.3 branch tag, or a static NSS 3.3 RTM tag? If the latter,
do we want to include Ian's changes (which would be a requirement for FIPS
compliance, and fix such bugs as you can't import a pk12 file when in FIPS mode).

If not we need to create a new branch (yuk), and bring these changes into that
branch.

bob
(Assignee)

Comment 9

17 years ago
Created attachment 48430 [details] [diff] [review]
NSS 3.3 DH patch -- adds internal token as a default DH device

Updated

17 years ago
Group: netscapeconfidential?

Comment 10

17 years ago
PSM is pulling from the static tag NSS_CLIENT_TAG.
Right now NSS_CLIENT_TAG is a snapshot of the NSS
3.3 branch.

Moving the NSS_CLIENT_TAG is considered the same
as checking something into the Mozilla client code
base, so you need to ask the PSM team for approval.

Ian, does the NSS_CLIENT_TAG already include the
NSS 3.2.2 fixes?

Comment 11

17 years ago
All of the FIPS changes I made for 3.2.2 are in NSS_3_3_BRANCH, NSS_CLIENT_TAG,
and the tip.  It is safe to update NSS_3_3_BRANCH and then move NSS_CLIENT_TAG
(as long as the tree is open).

Comment 12

17 years ago
adding review keyword.  Request sr= for moving the NSS_CLIENT_TAG to the
versions of the nss files containing the fix for this bug.
Keywords: review

Comment 13

17 years ago
Actually, according to http://www.mozilla.org/hacking/reviewers.html after wtc
approval (which we have) we can move the tag. David Drinan will do so.

Updated

17 years ago
Group: netscapeconfidential?
(Reporter)

Comment 14

17 years ago
I think relyea has already fixed this.  
Reassigning to him so the bug fix will be credited to him.
Assignee: nelsonb → relyea
Status: ASSIGNED → NEW
(Reporter)

Comment 15

17 years ago
This bug was fixed a while ago.
Status: NEW → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED
(Reporter)

Comment 16

17 years ago
Bob R,  What NSS releases include this fix?

David,  Did this fix make it into N6.2?

Comment 17

17 years ago
Bob's NSS 3.3 DH patch (attachment 48430 [details] [diff] [review]) was checked in
on the NSS_3_3_BRANCH on 2001/09/06 21:23:21.  (So it is
in NSS 3.3.1 RTM.)  It did not make it into mozilla0.9.4
but made it into mozilla0.9.5.  Since Netscape 6.2 is
based on mozilla0.9.4, Bob's fix is not unlikely to be
in Netscape 6.2.

You need to log in before you can comment on or make changes to this bug.