The stack trace for this crash is reported in bug http://bugzilla.mozilla.org/show_bug.cgi?id=95987 The problem is reproducible with tstclnt (NSS test client) by visiting www.kuix.de with the request GET /mist/test6/ HTTP/1.0 The transaction proceeds as follows: 1. The first SSL handshake completes without requesting client auth. It uses a DHE cipher suite. All is well. 2. The client sends the HTTP request to the server. 3. The server initiates a second full SSL handshake, again using a DHE cipher suite, and this time requesting client auth. 4. If the client does not attempt to do client auth, and responds with no certificate, the handshake succeeds and does not crash. 5. When the client attempts to perform client authentication with a cert, NSS crashes while preparing to compute the value for the client key exchange (CKE) message. The CKE message is sent after the client's certificate message which contains the client's cert chain, and before the client's "cert verify" message which includes a signature. The crash occurs when NSS tries to move the pre-master secret (PMS) from the private-key token to the generic-crypto token so that it can derive the master secret. But the error that needs to be fixed has occured long before then, at the point that the PMS was derived (DHE derivation) in the private-key token. The PMS should not have been derived in that token in the first place. To derive the PMS, NSS generates a DHE public/private key (value) pair. The function that does this, SECKEY_CreateDHPrivateKey, calls PK11_GetBestSlot(CKM_DH_PKCS_KEY_PAIR_GEN,cx) [where cx == NULL] to get the slot in which to generate the key pair. It then uses this key pair to do the DHE derivation, which results in the creation of the PMS. The PMS usually is created in the same slot where the generated pub/priv key pair exists. The function PK11_GetBestSlot(CKM_DH_PKCS_KEY_PAIR_GEN,cx) sometimes returns a pointer to the generic-crypto slot, and other times returns a pointer to the private-key slot. When the client creates a cert message containing the client's cert chain, then the next call to PK11_GetBestSlot(CKM_DH_PKCS_KEY_PAIR_GEN,cx) will return a pointer to the private key slot, leading the the crash. I have not yet determined why PK11_GetBestSlot(CKM_DH_PKCS_KEY_PAIR_GEN doesn't always return the same answer. However, I have determined the following things: a) In PK11_GetBestSlotMultiple, the call to PK11_GetSlotList(type); returns a pointer to pk11_dhSlotList, but list->head is null indicating that the list is empty. The list remains empty, and is not populated as NSS runs. This seems like a bug. b) Then PK11_GetBestSlotMultiple calls PK11_GetAllTokens(type,PR_FALSE,PR_TRUE,wincx) and the list returned by that function usually contains both slots. Sometimes the generic-crypto slot is first in the list, sometimes it is last. The first value in this list is always the value returned by the function. I have not yet determined why the list returned by PK11_GetAllTokens(type,PR_FALSE,PR_TRUE,wincx) is inconsistent. Bob, do you wanna tackle this PK11 bug?
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → 3.4
The reason the slot list is NULL is because the internal token does not have the Diffie Hellman bits turned on by default. (pk11db.h - internal flags) We should fix this. The most likely reason GetBestSlot is inconsistant it depends on the "is logged in" state of the private token. If the Database token is not logged in it will prefer the Crypto token, otherwise it will prefer the database token (assuming it can do the requested operation). In any case it's not really a bug that the PMS gets generated in the db token (it's actually quite common for the PMS to start in db token. This happens anytime we use one of our database keys in the generation). The question is 1) why are we trying to move it, and 2) why can't we move it?
> The question is 1) why are we trying to move it, Because after deriving the PMS in the Private Key token, we attempt to derive the Master Secret (MS) from it using CKM_SSL3_MASTER_KEY_DERIVE_DH and in pk11_DeriveWithTemplate(), the call to PK11_DoesMechanism(slot,derive) [where slot is the private key token and derive is CKM_SSL3_MASTER_KEY_DERIVE_DH ] returns false. PK11GetAllTokens finds only one token that can do CKM_SSL3_MASTER_KEY_DERIVE_DH, and its the generic internal token, not the private key token. BTW, there is no slot list for CKM_SSL3_MASTER_KEY_DERIVE_DH > and 2) why can't we move it? pk11_KeyExchange geneates RSA key pairs in only two sizes, namely 512 bits and 256 bits. But the DH PMS is 1024 bits (in this case, but could be larger or smaller). pk11_KeyExchange needs to be able to handle larger symkeys.
In that last comment, I answered the question "why does the key exchange crash", but I think the question Bob was trying to ask was "why are we doing a key exchange at all and not simply moving the key directly?" The answer is that the DHE key pair is generated with the CKA_SENSITIVE bit, so the PMS derived from it is also sensitive. I've made several changes and am testing them. These changes include a) make key exchanges work for keys larger than 480 bits. b) Attempt to create DHE keys without the sensitive bit, to avoid doing the key exchange. but I need a client auth cert that will be validated by the test server to complete my testing.
The crashes occurred when the secret being moved was too large to move with a 512-bit RSA key. The code that generated key pairs for key exchanges would generate 512-bit keys, even when the secret was larger than that. I changed the code that generates RSA key pairs for key exchange so that it generates keys up to 8K bits, the smallest power of two large enough to move the secret. This solves the crash, but the time required to generate the key pair to move a 1k bit symkey can be more than 5 seconds on an 800 Mhz Pentium-3. Maybe the code should simply fail, not generating the key pair (and hence not moving the key) if the secret size is greater than some arbitrary limit, say, 480 bits (512 - 32, which is the PKCS#1 minimum overhead). This would also force us to address the problems with deriving sensitive keys in the "wrong" tokens. Bob, Do you agree?
yes, I agree with both the batch and the proposal. bob
I consider the changes in the patch above to be "workarounds". The real problem is that the DHE key pair is being generated in the "database" token, not in the "generic" token. This causes the PMS to be derived from the DHE key pair in the database token. Since the database token cannot derive the Master Secret from the PMS, the PMS needs to be moved from the database token to the generic token. The real solution is to generate the PMS key pair in the generic token. I suspected that ephemeral RSA keys (SSL Step Down keys), might also be generated in the database token, but I found that those keys are always generated in the "generic" token, because there is a slot list for RSA key generation, and it contains only one token, the generic token. There was no similar slot list for DHE key generation. I learned that the slot lists are constructed from info inside the secmod.db files. The default configuration does not include DH in the list of "default mechanisms" for the generic slot. I added DH to the list of default mechanisms for the generic slot, and that cured everything. Thereafter, the DHE key pairs were generated in the generic slot, so they did not have to be moved. So, I think the real fix for this problem involves changing NSS so that new secmod.dbs created in the future will have the DH mechanism in the default list for the generic crypto slot. Existing secmod.dbs should ideally also be changed to have the DH mechanism added, perhaps by modutil, but with the workarounds above in place, browsers should work correctly (not crash) even if the secmod dbs are not changed. I have checked in the patches above. I think the long term secmod DB fix is a change for Bob to make. Bob, please assign this to yourself if you agree.
We need to make these changes to the NSS 3.3 branch as well... I'll attach patches to both. NOTE: Ian has landed the FIPS changes to the NSS 3.3 branch recently. Is PSM pulling from the NSS 3.3 branch tag, or a static NSS 3.3 RTM tag? If the latter, do we want to include Ian's changes (which would be a requirement for FIPS compliance, and fix such bugs as you can't import a pk12 file when in FIPS mode). If not we need to create a new branch (yuk), and bring these changes into that branch. bob
Created attachment 48430 [details] [diff] [review] NSS 3.3 DH patch -- adds internal token as a default DH device
PSM is pulling from the static tag NSS_CLIENT_TAG. Right now NSS_CLIENT_TAG is a snapshot of the NSS 3.3 branch. Moving the NSS_CLIENT_TAG is considered the same as checking something into the Mozilla client code base, so you need to ask the PSM team for approval. Ian, does the NSS_CLIENT_TAG already include the NSS 3.2.2 fixes?
All of the FIPS changes I made for 3.2.2 are in NSS_3_3_BRANCH, NSS_CLIENT_TAG, and the tip. It is safe to update NSS_3_3_BRANCH and then move NSS_CLIENT_TAG (as long as the tree is open).
adding review keyword. Request sr= for moving the NSS_CLIENT_TAG to the versions of the nss files containing the fix for this bug.
Actually, according to http://www.mozilla.org/hacking/reviewers.html after wtc approval (which we have) we can move the tag. David Drinan will do so.
I think relyea has already fixed this. Reassigning to him so the bug fix will be credited to him.
Assignee: nelsonb → relyea
Status: ASSIGNED → NEW
This bug was fixed a while ago.
Status: NEW → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED
Bob R, What NSS releases include this fix? David, Did this fix make it into N6.2?
Bob's NSS 3.3 DH patch (attachment 48430 [details] [diff] [review]) was checked in on the NSS_3_3_BRANCH on 2001/09/06 21:23:21. (So it is in NSS 3.3.1 RTM.) It did not make it into mozilla0.9.4 but made it into mozilla0.9.5. Since Netscape 6.2 is based on mozilla0.9.4, Bob's fix is not unlikely to be in Netscape 6.2.
You need to log in before you can comment on or make changes to this bug.