Closed Bug 328514 Opened 18 years ago Closed 18 years ago

SSL handshake error when using large ECC keys

Categories

(NSS :: Libraries, defect, P1)

3.11
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.11.1

People

(Reporter: vipul.gupta, Assigned: rrelyea)

References

Details

(Whiteboard: ECC)

Attachments

(4 files)

SSL connection attempts between tstclnt and selfserv fail when
the pre-master secret needs to be computed over nistp521 or 
nistb571 (curves nistp384/nistk409 and smaller work ok).
Vipul, How is this manifest?  What are the symptoms? 
Jyri, are you using either of the affected curves on the server that has
      all those BAD MAC errors?
Severity: normal → critical
Priority: -- → P1
Target Milestone: --- → 3.11.1
Version: unspecified → 3.11
(In reply to comment #1)

Here are the steps needed to reproduce this problem (assumes an ECC-enabled build of NSS):

1. Create a new cert/key database
   % certutil -N -d .

2. Create selfsigned RSA cert and an ECC cert (the example uses nistp521 but nistk571 shows the same problem)
   % certutil -S -n myECC-nistp521 -s "CN=localhost (nistp521)" -c "CN=localhost (nistp521)" -t "Cu,Cu,Cu" -k ec -q nistp521 -x -d .
   % certutil -S -n MyRSA-1024 -s "CN=localhost (RSA)" -c "CN=localhost (RSA)" -t "Cu,Cu,Cu" -k rsa -g 1024 -x -d . 

3. Start up selfserv:
   % selfserv -n MyRSA-1024 -p 8443 -e MyECC-nistp521 -d . -c :c004:c009

4. Try connecting via tstclnt using ECDH-ECDSA-AES128-SHA.
   % tstclnt -h localhost -p 8443 -d . -c :c004 -o
    This produces the error:
    tstclnt: read from socket failed: SSL peer reports incorrect Message Authentication Code.

5. Now try connecting via tstclnt using ECDHE-ECDSA-AES128-SHA
   % tstclnt -h localhost -p 8443 -d . -c :c009 -o
    (Connection succeeds because, by default, NSS uses nistp256 to generate the ephemeral
    ECDH key and curves nistk409 and smaller work fine).
The server which was seeing BAD MAC errors was using nistp521.  
I changed it to use nistp256 and the problem does not occur anymore, as confirmed by the above.
Since Jyri has seen this on Solaris, I'm changing the CPU and OS to "ALL".
Vipul, if you're working on this, please "take" this bug (assign it to yourself).
OS: MacOS X → All
Hardware: Macintosh → All
Status: NEW → ASSIGNED
Assignee: wtchang → vipul.gupta
Status: ASSIGNED → NEW
Attached file Output from test 1
Client and server output (ECDH derived secret, pre-master secret used in master 
secret computation and master secret) from a successful handshake using 
nistk409.
Attached file Output from test 2
Client and server output (ECDH derived secret, pre-master secret used in master 
secret computation and master secret) from an  unsuccessful handshake using 
nistb571.

Note that the ECDH derived secret is the same on both client and server but
the pre-master secret used in the master secret computation on the server
no longer matches the ECDH derived secret (it is shorter and looks like
a pre-master secret from an RSA handshake -- 48 bytes long, protocol
version in the first two bytes).
Attached file Output from test 3
Client and server output (ECDH derived secret, pre-master secret used in master 
secret computation and master secret) from a successful handshake using 
nistb571 (for this, I had to modify code in pk11kea.c:pk11_KeyExchange()
to use 1024-bit RSA for wrapping (this allows symKeyLengths greater
than 53 to be wrapped).
Here is the patch used to generate the output in test 3.
It fixes the problem but I'm not sure this is the best fix.

It appears that the code which deals with wrapping/unwrapping
the pre-master secret on the server side has a problem in
dealing with pre-master values longer than 53 bytes.

This code is outside my area of expertise. I'd really appreciate
it if someone from the core NSS team (Bob?) can "take" this
bug now.

thanks,

vipul
I'm adding Bob Relyea to the cc list because he might be
the best person to resolve this bug in the code that deals 
with wrapping/unwrapping the pre-master secret on the
server.

Bob, could you "take" this bug? I'm not familar with this
part of the NSS code.

vipul
Vipul, Great detective work!

The type of handshake being done here, ECDH-ECDSA-AES128-SHA, doesn't involve 
RSA at all.  So, from the fact that RSA is being used in this case by pk11_KeyExchange() we can infer that a key exchange is being done to exchange
a key between two PKCS11 tokens in the server.  That implies these things:

1) the DH derive operation, whose output is the PreMaster Secret (PMS), has
been done in a token/slot that is NOT the one where NSS wants to do the 
subsequent derivation of the MasterSecret (MS) from the PMS, necessitating
that the PMS be moved to another slot/token, where that subsequent derivation
will be done.

2) It is likely that the DH derive operation was done in the softken "DB" 
slot, and the likely reason it was done there is that that is where the 
DH private value (private key) lives.  

3) the PMS (output of that DH derive) has been marked as being "sensitive"
(if I'm remembering the names of the PKCS11 key attributes correctly), 
meaning that it must be wrapped (encrypted) to be extracted from the slot/token
where it is.  It cannot be extracted in unwrapped (plaintext) fashion.

4) So, NSS is using RSA as a key transport method, to move the key from one
slot to another, that is, from the DB slot to the slot where it will do the 
PMS->MS derivation.  RSA works for this, but is surely not the most efficient
way to do it.  In this case, the RSA performance cost for this movement of 
the PMS completely outweighs any cost benefit of using ECC over RSA in the 
handshake.  IOW, we must eliminate this RSA key transport step to get 
reasonable performance.  Eliminating it will also solve the key length 
limit that Vipul uncovered.  

Several ways to eliminate that RSA key transport come to mind, including:

a) Do all the steps in the DB token rather than in the "generic" token,
thereby obviating any movement of key material to another slot/token.
Bob has chosen not to do that for some reason I do not now recall. 

b) use a much cheaper key transport method, such as AES key wrap, for 
the key movement.  (Why aren't we doing that already?)

c) don't mark the PMS as sensitive, so that it can simply be extracted
from one slot without wrapping, and be inserted into another slot without
unwrapping.  

d) Transport a copy of the DH private key from the DB slot to the generic
slot once, early in the server lifetime, and then use that copy in the 
generic slot to do all the DH derivations of the PMSs.  This obviates 
movement of the PMS, because the PSM can be derived in the generic slot,
using the copy of the DH private key.   We already do this for RSA 
(a new feature of NSS 3.11, IIRC).  IMO, it is the most desirable option.
IMO, we should do this for DH and ECDH also, as well as for RSA.  

I seem to remember Bob Relyea discussing somethign like this in another 
bug.  I will look for it now.
It appears to me that Bob proposed doing my option c, "don't mark the PMS as sensitive" as a performance enhancement in bug 326482, and attached a patch
to implement that, which is now awaiting my review! :-o
I'm not 100% sure that his patch would also address this bug's problem, but
I'll try to determine that when I review his patch.  

I think we should do c and d, doing d if we can, and falling back on c if
we can't do d.   That's what we're doing now for RSA.  But in the very short
term, it would suffice to do either one, I think.
The code that implements option d "Transport a copy of the private key from 
the DB slot to the generic slot once,  early in the server lifetime" for 
RSA keys is in function SSL_ConfigSecureServer() in 
http://lxr.mozilla.org/security/source/security/nss/lib/ssl/sslsecur.c#696

Presently, that code attempts to copy the token priv key to a session priv 
key in the same slot.  If that attempt fails, then it tries to copy the 
priv key to the preferred slot for the key's mechanism.  But the function
it calls to get the key's mechanism, PK11_MapSignKeyType(), works for EC 
keys but not for DH keys.  

Because that code tries to copy the token key to a session key IN THE SAME SLOT
first, I'm puzzled why we don't have the same "PMS in the wrong slot" problem
with RSA that this bug shows with ECDH.  And I wrote that code :-(  
Reassigning this to Bob, per Vipul's suggestion, since this appears to 
be a PK11wrap issue.  If we decide it is a libSSL issue, then Bob or I 
can work on it.
Assignee: vipul.gupta → rrelyea
Whiteboard: ECC
RE: It appears to me that Bob proposed doing my option c, "don't mark the PMS as
sensitive" as a performance enhancement in bug 326482, and attached a patch
to implement that, which is now awaiting my review!

Yes. I noticed this as a performance issue in ECDH.

As far as the copy goes, there was another bug that prevented the private key from being copied from a token to session key. The copy still happens in the database token, so the sensitive patch is still required. It turns out to have been a bug in our implementation of Derive. PKCS #11 requires that Derive restrict the sensitive bit on the target key to a function of the sensitive bit on the source key for SOME algorithms. We were implementing the restriction for all algorithms. The basic criteria is the target key should be sensitive if knowledge of the target key reveals information about the source key. That is not true of ECDH (otherwise it would be horribly broken since the client knows the target key;).

bob
Depends on: 326482
Comment on attachment 213216 [details] [diff] [review]
Patch to pk11kea.c used in generating test 3 output

Marking r- just to show that this patch is not a candidate fix for this problem.
Attachment #213216 - Flags: review-
fix from bug 326482 is checked in.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
(In reply to comment #16)
> fix from bug 326482 is checked in.

But from what I can tell, it doesn't resolve the problem.
I'll repeat the tests shortly and post an update.

vipul
(In reply to comment #17)
> (In reply to comment #16)
> > fix from bug 326482 is checked in.
> 
> But from what I can tell, it doesn't resolve the problem.
> I'll repeat the tests shortly and post an update.
> 
> vipul

I've double checked that my workspace is up to date
and includes the patch above. Still, I'm seeing bad mac
failures when using secp521r1.

Could you please follow the steps in comment# 2 and
see if you also encounter the same failure.

vipul


Reopening bug in light of Vipul's preceeding comment.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Bob, It may be not enough to copy from token to session key in the same slot.
It may be necessary to copy to another slot.  But see my note in comment 12.  
Arg, It was the wrong patch.

Nelson. The relevant patch isn't the one that allows the copy to happen. You are quite right, the copy will happen in the same slot, and thus not affect this bug.

The patch that should fix this bug is the patch to not force the sensitive bit on the ECDH derived key. That patch also has 2 reviews so I'll check it in sortly.

bob
ok, the correct patch should be checked in now. I'll close the bug after I hear Vipul can do a connection;).

bob
(In reply to comment #22)
> ok, the correct patch should be checked in now. I'll close the bug after I hear
> Vipul can do a connection;).
> 
> bob

I've rerun my tests on nistp521 and nistb571 and they now work!
So this bug can be marked resolved.

vipul
 
Status: REOPENED → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: