Open Bug 653032 Opened 14 years ago Updated 11 months ago

field_mod() gives incorrect results for NIST-P192 and NIST-P224.

Categories

(NSS :: Libraries, defect, P5)

x86_64
Linux

Tracking

(Not tracked)

REOPENED

People

(Reporter: osk, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sg:investigate] )

Attachments

(2 files, 4 obsolete files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16 (.NET CLR 3.5.30729) Build Identifier: nss-3.12.8-with-nspr-4.8.6.tar.gz I think I have found a particular set of inputs where the field_mod() functions for the NIST-P192 and NIST-P224 curves generates incorrect results. I have attached a sample C program that reproduces the error, and a patch that fixes the problem. This patch has only been tested on 64-bit platforms. The problem seems to be caused by not implementing carries on the final reduction of the mod function. For P192, this bug should occur for both 32 and 64-bit builds, and for P224 this should occur for only 64-bit builds. Also, the field_mod() functions for P192 and P224 were missing a call to s_mp_clamp() which causes problems when the result is close to zero. Reproducible: Always Steps to Reproduce: 1. Build the ECL and MPI libraries with NSS_ECC_MORE_THAN_SUITE_B defined. 2. Build the sample program and run it from a Linux command line. Actual Results: [osk@rayon ecl] gcc modtest.c ../mpi/libmpi.a libecl.a ../mpi/mp_gf2m.o -I../mpi -O2 -Wall -o modtest [osk@rayon ecl] ./modtest field_mod disagrees with mp_mod for p=NIST-P192! field_mod disagrees with mp_mod for p=NIST-P224! field_mod agrees with mp_mod for p=NIST-P256!
I compiled this program as follows: [osk@rayon mpi]$ make libs [osk@rayon mpi]$ cd ../ecl [osk@rayon ecl]$ make libs CFLAGS="-DNSS_ECC_MORE_THAN_SUITE_B -DMP_API_COMPATIBLE -DMP_IOFUNC -I../mpi" [osk@rayon ecl]$ gcc modtest.c ../mpi/libmpi.a libecl.a ../mpi/mp_gf2m.o -I../mpi -O2 -Wall -o modtest
What is the security impact of this bug for Mozilla?
Whiteboard: [sg:investigate]
The impact of this bug is that very rarely when using NIST-P192 or P224, ECDH and ECDSA operations will fail to agree. For NIST-P192, this bug will only occur when the output of the mod function is equal to 1<<64, in which case the output will be zero instead. If an attacker were able to choose inputs such that any of the mod operations result in (1<<64), then the loss of entropy could allow an attacker to guess the result of a point multiplication. As for NIST-P224, when this error occurs, there is no entropy lost, but some of the bits in the result have been inverted. If an attacker were able to choose inputs to trigger this bug, the result will be corrupted, but should still be unpredictable. However, hackers tend to be crafty people, and I could be proven wrong.
To what extent are we confident that there aren't similar problems in the P-256, P-384, and P-521 implementations?
I have run my test program against all of the NIST prime fields, and P192 and P256 are the only two that give problems. A visual inspection of the final reduction code for P256 and P384 looks to be free of this bug (they use mp_sub() rather than guess at what the high-order bits should be). The mod function for P521 is completely different, and I can draw no parallels between the it and code for smaller primes. And completely off topic; why does the code for P384 call mp_submod(), doesn't that completely defeat the point of having an optimized mod function?
Thanks Owen. I filed Bug 653236 about your off-topic question. Please file bugs for inefficiencies you see in the code, especially for P256 and P384.
(In reply to comment #6) > P192 and P256 are the only two that give problems. That was a typo, I should have said that P192 and P224 are the only two that give problems.
My original patch introduced a bug that would break the mod function when computing a result that was congruent with zero (eg: p mod p). Testing this also revealed that the mod function for P521 was broken for this case too, so it has also been fixed in this patch.
Attachment #528509 - Attachment is obsolete: true
Attached file C program to test field_mod. (obsolete) —
This is an updated version of my test program which now checks for results that should be congruent with zero (p mod p, and p^2 mod p), and adds tests for all of the NIST prime fields.
Attachment #528508 - Attachment is obsolete: true
Attachment #529160 - Flags: review?
Comment on attachment 529160 [details] [diff] [review] Patch to P192, P256, and P521 mod functions. Delegating unassigned reviews. Please reassign as appropriate. Thanks!
Attachment #529160 - Flags: review? → review?(bsmith)
Comment on attachment 529160 [details] [diff] [review] Patch to P192, P256, and P521 mod functions. Bob, could you review this? I do not have a strong understanding of this code.
Attachment #529160 - Flags: review?(bsmith) → review?(rrelyea)
Yes, I need to get my ECC book out to verify some things. Some general comments. I looks like the 192 problem is the failure to incorporate r1. I would prefer if the code used the MP_ADD_CARRY macros. For 224, the 2 32 bit cases are identical. The problem is we need to zero out the top bits of in the 64 bit case. The compliment code accomplishes this. The 521 bit case looks correct. (if r == p, then r needs to be zero). I'll do an official review once I bring in my ECC book. bob
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
First, thanks for finding this, owen, and doing the work to come up with a patch. > A visual inspection of the final reduction code for P256 and P384 looks to be free of this bug > (they use mp_sub() rather than guess at what the high-order bits should be). The mod function > for P521 is completely different, and I can draw no parallels between the it and code for > smaller primes. The 521, 192, and 244 bit curves have smaller number of component than the 256 and 384 bit curves, so it's worth doing the final subtraction "by hand". The bug in the 192 curve is the curve is: 2^192-2^64-1, which means to do the final subtraction we need to subract 2^192-2^64-1. We do this by adding 2^64+1, except the code was not adding 2^64. For the 244 bit curve we need to subtract 2^224-2^96+1. We correctly subtract the 1, but we only implicitly subtract the -2^96 in the 32 bit case (by zeroing the high bits of r1. In the 64 bit case we need to explicitly subtract the -2^96. For the 521 bit case there is only one case where we can be >=2^521-1, and that is if we are exactly equal to it, which is the patch owen added. In the 256 bit case, the has many more components so that it's easier (clearer, safer, and probably just as fast), to do an mp_sub. > And completely off topic; why does the code for P384 call mp_submod(), doesn't that > completely defeat the point of having an optimized mod function? In the 384 bit case, this code has not been completely optimized notice that it's still doing full adds. In theory there should be some win. doing an mp_mod where the dividend is large compared to the divisor is more expensive than if it's small. Still the gain isn't likely great. I would be happy to review a patch that fixes this issue. (It might be interesting to compare doing a 384 'optimized' against a straight 384 prime field implementation. Anyway ownen's patch is correct. I would like to make a few changes that makes it clearer what was going on to the 192 and 244 changes. I'll attach something tomorrow. bob
Correct error in the test for field_mod. The sqr operation isn't necessarily a sqr mod operation. The test will fail when the montgomery prime method is used.
Attachment #529162 - Attachment is obsolete: true
Comment on attachment 529160 [details] [diff] [review] Patch to P192, P256, and P521 mod functions. clearing review field.
Attachment #529160 - Flags: review?(rrelyea)
Here's my perferred patch for the issue.
Attachment #529160 - Attachment is obsolete: true
Attachment #620797 - Attachment is patch: true
Comment on attachment 620797 [details] [diff] [review] Patch to P192, P224, and P521 mod functions Oscar, can you verify that my changes are equivalent to yours. I found they passed both 64 and 32 bit versions of your test program.
Attachment #620797 - Flags: review?(osk)
Also, I have some more info on both p384 and the effects of this patch on mozilla. 1) Evidently the p384 code has is not used at all. p384 always uses the general montgomery prime field functions (probably because we determined it was faster than the unoptimized p384 fields). If someone wanted to supply a patch to optimize the p384 code, they would also have to update ecgroup_fromNameAndHex() in ecl.c to use the optimized functions. 2) even for p256 and p521, mozilla does not use the optimized curve functions. Mozilla always uses the general montgomery prime field functions for p384, p256, and p521 so there is not mozilla related security issue here. bob
> Oscar, can you verify that my changes are equivalent to yours. ^Oscar^Owen bob
Ping Owen, are you still interested in this bug? bob
Attachment #620797 - Flags: superreview?(wtc)
I have confirmed that Bob's patch is functionally equivalent to mine.
Comment on attachment 620797 [details] [diff] [review] Patch to P192, P224, and P521 mod functions Review of attachment 620797 [details] [diff] [review]: ----------------------------------------------------------------- This patch is equivalent to mine. Looks good.
Checking in ecp_192.c; /cvsroot/mozilla/security/nss/lib/freebl/ecl/ecp_192.c,v <-- ecp_192.c new revision: 1.7; previous revision: 1.6 done Checking in ecp_224.c; /cvsroot/mozilla/security/nss/lib/freebl/ecl/ecp_224.c,v <-- ecp_224.c new revision: 1.8; previous revision: 1.7 done Checking in ecp_521.c; /cvsroot/mozilla/security/nss/lib/freebl/ecl/ecp_521.c,v <-- ecp_521.c new revision: 1.3; previous revision: 1.2 done
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Owen, thanks for the bug, patch and review. bob
Attachment #620796 - Attachment mime type: text/x-csrc → text/plain
Comment on attachment 620797 [details] [diff] [review] Patch to P192, P224, and P521 mod functions marking r+ based on owen's comments. Clearing wtc. Wan-Teh, you can still review this if you want to, but it's not necessary so I'm removing it as noice in your review list. bob
Attachment #620797 - Flags: superreview?(wtc)
Attachment #620797 - Flags: review?(osk)
Attachment #620797 - Flags: review+
Comment on attachment 620797 [details] [diff] [review] Patch to P192, P224, and P521 mod functions Review of attachment 620797 [details] [diff] [review]: ----------------------------------------------------------------- ::: ecp_192.c @@ +123,4 @@ > (r0b == 0xffffffff)) ) { > /* do a quick subtract */ > MP_ADD_CARRY(r0a, 1, r0a, 0, carry); > + MP_ADD_CARRY(r0b, carry, r0a, 0, carry); 1. Should the third argument of this MP_ADD_CARRY call be r0b instead of r0a? I don't understand this patch, but I observed that we pass the same variable to all but three of the MP_ADD_CARRY calls in this file. So this new MP_ADD_CARRY does not match that pattern. 2. The roles of the second and fourth arguments to MP_ADD_CARRY seem interchangeable. If so, it would look nicer to pass 0 as the second argument and 'carry' as the fourth argument, because the fourth argument is 'cin' (for "input carry"). In summary, I guess this new MP_ADD_CARRY call should be MP_ADD_CARRY(r0b, 0, r0b, carry, carry); I found many existing MP_ADD_CARRY calls that match this pattern: MP_ADD_CARRY(x, 0, x, carry, carry);
> Should the third argument of this MP_ADD_CARRY call be r0b instead of r0a? yes > The roles of the second and fourth arguments to MP_ADD_CARRY seem interchangeable. yes, though your suggested change fits better. We are actually carrying rather than happening to add 1 if there is a carry. bob
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 620797 [details] [diff] [review] Patch to P192, P224, and P521 mod functions r- based on wtc's comments. I'm glad you reviewed this Wan-Teh.
Attachment #620797 - Flags: superreview-
> I don't understand this patch, but I observed that we pass the same variable > to all but three of the MP_ADD_CARRY calls in this file. yes, it's a bit tricky. What is going on here is a fast modulo reduction. The prime is 2^192 - 2^64 -1. The code above where we are is using a set of adds and shifts to reduce this number to a number that is 192 bits long. At this point we should already be less than 2^192 - 2^64 -1 except for some corner cases. If we hit this corner case, we need to subtract 2^192 - 2^64-1, we can do this quickly by adding the compliment 1 + 2^64. We know r2b, r2a, and r1b are all 0xfffff...ffff, and that there will be a carry out from r1a. The bug here was 2 fold: 1 we weren't carrying from r0a to r0b. Since r0a could be 0xffff.ffffff, this would give the wrong results. Also we weren't adding 1 to r1a (that is adding 2^64), and just assuming r1a was always zero, but there is one case (2^192-1) where r1a winds up as '1'.
Bob: thank you for the explanation. I believe I understand the problems with ecp_192.c now, except the need the for s_mp_clamp() call. But the s_mp_clamp() call seems harmless. Just to confirm my understanding: you said: ..., but there is one case (2^192-1) where r1a winds up as '1'. Is that case r2 == r1 == r0 == 0xfffffffffffffffff?
> I believe I understand the problems with ecp_192.c now, >except the need the for s_mp_clamp() call. But the s_mp_clamp() call seems harmless. clamp will update the mpi variable's length, dropping leading zeros. If your calculation leads to leading zeros you need to call clap (This is definately true in the case we are describing. It's true in other cases as well. > Is that case r2 == r1 == r0 == 0xfffffffffffffffff? yes, r0 is the least significant work, r1 is the next r2 is the most significant word. In the case where we have r0a r0b, the word size is half of the 64 bit word size, so r0b||r0a = r0. bob
Blocks: ec-cleanup
Severity: normal → S3
Severity: S3 → S4
Priority: -- → P5
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: