536389 - ECC wrong point multiplication (=> wrong signatures) case.

Reporter

Description

•

16 years ago

ECC gives invalid multiplication of EC points for certain curve and multipliers. Here is a test case. Let curve has parameters: /*! id-GostR3410-2001-CryptoPro-{Xch,}A-ParamSet [RFC4357] */ static const ECCurveParams ecCurve_GOST_CP_A = { "CryptoPro-A/XchA", ECField_GFp, 256, "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD97", // prime "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD94", // a "00000000000000000000000000000000000000000000000000000000000000A6", // b "0000000000000000000000000000000000000000000000000000000000000001", // G.x "8D91E471E0989CDA27DF505A453F2B7635294F2DDF23E3B122ACC99C9E9F1E14", // G.y "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6C611070995AD10045841B09B761B893", // G order 1 // cofactor }; When I ask ec_points_mul()@freebl/ec.c to multiply base point of this curve (G) by random number (as part of a signature procedure), it gives wrong product for some multipliers and correct product for other multipliers. For instance, if k=216ee1be33d81a16f2f1060f32c304bf7876e8c93f7a2e469dac79d6a3dac0db ec_points_mul returns wrong product (wrong) kG=( x=d0a89e74850885fe304a5881f8ce15eb9f52a0d2e6373d673fc94dff416dae05 y=5ba273859c93f6759b28a4576a69c23e38ba25376d097eb596675e794197067c ) but valid product is (valid) kG=( x=9bd2578fcb86d9fc2fe54cf192236102b20657c9d5ec5291d456bbdcdb894f0d y=9807c0f0ea6148c7dfd96d7c96bb073f0f61c92506aca6c468737bd95be946a1 ) Here are some other multipliers which trigger wrong product in ec_points_mul(): b53ff01b8c03d79154f1ca7dc14e4df90cccee10a49fc220829f5068b291f7d5 1cfbaac38b9dea01d9fb2e971165bd8308cdd6a66eb63894075f595f1de2283c, etc. However, ec_points_mul() returns correct product for multipliers: 11ea01601a2d443e7907af2385357cd59027d3470e93c63b744918e90d032a9a e11a08966345612f721c2141962b75342fd0b1991a4f068530c41414634028f2 2c672f6ff10be6f5ad8d9e45e674b61788ce42b4dddb872f1995b359213e5dd7, etc. I observe that approximately 1/3 - 1/2 of random multipliers gives correct product in ec_points_mul(). The rest of multipliers gives wrong product, and, finally, wrong signature. Since this behavior is observed only on particular curve I conclude there is a bug somewhere deep in the implimentation of EC point multiplication. To the moment, I do not know, whether this issue is hardware/OS/compiler - dependent.

Nelson Bolyard (seldom reads bugmail)

Comment 1

•

16 years ago

Presently, there is a list of 25 curves that are supported in NSS. NSS's ECC code is believed to be correct for those 25 curves. It is possible that the code is incorrect when it is used with curves other than those 25 supported curves, however, I would not necessarily consider that a bug in the existing code. Rather, I would consider the addition of support for new curves to be an enhancement request, and making the code produce correct results for any such new curves to be part of the work for that enhancement.

Douglas Stebila

Comment 2

•

16 years ago

Nelson, I would still consider an error in a non-supported curve a bug in the NSS ECC code. For different curves, the code implementing the operations is the same code path, just with different constants as the inputs. The code should be consistent and correct on all inputs, whether they were from the approved list of curves or not. This bug may be exposing a bug in the main ECC code that exists but has not shown up in any of our test cases because we're not exercising that code path. I will look into this further in the new year.

Konstantin Andreev

Reporter

Comment 3

•

16 years ago

Thank you for your attention. I agree, the NSS ECC code could be written having only supported curves in mind. If so, every new curve is an enhancement. At first glance, ECC code looks generic enough to support every curve. I believe it's worth reviewing whether this issue does expose unnoticed bug or doesn't.

Douglas Stebila

Updated

•

16 years ago

Assignee: nobody → douglas

Douglas Stebila

Comment 4

•

16 years ago

Attached file Possible testcase that doesn't demonstrate the bug in comment #0 — Details

Konstantin: I've tried to reproduce the bug you've described but am unable to do so. I've attached the test case I put together. I've used the first scalar k= 216ee1be33d81a16f2f1060f32c304bf7876e8c93f7a2e469dac79d6a3dac0db from your comment; when I run the code, it outputs the point that you've listed as valid. What type of system are you running on? Can you try running the test file I've attached to see what output you get?

Konstantin Andreev

Reporter

Comment 5

•

16 years ago

Hello, Douglas. Thank you for investigation. On my system this testcase outputs invalid point, exactly as I shown in my 1st post: $ ./test ec_points_mul: pointQ [len=65]:04:d0:a8:9e:74:85:08:85:fe:30:4a:58:81:f8:ce:15:eb:9f:52:a0:d2:e6:37:3d:67:3f:c9:4d:ff:41:6d:ae:05:5b:a2:73:85:9c:93:f6:75:9b:28:a4:57:6a:69:c2:3e:38:ba:25:37:6d:09:7e:b5:96:67:5e:79:41:97:06:7c: As long as it works for you I guess there is no errors in ECC code, but in the underlying optimized arithmetic library. I am running Solaris Express Community Edition snv_109 on X86. I have noted that during build two assembler files were compiled: mpi/mpcpucache_x86.s and mpi/mpi_i86pc.s If it matters, files in the "freebl" directory are compiled as follows: cc -o SOME.o -c -g -KPIC -DSVR4 -DSYSV -D__svr4 -D__svr4__ -DSOLARIS -D_REENTRANT -Di386 -DSOLARIS2_11 -D_SVID_GETTOD -xs -DXP_UNIX -DSHLIB_SUFFIX=\"so\" -DSHLIB_PREFIX=\"lib\" -DSHLIB_VERSION=\"3\" -DSOFTOKEN_SHLIB_VERSION=\"3\" -DRIJNDAEL_INCLUDE_TABLES -DDEBUG -UNDEBUG -DDEBUG_andreev -DNSS_ENABLE_ECC -DUSE_UTIL_DIRECTLY -DNSS_X86_OR_X64 -DNSS_X86 -DMP_USE_UINT_DIGIT -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE -DMP_ASSEMBLY_DIV_2DX1D -DMP_API_COMPATIBLE -I/usr/dt/include -I/usr/openwin/include -I/usr/include/nspr -I../../../../dist/SunOS5.11_i86pc_DBG.OBJ/include -I../../../../dist/public/nss -I../../../../dist/private/nss -Impi -Iecl SOME.c Could you advise, what should I undertake to isolate the problem ?

Douglas Stebila

Comment 6

•

16 years ago

Konstantin, It still is possible that the problem is with the elliptic curve library (ecl), not with the integer arithmetic library (mpi). There are a few changes I can suggest to see if we can nail down if the bug is occurring in particular parts of the ecl code. Can you try out each of the following changes (recompiling and testing your example for each one) to see if any of them result in different output? This may help us isolate the problem. They are in increasing complexity, however, so it may be more effort than you care to undertake. (1) In freebl/ecl/ecl.c, line 311: change ECGroup_consGFp_mont to ECGroup_cons_GFp (2) Do (1) and in ecl.c, line 112: change ec_GFp_pt_mul_jm_wNAF to ec_GFp_pt_mul_aff (3) Do (2) and in ecl.c, line 112: change ec_GFp_pts_mul_jac to ec_pts_mul_basic (4) Can you try compiling on a 64-bit system to see if it's a 32-bit versus 64-bit problem? (5) Try compiling mpi and ecl independently of NSS (they can be built as standalone libraries). See the instructions at the bottom of ecl/README. Then alter ecl/tests/ecp_test.c to include a test case for your example and see what it outputs.

Konstantin Andreev

Reporter

Comment 7

•

16 years ago

So, given that: (-) WRONG point for 32-bit build Testing changes: (1) freebl/ecl/ecl.c:311: ECGroup_consGFp_mont -> ECGroup_consGFp VALID point for 32-bit build (4) -- no change in source code -- VALID point for 64-bit build If it matters, the difference in compiler flags between 32/64-bit builds: -= -Di386 -DNSS_X86 -DMP_USE_UINT_DIGIT -DMP_ASSEMBLY_SQUARE -DMP_ASSEMBLY_DIV_2DX1D += -xprefetch=no -m64 -DNSS_USE_64 -DNSS_X64 -DNSS_BEVAND_ARCFOUR -DMPI_AMD64 -DNSS_USE_COMBA -DMP_CHAR_STORE_SLOW -DMP_IS_LITTLE_ENDIAN -DUSE_HW_AES The difference in the sets of mpi files compiled during 32/64-bit builds: -= mpcpucache_x86.s mpi_i86pc.s += mpcpucache_amd64.s mpi_amd64.c mpi_amd64_sun.s mp_comba_amd64_sun.s Douglas, I am at your disposal for further testing.

Douglas Stebila

Comment 8

•

16 years ago

If I understand what you've written correctly, the code runs correctly when using ECGroup_consGFp instead of ECGroup_consGFp_mont, even on 32-bit builds. If that's the case, then the problem is either related to the 32-bit Montgomery arithmetic in mpi, or in the Montgomery wrapper in ecl/ecp_mont.c. Could you try running the standalone mpi tests to see if they pass? cd mozilla/security/nss/lib/freebl/mpi make tests ./all-tests

Konstantin Andreev

Reporter

Comment 9

•

16 years ago

(In reply to comment #8) > If I understand what you've written correctly, the code runs correctly when > using ECGroup_consGFp instead of ECGroup_consGFp_mont, even on 32-bit builds. Correct. > If that's the case, then the problem is either related to the 32-bit > Montgomery arithmetic in mpi, or in the Montgomery wrapper in ecl/ecp_mont.c. > > Could you try running the standalone mpi tests to see if they pass? > cd mozilla/security/nss/lib/freebl/mpi > make tests "make tests" builds not automated tests, so this is not necessary. > ./all-tests This was a tricky task, because NSS builds mpi not in such a way as legacy mpi/Makefile. But I managed that. Yes, mpi passed self-test, both for 32- and 64-bit builds: ---( begin cite )--- ** Running unit tests for MPI library Bringing mpi-test up to date ... There are currently 42 test suites available copy ... passed ... ... ... fermat ... passed All unit tests passed ** Running other tests Computation took 0.35 sec. Okay! The pi test passes. ---( end cite )--- I guess I have no other way except the trace correct and wrong builds side-by-side in the debugger to find where computations starts to differ :(

Douglas Stebila

Comment 10

•

16 years ago

(In reply to comment #9) > > If I understand what you've written correctly, the code runs correctly when > > using ECGroup_consGFp instead of ECGroup_consGFp_mont, even on 32-bit builds. > > Correct. > > Yes, mpi passed self-test, both for 32- and 64-bit builds: > > I guess I have no other way except the trace correct and wrong builds > side-by-side in the debugger to find where computations starts to differ :( Yes, I'm afraid that's the case. I can't think of any other tests that I can ask you to run to help isolate the problem. A good starting point would be to print the point at each iteration in the point multiplication and see when they start to differ. Add the following code after line 303 in lib/freebl/ecl/ecp_jm.c: printf("iteration %i:\n", i); printf("point R:\n"); MP_CHECKOK(mp_toradix(rx, s, 16)); printf(" x = %s\n", s); MP_CHECKOK(mp_toradix(ry, s, 16)); printf(" y = %s\n", s); MP_CHECKOK(mp_toradix(&rz, s, 16)); printf(" z = %s\n", s);

Nelson Bolyard (seldom reads bugmail)

Comment 11

•

16 years ago

(In reply to comment #8) > Could you try running the standalone mpi tests to see if they pass? > cd mozilla/security/nss/lib/freebl/mpi > make tests > ./all-tests One caveat: the makefile in freebl/mpi may not build MPI with the exact same set of -D command line options as the Makefile in freebl does. If they're different, then of course, the test results from the built using the mpi makefile are not representative of the behavior of the code as built by freebl's makefile.

Konstantin Andreev

Reporter

Comment 12

•

16 years ago

(In reply to comment #11) > (In reply to comment #8) > > Could you try running the standalone mpi tests to see if they pass? > > cd mozilla/security/nss/lib/freebl/mpi > > make tests > > ./all-tests > > One caveat: the makefile in freebl/mpi may not build MPI with the exact same > set of -D command line options as the Makefile in freebl does. If they're > different, then of course, the test results from the built using the mpi > makefile are not representative of the behavior of the code as built by > freebl's makefile. Thank you for the hint, Nelson. Yes, I saw. I rewrote the mpi/Makefile to build mpi exactly as NSS build does.

Konstantin Andreev

Reporter

Comment 13

•

16 years ago

This is definitely a bug in Montgomery modular multiplication: mpi/mpmontg.c:s_mp_mul_mont(). Let's just try to square the 0x56B543289A8B7658A2FECA9A61F2000000000000 by modulus 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD97. s_mp_mul_mont( 0x56B5..., 0x56B5..., &rv, mp_mont_modulus(0xFF...D97)); The rv (product) is: 32-bit_code:0x000000007E80FA0645090F98C904FE30E9F72224000000001D5E497750DE81BD 64-bit_code:0xF646668A7E80FA0645090F98C904FE30E9F72224000000001D5E497750DE81BD Look, in the 32-bit code the most significant digit of product has lost. On my system this bug is eliminated by the following patch: --- mpmontg.c.org 2006-08-29 06:41:38.000000000 +0400 +++ mpmontg.c 2010-01-25 20:27:06.410299148 +0300 @@ -128,7 +128,7 @@ mp_err s_mp_mul_mont(const mp_int *a, co } MP_USED(c) = 1; MP_DIGIT(c, 0) = 0; - ib = MP_USED(a) + MP_MAX(MP_USED(b), MP_USED(&mmm->N)) + 2; + ib = MP_USED(a) + MP_MAX(MP_USED(b), MP_USED(&mmm->N)) + 3; if((res = s_mp_pad(c, ib)) != MP_OKAY) goto CLEANUP; but I am not sure, whether this a valid approach. Douglas, could you take a look at ? It is also an instresting question, why this problem was not noted earlier...

Wan-Teh Chang

Comment 14

•

16 years ago

Konstantin: Good finding! I stared at the s_mp_mul_mont code for an hour last night, but I couldn't find where the bug is or why your change would eliminate the bug. Here are my findings: 1. I searched for "+ 2" and "+ 3" in that file. I didn't find any "+ 3", but there are many instances of "+ 2". So it's not clear by this dumb method why using "+ 2" in s_mp_mul_mont is wrong. 2. It seems OK for 'ib' to be larger than necessary because the effect is that 'c' will have more leading zero digits, which will be removed by the s_mp_clamp call at the end of the s_mp_mul_mont function. I hope Douglas or Nelson, who is more familiar with the MPI code, will have better luck tracking this down.

Wan-Teh Chang

Comment 15

•

16 years ago

More on my "dumb" code inspection method in comment 14: The structure of s_mp_redc in mpmontg.c is very similar to s_mp_mul_mont. When we understand this bug in s_mp_mul_mont, we should check if s_mp_redc has the same bug.

Wan-Teh Chang

Comment 16

•

16 years ago

It seems that because of this second for loop in s_mp_mul_mont: 153 if (usedb < MP_USED(&mmm->N)) { 154 for (usedb = MP_USED(&mmm->N); ib < usedb; ++ib ) { 155 m_i = MP_DIGIT(c, ib) * mmm->n0prime; 156 s_mp_mul_d_add_offset(&mmm->N, m_i, c, ib); 157 } 158 } 'c' needs to have at least twice the number of digits of mmm->N. So the formula for the initial value of 'ib' that Konstantin changed should contain, ignoring the "+ 2", MP_USED(&mmm->N) + MP_MAX(MP_USED(b), MP_USED(&mmm->N)) rather than MP_USED(a) + MP_MAX(MP_USED(b), MP_USED(&mmm->N)) Perhaps the "+ 2" is actually more than necessary, so it can compensate for the difference between MP_USED(a) and MP_USED(&mmm->N) sometimes. Perhaps 'a' and 'b' usually have the same number of digits as mmm->N.

Douglas Stebila

Comment 17

•

16 years ago

(In reply to comment #14) > 2. It seems OK for 'ib' to be larger than necessary because > the effect is that 'c' will have more leading zero digits, which > will be removed by the s_mp_clamp call at the end of the > s_mp_mul_mont function. I agree with this. > I hope Douglas or Nelson, who is more familiar with the > MPI code, will have better luck tracking this down. I'm not too familiar with the Montgomery multiplication code here, so my inspection will be no better than yours. (In reply to comment #16) > 'c' needs to have at least twice the number of digits of mmm->N. > So the formula for the initial value of 'ib' that Konstantin changed > should contain, ignoring the "+ 2", > MP_USED(&mmm->N) + MP_MAX(MP_USED(b), MP_USED(&mmm->N)) > rather than > MP_USED(a) + MP_MAX(MP_USED(b), MP_USED(&mmm->N)) > > Perhaps the "+ 2" is actually more than necessary, so it > can compensate for the difference between MP_USED(a) > and MP_USED(&mmm->N) sometimes. Perhaps 'a' and 'b' > usually have the same number of digits as mmm->N. I would be hesitant to assume that a and b are already reduced to the same number of digits as mmm->N. So to be on the safe side I would go with MP_MAX(MP_USED(a), MP_USED(&mmm->N)) + MP_MAX(MP_USED(b), MP_USED(&mmm->N)) But as for Konstantin's original observation, regarding what number should be added to this (+ 2, + 3, ...?) I have no intuition whatsoever. My intuition tells me that the line above should in fact be sufficient. Does anyone know what builds would actually use this section of code? It's wrapped in an #if !defined(MP_ASSEMBLY_MUL_MONT) and I would have expected that NSS was built with assembly Montgomery mont mul for most systems of interest, so I'm surprised Konstantin even discovered this bug (not that I'm saying it shouldn't be fixed).

Wan-Teh Chang

Comment 18

•

16 years ago

Douglas, I actually came to the same conclusion that we should go with MP_MAX(MP_USED(a), MP_USED(&mmm->N)) + MP_MAX(MP_USED(b), MP_USED(&mmm->N)) to be on the safe side. Konstantin, you're in the best position to get to the bottom of this. Shall I create a patch for you to review and test?

Konstantin Andreev

Reporter

Comment 19

•

16 years ago

(In reply to comment #17) > (In reply to comment #14) > > I hope Douglas or Nelson, who is more familiar with the > > MPI code, will have better luck tracking this down. > > I'm not too familiar with the Montgomery multiplication code here, so my > inspection will be no better than yours. Sad news :( > Does anyone know what builds would actually use this section of code? It's > wrapped in an > #if !defined(MP_ASSEMBLY_MUL_MONT) > and I would have expected that NSS was built with assembly Montgomery mont mul > for most systems of interest, so I'm surprised Konstantin even discovered this > bug (not that I'm saying it shouldn't be fixed). When I build NSS, I do not set any mathematics-related options. But this peace of code works both for 32-bit and 64-bit builds on Solaris. And yes, Montgomery modular mult use assembler:, the "s_mpv_mul_d", "s_mp_mul_d_add_offset" and "s_mpv_mul_d_add_prop" used by "s_mp_mul_mont" are implemented in platform-specific assembler. (In reply to comment #18) > Douglas, I actually came to the same conclusion that > we should go with > MP_MAX(MP_USED(a), MP_USED(&mmm->N)) + MP_MAX(MP_USED(b), MP_USED(&mmm->N)) > > to be on the safe side. > > Konstantin, you're in the best position to get to the bottom of this. Shall I create a patch for you to review and test? This change is certainly safe, but do not patch right now, please. I should investigate the "s_mp_mul_mont" more carefully. Douglas, when you were not able to reproduce the bug in test case, what kind of system you were running on ? 32-bit or 64-bit, in particular ?

Douglas Stebila

Comment 20

•

16 years ago

(In reply to comment #19) > Douglas, when you were not able to reproduce the bug in test case, what kind of > system you were running on ? 32-bit or 64-bit, in particular ? 64-bit build on Mac OS X 10.6.2 on an Intel Core 2 Duo.

Wan-Teh Chang

Comment 21

•

16 years ago

Attached patch Patch with tight upper bounds (obsolete) — Details — Splinter Review

Please review and test this patch. You may find it hard to verify the upper bounds are correct. Also, this patch changes s_mp_redc, which isn't known to be broken, so it violates the "don't fix it if it ain't broke" principle.

Wan-Teh Chang

Comment 22

•

16 years ago

Attached patch Small patch with a looser upper bound — Details — Splinter Review

This patch fixes only the function that's known to be broken, and uses a looser upper bound that's clearly big enough. If none of us is willing to invest the time to understand the relevant MPI code, this patch is a better fix.

Wan-Teh Chang

Updated

•

16 years ago

Attachment #424378 - Flags: superreview?(douglas)

Attachment #424378 - Flags: review?(andreev)

Konstantin Andreev

Reporter

Comment 23

•

16 years ago

(In reply to comment #21) > Created an attachment (id=424376) [details] > Patch with tight upper bounds > Please review and test this patch. Wan, I can not review until better understanding of the Montgomery algorithm. (In reply to comment #22) > If none of us is willing to invest the time to > understand the relevant MPI code, this patch is > a better fix. I have queried a mathematician for algorithm details. I believe I could review after getting the response.

Wan-Teh Chang

Comment 24

•

16 years ago

Konstantin: one lame way to review these patches without understanding the Montgomery algorithm is to convince yourself that it's OK to use a looser (larger than necessary) upper bound on the number of digits required, and then study the for loops to determine how many digits they need for the output. For s_mp_redc, it's possible that the MPI code has some implicit assumption on the relative sizes of 'T' and the modulus 'mmm->N'. For example, I read in Tom St. Denis' book "BigNum Math" that 'T' should be less than the square of the modulus. With that assumption, the expression i = MP_MAX(MP_USED(T), 2 * MP_USED(&mmm->N)) + 2; in my patch attachment 424376 [details] [diff] [review] will become i = 2 * MP_USED(&mmm->N) + 2;

Nelson Bolyard (seldom reads bugmail)

Comment 25

•

16 years ago

I originally coded mpmontg.c according to the description of Peter Montgomery's multiplication method, as improved by Dusse' & Kaliski. You can read the paper ( http://dsns.csie.nctu.edu.tw/research/crypto/HTML/PDF/E90/230.PDF pages 2-8) It's an easy read, written for SW developers. After reading it, you may be able to improve the code in several ways. I originally tried implementing all 3 of the suggested improvements, but found that I got the best results with improvement 1 alone.

URL: http://dsns.csie.nctu.edu.tw/research...

Konstantin Andreev

Reporter

Comment 26

•

16 years ago

Comment on attachment 424378 [details] [diff] [review] Small patch with a looser upper bound The patch should respect to s_mp_redc as well.

Attachment #424378 - Flags: review?(andreev) → review-

Konstantin Andreev

Reporter

Comment 27

•

16 years ago

(In reply to comment #25) > I originally coded mpmontg.c according to the description of Peter > Montgomery's multiplication method, as improved by Dusse' & Kaliski. > You can read the paper (http://dsns.cs.../230.PDF) Nelson, thank you for this direction. The paper has helped me to understand what's going on. (In reply to comment #17 of Douglas Stebila) > (In reply to comment #16 of Wan-Teh Chang) > > Perhaps 'a' and 'b'usually have the same number of digits as mmm->N. > > I would be hesitant to assume that a and b are already reduced to the same > number of digits as mmm->N. Douglas, we may be sure that input multipliers for "s_mp_mul_mont" are reduced, for the following reasons: 1) <Montgomery> REDC algorithm was designed to work with "n-residues", which < N by construction. 2) REDC(T) shall return (T*R-1) mod N. One can verify that, if T = A * B and A,B > N, then REDC(T) won't return what expected. Thereby I do not see any rationality to call "s_mp_mul_mont/s_mp_redc" with non-reduced arguments. If this happens, it's a library internal error. And last, but not least: s_mp_mul_mont is used 3 times in the NSS code: mp_exptmod_i(), mp_exptmod_safe_i() in mpmontg.c, and ec_GFp_mul_mont() in ecp_mont.c with reduced arguments, as expected. Thereby the room reservation for reduct will be as simple as: (MP_USED(&mmm->N) << 1) + 1; The last "+ 1" is for possible carry. Wan, thank you for observation regarding "s_mp_redc". It suffers from the same disease, that "s_mp_mul_mont" does. Let me show one more example. Let N = 2^256 - 0x269 (CryptoPro-A elliptic curve prime, as in previous examples). Let's try to reduce just "1": s_mp_mul_mont( 1, 1, &rv, N); s_mp_redc( 1, N ); in their current form both lose 5 of 8 digits of result: rv == 0x4A44E00D46F3234475D5ADA6 REDC(1) == 0x15291380351bcc8d11d756b763fe57219b9771454a44e00d46f3234475d5ada6 Douglas, once again about #if/#endif brackets around: #if !defined(MP_ASSEMBLY_MUL_MONT) && !defined(MP_MONT_USE_MP_MUL) mp_err s_mp_mul_mont(...) { ... } #endif The MP_ASSEMBLY_MUL_MONT is nowhere else in the NSS code. I would get rid of it, to not create wrong expectations. The MP_MONT_USE_MP_MUL acts as a switch: if it is defined, the clients use (mp_mul/s_mp_redc) couple instead of s_mp_mul_mont. I assume that's intended for platforms where multiprecision multiplication is faster than digit-by-digit "s_mp_mul_mont" approach. As far as I see from freebl/{mpi/target.mk,Makefile}, NSS defines MP_MONT_USE_MP_MUL if: -- cpu: X86 (32bit), os: linux, win32 -- cpu: sparc (any) I am curious of this particular OS choice for X86. Could anybody shed some light on this ?

Konstantin Andreev

Reporter

Comment 28

•

16 years ago

Attached patch Room reservation for reduction in s_mp_redc/s_mp_mul_mont,v4 (obsolete) — Details — Splinter Review

This patch makes the enhancements: 1) reserves room for reduct in s_mp_redc/s_mp_mul_mont, documents function signatures. 2) corrects documentation reference 3) gets rid of MP_ASSEMBLY_MUL_MONT 4) gets rid of excessive "b" field of mp_mont_modulus structure. I applied this patch to NSS 3.12.5, compiled NSS with Extended ECC on X86, and ran the test suite. All tests were passed.

Attachment #424376 - Attachment is obsolete: true

Attachment #424378 - Attachment is obsolete: true

Attachment #424378 - Flags: superreview?(douglas)

Nelson Bolyard (seldom reads bugmail)

Comment 29

•

16 years ago

Comment on attachment 428736 [details] [diff] [review] Room reservation for reduction in s_mp_redc/s_mp_mul_mont,v4 asking self to review. Would like Douglas and/or Wan-Teh to also review. In answer to question in preceeding comment, All existing ifdef choices were based on empirical performance test results done in 2001. It's probably time to redo them on modern CPUs.

Attachment #428736 - Attachment description: Room reservation for reduct in s_mp_redc/s_mp_mul_mont,v4 → Room reservation for reduction in s_mp_redc/s_mp_mul_mont,v4

Attachment #428736 - Flags: review?(nelson)

Nelson Bolyard (seldom reads bugmail)

Comment 30

•

16 years ago

Let us not forget here that the whole point of using this algorithm is to maximize performance. It is certainly true that one can safely choose to use variables that are larger than necessary in the the Montgomery steps. The price for that choice will be performance. It should be possible to compute the exact size necessary and use just that much. Let us strive to do that.

Nelson Bolyard (seldom reads bugmail)

Comment 31

•

16 years ago

Please ignore my comment 30. It was a reply to some comments I received in private email, which I mistakenly thought were comments in this bug.

Wan-Teh Chang

Comment 32

•

16 years ago

The advantage of using a size larger than the exact size is that it may simplify code reviews. For example, I believe most instances of + 2 in the padding size in that file can be + 1. Konstantin made two such changes in his patch (attachment 428736 [review]). But it may take a lot of time to convince ourselves that + 1 is the exact size, and we would save only one digit.

Konstantin Andreev

Reporter

Comment 33

•

16 years ago

I am in doubt where original "+ 2" came from. But the "+ 1" in "len(T) := 2*len(N) + 1" is simple to explain: the extra digit is to keep one bit of carry, that may occure during REDC execution. Wan, if you are hesitant, let replace "+ 1" with "+ 2", it doesn't make a big difference.

Alexei Volkov

Updated

•

16 years ago

Blocks: FIPS2010

Nelson Bolyard (seldom reads bugmail)

Comment 34

•

16 years ago

Attached patch Konstantin's patch v4, converted to CVS diff against 3.13 branch — Details — Splinter Review

This will aid my review.

Nelson Bolyard (seldom reads bugmail)

Comment 35

•

16 years ago

Wan-Teh, the savings of "only 1 digit" can reduce the number of long slow multiply instructions required to do some operations by thousands, perhaps tens of thousands. It represents "low hanging fruit" for someone interested in performance. We ought not to casually discard performance. I hoped to get to the work of getting the code to use minimum sizes this weekend, but didn't. Maybe next weekend.

Wan-Teh Chang

Comment 36

•

16 years ago

I was just pointing out that if we're willing to use a sub-optimal upper bound that both Douglas and I could verify, we would have fixed the bug two months ago (see the discussion around comment 17). Today, this bug is still open. That's the trade-off.

Konstantin Andreev

Reporter

Comment 37

•

16 years ago

It looks like there was some discussion beyond this thread :) I guess, you want to pre-allocate (for Montgomery product) no more digits than product will contain. Do you ? Unfortunately, this is not possible. As demonstrated in comment 27, the number of digits in product does not depend from the numbers of digits in multipliers. Come again (see comment 27): the result of s_mp_mul_mont( 1, 1, &rv, N); occupies the number of digits in N, not in "1". (In reply to comment #36) > I was just pointing out that if we're willing to use a sub-optimal > upper bound that both Douglas and I could verify, we would have > fixed the bug two months ago (see the discussion around comment 17). Yes, but that fix was a bit overkill, because MP_MAX(MP_USED(a), MP_USED(&mmm->N)) + MP_MAX(MP_USED(b), MP_USED(&mmm->N)) is just 2 * MP_USED(&mmm->N)

Robert Relyea

Updated

•

16 years ago

No longer blocks: FIPS2010

Konstantin Andreev

Reporter

Updated

•

16 years ago

OS: Solaris → All

Hardware: x86 → All

Nelson Bolyard (seldom reads bugmail)

Updated

•

15 years ago

Whiteboard: FIPS

Robert Relyea

Comment 38

•

15 years ago

Stick back on the FIPS rereview list

Robert Relyea

Updated

•

15 years ago

Blocks: FIPS2010

Nelson Bolyard (seldom reads bugmail)

Comment 39

•

15 years ago

Comment on attachment 424378 [details] [diff] [review] Small patch with a looser upper bound I'm going to give this patch r+ as a short term measure, to get a fix in for this, while we continue to find the best solution.

Attachment #424378 - Attachment is obsolete: false

Attachment #424378 - Flags: superreview+

Konstantin Andreev

Reporter

Comment 40

•

15 years ago

(In reply to comment #39) > (From update of attachment 424378 [details] [diff] [review]) > I'm going to give this patch r+ as a short term measure, to get a fix in for > this, while we continue to find the best solution. Please, do not. Attachment 424378 [details] [diff] only fixes bug in "s_mp_mul_mont", but "s_mp_redc" is NOT fixed. Could you, please, advice, what is wrong with patch 433870 ?

Nelson Bolyard (seldom reads bugmail)

Comment 41

•

15 years ago

Comment on attachment 433870 [details] [diff] [review] Konstantin's patch v4, converted to CVS diff against 3.13 branch This weekend, I instrumented libfreebl and compared the effects of the two patches against the present committed code on RSA operations and ECC operations. RSA operations tend to be VERY constant in the number of multiplication operations done for a given modulus size. By contrast ECC operations tend to vary a great deal for operations done in a given curve. My chief concern was the impact on RSA operations. I observed that the only performance differences occurred when modulus length was not a multiple of 32 bits. IMO, this is a sufficiently rare case that I don't care about it. Indeed, softoken will only generate RSA keys whose modulus size is a multiple of 16 bits. So I'm willing to accept Konstantin's patch, which is "correct" (produces correct values) and appears to be no worse than the present code when modulus length is a multiple of 32 bits.

Attachment #433870 - Flags: review+

Nelson Bolyard (seldom reads bugmail)

Updated

•

15 years ago

Attachment #428736 - Attachment is obsolete: true

Attachment #428736 - Flags: review?(nelson)

Nelson Bolyard (seldom reads bugmail)

Comment 42

•

15 years ago

Note that the paper formerly available at http://dsns.csie.nctu.edu.tw/research/crypto/HTML/PDF/E90/230.PDF is no longer available there.

URL: http://dsns.csie.nctu.edu.tw/research...

Nelson Bolyard (seldom reads bugmail)

Comment 43

•

15 years ago

Bug 536389: ECC wrong point multiplication (=> wrong signatures) case Patch contributed by Konstantin Andreev <andreev@swemel.ru>, r=nelson On 3.13 branch ecl/ecp_mont.c; new revision: 1.2.182.1; previous revision: 1.2 mpi/mpi-priv.h; new revision: 1.21.8.1; previous revision: 1.21 mpi/mpi.c; new revision: 1.45.56.4; previous revision: 1.45.56.3 mpi/mpmontg.c; new revision: 1.20.56.1; previous revision: 1.20

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Wan-Teh Chang

Comment 46

•

14 years ago

Nelson's checkin in comment 43 fixed this bug in NSS 3.13.

Target Milestone: --- → 3.13

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Updated

•

14 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Whiteboard: FIPS → FIPS [sg:investigate]

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Updated

•

11 years ago

Comment 48

•

3 years ago

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: douglas → nobody

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Benjamin Beurdouche [:beurdouche]

Updated

•

2 years ago

Priority: -- → P4

Anna Weine

Comment 49

•

2 years ago

•

Edited

It looks like the code from the patch is in NSS.

Status: REOPENED → RESOLVED

Closed: 15 years ago → 2 years ago

Resolution: --- → WORKSFORME

Robert Relyea

Comment 50

•

2 years ago

So the original patch was included 13 years ago. Brian reopenned the bug because of bug 1128140 and the fact he wanted it in 3.12. The former bug is closed and any relevancy of NSS 3.12 in 2024 is moot, so Anne is correct in closing this bug.

Possible testcase that doesn't demonstrate the bug in comment #0 16 years ago Douglas Stebila 4.04 KB, application/octet-stream		Details
Patch with tight upper bounds 16 years ago Wan-Teh Chang 1.55 KB, patch		Details \| Diff \| Splinter Review
Small patch with a looser upper bound 16 years ago Wan-Teh Chang 1.05 KB, patch	grapvar : review- nelson : superreview+	Details \| Diff \| Splinter Review
Room reservation for reduction in s_mp_redc/s_mp_mul_mont,v4 16 years ago Konstantin Andreev 4.66 KB, patch		Details \| Diff \| Splinter Review
Konstantin's patch v4, converted to CVS diff against 3.13 branch 16 years ago Nelson Bolyard (seldom reads bugmail) 5.18 KB, patch	nelson : review+	Details \| Diff \| Splinter Review