Closed Bug 536389 Opened 15 years ago Closed 2 months ago

ECC wrong point multiplication (=> wrong signatures) case.

Categories

(NSS :: Libraries, defect, P4)

3.12.4

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: grapvar, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: FIPS [sg:investigate])

Attachments

(3 files, 2 obsolete files)

ECC gives invalid multiplication of EC points for certain curve and multipliers. Here is a test case.

Let curve has parameters:

/*! id-GostR3410-2001-CryptoPro-{Xch,}A-ParamSet [RFC4357] */
static const ECCurveParams ecCurve_GOST_CP_A = { 
 "CryptoPro-A/XchA", ECField_GFp, 256,
 "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD97", // prime
 "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD94", // a
 "00000000000000000000000000000000000000000000000000000000000000A6", // b
 "0000000000000000000000000000000000000000000000000000000000000001", // G.x
 "8D91E471E0989CDA27DF505A453F2B7635294F2DDF23E3B122ACC99C9E9F1E14", // G.y
 "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6C611070995AD10045841B09B761B893", // G order
  1 // cofactor
};

When I ask ec_points_mul()@freebl/ec.c to multiply base point of this curve (G) by random number (as part of a signature procedure), it gives wrong product for some multipliers and correct product for other multipliers. For instance, if

k=216ee1be33d81a16f2f1060f32c304bf7876e8c93f7a2e469dac79d6a3dac0db
ec_points_mul returns wrong product
 (wrong) kG=(
      x=d0a89e74850885fe304a5881f8ce15eb9f52a0d2e6373d673fc94dff416dae05
      y=5ba273859c93f6759b28a4576a69c23e38ba25376d097eb596675e794197067c
    )
but valid product is
 (valid) kG=(
      x=9bd2578fcb86d9fc2fe54cf192236102b20657c9d5ec5291d456bbdcdb894f0d
      y=9807c0f0ea6148c7dfd96d7c96bb073f0f61c92506aca6c468737bd95be946a1
    )

Here are some other multipliers which trigger wrong product in ec_points_mul(): b53ff01b8c03d79154f1ca7dc14e4df90cccee10a49fc220829f5068b291f7d5 1cfbaac38b9dea01d9fb2e971165bd8308cdd6a66eb63894075f595f1de2283c, etc.

However, ec_points_mul() returns correct product for multipliers: 11ea01601a2d443e7907af2385357cd59027d3470e93c63b744918e90d032a9a e11a08966345612f721c2141962b75342fd0b1991a4f068530c41414634028f2 2c672f6ff10be6f5ad8d9e45e674b61788ce42b4dddb872f1995b359213e5dd7, etc.

I observe that approximately 1/3 - 1/2 of random multipliers gives correct product in ec_points_mul(). The rest of multipliers gives wrong product, and, finally, wrong signature.

Since this behavior is observed only on particular curve I conclude there is a bug somewhere deep in the implimentation of EC point multiplication.

To the moment, I do not know, whether this issue is hardware/OS/compiler - dependent.
Presently, there is a list of 25 curves that are supported in NSS.  NSS's 
ECC code is believed to be correct for those 25 curves.  It is possible 
that the code is incorrect when it is used with curves other than those 
25 supported curves, however, I would not necessarily consider that a bug 
in the existing code.  Rather, I would consider the addition of support for
new curves to be an enhancement request, and making the code produce correct
results for any such new curves to be part of the work for that enhancement.
Nelson, I would still consider an error in a non-supported curve a bug in the NSS ECC code.  For different curves, the code implementing the operations is the same code path, just with different constants as the inputs.  The code should be consistent and correct on all inputs, whether they were from the approved list of curves or not.  This bug may be exposing a bug in the main ECC code that exists but has not shown up in any of our test cases because we're not exercising that code path.  I will look into this further in the new year.
Thank you for your attention.

I agree, the NSS ECC code could be written having only supported curves in mind. If so, every new curve is an enhancement.

At first glance, ECC code looks generic enough to support every curve. I believe it's worth reviewing whether this issue does expose unnoticed bug or doesn't.
Assignee: nobody → douglas
Konstantin:  I've tried to reproduce the bug you've described but am unable to do so.  I've attached the test case I put together.  I've used the first scalar k= 216ee1be33d81a16f2f1060f32c304bf7876e8c93f7a2e469dac79d6a3dac0db from your comment; when I run the code, it outputs the point that you've listed as valid.  

What type of system are you running on?  Can you try running the test file I've attached to see what output you get?
Hello, Douglas.

Thank you for investigation.

On my system this testcase outputs invalid point, exactly as I shown in my 1st post:

$ ./test
ec_points_mul: pointQ [len=65]:04:d0:a8:9e:74:85:08:85:fe:30:4a:58:81:f8:ce:15:eb:9f:52:a0:d2:e6:37:3d:67:3f:c9:4d:ff:41:6d:ae:05:5b:a2:73:85:9c:93:f6:75:9b:28:a4:57:6a:69:c2:3e:38:ba:25:37:6d:09:7e:b5:96:67:5e:79:41:97:06:7c:

As long as it works for you I guess there is no errors in ECC code, but in the underlying optimized arithmetic library.

I am running Solaris Express Community Edition snv_109 on X86. I have noted that during build two assembler files were compiled: mpi/mpcpucache_x86.s and mpi/mpi_i86pc.s

If it matters, files in the "freebl" directory are compiled as follows:
cc -o SOME.o -c -g -KPIC -DSVR4 -DSYSV -D__svr4 -D__svr4__ -DSOLARIS -D_REENTRANT -Di386 -DSOLARIS2_11 -D_SVID_GETTOD  -xs -DXP_UNIX -DSHLIB_SUFFIX=\"so\" -DSHLIB_PREFIX=\"lib\" -DSHLIB_VERSION=\"3\" -DSOFTOKEN_SHLIB_VERSION=\"3\" -DRIJNDAEL_INCLUDE_TABLES -DDEBUG -UNDEBUG -DDEBUG_andreev -DNSS_ENABLE_ECC -DUSE_UTIL_DIRECTLY -DNSS_X86_OR_X64 -DNSS_X86 -DMP_USE_UINT_DIGIT -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE  -DMP_ASSEMBLY_DIV_2DX1D -DMP_API_COMPATIBLE -I/usr/dt/include -I/usr/openwin/include -I/usr/include/nspr -I../../../../dist/SunOS5.11_i86pc_DBG.OBJ/include -I../../../../dist/public/nss -I../../../../dist/private/nss -Impi -Iecl SOME.c

Could you advise, what should I undertake to isolate the problem ?
Konstantin,

It still is possible that the problem is with the elliptic curve library (ecl), not with the integer arithmetic library (mpi).

There are a few changes I can suggest to see if we can nail down if the bug is occurring in particular parts of the ecl code.  Can you try out each of the following changes (recompiling and testing your example for each one) to see if any of them result in different output?  This may help us isolate the problem.  They are in increasing complexity, however, so it may be more effort than you care to undertake.

(1) In freebl/ecl/ecl.c, line 311: change ECGroup_consGFp_mont to ECGroup_cons_GFp
(2) Do (1) and in ecl.c, line 112: change ec_GFp_pt_mul_jm_wNAF to ec_GFp_pt_mul_aff
(3) Do (2) and in ecl.c, line 112: change ec_GFp_pts_mul_jac to ec_pts_mul_basic
(4) Can you try compiling on a 64-bit system to see if it's a 32-bit versus 64-bit problem?
(5) Try compiling mpi and ecl independently of NSS (they can be built as standalone libraries).  See the instructions at the bottom of ecl/README.  Then alter ecl/tests/ecp_test.c to include a test case for your example and see what it outputs.
So, given that:

(-) WRONG point for 32-bit build

Testing changes:

(1) freebl/ecl/ecl.c:311: ECGroup_consGFp_mont -> ECGroup_consGFp
    VALID point for 32-bit build
(4) -- no change in source code --
    VALID point for 64-bit build

If it matters, the difference in compiler flags between 32/64-bit builds:

-= -Di386 -DNSS_X86 -DMP_USE_UINT_DIGIT -DMP_ASSEMBLY_SQUARE -DMP_ASSEMBLY_DIV_2DX1D
+= -xprefetch=no -m64 -DNSS_USE_64 -DNSS_X64 -DNSS_BEVAND_ARCFOUR -DMPI_AMD64 -DNSS_USE_COMBA -DMP_CHAR_STORE_SLOW -DMP_IS_LITTLE_ENDIAN -DUSE_HW_AES

The difference in the sets of mpi files compiled during 32/64-bit builds:

-= mpcpucache_x86.s mpi_i86pc.s
+= mpcpucache_amd64.s mpi_amd64.c mpi_amd64_sun.s mp_comba_amd64_sun.s

Douglas, I am at your disposal for further testing.
If I understand what you've written correctly, the code runs correctly when using ECGroup_consGFp instead of ECGroup_consGFp_mont, even on 32-bit builds.  If that's the case, then the problem is either related to the 32-bit Montgomery arithmetic in mpi, or in the Montgomery wrapper in ecl/ecp_mont.c.

Could you try running the standalone mpi tests to see if they pass?  
cd mozilla/security/nss/lib/freebl/mpi
make tests
./all-tests
(In reply to comment #8)
> If I understand what you've written correctly, the code runs correctly when
> using ECGroup_consGFp instead of ECGroup_consGFp_mont, even on 32-bit builds. 

Correct.

> If that's the case, then the problem is either related to the 32-bit
> Montgomery arithmetic in mpi, or in the Montgomery wrapper in ecl/ecp_mont.c.
> 
> Could you try running the standalone mpi tests to see if they pass?  
> cd mozilla/security/nss/lib/freebl/mpi
> make tests

"make tests" builds not automated tests, so this is not necessary.

> ./all-tests

This was a tricky task, because NSS builds mpi not in such a way as legacy mpi/Makefile. But I managed that.

Yes, mpi passed self-test, both for 32- and 64-bit builds:

---( begin cite )---
** Running unit tests for MPI library
Bringing mpi-test up to date ... 
There are currently 42 test suites available
copy ... passed
... ... ...
fermat ... passed
All unit tests passed
** Running other tests
Computation took 0.35 sec.
Okay!  The pi test passes.
---( end cite )---

I guess I have no other way except the trace correct and wrong builds side-by-side in the debugger to find where computations starts to differ :(
(In reply to comment #9)

> > If I understand what you've written correctly, the code runs correctly when
> > using ECGroup_consGFp instead of ECGroup_consGFp_mont, even on 32-bit builds. 
> 
> Correct.
> 
> Yes, mpi passed self-test, both for 32- and 64-bit builds:
> 
> I guess I have no other way except the trace correct and wrong builds
> side-by-side in the debugger to find where computations starts to differ :(

Yes, I'm afraid that's the case.  I can't think of any other tests that I can ask you to run to help isolate the problem.

A good starting point would be to print the point at each iteration in the point multiplication and see when they start to differ.  Add the following code after line 303 in lib/freebl/ecl/ecp_jm.c:

printf("iteration %i:\n", i);
printf("point R:\n");
MP_CHECKOK(mp_toradix(rx, s, 16));
printf("   x = %s\n", s);
MP_CHECKOK(mp_toradix(ry, s, 16));
printf("   y = %s\n", s);
MP_CHECKOK(mp_toradix(&rz, s, 16));
printf("   z = %s\n", s);
(In reply to comment #8)
> Could you try running the standalone mpi tests to see if they pass?  
> cd mozilla/security/nss/lib/freebl/mpi
> make tests
> ./all-tests

One caveat: the makefile in freebl/mpi may not build MPI with the exact same 
set of -D command line options as the Makefile in freebl does.  If they're 
different, then of course, the test results from the built using the mpi 
makefile are not representative of the behavior of the code as built by 
freebl's makefile.
(In reply to comment #11)
> (In reply to comment #8)
> > Could you try running the standalone mpi tests to see if they pass?  
> > cd mozilla/security/nss/lib/freebl/mpi
> > make tests
> > ./all-tests
> 
> One caveat: the makefile in freebl/mpi may not build MPI with the exact same 
> set of -D command line options as the Makefile in freebl does.  If they're 
> different, then of course, the test results from the built using the mpi 
> makefile are not representative of the behavior of the code as built by 
> freebl's makefile.

Thank you for the hint, Nelson. Yes, I saw. I rewrote the mpi/Makefile to build mpi exactly as NSS build does.
This is definitely a bug in Montgomery modular multiplication: mpi/mpmontg.c:s_mp_mul_mont().

Let's just try to square the 0x56B543289A8B7658A2FECA9A61F2000000000000 by modulus 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD97.

   s_mp_mul_mont( 0x56B5..., 0x56B5..., &rv, mp_mont_modulus(0xFF...D97));

The rv (product) is:

32-bit_code:0x000000007E80FA0645090F98C904FE30E9F72224000000001D5E497750DE81BD
64-bit_code:0xF646668A7E80FA0645090F98C904FE30E9F72224000000001D5E497750DE81BD

Look, in the 32-bit code the most significant digit of product has lost.

On my system this bug is eliminated by the following patch:

--- mpmontg.c.org	2006-08-29 06:41:38.000000000 +0400
+++ mpmontg.c	2010-01-25 20:27:06.410299148 +0300
@@ -128,7 +128,7 @@ mp_err s_mp_mul_mont(const mp_int *a, co
   }
 
   MP_USED(c) = 1; MP_DIGIT(c, 0) = 0;
-  ib = MP_USED(a) + MP_MAX(MP_USED(b), MP_USED(&mmm->N)) + 2;
+  ib = MP_USED(a) + MP_MAX(MP_USED(b), MP_USED(&mmm->N)) + 3;
   if((res = s_mp_pad(c, ib)) != MP_OKAY)
     goto CLEANUP;

 
but I am not sure, whether this a valid approach. Douglas, could you take a look at ?

It is also an instresting question, why this problem was not noted earlier...
Konstantin: Good finding!  I stared at the s_mp_mul_mont code
for an hour last night, but I couldn't find where the bug is or
why your change would eliminate the bug.  Here are my findings:

1. I searched for "+ 2" and "+ 3" in that file.  I didn't find any
"+ 3", but there are many instances of "+ 2".  So it's not clear
by this dumb method why using "+ 2" in s_mp_mul_mont
is wrong.

2. It seems OK for 'ib' to be larger than necessary because
the effect is that 'c' will have more leading zero digits, which
will be removed by the s_mp_clamp call at the end of the
s_mp_mul_mont function.

I hope Douglas or Nelson, who is more familiar with the
MPI code, will have better luck tracking this down.
More on my "dumb" code inspection method in comment 14:
The structure of s_mp_redc in mpmontg.c is very similar to
s_mp_mul_mont.  When we understand this bug in
s_mp_mul_mont, we should check if s_mp_redc has the
same bug.
It seems that because of this second for loop in s_mp_mul_mont:

153   if (usedb < MP_USED(&mmm->N)) {
154     for (usedb = MP_USED(&mmm->N); ib < usedb; ++ib ) {
155       m_i = MP_DIGIT(c, ib) * mmm->n0prime;
156       s_mp_mul_d_add_offset(&mmm->N, m_i, c, ib);
157     }
158   }

'c' needs to have at least twice the number of digits of mmm->N.
So the formula for the initial value of 'ib' that Konstantin changed
should contain, ignoring the "+ 2",
    MP_USED(&mmm->N) + MP_MAX(MP_USED(b), MP_USED(&mmm->N))
rather than
    MP_USED(a) + MP_MAX(MP_USED(b), MP_USED(&mmm->N))

Perhaps the "+ 2" is actually more than necessary, so it
can compensate for the difference between MP_USED(a)
and MP_USED(&mmm->N) sometimes.  Perhaps 'a' and 'b'
usually have the same number of digits as mmm->N.
(In reply to comment #14)

> 2. It seems OK for 'ib' to be larger than necessary because
> the effect is that 'c' will have more leading zero digits, which
> will be removed by the s_mp_clamp call at the end of the
> s_mp_mul_mont function.

I agree with this.

> I hope Douglas or Nelson, who is more familiar with the
> MPI code, will have better luck tracking this down.

I'm not too familiar with the Montgomery multiplication code here, so my inspection will be no better than yours.

(In reply to comment #16)

> 'c' needs to have at least twice the number of digits of mmm->N.
> So the formula for the initial value of 'ib' that Konstantin changed
> should contain, ignoring the "+ 2",
>     MP_USED(&mmm->N) + MP_MAX(MP_USED(b), MP_USED(&mmm->N))
> rather than
>     MP_USED(a) + MP_MAX(MP_USED(b), MP_USED(&mmm->N))
> 
> Perhaps the "+ 2" is actually more than necessary, so it
> can compensate for the difference between MP_USED(a)
> and MP_USED(&mmm->N) sometimes.  Perhaps 'a' and 'b'
> usually have the same number of digits as mmm->N.

I would be hesitant to assume that a and b are already reduced to the same number of digits as mmm->N.  So to be on the safe side I would go with
     MP_MAX(MP_USED(a), MP_USED(&mmm->N)) + MP_MAX(MP_USED(b), MP_USED(&mmm->N))

But as for Konstantin's original observation, regarding what number should be added to this (+ 2, + 3, ...?) I have no intuition whatsoever.  My intuition tells me that the line above should in fact be sufficient.

Does anyone know what builds would actually use this section of code?  It's wrapped in an 
    #if !defined(MP_ASSEMBLY_MUL_MONT)
and I would have expected that NSS was built with assembly Montgomery mont mul for most systems of interest, so I'm surprised Konstantin even discovered this bug (not that I'm saying it shouldn't be fixed).
Douglas, I actually came to the same conclusion that
we should go with
    MP_MAX(MP_USED(a), MP_USED(&mmm->N)) + MP_MAX(MP_USED(b),
MP_USED(&mmm->N))

to be on the safe side.

Konstantin, you're in the best position to get to the
bottom of this.  Shall I create a patch for you to
review and test?
(In reply to comment #17)
> (In reply to comment #14)
> > I hope Douglas or Nelson, who is more familiar with the
> > MPI code, will have better luck tracking this down.
> 
> I'm not too familiar with the Montgomery multiplication code here, so my
> inspection will be no better than yours.

Sad news :(

> Does anyone know what builds would actually use this section of code?  It's
> wrapped in an 
>     #if !defined(MP_ASSEMBLY_MUL_MONT)
> and I would have expected that NSS was built with assembly Montgomery mont mul
> for most systems of interest, so I'm surprised Konstantin even discovered this
> bug (not that I'm saying it shouldn't be fixed).

When I build NSS, I do not set any mathematics-related options. But this peace of code works both for 32-bit and 64-bit builds on Solaris.

And yes, Montgomery modular mult use assembler:, the "s_mpv_mul_d", "s_mp_mul_d_add_offset" and "s_mpv_mul_d_add_prop" used by "s_mp_mul_mont" are implemented in platform-specific assembler.

(In reply to comment #18)
> Douglas, I actually came to the same conclusion that
> we should go with
>     MP_MAX(MP_USED(a), MP_USED(&mmm->N)) + MP_MAX(MP_USED(b), MP_USED(&mmm->N))
> 
> to be on the safe side.
> 
> Konstantin, you're in the best position to get to the bottom of this.  Shall I create a patch for you to review and test?

This change is certainly safe, but do not patch right now, please. I should investigate the "s_mp_mul_mont" more carefully.

Douglas, when you were not able to reproduce the bug in test case, what kind of system you were running on ? 32-bit or 64-bit, in particular ?
(In reply to comment #19)

> Douglas, when you were not able to reproduce the bug in test case, what kind of
> system you were running on ? 32-bit or 64-bit, in particular ?

64-bit build on Mac OS X 10.6.2 on an Intel Core 2 Duo.
Attached patch Patch with tight upper bounds (obsolete) — Splinter Review
Please review and test this patch.

You may find it hard to verify the upper bounds
are correct.  Also, this patch changes s_mp_redc,
which isn't known to be broken, so it violates
the "don't fix it if it ain't broke" principle.
This patch fixes only the function that's known to
be broken, and uses a looser upper bound that's
clearly big enough.

If none of us is willing to invest the time to
understand the relevant MPI code, this patch is
a better fix.
Attachment #424378 - Flags: superreview?(douglas)
Attachment #424378 - Flags: review?(andreev)
(In reply to comment #21)
> Created an attachment (id=424376) [details]
> Patch with tight upper bounds
> Please review and test this patch.

Wan, I can not review until better understanding of the Montgomery algorithm.

(In reply to comment #22)
> If none of us is willing to invest the time to
> understand the relevant MPI code, this patch is
> a better fix.

I have queried a mathematician for algorithm details. I believe I could review after getting the response.
Konstantin: one lame way to review these patches without
understanding the Montgomery algorithm is to convince
yourself that it's OK to use a looser (larger than necessary)
upper bound on the number of digits required, and then
study the for loops to determine how many digits they need
for the output.

For s_mp_redc, it's possible that the MPI code has some
implicit assumption on the relative sizes of 'T' and the
modulus 'mmm->N'.  For example, I read in Tom St. Denis'
book "BigNum Math" that 'T' should be less than the
square of the modulus.  With that assumption, the expression
  i = MP_MAX(MP_USED(T), 2 * MP_USED(&mmm->N)) + 2;
in my patch attachment 424376 [details] [diff] [review] will become
  i = 2 * MP_USED(&mmm->N) + 2;
I originally coded mpmontg.c according to the description of Peter Montgomery's multiplication method, as improved by Dusse' & Kaliski.  You can read the paper 
( http://dsns.csie.nctu.edu.tw/research/crypto/HTML/PDF/E90/230.PDF pages 2-8)
It's an easy read, written for SW developers.  After reading it, you may be 
able to improve the code in several ways.  

I originally tried implementing all 3 of the suggested improvements, 
but found that I got the best results with improvement 1 alone.
Comment on attachment 424378 [details] [diff] [review]
Small patch with a looser upper bound

The patch should respect to s_mp_redc as well.
Attachment #424378 - Flags: review?(andreev) → review-
(In reply to comment #25)
> I originally coded mpmontg.c according to the description of Peter
> Montgomery's multiplication method, as improved by Dusse' & Kaliski.
> You can read the paper (http://dsns.cs.../230.PDF)

Nelson, thank you for this direction. The paper has helped me to understand what's going on.

(In reply to comment #17 of Douglas Stebila)
> (In reply to comment #16 of Wan-Teh Chang)
> > Perhaps 'a' and 'b'usually have the same number of digits as mmm->N.
>
> I would be hesitant to assume that a and b are already reduced to the same
> number of digits as mmm->N.

Douglas, we may be sure that input multipliers for "s_mp_mul_mont" are reduced, for the following reasons:

1) <Montgomery> REDC algorithm was designed to work with "n-residues", which < N by construction.
2) REDC(T) shall return (T*R-1) mod N. One can verify that, if T = A * B and A,B > N, then REDC(T) won't return what expected.

Thereby I do not see any rationality to call "s_mp_mul_mont/s_mp_redc" with non-reduced arguments. If this happens, it's a library internal error.

And last, but not least: s_mp_mul_mont is used 3 times in the NSS code:

   mp_exptmod_i(), mp_exptmod_safe_i() in mpmontg.c, and
   ec_GFp_mul_mont() in ecp_mont.c

with reduced arguments, as expected.

Thereby the room reservation for reduct will be as simple as:

    (MP_USED(&mmm->N) << 1) + 1;

The last "+ 1" is for possible carry.

Wan, thank you for observation regarding "s_mp_redc". It suffers from the same disease, that "s_mp_mul_mont" does. Let me show one more example. Let N = 2^256 - 0x269 (CryptoPro-A elliptic curve prime, as in previous examples). Let's try to reduce just "1":

    s_mp_mul_mont( 1, 1, &rv, N);
    s_mp_redc( 1, N );

in their current form both lose 5 of 8 digits of result:

    rv      ==                                         0x4A44E00D46F3234475D5ADA6
    REDC(1) == 0x15291380351bcc8d11d756b763fe57219b9771454a44e00d46f3234475d5ada6

Douglas, once again about #if/#endif brackets around:

#if !defined(MP_ASSEMBLY_MUL_MONT) && !defined(MP_MONT_USE_MP_MUL)
mp_err s_mp_mul_mont(...) { ... }
#endif

The MP_ASSEMBLY_MUL_MONT is nowhere else in the NSS code. I would get rid of it, to not create wrong expectations.

The MP_MONT_USE_MP_MUL acts as a switch: if it is defined, the clients use (mp_mul/s_mp_redc) couple instead of s_mp_mul_mont. I assume that's intended for platforms where multiprecision multiplication is faster than digit-by-digit "s_mp_mul_mont" approach.

As far as I see from freebl/{mpi/target.mk,Makefile}, NSS defines MP_MONT_USE_MP_MUL if:

   -- cpu: X86 (32bit), os: linux, win32
   -- cpu: sparc (any)

I am curious of this particular OS choice for X86. Could anybody shed some light on this ?
This patch makes the enhancements:
1) reserves room for reduct in s_mp_redc/s_mp_mul_mont, documents function signatures.
2) corrects documentation reference
3) gets rid of MP_ASSEMBLY_MUL_MONT
4) gets rid of excessive "b" field of mp_mont_modulus structure.

I applied this patch to NSS 3.12.5, compiled NSS with Extended ECC on X86, and ran the test suite. All tests were passed.
Attachment #424376 - Attachment is obsolete: true
Attachment #424378 - Attachment is obsolete: true
Attachment #424378 - Flags: superreview?(douglas)
Comment on attachment 428736 [details] [diff] [review]
Room reservation for reduction in s_mp_redc/s_mp_mul_mont,v4

asking self to review.
Would like Douglas and/or Wan-Teh to also review.

In answer to question in preceeding comment, 
All existing ifdef choices were based on empirical performance test results done in 2001.  It's probably time to redo them on modern CPUs.
Attachment #428736 - Attachment description: Room reservation for reduct in s_mp_redc/s_mp_mul_mont,v4 → Room reservation for reduction in s_mp_redc/s_mp_mul_mont,v4
Attachment #428736 - Flags: review?(nelson)
Let us not forget here that the whole point of using this algorithm is to 
maximize performance.  It is certainly true that one can safely choose to 
use variables that are larger than necessary in the the Montgomery steps.
The price for that choice will be performance.  It should be possible to 
compute the exact size necessary and use just that much.  Let us strive to 
do that.
Please ignore my comment 30.  It was a reply to some comments I received in 
private email, which I mistakenly thought were comments in this bug.
The advantage of using a size larger than the exact size is that
it may simplify code reviews.  For example, I believe most
instances of + 2 in the padding size in that file can be + 1.
Konstantin made two such changes in his patch (attachment
428736 [review]).  But it may take a lot of time to convince ourselves
that + 1 is the exact size, and we would save only one digit.
I am in doubt where original "+ 2" came from. But the "+ 1" in "len(T) := 2*len(N) + 1" is simple to explain: the extra digit is to keep one bit of carry, that may occure during REDC execution.

Wan, if you are hesitant, let replace "+ 1" with "+ 2", it doesn't make a big difference.
Blocks: FIPS2010
Wan-Teh, the savings of "only 1 digit" can reduce the number of long slow
multiply instructions required to do some operations by thousands, perhaps
tens of thousands.  It represents "low hanging fruit" for someone interested
in performance.  We ought not to casually discard performance.  

I hoped to get to the work of getting the code to use minimum sizes this 
weekend, but didn't.  Maybe next weekend.
I was just pointing out that if we're willing to use a sub-optimal
upper bound that both Douglas and I could verify, we would have
fixed the bug two months ago (see the discussion around comment 17).
Today, this bug is still open.  That's the trade-off.
It looks like there was some discussion beyond this thread :)

I guess, you want to pre-allocate (for Montgomery product) no more digits than product will contain. Do you ?

Unfortunately, this is not possible. As demonstrated in comment 27, the number of digits in product does not depend from the numbers of digits in multipliers. Come again (see comment 27): the result of 
    s_mp_mul_mont( 1, 1, &rv, N);
occupies the number of digits in N, not in "1".

(In reply to comment #36)
> I was just pointing out that if we're willing to use a sub-optimal
> upper bound that both Douglas and I could verify, we would have
> fixed the bug two months ago (see the discussion around comment 17).

Yes, but that fix was a bit overkill, because 
  MP_MAX(MP_USED(a), MP_USED(&mmm->N)) + MP_MAX(MP_USED(b), MP_USED(&mmm->N))
is just 
  2 * MP_USED(&mmm->N)
No longer blocks: FIPS2010
OS: Solaris → All
Hardware: x86 → All
Whiteboard: FIPS
Stick back on the FIPS rereview list
Blocks: FIPS2010
Comment on attachment 424378 [details] [diff] [review]
Small patch with a looser upper bound

I'm going to give this patch r+ as a short term measure, to get a fix in for this, while we continue to find the best solution.
Attachment #424378 - Attachment is obsolete: false
Attachment #424378 - Flags: superreview+
(In reply to comment #39)
> (From update of attachment 424378 [details] [diff] [review])
> I'm going to give this patch r+ as a short term measure, to get a fix in for
> this, while we continue to find the best solution.

Please, do not.

Attachment 424378 [details] [diff] only fixes bug in "s_mp_mul_mont", but "s_mp_redc" is NOT fixed.

Could you, please, advice, what is wrong with patch 433870 ?
Comment on attachment 433870 [details] [diff] [review]
Konstantin's patch v4, converted to CVS diff against 3.13 branch

This weekend, I instrumented libfreebl and compared the effects of the
two patches against the present committed code on RSA operations and 
ECC operations.  RSA operations tend to be VERY constant in the number
of multiplication operations done for a given modulus size.  By contrast
ECC operations tend to vary a great deal for operations done in a given
curve.  

My chief concern was the impact on RSA operations.  I observed that the 
only performance differences occurred when modulus length was not a 
multiple of 32 bits.  IMO, this is a sufficiently rare case that I don't 
care about it.  Indeed, softoken will only generate RSA keys whose modulus
size is a multiple of 16 bits.  

So I'm willing to accept Konstantin's patch, which is "correct" (produces
correct values) and appears to be no worse than the present code when 
modulus length is a multiple of 32 bits.
Attachment #433870 - Flags: review+
Attachment #428736 - Attachment is obsolete: true
Attachment #428736 - Flags: review?(nelson)
Bug 536389: ECC wrong point multiplication (=> wrong signatures) case
Patch contributed by Konstantin Andreev <andreev@swemel.ru>, r=nelson

On 3.13 branch

ecl/ecp_mont.c; new revision: 1.2.182.1; previous revision: 1.2
mpi/mpi-priv.h; new revision: 1.21.8.1; previous revision: 1.21
mpi/mpi.c;      new revision: 1.45.56.4; previous revision: 1.45.56.3
mpi/mpmontg.c;  new revision: 1.20.56.1; previous revision: 1.20
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Nelson's checkin in comment 43 fixed this bug in NSS 3.13.
Target Milestone: --- → 3.13
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: FIPS → FIPS [sg:investigate]

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: douglas → nobody
Severity: normal → S3
Priority: -- → P4

It looks like the code from the patch is in NSS.

Status: REOPENED → RESOLVED
Closed: 14 years ago2 months ago
Resolution: --- → WORKSFORME

So the original patch was included 13 years ago. Brian reopenned the bug because of bug 1128140 and the fact he wanted it in 3.12. The former bug is closed and any relevancy of NSS 3.12 in 2024 is moot, so Anne is correct in closing this bug.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: