Last Comment Bug 321350 - Implement optimized code for NIST Suite B elliptic curves
: Implement optimized code for NIST Suite B elliptic curves
Status: RESOLVED FIXED
ECC FIPS
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: 3.11
: All All
: P2 enhancement (vote)
: 3.11.1
Assigned To: Douglas Stebila
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-12-23 08:41 PST by Douglas Stebila
Modified: 2006-04-27 08:06 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Optimized reduction for NIST P-256, P-384, and P-521 (2.99 KB, patch)
2006-03-02 18:48 PST, Douglas Stebila
no flags Details | Diff | Review
Optimized Reduction timings (673 bytes, text/plain)
2006-03-02 18:49 PST, Douglas Stebila
no flags Details
lib/freebl/ecl/ecp_256.c (7.81 KB, text/plain)
2006-03-02 18:49 PST, Douglas Stebila
vipul.gupta: review+
wtc: superreview+
Details
lib/freebl/ecl/ecp_384.c (7.91 KB, text/plain)
2006-03-02 18:50 PST, Douglas Stebila
vipul.gupta: review+
wtc: superreview+
Details
lib/freebl/ecl/ecp_521.c (4.82 KB, text/plain)
2006-03-02 18:50 PST, Douglas Stebila
vipul.gupta: review+
wtc: superreview+
Details
Optimized reduction for NIST P-256, P-384, and P-521 (unified) (6.13 KB, patch)
2006-03-03 05:50 PST, Douglas Stebila
vipul.gupta: review+
wtc: superreview+
Details | Diff | Review
Tool to measure ECC performance at the ECDH and PKCS #11 levels (30.22 KB, patch)
2006-03-15 11:48 PST, Robert Relyea
no flags Details | Diff | Review
Optimized curves, faster adds. (81.53 KB, patch)
2006-03-15 12:02 PST, Robert Relyea
douglas: review-
Details | Diff | Review
Optimized curves, faster adds. Version 2 (82.01 KB, patch)
2006-03-20 14:58 PST, Robert Relyea
douglas: review+
vipul.gupta: superreview+
Details | Diff | Review

Description Douglas Stebila 2005-12-23 08:41:40 PST
The elliptic curve library (ECL) in NSS currently has optimized code only for curves secp160r1, secp192r1, and secp224r1, in the files lib/freebl/ecl/ecp_192.c, ecp_224.c, ecp_fp160.c, ecp_fp192.c, and ecp_fp224.c.  Optimized code should be developed for the curves in the NIST Suite B cryptography recommendations (http://www.nsa.gov/ia/industry/crypto_suite_b.cfm?MenuID=10.2.7), namely for the nistp224 (==secp256r1), nistp384 (==secp384r1), and nistp521 (==secp521r1) curves.  This and bug #320583 would allow NSS to support all recommendations (except ECMQV) of NIST Suite B.
Comment 1 Robert Relyea 2006-01-20 14:11:18 PST
Hmm, our test are showing p192 to be slower then p224 and p256. p224 and p256 are about on par, p384 is much slower.

bob
Comment 2 Nelson Bolyard (seldom reads bugmail) 2006-02-27 10:46:51 PST
Above, this bug says "nistp224 (==secp256r1)".  That's a typo, right?

If this work is going to be done, it needs to be done in time for FIPS
evaluation, right?

Assigning this to Vipul, so it doesn't appear to be Wan-Teh's responsibility.
Comment 3 Douglas Stebila 2006-03-02 18:48:28 PST
Created attachment 213840 [details] [diff] [review]
Optimized reduction for NIST P-256, P-384, and P-521

Implementation of fast reduction for the remaining NIST prime curves: P-256, P-384, and P-521.

On my test systems (PowerPC G4 and PowerPC G5), the fast reduction code is actually slower than the Montgomery modular reduction code for all curves except P-521, so the algorithm wiring in ecl.c only uses the fast reduction for P-521.  I've attached timings from my system showing this behaviour.  It's probably worth checking in all the code, even though only the P-521 code will actually be wired in.

I wasn't able to force cvs diff to create a patch that also contained new files, so also part of the CVS submission are the attachments ecp_256.c, ecp_384.c, and ecp_521.c, all of which should be placed in the lib/freebl/ecl directory.
Comment 4 Douglas Stebila 2006-03-02 18:49:05 PST
Created attachment 213841 [details]
Optimized Reduction timings
Comment 5 Douglas Stebila 2006-03-02 18:49:46 PST
Created attachment 213843 [details]
lib/freebl/ecl/ecp_256.c
Comment 6 Douglas Stebila 2006-03-02 18:50:17 PST
Created attachment 213844 [details]
lib/freebl/ecl/ecp_384.c
Comment 7 Douglas Stebila 2006-03-02 18:50:47 PST
Created attachment 213845 [details]
lib/freebl/ecl/ecp_521.c
Comment 8 Vipul Gupta 2006-03-02 22:46:25 PST
(In reply to comment #3)

> On my test systems (PowerPC G4 and PowerPC G5), the fast reduction code is
> actually slower than the Montgomery modular reduction code for all curves
> except P-521, so the algorithm wiring in ecl.c only uses the fast reduction for
> P-521.  I've attached timings from my system showing this behaviour.  It's
> probably worth checking in all the code, even though only the P-521 code will
> actually be wired in.

According to comment 1, shouldn't the patch also rewire the code for 
nistp192 to use montgomery reduction? and possibly even nistp224?

nit: could you please post a unified diff (use diff -u)?

vipul
Comment 9 Douglas Stebila 2006-03-03 05:50:42 PST
Created attachment 213889 [details] [diff] [review]
Optimized reduction for NIST P-256, P-384, and P-521 (unified)
Comment 10 Douglas Stebila 2006-03-03 05:52:01 PST
(In reply to comment #8)
> According to comment 1, shouldn't the patch also rewire the code for 
> nistp192 to use montgomery reduction? and possibly even nistp224?

It does that.

Douglas
Comment 11 Vipul Gupta 2006-03-03 06:59:51 PST
(In reply to comment #10)
> (In reply to comment #8)
> > According to comment 1, shouldn't the patch also rewire the code for 
> > nistp192 to use montgomery reduction? and possibly even nistp224?
> 
> It does that.

  Ok. I'll take a closer look later today. Since I'm not familiar
  with the algorithmic details of the, and I don't see any stylistic issues,
  I'll simply be treating this code as a blackbox for tests.

  I'm curious to know if you've run all.sh using the merged
  ECC and non-ECC test scripts (you'll need to apply the
  latest patches for the cert, smime, ssl and tools directory
  available as part of bug 324887). These tests have been
  modified to exercise the three NIST B curves at the highest
  level (i.e. at the certutil/tstclnt/selfserv/cmsutil level).
  [Ideally, we should do these tests in both 32 and 64 bit mode
  with the new code wired in for *all* the curves so we know
  the code behaves correctly should someone choose to change
  the wiring later.]
 
  Also, does the optimized reduction implemented here
  implement what I remember as the shift and subtract
  technique seen in Dan Bernstein's P224 code or does that
  only make sense for floating point code?

  Thanks for putting in all this effort and doing it so 
  promptly. BTW, any insight into why the public-key
  verification tests are failing for the NIST Koblitz curves
  as per Wan-Teh's email?

  vipul
Comment 12 Douglas Stebila 2006-03-03 07:29:41 PST
(In reply to comment #11)
>   I'm curious to know if you've run all.sh using the merged
>   ECC and non-ECC test scripts (you'll need to apply the
>   latest patches for the cert, smime, ssl and tools directory
>   available as part of bug 324887). These tests have been
>   modified to exercise the three NIST B curves at the highest
>   level (i.e. at the certutil/tstclnt/selfserv/cmsutil level).
>   [Ideally, we should do these tests in both 32 and 64 bit mode
>   with the new code wired in for *all* the curves so we know
>   the code behaves correctly should someone choose to change
>   the wiring later.]

No, I haven't tried the merged all.sh scripts.  Could you give it a try?

>   Also, does the optimized reduction implemented here
>   implement what I remember as the shift and subtract
>   technique seen in Dan Bernstein's P224 code or does that
>   only make sense for floating point code?

I haven't read Dan Bernstein's P224 code, but the P224 algorithm as implemented is basically some shifts and some additions and subtractions.

>   Thanks for putting in all this effort and doing it so 
>   promptly. BTW, any insight into why the public-key
>   verification tests are failing for the NIST Koblitz curves
>   as per Wan-Teh's email?

No, I don't know why the KIST K Koblitz curves are failing.  My initial guess was that maybe they use different irreducibles than the NIST B and so they were failing for the same reason that the c2pnb163v? curves failed, but that's not the case as far as I can tell.  I've asked Wan-Teh to send me more info about the tests so I can look into it.
Comment 13 Vipul Gupta 2006-03-03 12:26:27 PST
Comment on attachment 213889 [details] [diff] [review]
Optimized reduction for NIST P-256, P-384, and P-521 (unified)

Looks ok. Will run merged tests after wiring in the new code (hopefully, later today). Didn't want to hold up other reviews in the meantime.
Comment 14 Vipul Gupta 2006-03-03 12:27:02 PST
Comment on attachment 213843 [details]
lib/freebl/ecl/ecp_256.c

Looks ok. Will run merged tests after wiring in the new code (hopefully, later today). Didn't want to hold up other reviews in the meantime.
Comment 15 Vipul Gupta 2006-03-03 12:27:32 PST
Comment on attachment 213844 [details]
lib/freebl/ecl/ecp_384.c

Looks ok. Will run merged tests after wiring in the new code (hopefully, later today). Didn't want to hold up other reviews in the meantime.
Comment 16 Vipul Gupta 2006-03-03 12:27:58 PST
Comment on attachment 213845 [details]
lib/freebl/ecl/ecp_521.c

Looks ok. Will run merged tests after wiring in the new code (hopefully, later today). Didn't want to hold up other reviews in the meantime.
Comment 17 Wan-Teh Chang 2006-03-03 22:06:17 PST
Comment on attachment 213889 [details] [diff] [review]
Optimized reduction for NIST P-256, P-384, and P-521 (unified)

r=wtc.

In ecl/ecl.c, we have

> 		} else if ((name == ECCurve_SECG_PRIME_192K1)) {
> 			group =
> 				ECGroup_consGFp_mont(&irr, &curvea, &curveb, &genx, &geny,
> 									 &order, params->cofactor);
> 			if (group == NULL) { res = MP_UNDEF; goto CLEANUP; }
>-			MP_CHECKOK(ec_group_set_gfp192(group, name));

Is this rewiring or bug fixing?  I don't understand when ec_group_set_gfp<ddd> should be used.

>-		/* XXX secp521r1 fails ecp_test with &ec_GFp_pts_mul_jac */
>-		if (name == ECCurve_SECG_PRIME_521R1) {
>-			group->points_mul = &ec_pts_mul_simul_w2;
>-		}

Why is this removed?
Comment 18 Douglas Stebila 2006-03-04 09:43:34 PST
(In reply to comment #17)

> In ecl/ecl.c, we have
> 
> > 		} else if ((name == ECCurve_SECG_PRIME_192K1)) {
> > 			group =
> > 				ECGroup_consGFp_mont(&irr, &curvea, &curveb, &genx, &geny,
> > 									 &order, params->cofactor);
> > 			if (group == NULL) { res = MP_UNDEF; goto CLEANUP; }
> >-			MP_CHECKOK(ec_group_set_gfp192(group, name));
> 
> Is this rewiring or bug fixing?  I don't understand when ec_group_set_gfp<ddd>
> should be used.

This is the removal of a line that was unnecessary.  ec_group_set_grp<ddd> functions have their own switch statement in them to determine what to do based on the curve.  The secp192k1 curve doesn't have any optimized code in the ecp_192.c file, so the ec_group_set_grp192 call on this curve doesn't do anything and hence can be removed.  Not a bug, but an unnecessary line of code.

> >-		/* XXX secp521r1 fails ecp_test with &ec_GFp_pts_mul_jac */
> >-		if (name == ECCurve_SECG_PRIME_521R1) {
> >-			group->points_mul = &ec_pts_mul_simul_w2;
> >-		}
> 
> Why is this removed?

It no longer fails.  Three years ago I never figured out why it failed, and now I don't know why it's not failing.  It might be related to one of the array size bugs that was fixed a few months ago, but I'm not certain.  
Comment 19 Vipul Gupta 2006-03-06 14:43:19 PST
(In reply to comment #12)
> No, I haven't tried the merged all.sh scripts.  Could you give it a try?

  I've tried this (but with only the optimized reduction of secp521r1 wired in)
and all tests pass. This bug looks ready to be marked fixed/resolved.

vipul
Comment 20 Wan-Teh Chang 2006-03-06 15:29:44 PST
Douglas, how do I build and run the performance test
you used to get the numbers in comment 4?  I'd like
to get the numbers for Linux and Windows.
Comment 21 Douglas Stebila 2006-03-06 16:48:20 PST
(In reply to comment #20)
> Douglas, how do I build and run the performance test
> you used to get the numbers in comment 4?  I'd like
> to get the numbers for Linux and Windows.

You need to create standalone builds of the mpi and ecl libraries and ecl test programs:
> cd lib/freebl/mpi
> make libs
> cd../ecl
> make tests
Then run the ecp_test program with the timing option:
> ./ecp_test --time
This will run with the default algorithm wirings according to the patch.  To change the wirings, you need to edit lib/freebl/ecl/ecl.c and change the wirings in ecgroup_fromNameAndHex.  For example, if you want to use the Montgomery arithmetic, each curve just needs to have the following lines:
> group = ECGroup_consGFp_mont(&irr, &curvea, &curveb, &genx, &geny, &order, params->cofactor);
> if (group == NULL) { res = MP_UNDEF; goto CLEANUP; }
Whereas if you want fast reduction, each curve needs to have the following lines, with the appropriate function call on the third line:
> group = ECGroup_consGFp(&irr, &curvea, &curveb, &genx, &geny, &order, params->cofactor);
> if (group == NULL) { res = MP_UNDEF; goto CLEANUP; }
> MP_CHECKOK(ec_group_set_gfp224(group, name));
Comment 22 Douglas Stebila 2006-03-07 17:16:56 PST
I plan to do optimized base point multiplication as well for the NIST suite B curves, but as a separate patch.  Wan-Teh, should that be filed under a new bug or as future attachments to this bug?
Comment 23 Wan-Teh Chang 2006-03-07 17:29:04 PST
Douglas, please open a new bug for optimized base point multiplication.

Bob, could you or Rob Crittenden do the performance measurement described
in comment 20 and comment 21?
Comment 24 Robert Relyea 2006-03-08 08:01:46 PST
ecperf is a NSS test tool I created from douglas's raw performance tool. I created it for a couple of reasons: 1) I was most interested in the raw performance of the tool algorithms at both the freebl and the pkcs #11 levels (so I could compare the relative performance, and 2) I wanted to build and link with mpi as configured in our running NSS (the stand alone mpi does not necessarily have all of our platform performance enhancements built in). 3) I needed a threaded version of the tool to me able to measure the true performance  on hyperthreaded/dual core and smp systems (non-threaded versions produced numbers just more than half the true throughput).

Below is the result of 3 different configurations:
1) All the optimized curve code turned off.
2) The old 'optimized' curve code.
3) The new optimized curve code.

These were run on a 2 way Intel EMT 64 running RHEL 4 (Linux).

Summary.
1. All the old 'optimized' curves are very slow (much slower than unoptimized) - we already knew this.
2. The new optimized code is percentage slower than optimized on Intel 64 bit, except for 521 (I think douglas was predicting that).

I suspect this would not be the case on 32bit platforms, where the relative difference between bignum multiplies and other operations is greater.

-------------No optimizations----------------------------------------

Testing SECP-160K1 using freebl implementation...
    ECDH_Derive     count: 600 sec: 0.66 op/sec: 909.20
    ECDSA_Sign      count: 600 sec: 0.67 op/sec: 892.46
    ECDSA_Verify    count: 600 sec: 1.42 op/sec: 422.96
... okay.
Testing NIST-P192 using freebl implementation...
    ECDH_Derive     count: 600 sec: 0.79 op/sec: 759.06
    ECDSA_Sign      count: 600 sec: 0.83 op/sec: 724.61
    ECDSA_Verify    count: 600 sec: 1.78 op/sec: 337.66
... okay.
Testing NIST-P224 using freebl implementation...
    ECDH_Derive     count: 600 sec: 1.02 op/sec: 586.95
    ECDSA_Sign      count: 600 sec: 1.05 op/sec: 573.38
    ECDSA_Verify    count: 600 sec: 2.15 op/sec: 279.68
... okay.
Testing NIST-P256 using freebl implementation...
    ECDH_Derive     count: 600 sec: 1.20 op/sec: 498.32
    ECDSA_Sign      count: 600 sec: 1.24 op/sec: 484.67
    ECDSA_Verify    count: 600 sec: 2.54 op/sec: 235.80
... okay.
Testing NIST-P384 using freebl implementation...
    ECDH_Derive     count: 600 sec: 2.34 op/sec: 256.44
    ECDSA_Sign      count: 600 sec: 2.39 op/sec: 250.84
    ECDSA_Verify    count: 600 sec: 4.59 op/sec: 130.85
... okay.
Testing NIST-P521 using freebl implementation...
    ECDH_Derive     count: 600 sec: 4.21 op/sec: 142.50
    ECDSA_Sign      count: 600 sec: 4.28 op/sec: 140.12
    ECDSA_Verify    count: 600 sec: 7.48 op/sec:  80.19
... okay.

----------------Old optimized code--------------------------

Testing SECP-160K1 using freebl implementation...
    ECDH_Derive     count: 600 sec: 0.67 op/sec: 900.41
    ECDSA_Sign      count: 600 sec: 0.68 op/sec: 885.87
    ECDSA_Verify    count: 600 sec: 1.47 op/sec: 407.85
... okay.
Testing NIST-P192 using freebl implementation...
    ECDH_Derive     count: 600 sec: 2.51 op/sec: 239.18
    ECDSA_Sign      count: 600 sec: 2.59 op/sec: 231.55
    ECDSA_Verify    count: 600 sec: 4.43 op/sec: 135.38
... okay.
Testing NIST-P224 using freebl implementation...
    ECDH_Derive     count: 600 sec: 2.12 op/sec: 282.45
    ECDSA_Sign      count: 600 sec: 2.15 op/sec: 279.37
    ECDSA_Verify    count: 600 sec: 3.53 op/sec: 170.05
... okay.
Testing NIST-P256 using freebl implementation...
    ECDH_Derive     count: 600 sec: 1.22 op/sec: 490.35
    ECDSA_Sign      count: 600 sec: 1.24 op/sec: 483.94
    ECDSA_Verify    count: 600 sec: 2.55 op/sec: 234.85
... okay.
Testing NIST-P384 using freebl implementation...
    ECDH_Derive     count: 600 sec: 2.38 op/sec: 251.72
    ECDSA_Sign      count: 600 sec: 2.40 op/sec: 249.65
    ECDSA_Verify    count: 600 sec: 4.61 op/sec: 130.24
... okay.
Testing NIST-P521 using freebl implementation...
    ECDH_Derive     count: 600 sec: 4.16 op/sec: 144.08
    ECDSA_Sign      count: 600 sec: 4.27 op/sec: 140.48
    ECDSA_Verify    count: 600 sec: 57.27 op/sec:  10.48
... okay.

----------------New optimized code--------------------------

ecperf -t 2 -i 300
Testing SECP-160K1 using freebl implementation...
    ECDH_Derive     count: 600 sec: 0.66 op/sec: 907.78
    ECDSA_Sign      count: 600 sec: 0.69 op/sec: 868.81
    ECDSA_Verify    count: 600 sec: 1.49 op/sec: 402.64
... okay.
Testing NIST-P192 using freebl implementation...
    ECDH_Derive     count: 600 sec: 0.83 op/sec: 723.07
    ECDSA_Sign      count: 600 sec: 0.86 op/sec: 699.67
    ECDSA_Verify    count: 600 sec: 1.84 op/sec: 326.88
... okay.
Testing NIST-P224 using freebl implementation...
    ECDH_Derive     count: 600 sec: 1.04 op/sec: 575.97
    ECDSA_Sign      count: 600 sec: 1.08 op/sec: 558.00
    ECDSA_Verify    count: 600 sec: 2.19 op/sec: 273.99
... okay.
Testing NIST-P256 using freebl implementation...
    ECDH_Derive     count: 600 sec: 1.24 op/sec: 484.37
    ECDSA_Sign      count: 600 sec: 1.27 op/sec: 471.96
    ECDSA_Verify    count: 600 sec: 2.60 op/sec: 230.80
... okay.
Testing NIST-P384 using freebl implementation...
    ECDH_Derive     count: 600 sec: 2.41 op/sec: 248.64
    ECDSA_Sign      count: 600 sec: 2.47 op/sec: 242.82
    ECDSA_Verify    count: 600 sec: 4.72 op/sec: 127.22
... okay.
Testing NIST-P521 using freebl implementation...
    ECDH_Derive     count: 600 sec: 3.76 op/sec: 159.37
    ECDSA_Sign      count: 600 sec: 3.89 op/sec: 154.18
    ECDSA_Verify    count: 600 sec: 6.59 op/sec:  91.10
... okay.


Comment 25 Robert Relyea 2006-03-08 09:19:43 PST
Just to be clear for those that haven't been following:

The old curve optimizations actually slowed down performance (this lead to bizzarre situations where 256 bit EC was faster than 192 bit on all platforms because 256 bit wasn't optimized). Both douglas and I noticed this last week.

Douglas found a major source of the problems (appearantly not calling the montgomery multipy routine) and fixed it for this patch. Since we found the differential between the optimized curves and the unoptimized curves to be greater on 64 bit, we wanted to measure the actual gain (or loss) on that platform. The numbers I posted in the bug are all running on exactly the same platform (a 64 bit Intel EMT).

There are no 32-bit numbers posted. My guess is the new optimizations will be a win on 32-bit.

bob
Comment 26 Nelson Bolyard (seldom reads bugmail) 2006-03-08 10:05:28 PST
Bob, in comment 24, you wrote:
>  The new optimized code is percentage slower than optimized on Intel 64 bit,
That says "X is slower than X".  
Now, perhaps you meant "new optimized code is slower than old optimized code",
or   perhaps you meant "new optimized code is slower than unoptimized code",
or   perhaps you meant "  unoptimized code is slower than optimized code",
but I seriously doubt you meant that X is slower than X.  Please clarify.
Comment 27 Douglas Stebila 2006-03-11 12:20:43 PST
(In reply to comment #24)
> ecperf is a NSS test tool I created from douglas's raw performance tool.

Where can ecperf be found?
Comment 28 Robert Relyea 2006-03-15 11:48:36 PST
Created attachment 215159 [details] [diff] [review]
Tool to measure ECC performance at the ECDH and PKCS #11 levels

Here's ECperf that I used.
Comment 29 Robert Relyea 2006-03-15 12:02:49 PST
Created attachment 215161 [details] [diff] [review]
Optimized curves, faster adds.

this patch takes douglas's patch and adds the following:
1) inline adds/subracts in the optimized curve mod functions for 192, 224, and 256 curves.
2) replaces the mp_mod() function call in 192, 224, and 256 with optimized versions.
2) makes a single 512 bit curve independent of the MP_WORD_SIZE.
3) Creates in-line versions for add and subtract for general gf curves of 3=6 words in length.

This code increases 192 and 224 performance by 70% over the montgomery code, and 256 bit performance by about 10%.

Obvious future todos: More tweaking with the 256 curves, doing the same optimizations for 384.
Comment 30 Douglas Stebila 2006-03-17 17:49:04 PST
Comment on attachment 215161 [details] [diff] [review]
Optimized curves, faster adds.

The lib/freebl/ecl/tests/ecp_test.c programs fails with this patch on my PowerPC G5 system.  The error is as follows:

Testing NIST-P192 using specific implementation...
  base point P:
    188DA80EB03090F67CBF20EB43A18800F4FF0AFD82FF1012
    7192B95FFC8DA78631011ED6B24CDD573F977A11E794811
  (order-1)*P (ECPoint_mul):
    188DA80EB03090F67CBF20EB43A18800F4FF0AFD82FF1012
    F8E6D46A003725879CEFEE1294DB32298C06885EE186B7EE
  (order-1)*P (ECPoint_mul):
    188DA80EB03090F67CBF20EB43A18800F4FF0AFD82FF1012
    F8E6D46A003725879CEFEE1294DB32298C06885EE186B7EE
  (order)*P (ECPoint_mul):
    0
    0
  (order)*P (ECPoint_mul):
    0
    0
  (order-1)*P + (order-1)*P + P == (order-1)*P (ECPoints_mul):
    188DA80EB03090F67CBF20EB43A18800F4FF0AFD82FF1012
    F8E6D46A003725879CEFEE1294DB32298C06885EE186B7EE
  Error: validate point on base point failed.
  Error: exiting with error value -1
Error: exiting with error value -1

I have not had a chance to determine the cause of this failure as the patch is quite a large patch.  Have you run it on 64-bit platforms other than AMD?  (The PowerPC G5 is a 64-bit platform.)
Comment 31 Robert Relyea 2006-03-17 18:14:10 PST
Actually I haven't been able to run it on AMD 64. The only 64 bit platform I have access to is Linux 64 bit.
I've run both 32bit and 64 bit. I've also run the code without turning on the assembler.

If you are failing on the 192 bit curve, the problem should be isolated to to ecp_192.c and ecl-curve.h (well you also have to turn on the 192 optimized curve in ecl.c).

Do you have test vectors I can run on my machine?

If it's failing on a single test (are a set of tests), it's probably something wrong with the final reduction test (though 192's test is pretty simple). If it is failing generally, it could be a problem with the MP_ADD_CARRY or MP_SUB_BORROW macros. They were copied from mpi add and subtract routines, but I may have missed something.

bob
Comment 32 Douglas Stebila 2006-03-17 18:22:33 PST
(In reply to comment #31)
> Do you have test vectors I can run on my machine?

You should be able to run the same tests I did by building the standline MPI and ECL libraries and then running ecp_test --print in the ecl directory.
Comment 33 Nelson Bolyard (seldom reads bugmail) 2006-03-17 18:28:27 PST
(In reply to comment #31)
> Actually I haven't been able to run it on AMD 64. The only 64 bit platform I
> have access to is Linux 64 bit.

Did you mean Intel 64 bit?
Comment 34 Robert Relyea 2006-03-20 11:56:29 PST
yes, I mean Linux 64 bit running on an Intel EMT 64 processor.
Comment 35 Robert Relyea 2006-03-20 14:58:46 PST
Created attachment 215707 [details] [diff] [review]
Optimized curves, faster adds. Version 2

Found 192 problem (missing clamp).
Fixed 256 final reduce check.
Fixed MP_NO_MP_WORD ADD and Subract macros
Comment 36 Robert Relyea 2006-03-20 15:01:07 PST
I was able to reproduce the ecp_test failure on my 64 bit platform (missing clamp). updated patch attached.
Comment 37 Douglas Stebila 2006-03-21 17:13:03 PST
Comment on attachment 215707 [details] [diff] [review]
Optimized curves, faster adds. Version 2

Patch applies cleanly and all tests (both freebl/ecl/ec?_test and all.sh) pass.
Comment 38 Robert Relyea 2006-03-23 11:57:57 PST
Comment on attachment 215707 [details] [diff] [review]
Optimized curves, faster adds. Version 2

need 2nd review for 3.11
Comment 39 Vipul Gupta 2006-03-23 14:40:52 PST
Comment on attachment 215707 [details] [diff] [review]
Optimized curves, faster adds. Version 2

I didn't have time to actually apply the patch and run tests so I'm relying on Douglas's assertion that everything checks out. I've read through the patch and it seems fine.
Comment 40 Julien Pierre 2006-03-23 17:34:54 PST
This patch broke the Sparc 32-bit solaris build on the tip. You can see that in the tinderbox.

gmake[1]: Entering directory `/h/monstre/export/home/nss/tip/mozilla/security/nss/lib/freebl'
cc -o SunOS5.9_DBG.OBJ/SunOS_ABI32_INT32/ecl_gf.o -c -g -KPIC -DSVR4 -DSYSV -D__svr4 -D__svr4__ -DSOLARIS -D_REENTRANT -DSOLARIS2_9 -D_SVID_GETTOD -xarch=v8 -xs -DXP_UNIX -DSHLIB_SUFFIX=\"so\" -DSHLIB_PREFIX=\"lib\" -DSHLIB_VERSION=\"_3\" -DSOFTOKEN_SHLIB_VERSION=\"3\" -DNSS_ENABLE_ECC -DRIJNDAEL_INCLUDE_TABLES -DDEBUG -UNDEBUG -DDEBUG_jp96085 -DMP_USE_UINT_DIGIT -DMP_ASSEMBLY_MULTIPLY -DSHA_NO_LONG_LONG  -DMP_API_COMPATIBLE -I/usr/dt/include -I/usr/openwin/include -I../../../../dist/SunOS5.9_DBG.OBJ/include  -I../../../../dist/public/nss -I../../../../dist/private/nss -Impi -Iecl  ecl/ecl_gf.c
"ecl/ecl_gf.c", line 278: syntax error before or at: r0
"ecl/ecl_gf.c", line 279: syntax error before or at: r1
"ecl/ecl_gf.c", line 280: syntax error before or at: r2
"ecl/ecl_gf.c", line 365: syntax error before or at: r0
"ecl/ecl_gf.c", line 366: syntax error before or at: r1
"ecl/ecl_gf.c", line 367: syntax error before or at: r2
"ecl/ecl_gf.c", line 368: syntax error before or at: r3
"ecl/ecl_gf.c", line 425: cannot recover from previous errors
cc: acomp failed for ecl/ecl_gf.c
gmake[1]: *** [SunOS5.9_DBG.OBJ/SunOS_ABI32_INT32/ecl_gf.o] Error 2
gmake[1]: Leaving directory `/h/monstre/export/home/nss/tip/mozilla/security/nss/lib/freebl'
gmake: *** [libs] Error 2
Comment 41 Nelson Bolyard (seldom reads bugmail) 2006-03-24 01:11:09 PST
Sheriff Nelson backed out a certain set of changes that broke the build.
Please remember to watch Tinderbox after doing a checkin.

Checking in Makefile;       new revision: 1.77; previous revision: 1.76
Checking in manifest.mn;    new revision: 1.46; previous revision: 1.45
Checking in ecl/Makefile;   new revision: 1.4; previous revision: 1.3
Checking in ecl/ecl-priv.h; new revision: 1.6; previous revision: 1.5
Checking in ecl/ecl.c;      new revision: 1.8; previous revision: 1.7
Checking in ecl/ecl_gf.c;   new revision: 1.4; previous revision: 1.3
Checking in ecl/ecp_192.c;  new revision: 1.4; previous revision: 1.3
Checking in ecl/ecp_224.c;  new revision: 1.4; previous revision: 1.3
Comment 42 Robert Relyea 2006-03-24 14:56:37 PST
Checked in on tip :

Checking in manifest.mn;
/cvsroot/mozilla/security/nss/lib/freebl/manifest.mn,v  <--  manifest.mn
new revision: 1.47; previous revision: 1.46
done
Checking in Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v  <--  Makefile
new revision: 1.78; previous revision: 1.77
done
Checking in ecl/Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/Makefile,v  <--  Makefile
new revision: 1.5; previous revision: 1.4
done
Checking in ecl/ecl-priv.h;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecl-priv.h,v  <--  ecl-priv.h
new revision: 1.7; previous revision: 1.6
done
Checking in ecl/ecl.c;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecl.c,v  <--  ecl.c
new revision: 1.9; previous revision: 1.8
done
Checking in ecl/ecl_gf.c;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecl_gf.c,v  <--  ecl_gf.c
new revision: 1.5; previous revision: 1.4
done
Checking in ecl/ecp_192.c;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecp_192.c,v  <--  ecp_192.c
new revision: 1.5; previous revision: 1.4
done
Checking in ecl/ecp_224.c;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecp_224.c,v  <--  ecp_224.c
new revision: 1.5; previous revision: 1.4
done

This has now been tested good on solaris 32bit Sparc.

bob
Comment 43 Robert Relyea 2006-03-30 16:29:10 PST
Checked into NSS 3.11 branch:
Checking in Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v  <--  Makefile
new revision: 1.70.2.4; previous revision: 1.70.2.3
done
Checking in manifest.mn;
/cvsroot/mozilla/security/nss/lib/freebl/manifest.mn,v  <--  manifest.mn
new revision: 1.44.2.1; previous revision: 1.44
done
Checking in ecl/Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/Makefile,v  <--  Makefile
new revision: 1.2.28.1; previous revision: 1.2
done
Checking in ecl/ecl-priv.h;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecl-priv.h,v  <--  ecl-priv.h
new revision: 1.4.2.1; previous revision: 1.4
done
Checking in ecl/ecl.c;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecl.c,v  <--  ecl.c
new revision: 1.5.2.2; previous revision: 1.5.2.1
done
Checking in ecl/ecl_gf.c;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecl_gf.c,v  <--  ecl_gf.c
new revision: 1.2.28.1; previous revision: 1.2
done
Checking in ecl/ecp_192.c;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecp_192.c,v  <--  ecp_192.c
new revision: 1.2.28.1; previous revision: 1.2
done
Checking in ecl/ecp_224.c;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecp_224.c,v  <--  ecp_224.c
new revision: 1.2.28.1; previous revision: 1.2
done
Checking in ecl/ecp_256.c;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecp_256.c,v  <--  ecp_256.c
new revision: 1.1.2.2; previous revision: 1.1.2.1
done
Checking in ecl/ecp_384.c;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecp_384.c,v  <--  ecp_384.c
new revision: 1.1.2.2; previous revision: 1.1.2.1
done
Checking in ecl/ecp_521.c;
/cvsroot/mozilla/security/nss/lib/freebl/ecl/ecp_521.c,v  <--  ecp_521.c
new revision: 1.1.2.2; previous revision: 1.1.2.1
done
Comment 44 Wan-Teh Chang 2006-04-27 08:06:36 PDT
Comment on attachment 215707 [details] [diff] [review]
Optimized curves, faster adds. Version 2

>+#ifdef MPI_AMD64_ADD
>+	/* compiler fakeout? */
>+	if ((r3 == b0) && (r1 == b0) && (r0 == b0)) { 
>+		MP_CHECKOK(s_mp_pad(r, 4));
>+	} 
>+#endif
>+	MP_CHECKOK(s_mp_pad(r, 4));
>+	MP_DIGIT(r, 3) = r3;
>+	MP_DIGIT(r, 2) = r2;
>+	MP_DIGIT(r, 1) = r1;
>+	MP_DIGIT(r, 0) = r0;
>+	MP_SIGN(r) = MP_ZPOS;
>+	MP_USED(r) = 4;
>+	s_mp_clamp(r);

Bob, what's the purpose of the "compiler fakeout"?
Why doesn't it also test r2 == b0?  Why does it call
s_mp_pad(r, 4), which we call again right below?

Note You need to log in before you can comment on or make changes to this bug.