Closed Bug 653236 Opened 14 years ago Closed 11 months ago

Inefficient use of mp_submod in P384 ECC code

Categories

(NSS :: Libraries, enhancement, P5)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: briansmith, Unassigned)

Details

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #653032 +++ In Bug 653032 Comment 6, Owen wrote: > why does the code for P384 call mp_submod(), doesn't > that completely defeat the point of having an > optimized mod function?
This is a faster implementation of the P-384 modulus function that performs the final reduction using additions and subtractions.
Attachment #529253 - Flags: review?
Comment on attachment 529253 [details] [diff] [review] A faster implementation of P-384 mod. Bob, is there someone else who should (also) review this?
Attachment #529253 - Flags: review? → review?(rrelyea)
Comment on attachment 529253 [details] [diff] [review] A faster implementation of P-384 mod. r+ I've convinced myself the code is correct, though I have few comments, and would like to have a few changes included (mostly the addition of comments): 1) The comment above the 'a_used > ECP384_DIGITS' test is in correct. you need to drop 'or polynomials not using all words'. The main goal of this patch appears to handle the or case, so we should drop it from the comment. 2) The test a_used > ECP384_DIGITS*2 will allow some cases where a >= p^2 through. I've looked at the algorithm and decided that a < p^2 really is an unnecessary restriction. Checking for a < 2^768 is sufficient, which is what this test accomplishes. The previous code also allowed values of a between p^2 and 2^768-1 inclusive through. It may be worth a comment here to note this for someone who actually compares this code with Handerson, Menezes, and Vanstone. 3) The test a_used < ECP384_DIGITS is correct, but really too small. We should check for a_used <= ECP384_DIGITS and if a_used == ECP384_DIGITS, check for a > p. If a> p do one subtraction. There are lots of numbers between p and 2^320-1 (or 2^352-1 in the 32-bit case) that we are running the whole algorithm on unnecessarily. There are relatively few numbers between 2^384-1 and p where we'll have to do the single subtraction. Again, the current code is correct, it is just doing more work than it needs to in this particular case. 4) NIT: The loop filling in s[2] loops i from ECP386_DIGITS to a_used. I think it would be clearer if it looped from 0 to a_used-ECP386_DIGITS: for (i=0; i < a_used-ECP386_DIGITS; i++) { s[2][i] = MP_DIGIT(a,i+ECP386_DIGITS); } This also makes the following while loop much cleaner: while (i < ECP384_DIGITS) s[2][i++] = 0; 5) Comment /* s_mp_add_is faster than mp_sub */ raises the question: why not add m[7] as well at get down to one sub? I believe the answer is worry about overflow, since we are using hand contructed mp_ints, the mp_add could very well try realloc our statically created mp_digits and blow up. I believe the code is right, but we should include a comment. I think the overflow may be rare enough not to be caught in casual testing if someone tried to make this 'optimization' In general, this looks like a good patch. Thanks for providing it.
Attachment #529253 - Flags: review?(rrelyea) → review+
(In reply to comment #3) > Comment on attachment 529253 [details] [diff] [review] [review] > A faster implementation of P-384 mod. > > r+ I've convinced myself the code is correct, though I have few comments, > and would like to have a few changes included (mostly the addition of > comments): > > 1) The comment above the 'a_used > ECP384_DIGITS' test is in correct. you > need > to drop 'or polynomials not using all words'. The main goal of this patch > appears to handle the or case, so we should drop it from the comment. Thank you, I must have copied the comment from the old code, which no longer reflects the behavior of this function. > 2) The test a_used > ECP384_DIGITS*2 will allow some cases where a >= p^2 > through. I've looked at the algorithm and decided that a < p^2 really is an > unnecessary restriction. Checking for a < 2^768 is sufficient, which is what > this test accomplishes. The previous code also allowed values of a between > p^2 and 2^768-1 inclusive through. It may be worth a comment here to note > this for someone who actually compares this code with Handerson, Menezes, > and Vanstone. Yes, a comment may be appropriate here. I've noticed that all of the other elliptic curve mod functions only check for the number of digits instead of checking for p^2. Perhaps a similar comment would be appropriate for them as well? > 3) The test a_used < ECP384_DIGITS is correct, but really too small. We > should check for a_used <= ECP384_DIGITS and if a_used == ECP384_DIGITS, > check for a > p. If a> p do one subtraction. There are lots of numbers > between p and 2^320-1 (or 2^352-1 in the 32-bit case) that we are running > the whole algorithm on unnecessarily. There are relatively few numbers > between 2^384-1 and p where we'll have to do the single subtraction. Again, > the current code is correct, it is just doing more work than it needs to in > this particular case. Yes, I think this could improve speed for p <= a < 2^384. It seemed like an uncommon enough case that I didn't write a check for it. > 4) NIT: The loop filling in s[2] loops i from ECP386_DIGITS to a_used. I > think it would be clearer if it looped from 0 to a_used-ECP386_DIGITS: > > for (i=0; i < a_used-ECP386_DIGITS; i++) { > s[2][i] = MP_DIGIT(a,i+ECP386_DIGITS); > } > > This also makes the following while loop much cleaner: > > while (i < ECP384_DIGITS) s[2][i++] = 0; I agree, your suggestion makes the code much cleaner. > 5) Comment /* s_mp_add_is faster than mp_sub */ raises the question: why not > add m[7] as well at get down to one sub? I believe the answer is worry about > overflow, since we are using hand contructed mp_ints, the mp_add could very > well try realloc our statically created mp_digits and blow up. I believe the > code is right, but we should include a comment. I think the overflow may be > rare enough not to be caught in casual testing if someone tried to make this > 'optimization' To be honest, I hadn't considered the possibility of an overflow, and I didn't encounter any crashes when testing various combinations of s_mp_add and mp_sub. I settled on a single call to s_mp_add because it was the fastest. Combining m[7], m[8] and m[9] actually made the function slightly slower than two calls to mp_sub. It is not 100% clear to me why this was the case, but now that you mention it, an overflow may have been the cause. Since submitting this patch, I have tweaked this function a little more to eliminate terms m[8] and m[9] and skip a subtraction. The result runs a little quicker, but would require more scrutiny. I can submit my latest version as well if you think it would be worthwhile. > In general, this looks like a good patch. Thanks for providing it. You're very welcome, thanks for taking the time to review it.
Severity: normal → S3
Severity: S3 → S4
Status: NEW → RESOLVED
Closed: 11 months ago
Priority: -- → P5
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: