Closed
Bug 303507
Opened 19 years ago
Closed 19 years ago
Enhance RSA performance using "comba" multiplication and squaring
Categories
(NSS :: Libraries, enhancement, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
3.11
People
(Reporter: nelson, Assigned: saul.edwards.bugs)
Details
Attachments
(4 files, 2 obsolete files)
15.30 KB,
application/x-gzip
|
Details | |
80.80 KB,
patch
|
nelson
:
review+
wtc
:
superreview+
|
Details | Diff | Splinter Review |
5.76 KB,
patch
|
wtc
:
review+
nelson
:
superreview+
|
Details | Diff | Splinter Review |
940 bytes,
patch
|
saul.edwards.bugs
:
review+
|
Details | Diff | Splinter Review |
This is a replacement RFE for bug 303340 After comparing NSS's MPI bignum library to another one at length, we concluded that the major advantage of the other one was due to the use of "Comba" algorithms for multiply and square functions. The Comba algorithms are a win on CPUs with plenty of registers, such as the AMD64, and not on CPUs that don't such as the 32-bit x86. So, this enhancement is potentially for more than one platform, but not for all platforms. Initially we will do this for AMD64. The comba algorithms win by doing far fewer loads and (especially) stores than the ordinary long multiplication algorithms, even when the latter are hand optimized assembly. Rather than switch the entire bignum library, we decided to implement Comba style functions for MPI. We plan to use the public domain library TFM v0.03 source as a basis, and adapt it to fit our MPI library. We will probably have an assembly language version for AMD64.
Reporter | ||
Comment 1•19 years ago
|
||
Setting hardware to "other" since this initially is for AMD64. Leaving OS at "all" because it should work on at least two AMD64 OSes.
Priority: -- → P1
Hardware: All → Other
Target Milestone: --- → 3.11
Assignee | ||
Comment 2•19 years ago
|
||
Attachment #192435 -
Flags: review?(nelsonbhchan)
Assignee | ||
Comment 3•19 years ago
|
||
Assignee | ||
Updated•19 years ago
|
Attachment #192435 -
Flags: review?(nelsonbhchan) → review?(nelson)
Reporter | ||
Comment 4•19 years ago
|
||
Comment on attachment 192435 [details] [diff] [review] Add mp_comba.c for AMD64 Solaris/Linux, plus calls from existing mpi functions and Makefile changes I find all the parts of this patch to be ok except one. The new code inserted into mp_mul and mp_sqr, which calls the new comba routines, has been inserted into those functions *before* the point where those functions deal with input aliasing and ensure that the space for the result is adequately allocated. The new code needs to be inserted just after the points in mp_mul and mp_sqr where the aliasing and allocation checks have been done. I'm sorry I didn't catch this earlier. :(
Attachment #192435 -
Flags: review?(nelson) → review-
Assignee | ||
Comment 5•19 years ago
|
||
Attachment #192435 -
Attachment is obsolete: true
Attachment #192463 -
Flags: review?(nelson)
Reporter | ||
Comment 6•19 years ago
|
||
Comment on attachment 192463 [details] [diff] [review] Check sizes and for aliasing in mul and sqr operands r=nelson@bolyard The functions mp_mul and mp_sqr now ensure that the input lengths are correct for the respective s_mp_comba functions, and that the output allocation suffices. I appears that the comba functions cannot fail, so I'm ok with them being void functions. I'm satisfied that the .s file is properly generated from the .c file by the "unnamed" compiler. :) Wan-Teh, please super review. I will be happy to explain how this code works, if you desire.
Attachment #192463 -
Flags: superreview?(wtchang)
Attachment #192463 -
Flags: review?(nelson)
Attachment #192463 -
Flags: review+
Comment 7•19 years ago
|
||
Comment on attachment 192438 [details]
Associated assembly from mp_comba.c of above patch. Zipped because it is slightly above the file upload limit.
Some suggestions:
For people who will maintain this file, please document precisely how this file
was generated in the comment.
Otherwise, they won't be able to reproduce this work.
Change "Studio 10" to "Sun Studio 10".
Change "AMD 64" to "AMD64".
It may be more correct to use "notice" instead of "license" because it is
public domain code.
Comment 8•19 years ago
|
||
Comment on attachment 192463 [details] [diff] [review] Check sizes and for aliasing in mul and sqr operands As far as I can tell, this patch is fine. But to be a responsible reviewer, I will try to do the best I can. So I have some quesitons. 1. Makefile Do we need to use a specific optimization level for mp_comba.c when we use GCC? 2. mpi/mpi-priv.h: Some stylistic comments, which you may ignore. The isPowerof2 macro name does not conform to the naming convention in this file, which uses either all capital letters or all lowercase letters for macro names. (All caps is more common.) I verified that the definition of isPowerof2 is correct. You may consider omitting the (a) test if you are sure none of the mp_int's you invoke this macro on has a zero 'used' field. This file uses lowercase a, b, c as function parameters. Your comba functions use capital A, B, C as function parameters. 3. mpi/mpi.c In the mp_sqr function, your outter if statement ensures a->used <= 16, but you have an inner if statement that tests a->used == 32, which can't be true. Please resolve this conflict. Other general comments on changes to this file: You use a->used and b->used. The existing code in this file uses MP_USED(a) and MP_USED(b). It would be good to follow that style. It seems that the isPowerof2 test isn't that useful if the 'used' field is likely to be a power of 2. Could you explain why you are calling the comba routines at the right places in mp_mul and mp_sqr? 4. mpi/mp_comba.c Are your contributions also public domain? If yes, say so. If not, add the MPL/GPL/LGPL header. If you can provide a diff that shows your changes to the original TFM v0.03 file, I can review it. Otherwise, I'll just trust that this file is correct.
Comment 9•19 years ago
|
||
Comment on attachment 192463 [details] [diff] [review] Check sizes and for aliasing in mul and sqr operands OK, I've completed my code review. I found a potential memory leak. In mp_mul and mp_sqr, after calling one of the comba functions, your code should do "goto CLEANUP;" instead of "return MP_OKAY;". Otherwise, the mp_int variable tmp may not be cleared. It seems that the following code at the end of your mul_comba routines: C->sign = A->sign ^ B->sign; mp_clamp(C); duplicates the folliwing code before the CLEANUP label in mp_mul: s_mp_clamp(c); if(SIGN(a) == SIGN(b) || s_mp_cmp_d(c, 0) == MP_EQ) SIGN(c) = ZPOS; else SIGN(c) = NEG; (Similarly for your sqr_comba routines and mp_sqr.) Why can't you use the existing code? If your code is faster, you should replace the existing code with yours. mpi/mp_comba.c has some very long lines. I assume those came from the original TFM v0.03 code. mpi/mp_comba.c is also full of things like a->dp, a->dp[1], a->used, a->sign, that the original MPI code seems to use macros for (MP_DIGITS(a), MP_DIGIT(a,1), MP_USED(a), MP_SIGN(a), or without the MP_ prefix). I don't want to be a nit picker, so you decide if you want to change these.
Comment 10•19 years ago
|
||
Comment on attachment 192463 [details] [diff] [review] Check sizes and for aliasing in mul and sqr operands sr-. Saul, Nelson, I don't mind you checking this in first and addressing the bugs I found in a supplemental patch.
Attachment #192463 -
Flags: superreview?(wtchang) → superreview-
Assignee | ||
Comment 11•19 years ago
|
||
Addresses the following issues: 1.) Remove test for <=16 in mp_sqr 2.) Change isPowof2 to macro IS_POWER_OF_2 3.) changed a->used to macro MP_USED(a) 4.) Removed the pot. momory leak by replacing "return MP_OKAY" with "goto CLEANUP" Did not address these issues: 1.) Variables a, b, c should be A, B, C 2.) Two definitions of clamp: I think this is a lengthy discussion to resolve and I would like to get this patch in quickly. Postpone. 3.) Instructions on reproducing the .s file: I would like to discuss this through e-mail.
Attachment #192463 -
Attachment is obsolete: true
Attachment #192734 -
Flags: superreview?(wtchang)
Attachment #192734 -
Flags: review?(nelson)
Reporter | ||
Comment 12•19 years ago
|
||
Comment on attachment 192734 [details] [diff] [review] Address Wan-Teh's issues r=nelson, provided that you make one change >+#ifdef NSS_USE_COMBA >+ if ((MP_USED(a) == MP_USED(b)) && IS_POWER_OF_2(b->used)) { change that from IS_POWER_OF_2(b->used) to IS_POWER_OF_2(MP_USED(b)) I suppose that really should be MPI_IS_POWER_OF_2, but it is in a private header. Change it if you wish.
Attachment #192734 -
Flags: review?(nelson) → review+
Comment 13•19 years ago
|
||
Comment on attachment 192734 [details] [diff] [review] Address Wan-Teh's issues r=wtc. I just wanted to point out that you didn't address the last issue I raised in comment 9, either: mpi/mp_comba.c is also full of things like a->dp, a->dp[1], a->used, a->sign, that the original MPI code seems to use macros for (MP_DIGITS(a), MP_DIGIT(a,1), MP_USED(a), MP_SIGN(a), or without the MP_ prefix). I don't want to be a nit picker, so you decide if you want to change these. The reason I raised this issue again is that you made this change in another file (mpi/mpi.c).
Attachment #192734 -
Flags: superreview?(wtchang) → superreview+
Comment 14•19 years ago
|
||
Comment on attachment 192734 [details] [diff] [review] Address Wan-Teh's issues Saul: you should also address the copyright issue of mpi/mp_comba.c. The original code is public domain, but you've modified it. So is the resulting derivative work still in the public domain?
Reporter | ||
Comment 15•19 years ago
|
||
Wan-Teh, the TFM code never uses the MP_USED() macro at all. I think we want the code there to continue to look as much like the original TFM code as possible, making as few changes to it as possible, so that it can easily be verified to be the same code. I also think that consistency of that style is more important within a source file than across source files. That's also why we allowed extra long source lines. I think we're looking for the right format for a message that says "our derivative work is public domain too." But, IMO,t each contributor ought not need to duplicate the public domain statement for his contribution.
Comment 16•19 years ago
|
||
Nelson: agreed. Thanks for the clarification.
Assignee | ||
Comment 17•19 years ago
|
||
Checked in on trunk: Checking in Makefile; /cvsroot/mozilla/security/nss/lib/freebl/Makefile,v <-- Makefile new revision: 1.63; previous revision: 1.62 done Checking in mpi/mp_comba.c; /cvsroot/mozilla/security/nss/lib/freebl/mpi/mp_comba.c,v <-- mp_comba.c new revision: 1.2; previous revision: 1.1 done Checking in mpi/mp_comba_amd64_sun.s; /cvsroot/mozilla/security/nss/lib/freebl/mpi/mp_comba_amd64_sun.s,v <-- mp_comba_amd64_sun.s new revision: 1.2; previous revision: 1.1 done Checking in mpi/mpi-priv.h; /cvsroot/mozilla/security/nss/lib/freebl/mpi/mpi-priv.h,v <-- mpi-priv.h new revision: 1.19; previous revision: 1.18 done Checking in mpi/mpi.c; /cvsroot/mozilla/security/nss/lib/freebl/mpi/mpi.c,v <-- mpi.c new revision: 1.43; previous revision: 1.42 done
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•19 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 18•19 years ago
|
||
Attachment #193743 -
Flags: superreview?(nelson)
Attachment #193743 -
Flags: review?(wtchang)
Comment 19•19 years ago
|
||
Comment on attachment 193743 [details] [diff] [review] Consolidate OPTIMIZER to just Linux.mk, change default to -O3 for Linux and Solaris gcc builds r=wtc.
Attachment #193743 -
Flags: review?(wtchang) → review+
Assignee | ||
Comment 20•19 years ago
|
||
Checked into trunk: Checking in Linux.mk; /cvsroot/mozilla/security/coreconf/Linux.mk,v <-- Linux.mk new revision: 1.24; previous revision: 1.23 done Checking in Linux2.1.mk; /cvsroot/mozilla/security/coreconf/Linux2.1.mk,v <-- Linux2.1.mk new revision: 1.5; previous revision: 1.4 done Checking in Linux2.2.mk; /cvsroot/mozilla/security/coreconf/Linux2.2.mk,v <-- Linux2.2.mk new revision: 1.5; previous revision: 1.4 done Checking in Linux2.4.mk; /cvsroot/mozilla/security/coreconf/Linux2.4.mk,v <-- Linux2.4.mk new revision: 1.5; previous revision: 1.4 done Checking in Linux2.5.mk; /cvsroot/mozilla/security/coreconf/Linux2.5.mk,v <-- Linux2.5.mk new revision: 1.4; previous revision: 1.3 done Checking in Linux2.6.mk; /cvsroot/mozilla/security/coreconf/Linux2.6.mk,v <-- Linux2.6.mk new revision: 1.4; previous revision: 1.3 done Checking in SunOS5.mk; /cvsroot/mozilla/security/coreconf/SunOS5.mk,v <-- SunOS5.mk new revision: 1.17; previous revision: 1.16 done
Status: REOPENED → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 21•19 years ago
|
||
Comment on attachment 193743 [details] [diff] [review] Consolidate OPTIMIZER to just Linux.mk, change default to -O3 for Linux and Solaris gcc builds I thought Wan-Teh wanted to stay with -O2 (?), but in any case, it's better to have this in just one place, not 5.
Attachment #193743 -
Flags: superreview?(nelson) → superreview+
Comment 22•19 years ago
|
||
Sorry, Saul. I found that I misunderstood Mozilla's configure script. I thought that Mozilla is already using -O3 for GCC. That's one reason I thought it's okay to use -O3 to compile all of NSS. I just found that Mozilla is using -O (same as -O2), and the official releases are compiled with -Os (optimize for space), so I want to continue to use -O2 as the default for GCC. This means you need to come up with a new patch to just compile libfreebl3.so with -O3.
Attachment #193879 -
Flags: review?(saul.edwards.bugs)
Updated•19 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 23•19 years ago
|
||
Couldn't we just test for BUILD_SUN_PKG, which is set in all of Christophe's builds for Solaris and Linux, and use -O3 globally ? I think this is simpler and requires only a change to coreconf, and no more changes to the already complex freebl Makefile .
Comment 24•19 years ago
|
||
BUILD_SUN_PKG should be only be used for build options that are not suitable for regular NSS builds, such as hardcoding the /usr/lib/mps pathname. If -O3 is good, we should use it in regular NSS builds, too. GCC only has three optimization levels; -O3 is the highest level, and -O1 is the default. So -O2 is already pretty high for GCC. Without the endorsement of the Mozilla developers, I need more time to look into the stability of code generated by -O3. Using -O3 only on freebl gets most of the performance benefit and minimizes the risk. I can take care of the mozilla/security/nss/lib/freebl/Makefile change. I think it'll only take two lines.
Comment 25•19 years ago
|
||
Wan-Teh, We have been testing code built with -O3 for almost a year at Sun. I don't recall that we ran into any optimizer bug. Re: optimization levels, gcc actually has other many flags to control specific optimizations, in addition to -O<level>, eg. -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer .
Assignee | ||
Comment 26•19 years ago
|
||
Comment on attachment 193879 [details] [diff] [review] Change GCC's default OPTIMIZER back to -O2 (checked in) I will submit the freebl-specific patch when the patches for 303508 are checked in.
Attachment #193879 -
Flags: review?(saul.edwards.bugs) → review+
Updated•19 years ago
|
Attachment #193879 -
Attachment description: Change GCC's default OPTIMIZER back to -O2 → Change GCC's default OPTIMIZER back to -O2 (checked in)
Comment 27•19 years ago
|
||
I believe the only remaining thing left to do in this bug is the -O3 optimizer setting. Wan-Teh, have you found the information you needed about gcc -O3 code stability since you wrote comment #24 ? My opinion is that it should be the default, preferably for all of NSS, but if not, certainly for freebl. Currently, everything is at -O2 . I know of no NSS test that has ever failed because of -O3 vs -O2, and the code is almost always faster with -O3 . This isn't a big deal for Sun since we hand-generated assembly files made with gcc -O3 for the files with the biggest wins, and we are not using gcc for our C sources in official builds. If you don't care about NSS performance on Linux and other platforms that use gcc, you can leave it at -O2, and close this bug as fixed.
Comment 28•19 years ago
|
||
The main issue of this bug has been resolved. Wan-Teh, if you care about the gcc -O3 flag on Linux, this can be dealt with in a separate bug. Per our meeting today, marking fixed.
Status: REOPENED → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•