Closed
Bug 298630
Opened 19 years ago
Closed 18 years ago
freebl needs a memory cache invariant RSA implementation
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
3.11
People
(Reporter: rrelyea, Assigned: rrelyea)
Details
Attachments
(9 files, 14 obsolete files)
22.49 KB,
patch
|
Details | Diff | Splinter Review | |
2.46 KB,
text/html
|
Details | |
2.74 KB,
text/html
|
Details | |
46.07 KB,
patch
|
Details | Diff | Splinter Review | |
32.61 KB,
patch
|
nelson
:
review+
|
Details | Diff | Splinter Review |
20.69 KB,
patch
|
Details | Diff | Splinter Review | |
41.86 KB,
patch
|
nelson
:
review+
|
Details | Diff | Splinter Review |
30.48 KB,
patch
|
rrelyea
:
review+
|
Details | Diff | Splinter Review |
821 bytes,
patch
|
rrelyea
:
review+
|
Details | Diff | Splinter Review |
There has been demonstrated attacks against RSA by snooping the modulus exp code and it's cache usage profile. NSS needs to have an implementation that has the same cache usage profile independent of the value of the private key.
Assignee | ||
Updated•19 years ago
|
Target Milestone: --- → 3.11
Updated•19 years ago
|
Summary: freebl needs a cache invariant implementation. → freebl needs a memory cache invariant RSA implementation
Assignee | ||
Comment 1•19 years ago
|
||
This patch includes all the weaving strategies. Which strategy used is based on the define 'MP_USING_WEAVE_COPY'. Currently it's defined as '1'. The file mpcpucache.c is effectively a new file. The whole file shows up because the checked in version has DOS line endings. Since it's a new file, it should be reviewed in it's entirety anyway.
Attachment #190632 -
Flags: superreview?(nelson)
Attachment #190632 -
Flags: review?(wtchang)
Assignee | ||
Comment 2•19 years ago
|
||
While this is a complete patch (so it can be applied as one), the only difference between this patch and patch1 is in mpmontg.c where the #ifdefs around MP_USING_WEAVE_COPY were removed as if MP_USING_WEAVE_COPY were defined. The code in this mpmontg.c is much easier to read than it's brother in patch1.
Attachment #190633 -
Flags: superreview?
Attachment #190633 -
Flags: review?
Assignee | ||
Comment 3•19 years ago
|
||
Same as the previous patch except MP_USING_WEAVE_COPY is removed as if it were undefined rather than defined. This code is slightly more complicated than patch2 but still less complicated than patch 1.
Attachment #190634 -
Flags: superreview?
Attachment #190634 -
Flags: review?
Assignee | ||
Updated•19 years ago
|
Attachment #190633 -
Attachment description: hand removed ifdef 'MP_USING_WEAVE_COPY' → hand defined ifdef 'MP_USING_WEAVE_COPY' (-D)
Assignee | ||
Comment 4•19 years ago
|
||
In reviewing the diffs, it might be easiest to review patch2 first, then review the diffs between patch2 and patch1, but I leave it to you to decide. If you think we should just check in patch1, then just review that patch. bob
Assignee | ||
Comment 5•19 years ago
|
||
Comment on attachment 190633 [details] [diff] [review] hand defined ifdef 'MP_USING_WEAVE_COPY' (-D) I'm pretty sure nelson is reviewing this patch, just put it on his radar.
Attachment #190633 -
Flags: superreview? → superreview?(nelson)
Comment 6•19 years ago
|
||
Comment on attachment 190632 [details] [diff] [review] 'general' patch, including #defs for different weave strategies I have reviewed mpcpucache.c and emailed review feedback to Bob. I will continue to review the other parts of this patch.
Attachment #190632 -
Flags: superreview?(nelson) → superreview-
Comment 7•19 years ago
|
||
Comment on attachment 190632 [details] [diff] [review] 'general' patch, including #defs for different weave strategies Bob, I have a request concerning mpcpucache.c. Your present patch for that file merely corrects all the line endings. Please check that in as is, then make a separate patch, to be separately checked in, to address the review feedback I sent you. That way, subsequent reviewers of the cvs diffs will actually be able to see what was really changed. So, SR+=nelson for the mpcpucache.c patch exactly as attached above.
Assignee | ||
Comment 8•19 years ago
|
||
Comment on attachment 190632 [details] [diff] [review] 'general' patch, including #defs for different weave strategies remove old review request
Attachment #190632 -
Flags: review?(wtchang)
Comment 9•19 years ago
|
||
Comment on attachment 190633 [details] [diff] [review] hand defined ifdef 'MP_USING_WEAVE_COPY' (-D) Previously I reviewed mpcpucache.c and sent review comments to Bob. Now I'm reviewing the various patches to mpmontg.c. Here are some preliminary comments (not a full review): > #include "mpi-priv.h" >+#include "mp_gf2m-priv.h" > #include "mplogic.h" > #include "mpprime.h" > #ifdef MP_USING_MONT_MULF > #include "montmulf.h" > #endif >+#include "prtypes.h" Two #includes were added, but neither seems necessary. mpi sources should not use NSPR headers, types, or methods. I don't think this file uses anything from mpi's private ecc header. >+#define MAX_POWERS MAX_ODD_INTS*2 >+#define MAX_MODULUS_BITS 8192 >+#define MAX_MODULUS_LENGTH (MAX_MODULUS_BITS/8) I'd much prefer MAX_MODULUS_BYTES or MAX_MODULUS_CHARS to MAX_MODULUS_LENGTH. That way the units are unambiguous. >+#define SQR(a,b) \ >+ MP_CHECKOK( mp_sqr(a, b) );\ >+ MP_CHECKOK( s_mp_redc(b, mmm) ); >+ >+#if defined(MP_MONT_USE_MP_MUL) >+#define MUL(x,a,b) \ >+ MP_CHECKOK( weave_to_mpi(&tmp, powers + (x), nLen, num_powers) ); \ >+ MP_CHECKOK( mp_mul_weave(a, &tmp, b) ); \ In this patch, there is no function by the name mp_mul_weave. I believe you want mp_mul in this case. >+ MP_CHECKOK( s_mp_redc(b, mmm) ) ; >+#else >+#define MUL(x,a,b) \ >+ MP_CHECKOK( weave_to_mpi(&tmp, powers + (x), nLen, num_powers) ); \ >+ MP_CHECKOK( s_mp_mul_mont(a, &tmp, b, mmm) ); >+#endif A comment about the "weaving" strategy. The weave functions essentially do byte-at-a-time copying. That really hurts on AMD64, where byte writes are read-modify-write memory cycles, but mp_digit (64b) writes are just writes. So, I'd suggest using a strategy (especially for weave_to_mpi) of loading all the bytes (including shifting and or'ing) and then doing a single mp_digit store for each mp_digit, rather than doing 8 byte writes for each mp_digit. I suspect that the opposite direction (mpi_to_weave) can also be done that way, by gathering the bytes from the 8 different multipliers to be woven together and then doing one store of the woven digit. IINM, the other patch (that complements this one) sort-of works that way. I believe that doing one memory write for each 4 or 8 memory reads, rather than one-for-one, will be a performance win on all platforms. Finally, I think the ultimate decision of which way to go (that is, with MP_USING_WEAVE_COPY defined or undefined) should be determined by which is fastest, *after* the work has been done to try doing the 4-or-8 reads for every write.
Comment 10•19 years ago
|
||
Comment on attachment 190633 [details] [diff] [review] hand defined ifdef 'MP_USING_WEAVE_COPY' (-D) My previous review comments left the impression that I had more review comments to follow. But I believe those comments were complete. I hope this has not been holding up progress on this bug.
Attachment #190633 -
Flags: superreview?(nelson) → superreview-
Assignee | ||
Updated•19 years ago
|
Attachment #190633 -
Flags: review?
Assignee | ||
Updated•19 years ago
|
Attachment #190634 -
Flags: superreview?
Attachment #190634 -
Flags: review?
Assignee | ||
Comment 11•19 years ago
|
||
This patch is the equivalent of the 'general' patch. I think we can check in only one variant of the currently 4 we should check. The variants are: Variant #1: (the one I think we'll use) #define MP_USING_WEAVE_COPY 1 #define MP_CHAR_STORE_SLOW 1 Variant #2: #define MP_USING_WEAVE_COPY 1 /* #define MP_CHAR_STORE_SLOW 1 */ Variant #3: /* #define MP_USING_WEAVE_COPY 1 */ #define MP_CHAR_STORE_SLOW 1 Variant #4 /* #define MP_USING_WEAVE_COPY 1 */ /* #define MP_CHAR_STORE_SLOW 1 */ I've tested this against WINDOWS (intel32) RHEL (intel32) and MAC OS X (ppc). In all those tests Variant #1 was either significantly faster, or on par with all the other variants. If I could get someone to test this on AMD64 to confirm Variant #1 has the same or better performance than the other variants, and it's less to 2 % hit against the non-weave case, I'll attach a new patch with just that variant. Note: this patch also includes a method to turn off any cache invariant operations at runtime.
Attachment #190632 -
Attachment is obsolete: true
Attachment #190633 -
Attachment is obsolete: true
Attachment #190634 -
Attachment is obsolete: true
Assignee | ||
Comment 12•19 years ago
|
||
This patch addes 2 things to rsa_perf: 1) a fixed 2048 bit RSA key for performance testing of larger keys, and 2) a flag to turn off cache invariant mod exp. I've been running: # baseline 'fast' 1024 bit operation rsaperf -i 500 -s -n none -z # cache invariant version rsaperf -i 500 -s -n none # baseline 'fast' 2048 bit operation rsaperf -i 500 -s -k 2048 -n none -z # cache invariant 2048 bit key rsaperf -i 500 -s -k 2048 -n none NOTE: none of these tests will detect incorrect modexp operations. You can verify the modexp is working correctly by running: rsaperf -i 1 -s -n none -g I've run each script a couple of times for each variant to make sure my numbers are halfway consistant.
Comment 13•19 years ago
|
||
These results were obtained with the 32-bit library running on a dual AMD Opteron box.
Assignee | ||
Comment 14•19 years ago
|
||
Comparing the results with the baseline, it appears that variant 3 is fastest on the Opteron (where variant 1 is faster on 32 bit intel and ppc). It appears we want to include a patch where we have the option of either using weave copy or not. It also appears there is little need for us to keep the byte copy around. Under those conditions, let's get a review of the patch as is. I'll unifdef out the Byte copy stuff later (it's not as extensive as the WEAVE_COPY patch. bob
Comment 15•19 years ago
|
||
Comment on attachment 196846 [details]
performance test results
Bob,
Please hold off interpreting these results, as there were a couple of issues in
the run that I just discussed with Neil.
Attachment #196846 -
Attachment is obsolete: true
Comment 16•19 years ago
|
||
Bob, I built your patch on AMD64 Solaris, 64-bit, with both the studio and gcc compilers . The shlibsign signing of the build process fails 8 out of 10 times in either the first or second DSA keygen . I didn't review the patch, but this problem will obviously need to be fixed. We don't get any shlibsign failures in our nightly builds of AMD64 Solaris.
Assignee | ||
Comment 17•19 years ago
|
||
Under which options (variants) are failing? Running rsaperf with the -g function should put a fair amount of pressure on the modexp. I found that even minor problems in modexp would make it practically impossible to generate an rsa key. bob
Assignee | ||
Comment 18•19 years ago
|
||
Warning: the COMBA changes to mpi-priv.h are clobbered by this patch I'll attach a new patch with the updated mpi-priv.h. bob
Comment 19•19 years ago
|
||
Bob, the failure is in DSA, not RSA, and it is seen with shlibsign, which doesn't have variants. One of the failure log is below, where the first keygen succeeded and the second one failed : sh ./sign.sh ../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ SunOS5.10_i86pc_gcc_64_OPT.OBJ SunOS ../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib ../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib/libsoftokn3.so SunOS5.10_i86pc_gcc_64_OPT.OBJ/shlibsign -v -i ../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib/libsoftokn3.so Generating DSA Key Pair....done Library File: ../../../security/nss/lib/softoken/SunOS5.10_i86pc_gcc_64_OPT.OBJ/libsoftokn3.so 443032 bytes Check File: ../../../security/nss/lib/softoken/SunOS5.10_i86pc_gcc_64_OPT.OBJ/libsoftokn3.chk Link: libsoftokn3.chk hash: 20 bytes 86 0d fc dc 27 b0 94 a7 88 97 9e a3 2d 69 d3 5d 64 94 3f 8e signature: 40 bytes 4f 85 4c 37 30 b3 b9 e3 5c f5 18 71 e6 de b5 07 44 56 fe a4 75 a2 71 1d da c7 53 3b 74 56 97 74 d2 69 60 e8 13 55 1f dc sh ./sign.sh ../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ SunOS5.10_i86pc_gcc_64_OPT.OBJ SunOS ../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib ../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib/libfreebl3.so SunOS5.10_i86pc_gcc_64_OPT.OBJ/shlibsign -v -i ../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib/libfreebl3.so Generating DSA Key Pair....Generating DSA Key: An I/O error occurred during security authorization. gmake: *** [../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib/libfreebl3.chk] Error 1 I was already aware of your patch clobbering the few lines of COMBA in mpi-priv.h, and I had just restored them for my build.
Assignee | ||
Comment 20•19 years ago
|
||
this replaces the mpi/mpi-priv.h in patch 196202
Assignee | ||
Comment 21•19 years ago
|
||
The modexp code is common between the 2. I had found the rsa code to be the more sensitive of the 2. Anyway, the question still stands, which variant are you using? (what are the settings of the #define's). That will help me isolate which section seems to be causing problems. I'm currently pounding on things to try to reproduce the problem you are seeing. bob
Comment 22•19 years ago
|
||
I didn't change any of the defines. I just applied patches 196202 and 196856 to a fresh tree from the tip.
Comment 23•19 years ago
|
||
These results were obtained by running the optimized build of NSS. The previous attachment was run with the debug build. For this test I used "rsaperf -p 10 ..." instead of the -i option because it runs longer and seems to give more consistent results.
Assignee | ||
Comment 24•19 years ago
|
||
OK thanks Julian, that should help narrow things a bit. Could you try the following for me: add -DMPI_CACHE_LINE=64 to the MPI_CONFIG variable for AMD64SOLARIS in mpi/target.mk. I'll keep looking into the issue, but it looks like a problem with window sizes of 5. Long term I think we want that value in the ADM64SOLARIS config until we can get a 64bit version of mpcpucache.c working. bob
Comment 25•19 years ago
|
||
Bob, The libfreebl build doesn't use mpi/target.mk - everything is in freebl/Makefile . Presumably, you meant to define MPI_CACHE_LINE_SIZE , not MPI_CACHE_LINE . I did that in the Solaris amd64 section and tested it, and shlibsign now succeeds 10 out of 10 times. You would probably have the same issue on Linux amd64, but I didn't test that.
Assignee | ||
Comment 26•19 years ago
|
||
Thanks Neil, those numbers look closer to what I was expecting... well actually better than I was expecting. I was expecting to see a 1.5-2.0% degregation for cache invarience for 1024 keys, and we see closer to .5-1.5% with variant 4 being the fastest. Let me see what is going on with the window 5 size and we'll be ready. bob
Assignee | ||
Comment 27•19 years ago
|
||
This replaces mpmont.c the big patch. I'll attach another version of the big patch as well. The differences between this file and the one in the big patch should be completely in mp_exptmod.
Assignee | ||
Comment 28•19 years ago
|
||
OK, here's the new patch. I found a couple of errors in mpmontg.c that was in the previous patch (a debugging #error that tripped on ppc, but I forgot to remove from my to be checked in version, and an incorrect macro for a case that currently is never compiled (mp_digit !=32 & != 64). It also contains the clamping fix.
Attachment #196202 -
Attachment is obsolete: true
Attachment #196872 -
Attachment is obsolete: true
Assignee | ||
Comment 29•19 years ago
|
||
Thanks Julien, The better way to fix it would be to make mpcpucache.c figure out the cache size on the fly, but I don't know if the cpuid command works when running in 64-bit mode. If it does the 32bit code should work (with the 'is386() and is486()' test ifdef'd out. I think all 64bit machines have > 64-byte cache line sizes, but I'm not sure it would be safer to find this out on the fly. bob
Comment 30•19 years ago
|
||
(In reply to comment #29) > The better way to fix it would be to make mpcpucache.c figure out the cache > size on the fly, but I don't know if the cpuid command works when running > in 64-bit mode. If it does the 32bit code should work (with the 'is386() > and is486()' test ifdef'd out. Please supply a small patch to mpcpucache.c to test that hypothesis. BTW, the issue of using an instruction that requires assembly code presents a challenge for AMD64 on Solaris, where we are required to use Sun Studio instead of gcc. So, if we get the CPUID code working on that platform, we'll need to make another .s file from that code, one that can be assembled with studio's assembler, for use in the solaris builds. We're willing to make the .s file(s) if you'll supply the mpcpucache.c file that runs the CPUID instruction on AMD64 with gcc.
Assignee | ||
Comment 31•19 years ago
|
||
This patch should be applied to 196873's version of mpcpucache.c It was created with a diff -c. The defines are probably not quite right. The key here is that we need to add defines to the long #if which will cause all x86 platforms (32 and 64 bit) to enter this code path reguardless of OS. The USE_64 should only be defined for the actual 64 bit abi platforms that will call this. If I got those defines mixed up, then they will need to be adjusted the '#define wcpuid' for 64 bit processors. bob If the cpuid command has a different form, we can make the
Assignee | ||
Comment 32•19 years ago
|
||
I've found the AMD manual on CPUID at http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24594.pdf It says: "Provides information about the processor and its capabilities through a number of different functions. Software should load the number of the CPUID function to execute into the EAX register before executing the CPUID instruction. The processor returns information in the EAX, EBX, ECX, and EDX registers; the contents and format of these registers depend on the function. The architecture supports CPUID information about standard functions and extended functions. The standard functions have numbers in the 0000_xxxxh series (for example, standard function 1). To determine the largest standard function number that a processor supports, execute CPUID function 0. The extended functions have numbers in the 8000_xxxxh series (for example, extended function 8000_0001h). To determine the largest extended function number that a processor supports, execute CPUID extended function 8000_0000h. If the value returned in EAX is greater than 8000_0000h, the processor supports extended functions. Software operating at any privilege level can execute the CPUID instruction to collect this information. In 64-bit mode, this instruction works the same as in legacy mode except that it zero-extends 32-bit register results to 64 bits. CPUID is a serializing instruction." So, in theory, my previous patch should work once it actually compiles correctly.
Comment 33•19 years ago
|
||
These results are for 64 bit, gcc with https://bugzilla.mozilla.org/attachment.cgi?id=196929 applied to https://bugzilla.mozilla.org/attachment.cgi?id=196873
Comment 34•19 years ago
|
||
Comment on attachment 196873 [details] [diff] [review] replacement full patch. includes fixed mpi-priv.h and mpmontg.c This is a partial review. I have not completed the review of all the changes proposed for mpmontg.c, but I wanted to give you the review feedback I have so far while I work on the rest. >+SECStatus BLAPI_SetFlags(int flags, int value); Please change the name of the function and the name of the first argument. Let's not use the word "flag" or "flags". I suggest option. The name of the function and the first argument should be singular, not plural, to emphasize that this sets a single option, and is not a bit vector of option flag bits. For that same reason, I suggest making the options an enumerated type. This comment affects multiple files, and I will not repeat it at every affected place. >+/* BLAPI Flags, used in BLAPI_SetFlags */ >+#define BLAPI_SAFE_MODEXP 1 /* if flag is set to 1, use cache invariant >+ * rsa code */ Please make that an enum type, ad define the first value to be 1. That should help make it clear that the options are mutually exclusive, and that multiple options cannot be passed to BLAPI_SetOption in one call. This comment affects multiple files. > /* End of Version 3.008. */ >+ >+ BLAPI_SetFlags, >+ /* End of Version 3.009. */ >+ > }; We already bumped the version number once for 3.11, bumped it to 3.008. I don't think we need to bump it again until we've released a version of NSS that uses that new number. So, let's not bump it again unless we need to. Let's stick with 3.008 until we've done an NSS release with that number. This comment affects several files. >+SECStatus >+BLAPI_SetFlags(int flag, int value) >+{ >+ if (flag == BLAPI_SAFE_MODEXP) { I suggest using a switch statement here, switching on flag, to make it even more clear to future developers that flag is not a bit field, and that multiple simultaneous values are not allowed. + if (mp_set_modexp_mode(value) == MP_OKAY) { Please change that name. Not "set mode" which is vague and sounds like it's intended to be extended. I suggest a name that clearly explains what it enables or disables. mp_enable_cache_invariance ? mp_set_constant_cache_time? I'm happy for the BLAPI_SetOption function to be extensible, to avoid having to add more such functions to the blapi API later, but I think the MPI functions should be to the point, with a different one for each settable option. >Index: mpi/Makefile >-SRCS= mpi.c mpprime.c mplogic.c mp_gf2m.c mpmontg.c mpi-test.c primes.c tests/ \ >+SRCS= mpi.c mpprime.c mplogic.c mp_gf2m.c mpmontg.c mpi-test.c primes.c \ >+ mpcpucache.c tests/ \ In addition to adding more source files, don't we want/need to add some new -D options to the CC commands? Also, have you tested that mpi is still buildable from mpi's makefile since mpmontg was modified to use NSPR symbols? (I suspect it isn't) >Index: mpi/mpcpucache.c >-char *manMap[] = { >+static const char *manMap[] = { I believe you want the word const in two places in that declarataion, the second place is after the *. As written, it declares a non-constant array of pointers to constant strings. The second const will make the array of pointers be constant too, which will help keep it in the text segment. >Index: mpi/mpi.h >=================================================================== >RCS file: /cvsroot/mozilla/security/nss/lib/freebl/mpi/mpi.h,v >retrieving revision 1.22 >diff -u -r1.22 mpi.h >--- mpi/mpi.h 27 Apr 2004 23:04:36 -0000 1.22 >+++ mpi/mpi.h 21 Sep 2005 01:57:48 -0000 >@@ -80,7 +80,8 @@ > #define MP_RANGE -3 /* argument out of range */ > #define MP_BADARG -4 /* invalid parameter */ > #define MP_UNDEF -5 /* answer is undefined */ >-#define MP_LAST_CODE MP_UNDEF >+#define MP_PROGERR -6 /* internal programming error in mpi */ Do we really need this new value? Will this value get reflected out to the user? Will it be preserved out through the blapi layer, then through the PKCS11 layer, etc? I'd prefer that we use assertions, and be confident that our code is correctly coded, and show that in our error space. I particularly don't want to increase the number of error paths that must be considered by the users of MPI, especially if they have no effect on what the user sees. >Index: mpi/mpmontg.c > /* #define MP_USING_MONT_MULF 1 */ >+#define MP_USING_CACHE_SAFE_MOD_EXP 1 >+/*#define MP_USING_WEAVE_COPY 1 */ >+/*#define MP_CHAR_STORE_SLOW 1 */ > #include <string.h> > #include "mpi-priv.h" >+#include "mp_gf2m-priv.h" That last header is an ECC header. Is mpmontg now calling any ECC functions? If not, why is it including that header file? The dependencies in MPI should continue to be very well ordered, and not circular. ECC code may depend on integer routines, but the integer code should not depend on the ECC code. If there's some really good reason to include that header file then maybe the content of that file needs to be refactored. Maybe part of it really is integer code after all. > #endif >+#include <stddef.h> /* ptrdiff_t */ Is stddef.h available on all our platforms? Including windows and OS/2? Perhaps that #include may need to be ifdeffed by platform, with a different file name on some. But I'm not sure. >+/* need to know endianness of this platform. If we aren't told, get it from >+ * nspr... */ >+#ifdef MP_CHAR_STORE_SLOW >+#if !defined(IS_BIG_ENDIAN) && !defined(IS_LITTLE_ENDIAN) >+#include "prcpucfg.h" >+#endif >+#endif OK, this is my single biggest problem with this patch. It makes mpi depend on NSPR headers. So far, mpi has not depended on any NSPR headers, and can be built stand-alone with mpi's own makefile, without needing any other files from NSS or NSPR. Today one can cvs checkout nss/lib/freebl/mpi and build it without any other NSS or NSPR sources. I'd like to preserve that. Is there another way to find out the endianness? At the very least, it should remain possible for the mpi makefile to build MPI that runs correctly and uses the available performance techniques without using NSPR. If achieving that will add more than a few days to the schedule, then some compromise can be arranged. This partial review ends here. I'm marking this patch as still awaiting review to help me remember it.
Attachment #196873 -
Flags: review?(nelson)
Assignee | ||
Comment 35•19 years ago
|
||
>This is a partial review. This is a partial response to the partial review;).... (I've only addressed the issues where questions have been raised) >>Index: mpi/Makefile >> -SRCS= mpi.c mpprime.c mplogic.c mp_gf2m.c mpmontg.c mpi-test.c primes.c tests/ \ >> +SRCS= mpi.c mpprime.c mplogic.c mp_gf2m.c mpmontg.c mpi-test.c primes.c \ >> + mpcpucache.c tests/ \ > > In addition to adding more source files, don't we want/need to add some new > -D options to the CC commands? > > Also, have you tested that mpi is still buildable from mpi's makefile > since mpmontg was modified to use NSPR symbols? (I suspect it isn't) > I have not built with the MPI's makefiles, what are the instructions for doing so? I have built mpcpucache.c with just a CC or gcc command line on the platforms I tested on (cc -o mpcpucache[.exe] mpccpucache.c -DTEST_IT). I've made sure the native compilier set flags are uses whenever possible. In mpmontg.c I have had to rely on NSPR to get the endianess of the platform. If there is a better way to do this in the mpi build environment, I'd be happy to use it. The code is set up so if mpi supplies these values, NSPR isn't used. > >>+#define MP_PROGERR -6 /* internal programming error in mpi */ > > Do we really need this new value? > Will this value get reflected out to the user? > Will it be preserved out through the blapi layer, then through the > PKCS11 layer, etc? I'd prefer that we use assertions, and be > confident that our code is correctly coded, and show that in our > error space. I particularly don't want to increase the number of > error paths that must be considered by the users of MPI, especially > if they have no effect on what the user sees. Here's its only use. I was trying to match what appears to be the normal MPI Checking usage. I have no problem changing it to simply an assert. /* if the accum1 isn't set, then either j was out of range, or our logic * above does not populate all the powers values. either case it shouldn't * happen and is an internal mpi programming error */ ARGCHK(MP_USED(&accum1) != 0, MP_PROGERR); > >+#include "mp_gf2m-priv.h" > That last header is an ECC header. > Is mpmontg now calling any ECC functions? > If not, why is it including that header file? To get the definition of MP_DIGIT_BITS It wasn't clear that there was anything special about mp_gf2m-priv.h from the comments. Given your description of how it works, it looks like it's best just to move MP_DIGIT_BITS definition to mpi-priv.h >> #endif >>+#include <stddef.h> /* ptrdiff_t */ > Is stddef.h available on all our platforms? Including windows and OS/2? > Perhaps that #include may need to be ifdeffed by platform, with a > different file name on some. But I'm not sure. It's certainly on windows (windows was the first platform I built and tested this code on). I would be pretty surprised if it wasn't also in OS/2. It replaces the NSPR from a previous patch. stddef.h is part of C89 (like stdio.h and ctype.h), and ptrdiff_t is has been defined since C89, so it may not be there for really old platforms (win16, maybe old Mac 68k, but I would be surprised if it's not defined for something modern). >>+/* need to know endianness of this platform. If we aren't told, get it from >>+ * nspr... */ >>+#ifdef MP_CHAR_STORE_SLOW >>+#if !defined(IS_BIG_ENDIAN) && !defined(IS_LITTLE_ENDIAN) >>+#include "prcpucfg.h" >>+#endif >>+#endif >OK, this is my single biggest problem with this patch. >It makes mpi depend on NSPR headers. >So far, mpi has not depended on any NSPR headers, and can be built >stand-alone with mpi's own makefile, without needing any other files >from NSS or NSPR. Today one can cvs checkout nss/lib/freebl/mpi and >build it without any other NSS or NSPR sources. I'd like to preserve >that. Is there another way to find out the endianness? > >At the very least, it should remain possible for the mpi makefile to build >MPI that runs correctly and uses the available performance techniques >without using NSPR. If achieving that will add more than a few days to >the schedule, then some compromise can be arranged. I'd be happy to use it if you can supply one. the only places in freebl that need Endianness or out of mpi. I tried to construct the ifdef so that if you supply the endianness at build time, somehow, I don't have resort to grabbing the NSPR definition. Does MPI use autoconfig (I think autoconfig has dynamic detection of endianness that could be used to set the right variable in the build environment?). .... this ends my partial response. I'll adjust most of these in my tree shortly. I'm willing to take suggestions on how to deal with the endianness issue.
Comment 36•19 years ago
|
||
This patch was tested on Redhat 32/64 x86; Solaris 32/64 x86 with both gcc and studio; Solaris sparc. Detailed test results will follow shortly. The patch rolls everything together except the testing patch to rsaperf.
Updated•19 years ago
|
Attachment #196856 -
Attachment is obsolete: true
Attachment #196873 -
Attachment is obsolete: true
Attachment #196929 -
Attachment is obsolete: true
Comment 37•19 years ago
|
||
I'd like to avoid pasting large graphs and duplicating Neil's work in the bug, so I'll just summarize my results. In 32-bit builds, regardless of arch (x86 vs sparc) or OS (RH x86 vs Solaris x86) or compiler (gcc vs studio), adding Bob's code to the trunk had no measurable effect on performance. Enabling or disabling his code with the new freebl API had almost no effect either. In 64-bit builds, however, the story is quite different. As Neil reported, RSA performance does improve several (1-5%) percent when adding Bob's patch and disabling his code with the freebl API. When enabling his code in gcc builds, the results range from a 1.5% degradation from the baseline (trunk with no patches) on RH Linux 3 AMD64 to a tiny improvement on Solaris AMD64. I didn't get quite the improvements that Neil did. I think that's because of the cache differences between our respective AMD64 hardware. On Solaris Studio AMD64 builds using the assembly code attached to the bug, I found a 4-6% improvement, which I also attribute to icache peculiarities. Bottom line: the most damage this code can do to RSA performance is about two percent, while it actually helps in most 64-bit environments.
Comment 38•19 years ago
|
||
I removed the assembly from the large patch because it was enormous. Also, the DOS-version replacement for mpcpucache.c that I received was causing the entire file to be diffed, which is obviously useless. In the process of combining several of Bob's patches, I only fixed syntax errors and build problems on several platforms. I didn't change the core C code, except to add cpuid for AMD64. That is why I obsoleted the other patches. Sorry for the inconvenience(s).
Attachment #197477 -
Attachment is obsolete: true
Comment 39•19 years ago
|
||
Comment on attachment 197495 [details] [diff] [review] same as above, minus the assembly; mpcpucache converted from dos format Some comments on the patch for this one file: >Index: mpi/mpcpucache.c >-#if defined(i386) || defined(__i386) || defined(__X86__) || defined (_M_IX86) >+#if defined(i386) || defined(__i386) || defined(__X86__) || defined (_M_IX86) || defined(__x86_64__) > /* X86 processors have special instructions that tell us about the cache */ > #include "string.h" > >+#if defined(__x86_64__) >+#define AMD_64 1 >+#endif Why define a synonym for __x86_64__ ? Why not just use __x86_64__ instead of AMD_64 below? I could understand if you were creating a single symbol that meant that any of those 5 other *x86* symbols was defined, but it appears that AMD_64 is just a synonym for one of them. > { > /* sigh GCC isn't smart enough to save the ebx PIC register on it's own > * in this case, so do it by hand. */ Is that because the second output argument is specified below as =r instead of =b ? > __asm__("pushl %%ebx\n\t" >- "cpuid\n\t" >- "mov %%ebx,%1\n\t" >- "popl %%ebx\n\t" >+ "cpuid\n\t" >+ "mov %%ebx,%1\n\t" >+ "popl %%ebx\n\t" > : "=a" (*eax), > "=r" (*ebx), > "=c" (*ecx), > "=d" (*edx) > : "0" (op)); > } >@@ -198,17 +218,18 @@ > CacheType type; > unsigned long data; > #define pageSize size > #define trcuops size > unsigned long size; > unsigned long association; > #define tlbEntries lineSize > unsigned long lineSize; >-} CacheMap[] = { >+}; >+static const struct _cache CacheMap[256] = { > /* 00 */ {Cache_NONE, DATA_NONE, 0, 0, 0 }, > /* 01 */ {Cache_TLB, DATA_INSTR, TLB_4k, 4, 32 }, > /* 02 */ {Cache_TLB, DATA_INSTR, TLB_4M, 0, 2 }, This table is huge (10KB on AMD64). Can it be made smaller? Could any of those be reduced from longs (which are 64b on AMD64) to ints or shorts or even chars? >-/* target.mk can define LINE_SIZE if it's common for the family or OS */ >-#if defined(LINE_SIZE) && !defined(MPI_GET_PROCESSOR_LINE_SIZE_DEFINED) >+/* target.mk can define MPI_CACHE_LINE_SIZE if it's common for the family or >+ * OS */ >+#if defined(MPI_CACHE_LINE_SIZE) && !defined(MPI_GET_PROCESSOR_LINE_SIZE_DEFINED) target.mk is an mpi makefiles that's not used in freebl builds. Maybe you want to say "The Makefile can define ..."
Assignee | ||
Comment 40•19 years ago
|
||
>>+#if defined(__x86_64__) >>+#define AMD_64 1 >>+#endif > > Why define a synonym for __x86_64__ ? > Why not just use __x86_64__ instead of AMD_64 below? Basically because I wasn't sure that __x86_64__ was going to be sufficient. If we run into another compiler with has a different define for 64 bit x86, this makes adding that define easier. >> /* sigh GCC isn't smart enough to save the ebx PIC register on it's own >> * in this case, so do it by hand. */ > > Is that because the second output argument is specified below > as =r instead of =b ? Other way around. If the parameter is =b and PIC is turned on GCC complains because it doesn't know how, or doesn't want to, save the ebx, so we have to save it by hand, and use another register (which at this point we really don't care which, thus the =r. >>+static const struct _cache CacheMap[256] = { >> /* 00 */ {Cache_NONE, DATA_NONE, 0, 0, 0 }, >> /* 01 */ {Cache_TLB, DATA_INSTR, TLB_4k, 4, 32 }, >> /* 02 */ {Cache_TLB, DATA_INSTR, TLB_4M, 0, 2 }, > > This table is huge (10KB on AMD64). Can it be made smaller? > Could any of those be reduced from longs (which are 64b on AMD64) > to ints or shorts or even chars? Yes, this is a generic table. Not only can these vaules be changed to ints/shorts and chars, several of the fields could be removed. We really only care about DATA L1, L2, and L3 caches, and only the LineSize, not the actual cache size. There are other more intrusive ways to reduce the table size (since the table is sparse) as well, but just doing the above (and storing them in chars) reduces the table down to 512 bytes. bob
Comment 41•19 years ago
|
||
(In reply to comment #40) > Basically because I wasn't sure that __x86_64__ was going to be sufficient. If > we run into another compiler with has a different define for 64 bit x86, this > makes adding that define easier. OK, there already is one other known symbol for AMD64. __x86_64 > >> /* sigh GCC isn't smart enough to save the ebx PIC register on it's own > >> * in this case, so do it by hand. */ > > > > Is that because the second output argument is specified below > > as =r instead of =b ? > > Other way around. If the parameter is =b and PIC is turned on GCC complains > because it doesn't know how, or doesn't want to, save the ebx, so we have to > save it by hand, and use another register (which at this point we really don't > care which, thus the =r. Is the ebx register not used for this purpose in the 64-bit ABI? In the AND_64 part of this patch, changing that =r to =b solved numerous problems, including the saving/restoring of ebx. I'm wondering why it would do that for 64 bit code and not 32. > > This table is huge (10KB on AMD64). Can it be made smaller? > > Could any of those be reduced from longs (which are 64b on AMD64) > > to ints or shorts or even chars? > > Yes, this is a generic table. Not only can these vaules be changed to > ints/shorts and chars, several of the fields could be removed. We really only > care about DATA L1, L2, and L3 caches, and only the LineSize, not the actual > cache size. > > There are other more intrusive ways to reduce the table size (since the > table is sparse) as well, but just doing the above (and storing them in > chars) reduces the table down to 512 bytes. Please do that. We'll have to respin the .s files, but the savings is worth the effort.
Assignee | ||
Comment 42•19 years ago
|
||
>> Other way around. If the parameter is =b and PIC is turned on GCC complains >> because it doesn't know how, or doesn't want to, save the ebx, so we have to >> save it by hand, and use another register (which at this point we really don't >> care which, thus the =r. > > Is the ebx register not used for this purpose in the 64-bit ABI? > In the AND_64 part of this patch, changing that =r to =b solved numerous > problems, including the saving/restoring of ebx. I'm wondering why it > would do that for 64 bit code and not 32. At a guess, it's because PIC is not turned on in 64 bit (I have no idea why it's on for freebl for 32 bit). The =b code works just find in 32 bit as well if PIC is turned off. (the original test programs I wrote using cpuid do not use the nasty 'hand save ebx'. That is why I suggested that as a possible fix for 64 bit (as a result we wind up with no explicit register references, so we don't have to figure you if we need to push ebx or push rbx...). > > > This table is huge (10KB on AMD64). Can it be made smaller? > > Yes, this is a generic table. > Please do that. We'll have to respin the .s files, but the savings is > worth the effort. Will do.
Comment 43•19 years ago
|
||
Some info on the __xxxcpu and __xxxcpu__ macros: I've found that __xxxcpu is defined by both the native (i.e., OS vendor's) compiler and GCC. In addition, GCC also defines __xxxcpu__. xxxcpu, without any leading underscores, is considered to violate some ANSI or ISO C or POSIX requirement, so compilers generally don't use macros with such names. But NSPR and NSS's build systems sometimes define xxxcpu on the compiler command line (with a -Dxxxcpu flag). So between __x86_64 and __x86_64__, I believe that __x86_64 is better.
Comment 44•19 years ago
|
||
Actually, PIC is turned on for 64-bit builds too. It seems that the difference between 32-bit and 64-bit builds on x86 is just the number of registers the compiler is willing to use for temps. In fact, if you compile the same 64-bit code that calls cpuid with gcc -O3 instead of gcc -O2, you get compile errors. The code works on all platforms as it is now, so I vote that we leave the inline assembly for cpuid alone. We should solve the x86_64 define problem as it is already solved in loader.c and sha_fast.h - test for __x86_64__ OR __x86_64. Checking both of these defines justifies the AMD_64 variable currently in mpcpucache.c. Finally, I'd like to test the performance of the new mpcpucache.c file once we reduce the size of the cache map array. Unfortunately, we've seen a lot of very unpredictable effects due to caching, so I wouldn't be surprised if performance decreased even though we are removing code. Personally, I'm not too worried about the readability of an assembly file, especially the .data section.
Assignee | ||
Comment 45•19 years ago
|
||
Here's the replacement with the smaller array size. Old array size 4*5*256 or 5k on 32bit platforms, 8*5*256 or 10k on 64bit platforms to 1*2*256 or 512 bytes on all platforms.
Comment 46•19 years ago
|
||
(In reply to comment #35) > I have not built with the MPI's makefiles, > what are the instructions for doing so? Set an environment variable TARGET to one of these values: mipsIRIX alphaOSF1 v9SOLARIS v8plusSOLARIS v8SOLARIS ia64HPUX ia64HPUX64 PA2.0WHPUX PA2.0NHPUX PA1.1HPUX 32AIX 64AIX x86LINUX AMD64SOLARIS (hmm, windows seems to be missing from that list :/ ) Then cd nss/lib/freebl/mpi ./all-tests That shell script builds mpi and runs MPI's test suite against it.
Comment 47•19 years ago
|
||
Comment on attachment 197733 [details] [diff] [review] Add __x86_64 to our defines, reduce the array size r=nelson. Saul, please verify that the big table is now only 512 bytes total size, and regenerate the .s files for it to be sure.
Attachment #197733 -
Flags: review+
Comment 48•19 years ago
|
||
(In reply to comment #47) > (From update of attachment 197733 [details] [diff] [review] [edit]) > r=nelson. Saul, please verify that the big table is now only 512 bytes total > size, and regenerate the .s files for it to be sure. > I tested all builds for AMD64 and Sparc 64-bit. Everything looks good. Performance even went up slightly with gcc. The new .s file has 512 entries in the cache map, all of which are .byte.
Comment 49•19 years ago
|
||
Comment on attachment 197495 [details] [diff] [review] same as above, minus the assembly; mpcpucache converted from dos format Bob, the attachment that you've asked me to review is now marked obsolete. Is *this* attachment the one I should be reviewing now?
Assignee | ||
Comment 50•19 years ago
|
||
Comment on attachment 197733 [details] [diff] [review] Add __x86_64 to our defines, reduce the array size I've checked in this version of the patch, since it's got a r+, is seperable (it doesn't depend on other changes, but other changes depend on it). It will reduce the noise if we have to add to the patch. Saul you can pick it up directly from CVS. /cvsroot/mozilla/security/nss/lib/freebl/mpi/mpcpucache.c,v <-- mpcpucache.c new revision: 1.3; previous revision: 1.2 done
Assignee | ||
Comment 51•19 years ago
|
||
Comment on attachment 196873 [details] [diff] [review] replacement full patch. includes fixed mpi-priv.h and mpmontg.c Go ahead and finish your review on this patch. The essitial stuff that needs to be reviewed shouldn't have changes (except mpcpucache.c which you've already r+). You've already given several comments to me which I've been incorporating, so it's clear there will have to be a new version of the mpmontg.c code and some header files. The other patch is one that saul rolled up his changes in. We should probably split those out as a separate patch for review. We can check in all the Makefile and .s files before mpmontg.c bob
Attachment #196873 -
Attachment is obsolete: false
Comment 52•19 years ago
|
||
Bob, I am now reviewing your weaving code, and I just noticed this comment: * on some platforms character stores into memory is very expensive since they * generate a read/modify/write operation on the bus. On those platforms * we need to do integer writes to the bus. To eliminate RMW cycles, on those platforms we need to do writes of the largest integer size, which is likely to be "long" or ptrdiff_t, not int. That is, 64-bits, not 32. I'm not certain, but I think your code is always using 32-bits for that case. On AMD64, that won't eliminate any RMWs. It might reduce their number to 1/4 of what they would be with byte writes, but it still won't eliminate them.
Assignee | ||
Comment 53•19 years ago
|
||
The following patch implements my proposals to address nelson's issues. This patch does not include the mpcpucache code because it is checked in. It also does not have the AMDX86 and Solaris .s files and the Makefile changes for them since I don't have systems to generate the .s files or to test the result. The option to turn off safe modexp has been removed based on feedback I have received. As a result mpi-priv.h is the only header that has been modified under all of freebl. The bulk of the changes are still in mpmontg.c. This patch shows the changes against the tip. I will attach an additional patch for reference which will show the differences between this patch that the previously reviewed patch. bob
Attachment #196873 -
Attachment is obsolete: true
Attachment #202738 -
Flags: review?(nelson)
Attachment #196873 -
Flags: review?(nelson)
Assignee | ||
Comment 54•19 years ago
|
||
Assignee | ||
Comment 55•19 years ago
|
||
Attachment #202739 -
Attachment is obsolete: true
Comment 56•19 years ago
|
||
This bug needs to be P1, since we won't ship 3.11 without the fix.
Priority: -- → P1
Comment 57•19 years ago
|
||
Comment on attachment 202738 [details] [diff] [review] Updated Weave patch. Bob, this patch contains changes to two files in cmd/rsaperf. The connection between those 2 changes and the weaving code is not apparent. So, I plan to exclude those two files from my review, unless you tell me how they're related to weaving. >Index: cmd/rsaperf/defkey.c >Index: cmd/rsaperf/rsaperf.c
Assignee | ||
Comment 58•19 years ago
|
||
Ah, yes. They are tangentially related to weaving. The patch creates a static 2k RSA key, which was useful in testing performance. It is not necessary to get these into 3.11. bob
Assignee | ||
Comment 59•19 years ago
|
||
Attachment #202738 -
Attachment is obsolete: true
Attachment #203766 -
Flags: review?(nelson)
Attachment #202738 -
Flags: review?(nelson)
Assignee | ||
Comment 60•19 years ago
|
||
inter patch diff should work on the latest patch. bob
Comment 61•19 years ago
|
||
Comment on attachment 203766 [details] [diff] [review] Incorperate Nelson's comments. 2 unresolved comment issues (for which I wouldn't hold up this patch), and one logic change that I think was unintentional. >+ * Our same logical array p array, m is sizeof(mp_digit), >+ * N is still count and n is now b_size. If we define p[i].digit[j]0 as the >+ * most significant byte of the word p[i].digit[j], p[i].digit[j]1 as >+ * the next most significant byte of p[i].digit[j], ... and p[i].digit[j]m >+ * is the least significant byte.our array would look like: >+ * p[0].digit[0]0 p[1].digit[0]0 ...... p[N-1].digit[0]0 p[N].digit[0]0 >+ * p[0].digit[0]1 p[1].digit[0]1 ...... p[N-1].digit[0]1 p[N].digit[0]1 >+ * . . >+ * p[0].digit[0]m p[1].digit[0]m ...... p[N-1].digit[0]m p[N].digit[0]m >+ * p[0].digit[1]0 p[1].digit[1]0 ...... p[N-1].digit[1]0 p[N].digit[1]0 >+ * . . >+ * . . >+ * p[0].digit[n]m-1 p[1].digit[n]m-1 ...... p[N-1].digit[n]m-1 p[N].digit[n]m-1 >+ * p[0].digit[n]m p[1].digit[n]m ...... p[N-1].digit[n]m p[N].digit[n]m This table has the same problem as the previous one had. Arrays are shown going from 0..n and 0..m but should be 0..n-1 (definitely) and 0..m-1 (I think). >+ } else { >+ /* up to 8 we can find 2^i-1 in the accum array, but at 8 we our source >+ * and target are the same so we need to copy.. After that, the >+ * value is overwritten, so we need to fetch it from the stored >+ * weave array */ >+ if (i > WEAVE_WORD_SIZE) { That last line was "if (i > 8) {" in the previous patch, and I believe it should now be "if (i > 2*WEAVE_WORD_SIZE) {" in this patch. That seems like an unintended logic change. I won't hold this patch up for any more comment changes. When you've resolved the line above, I'll give it r+. I think we have to respin the release candidate because of problems on HPUX and Linux. So, there's time to get this fix into 3.11, I think.
Attachment #203766 -
Flags: review?(nelson) → review-
Assignee | ||
Comment 62•19 years ago
|
||
That's for the quick review. on the previous patch nelson..
Attachment #203766 -
Attachment is obsolete: true
Attachment #203831 -
Flags: review?
Assignee | ||
Comment 63•19 years ago
|
||
Comment on attachment 203831 [details] [diff] [review] fix missing 2x in front of MP_WEAVE_WORD_SIZE Sigh. actual mark this as request for review.....
Attachment #203831 -
Flags: review? → review?(nelson)
Assignee | ||
Comment 64•19 years ago
|
||
Comment on attachment 203831 [details] [diff] [review] fix missing 2x in front of MP_WEAVE_WORD_SIZE This is not the patch!
Attachment #203831 -
Attachment is obsolete: true
Attachment #203831 -
Flags: review?(nelson)
Assignee | ||
Comment 65•19 years ago
|
||
OK this should be the correct patch
Attachment #203849 -
Flags: review?
Assignee | ||
Comment 66•19 years ago
|
||
Comment on attachment 203849 [details] [diff] [review] fix missing 2x in front of MP_WEAVE_WORD_SIZE Sigh this one should be correct now..
Attachment #203849 -
Flags: review? → review?(nelson)
Comment 67•19 years ago
|
||
I carried forward parts of Saul's patch. Bob, please review.
Attachment #203896 -
Flags: review?(rrelyea)
Assignee | ||
Comment 68•19 years ago
|
||
Comment on attachment 203896 [details] [diff] [review] supplemental patch for .s files for Sun Studio compilers Initial developer should read: Red Hat, Inc. other than that r+
Attachment #203896 -
Flags: review?(rrelyea) → review+
Comment 69•19 years ago
|
||
Comment on attachment 203849 [details] [diff] [review] fix missing 2x in front of MP_WEAVE_WORD_SIZE I am giving r+ to a subset of this patch, namely the 4 files changed in nss/lib/freebl/mpi. This patch will be suplemented by the following patch.
Attachment #203849 -
Flags: review?(nelson) → review+
Comment 70•19 years ago
|
||
Checked in "Supplemental Patch" Checking in Makefile; new revision: 1.70; previous revision: 1.69 Checking in manifest.mn; new revision: 1.44; previous revision: 1.43 Checking in mpi/mpcpucache_amd64.s; initial revision: 1.1 Checking in mpi/mpcpucache_x86.s; initial revision: 1.1
Comment 71•19 years ago
|
||
In this morning's nightly builds, shlibsign crashed on some 64-bit platforms, notably AIX and HPUX. The problem is in the definition of the macro MP_ALIGN. It produces a 32-bit result. This patch fixes it. -#define MP_ALIGN(x,y) ((((ptrdiff_t)(x))+((y)-1))&(~((y)-1))) +#define MP_ALIGN(x,y) ((((ptrdiff_t)(x))+((y)-1))&(((ptrdiff_t)0)-(y))) The issue is that the type of the expression (~((y)-1)) is the type of y. If y is an unsigned int, then the expression is an unsigned 32-bit value. That value then gets converted to a ptrdiff_t, to be anded with the other expression, but since it is unsigned, the sign bit is not extended, so the and always produces a result in which only the low order 32 bits can be non-zero. The fix is to change the expression to one that computes a mask of type ptrdiff_t, rather than a mask of the type of "y". That's what this patch does.
Assignee | ||
Comment 72•19 years ago
|
||
r=relyea for nelson's patch.
Assignee | ||
Comment 73•19 years ago
|
||
NOTE: nelson's patch only works for 2's complement platforms. That's true of all the NSS supported platforms (and most systems), so we should go with it. We might want put a comment for the poor fool that is tasked to port this to an old Cray...;), though platforms that use 1's complement are often the some platform where byte != 8 bits... I'm pretty sure we would break pretty horribly on the latter anyway.
Assignee | ||
Comment 74•19 years ago
|
||
^the some^the same^
Comment 75•19 years ago
|
||
Why don't we use ~((ptrdiff_t)(y)-1)) instead of ((ptrdiff_t)0)-(y))?
Comment 76•19 years ago
|
||
In answer to comment 75, because it's one ALU operation instead of two.
Comment 77•19 years ago
|
||
We're continuing to test this. Christophe, this is the patch you wanted.
Attachment #203993 -
Flags: review?(rrelyea)
Assignee | ||
Comment 78•19 years ago
|
||
Comment on attachment 203993 [details] [diff] [review] Fix bug in MP_ALIGN macro r=relyea
Attachment #203993 -
Flags: review?(rrelyea) → review+
Comment 79•19 years ago
|
||
Checking in mpmontg.c; new revision: 1.17; previous revision: 1.16 Checked in MP_ALIGN fix. In addition to the shlibsign failures on HPUX and AIX, we saw a crash in selfserv on AMD64 with RHEL OS.
Comment 80•18 years ago
|
||
What, if anything, remains to be done for this bug? Was this bug resolved for NSS 3.11 ?
Assignee | ||
Comment 81•18 years ago
|
||
yes, closing it now..
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Comment 82•18 years ago
|
||
While reviewing one of Douglas Stebila's patches to mpi, I made some startling discoveries. I now think that we must not be defining MP_CHAR_STORE_SLOW on ANY platforms, because the code simply won't compile if we do. Rather than reopening this bug, I will file a new one.
Comment 83•18 years ago
|
||
It appears that the weaving does not take effect for all versions of freebl. For example, the two freebl versions that use floating point on Sparc don't go through the "safe" mp_exptmod_safe_i function . Do we need to improve the floating point code as well WRT cache attacks ? And are there other freebl versions that don't have this fix ?
You need to log in
before you can comment on or make changes to this bug.
Description
•