Closed Bug 298630 Opened 19 years ago Closed 18 years ago

freebl needs a memory cache invariant RSA implementation

Categories

(NSS :: Libraries, defect, P1)

3.9.7
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rrelyea, Assigned: rrelyea)

Details

Attachments

(9 files, 14 obsolete files)

22.49 KB, patch
Details | Diff | Splinter Review
2.46 KB, text/html
Details
2.74 KB, text/html
Details
46.07 KB, patch
Details | Diff | Splinter Review
32.61 KB, patch
nelson
: review+
Details | Diff | Splinter Review
20.69 KB, patch
Details | Diff | Splinter Review
41.86 KB, patch
nelson
: review+
Details | Diff | Splinter Review
30.48 KB, patch
rrelyea
: review+
Details | Diff | Splinter Review
821 bytes, patch
rrelyea
: review+
Details | Diff | Splinter Review
There has been demonstrated attacks against RSA by snooping the modulus exp code
and it's cache usage profile. NSS needs to have an implementation that has the
same cache usage profile independent of the value of the private key.
Target Milestone: --- → 3.11
Summary: freebl needs a cache invariant implementation. → freebl needs a memory cache invariant RSA implementation
This patch includes all the weaving strategies. Which strategy used is based on
the define 'MP_USING_WEAVE_COPY'. Currently it's defined as '1'.

The file mpcpucache.c is effectively a new file. The whole file shows up
because the checked in version has DOS line endings. Since it's a new file, it
should be reviewed in it's entirety anyway.
Attachment #190632 - Flags: superreview?(nelson)
Attachment #190632 - Flags: review?(wtchang)
While this is a complete patch (so it can be applied as one), the only
difference between this patch and patch1 is in mpmontg.c where the #ifdefs
around MP_USING_WEAVE_COPY were removed as if MP_USING_WEAVE_COPY were defined.
The code in this mpmontg.c is much easier to read than it's brother in patch1.
Attachment #190633 - Flags: superreview?
Attachment #190633 - Flags: review?
Same as the previous patch except MP_USING_WEAVE_COPY is removed as if it were
undefined rather than defined. This code is slightly more complicated than
patch2 but still less complicated than patch 1.
Attachment #190634 - Flags: superreview?
Attachment #190634 - Flags: review?
Attachment #190633 - Attachment description: hand removed ifdef 'MP_USING_WEAVE_COPY' → hand defined ifdef 'MP_USING_WEAVE_COPY' (-D)
In reviewing the diffs, it might be easiest to review patch2 first, then review
the diffs between patch2 and patch1, but I leave it to you to decide. If you
think we should just check in patch1, then just review that patch.

bob
Comment on attachment 190633 [details] [diff] [review]
hand defined ifdef 'MP_USING_WEAVE_COPY' (-D)

I'm pretty sure nelson is reviewing this patch, just put it on his radar.
Attachment #190633 - Flags: superreview? → superreview?(nelson)
Comment on attachment 190632 [details] [diff] [review]
'general' patch, including #defs for different weave strategies

I have reviewed mpcpucache.c and emailed review feedback to Bob.
I will continue to review the other parts of this patch.
Attachment #190632 - Flags: superreview?(nelson) → superreview-
Comment on attachment 190632 [details] [diff] [review]
'general' patch, including #defs for different weave strategies

Bob, I have a request concerning mpcpucache.c.
Your present patch for that file merely corrects all the line endings.	Please
check that in as is, then make a separate patch, to be separately checked in,
to address the review feedback I sent you.  That way, subsequent reviewers of
the cvs diffs will actually be able to see what was really changed.
So, SR+=nelson for the mpcpucache.c patch exactly as attached above.
Comment on attachment 190632 [details] [diff] [review]
'general' patch, including #defs for different weave strategies

remove old review request
Attachment #190632 - Flags: review?(wtchang)
Comment on attachment 190633 [details] [diff] [review]
hand defined ifdef 'MP_USING_WEAVE_COPY' (-D)

Previously I reviewed mpcpucache.c and sent review comments to Bob.
Now I'm reviewing the various patches to mpmontg.c.
Here are some preliminary comments (not a full review):

> #include "mpi-priv.h"
>+#include "mp_gf2m-priv.h"
> #include "mplogic.h"
> #include "mpprime.h"
> #ifdef MP_USING_MONT_MULF
> #include "montmulf.h"
> #endif
>+#include "prtypes.h"

Two #includes were added, but neither seems necessary.
mpi sources should not use NSPR headers, types, or methods.
I don't think this file uses anything from mpi's private ecc header.

>+#define MAX_POWERS MAX_ODD_INTS*2
>+#define MAX_MODULUS_BITS 8192
>+#define MAX_MODULUS_LENGTH (MAX_MODULUS_BITS/8)

I'd much prefer MAX_MODULUS_BYTES or MAX_MODULUS_CHARS to
MAX_MODULUS_LENGTH.  That way the units are unambiguous.



>+#define SQR(a,b) \
>+  MP_CHECKOK( mp_sqr(a, b) );\
>+  MP_CHECKOK( s_mp_redc(b, mmm) );
>+
>+#if defined(MP_MONT_USE_MP_MUL)
>+#define MUL(x,a,b) \
>+  MP_CHECKOK( weave_to_mpi(&tmp, powers + (x), nLen, num_powers) ); \
>+  MP_CHECKOK( mp_mul_weave(a, &tmp, b) ); \

In this patch, there is no function by the name mp_mul_weave.
I believe you want mp_mul in this case.

>+  MP_CHECKOK( s_mp_redc(b, mmm) ) ; 
>+#else
>+#define MUL(x,a,b) \
>+  MP_CHECKOK( weave_to_mpi(&tmp, powers + (x), nLen, num_powers) ); \
>+  MP_CHECKOK( s_mp_mul_mont(a, &tmp, b, mmm) );
>+#endif

A comment about the "weaving" strategy.  The weave functions 
essentially do byte-at-a-time copying.	That really hurts on AMD64,
where byte writes are read-modify-write memory cycles, but mp_digit
(64b) writes are just writes.  So, I'd suggest using a strategy (especially
for weave_to_mpi) of loading all the bytes (including shifting and 
or'ing) and then doing a single mp_digit store for each mp_digit, rather 
than doing 8 byte writes for each mp_digit.  

I suspect that the opposite direction (mpi_to_weave) can also be done 
that way, by gathering the bytes from the 8 different multipliers to be 
woven together and then doing one store of the woven digit.  
IINM, the other patch (that complements this one) sort-of works 
that way.

I believe that doing one memory write for each 4 or 8 memory reads, 
rather than one-for-one, will be a performance win on all platforms.

Finally, I think the ultimate decision of which way to go (that is,
with MP_USING_WEAVE_COPY defined or undefined) should be determined
by which is fastest, *after* the work has been done to try doing the
4-or-8 reads for every write.
Comment on attachment 190633 [details] [diff] [review]
hand defined ifdef 'MP_USING_WEAVE_COPY' (-D)

My previous review comments left the impression that I had more review comments
to follow.  But I believe those comments were complete.  I hope this has not
been holding up progress on this bug.
Attachment #190633 - Flags: superreview?(nelson) → superreview-
Attachment #190633 - Flags: review?
Attachment #190634 - Flags: superreview?
Attachment #190634 - Flags: review?
This patch is the equivalent of the 'general' patch. I think we can check in
only one variant of the currently 4 we should check.

The variants are:
Variant #1: (the one I think we'll use)
#define MP_USING_WEAVE_COPY 1
#define MP_CHAR_STORE_SLOW 1

Variant #2:
#define MP_USING_WEAVE_COPY 1
/* #define MP_CHAR_STORE_SLOW 1 */

Variant #3:
/* #define MP_USING_WEAVE_COPY 1 */
#define MP_CHAR_STORE_SLOW 1

Variant #4
/* #define MP_USING_WEAVE_COPY	1 */
/* #define MP_CHAR_STORE_SLOW	1 */

I've tested this against WINDOWS (intel32) RHEL (intel32) and MAC OS X (ppc).
In all those tests Variant #1 was either significantly faster, or on par with
all the other variants. 

If I could get someone to test this on AMD64 to confirm Variant #1 has the same
or better performance than the other variants, and it's less to 2 % hit against
the non-weave case, I'll attach a new patch with just that variant.

Note: this patch also includes a method to turn off any cache invariant
operations at runtime.
Attachment #190632 - Attachment is obsolete: true
Attachment #190633 - Attachment is obsolete: true
Attachment #190634 - Attachment is obsolete: true
This patch addes 2 things to rsa_perf: 1) a fixed 2048 bit RSA key for
performance testing of larger keys, and 2) a flag to turn off cache invariant
mod exp. I've been running:

# baseline 'fast' 1024 bit operation
rsaperf -i 500 -s -n none -z 
# cache invariant version
rsaperf -i 500 -s -n none
# baseline 'fast' 2048 bit operation
rsaperf -i 500 -s -k 2048 -n none -z
# cache invariant 2048 bit key
rsaperf -i 500 -s -k 2048 -n none

NOTE:  none of these tests will detect incorrect modexp operations. You can
verify the modexp is working correctly by running:
rsaperf -i 1 -s -n none -g

I've run each script a couple of times for each variant to make sure my numbers
are halfway consistant.
Attached file performance test results (obsolete) —
These results were obtained with the 32-bit library running on a dual AMD
Opteron box.
Comparing the results with the baseline, it appears that variant 3 is fastest on
the Opteron (where variant 1 is faster on 32 bit intel and ppc).

It appears we want to include a patch where we have the option of either using
weave copy or not.

It also appears there is little need for us to keep the byte copy around.

Under those conditions, let's get a review of the patch as is. I'll unifdef out
the Byte copy stuff later (it's not as extensive as the WEAVE_COPY patch.

bob
Comment on attachment 196846 [details]
performance test results 

Bob,

Please hold off interpreting these results, as there were a couple of issues in
the run that I just discussed with Neil.
Attachment #196846 - Attachment is obsolete: true
Bob,

I built your patch on AMD64 Solaris, 64-bit, with both the studio and gcc
compilers . The shlibsign signing of the build process fails 8 out of 10 times
in either the first or second DSA keygen . I didn't review the patch, but this
problem will obviously need to be fixed. We don't get any shlibsign failures in
our nightly builds of AMD64 Solaris.
Under which options (variants) are failing?

Running rsaperf with the -g function should put a fair amount of pressure on the
modexp. I found that even minor problems in modexp would make it practically
impossible to generate an rsa key.

bob
Warning: the COMBA changes to mpi-priv.h are clobbered by this patch I'll attach
a new patch with the updated mpi-priv.h.

bob
Bob, the failure is in DSA, not RSA, and it is seen with shlibsign, which
doesn't have variants. One of the failure log is below, where the first keygen
succeeded and the second one failed :

sh ./sign.sh ../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ
SunOS5.10_i86pc_gcc_64_OPT.OBJ SunOS
../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib
../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib/libsoftokn3.so
SunOS5.10_i86pc_gcc_64_OPT.OBJ/shlibsign -v -i
../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib/libsoftokn3.so
Generating DSA Key Pair....done
Library File:
../../../security/nss/lib/softoken/SunOS5.10_i86pc_gcc_64_OPT.OBJ/libsoftokn3.so
443032 bytes
Check File:
../../../security/nss/lib/softoken/SunOS5.10_i86pc_gcc_64_OPT.OBJ/libsoftokn3.chk
Link: libsoftokn3.chk
  hash: 20 bytes
    86 0d fc dc 27 b0 94 a7 88 97
    9e a3 2d 69 d3 5d 64 94 3f 8e
  signature: 40 bytes
    4f 85 4c 37 30 b3 b9 e3 5c f5
    18 71 e6 de b5 07 44 56 fe a4
    75 a2 71 1d da c7 53 3b 74 56
    97 74 d2 69 60 e8 13 55 1f dc
sh ./sign.sh ../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ
SunOS5.10_i86pc_gcc_64_OPT.OBJ SunOS
../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib
../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib/libfreebl3.so
SunOS5.10_i86pc_gcc_64_OPT.OBJ/shlibsign -v -i
../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib/libfreebl3.so
Generating DSA Key Pair....Generating DSA Key: An I/O error occurred during
security authorization.
gmake: *** [../../../../dist/SunOS5.10_i86pc_gcc_64_OPT.OBJ/lib/libfreebl3.chk]
Error 1

I was already aware of your patch clobbering the few lines of COMBA in
mpi-priv.h, and I had just restored them for my build.
this replaces the mpi/mpi-priv.h in patch 196202
The modexp code is common between the 2. I had found the rsa code to be the more
sensitive of the 2.

Anyway, the question still stands, which variant are you using? (what are the
settings of the #define's). That will help me isolate which section seems to be
causing problems.

I'm currently pounding on things to try to reproduce the problem you are seeing.

bob
I didn't change any of the defines. I just applied patches 196202 and 196856 to
a fresh tree from the tip.
These results were obtained by running the optimized build of NSS. The previous
attachment was run with the debug build.

For this test I used "rsaperf -p 10 ..." instead of the -i option because it
runs longer and seems to give more consistent results.
OK thanks Julian, that should help narrow things a bit.

Could you try the following for me:
add -DMPI_CACHE_LINE=64 to the MPI_CONFIG variable for AMD64SOLARIS in
mpi/target.mk.

I'll keep looking into the issue, but it looks like a problem with window sizes
of 5. Long term I think we want that value in the ADM64SOLARIS config until we
can get a 64bit version of mpcpucache.c working.

bob
Bob,

The libfreebl build doesn't use mpi/target.mk - everything is in freebl/Makefile
. Presumably, you meant to define MPI_CACHE_LINE_SIZE , not MPI_CACHE_LINE . I
did that in the Solaris amd64 section and tested it, and shlibsign now succeeds
10 out of 10 times. You would probably have the same issue on Linux amd64, but I
didn't test that.
Thanks Neil, those numbers look closer to what I was expecting... well actually
better than I was expecting. I was expecting to see a 1.5-2.0% degregation for
cache invarience for 1024 keys, and we see closer to .5-1.5% with variant 4
being the fastest.

Let me see what is going on with the window 5 size and we'll be ready.

bob
Attached patch mpmont.c with proper clamping. (obsolete) — Splinter Review
This replaces mpmont.c the big patch. I'll attach another version of the big
patch as well.

The differences between this file and the one in the big patch should be
completely in mp_exptmod.
OK, here's the new patch. I found a couple of errors in mpmontg.c that was in
the previous patch (a debugging #error that tripped on ppc, but I forgot to
remove from my to be checked in version, and an incorrect macro for a case that
currently is never compiled (mp_digit !=32 & != 64). It also contains the
clamping fix.
Attachment #196202 - Attachment is obsolete: true
Attachment #196872 - Attachment is obsolete: true
Thanks Julien, 

The better way to fix it would be to make mpcpucache.c figure out the cache size
on the fly, but I don't know if the cpuid command works when running in 64-bit
mode. If it does the 32bit code should work (with the 'is386() and is486()' test
ifdef'd out. I think all 64bit machines have > 64-byte cache line sizes, but I'm
not sure it would be safer to find this out on the fly.

bob
(In reply to comment #29)

> The better way to fix it would be to make mpcpucache.c figure out the cache 
> size on the fly, but I don't know if the cpuid command works when running 
> in 64-bit mode. If it does the 32bit code should work (with the 'is386() 
> and is486()' test ifdef'd out. 

Please supply a small patch to mpcpucache.c to test that hypothesis.  
BTW, the issue of using an instruction that requires assembly code presents
a challenge for AMD64 on Solaris, where we are required to use Sun Studio
instead of gcc.  So, if we get the CPUID code working on that platform, 
we'll need to make another .s file from that code, one that can be assembled
with studio's assembler, for use in the solaris builds.  We're willing to 
make the .s file(s) if you'll supply the mpcpucache.c file that runs the 
CPUID instruction on AMD64 with gcc.
Attached patch 64 bit test patch. (obsolete) — Splinter Review
This patch should be applied to 196873's version of mpcpucache.c It was created
with a diff -c.

The defines are probably not quite right. The key here is that we need to add
defines to the long #if  which will cause all x86 platforms (32 and 64 bit) to
enter this code path reguardless of OS. The USE_64 should only be defined for
the actual 64 bit abi platforms that will call this. If I got those defines
mixed up, then they will need to be adjusted the '#define wcpuid' for 64 bit
processors.

bob

If the cpuid command has a different form, we can make the
I've found the AMD manual on CPUID at

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24594.pdf

It says:
"Provides information about the processor and its capabilities through a number
of different functions. Software should load the number of the CPUID function to
execute into the EAX register before executing the CPUID instruction. The
processor returns information in the EAX, EBX, ECX, and EDX registers; the
contents and format of these registers depend on the function.

The architecture supports CPUID information about standard functions and
extended functions. The standard functions have numbers in the 0000_xxxxh
series (for example, standard function 1). To determine the largest standard
function number that a processor supports, execute CPUID function 0.

The extended functions have numbers in the 8000_xxxxh series (for example,
extended function 8000_0001h). To determine the largest extended function number
that a processor supports, execute CPUID extended function 8000_0000h. If the
value returned in EAX is greater than 8000_0000h, the processor supports
extended functions.

Software operating at any privilege level can execute the CPUID instruction to
collect this information. In 64-bit mode, this instruction works the same as in
legacy mode except that it zero-extends 32-bit register results to 64 bits.

CPUID is a serializing instruction."

So, in theory, my previous patch should work once it actually compiles correctly.
Comment on attachment 196873 [details] [diff] [review]
replacement full patch. includes fixed mpi-priv.h and mpmontg.c

This is a partial review.  I have not completed the review of all the
changes proposed for mpmontg.c, but I wanted to give you the review
feedback I have so far while I work on the rest.

>+SECStatus BLAPI_SetFlags(int flags, int value);

Please change the name of the function and the name of the first
argument.  Let's not use the word "flag" or "flags".  I suggest
option.  The name of the function and the first argument should be 
singular, not plural, to emphasize that this sets a single option, 
and is not a bit vector of option flag bits.  For that same reason, 
I suggest making the options an enumerated type.  This comment
affects multiple files, and I will not repeat it at every affected
place.

>+/* BLAPI Flags, used in BLAPI_SetFlags */
>+#define BLAPI_SAFE_MODEXP	1 /* if flag is set to 1, use cache invariant
>+                                   * rsa code */

Please make that an enum type, ad define the first value to be 1.
That should help make it clear that the options are mutually 
exclusive, and that multiple options cannot be passed to 
BLAPI_SetOption in one call.  This comment affects multiple files.

>     /* End of Version 3.008. */
>+
>+    BLAPI_SetFlags,
>+    /* End of Version 3.009. */
>+
> };

We already bumped the version number once for 3.11, bumped it to 3.008.  
I don't think we need to bump it again until we've released a version 
of NSS that uses that new number.  So, let's not bump it again unless
we need to.  Let's stick with 3.008 until we've done an NSS release
with that number.  This comment affects several files.


>+SECStatus
>+BLAPI_SetFlags(int flag, int value) 
>+{
>+   if (flag == BLAPI_SAFE_MODEXP) {

I suggest using a switch statement here, switching on flag,
to make it even more clear to future developers that flag is not
a bit field, and that multiple simultaneous values are not allowed.

+	if (mp_set_modexp_mode(value) == MP_OKAY) {

Please change that name.  Not "set mode" which is vague and sounds
like it's intended to be extended.  I suggest a name that clearly
explains what it enables or disables.  
mp_enable_cache_invariance ?  mp_set_constant_cache_time?  

I'm happy for the BLAPI_SetOption function to be extensible, to avoid 
having to add more such functions to the blapi API later, but I think 
the MPI functions should be to the point, with a different one for 
each settable option.  


>Index: mpi/Makefile

>-SRCS=   mpi.c mpprime.c mplogic.c mp_gf2m.c mpmontg.c mpi-test.c primes.c tests/ \
>+SRCS=   mpi.c mpprime.c mplogic.c mp_gf2m.c mpmontg.c mpi-test.c primes.c \
>+	mpcpucache.c tests/ \

In addition to adding more source files, don't we want/need to add some new
-D options to the CC commands?

Also, have you tested that mpi is still buildable from mpi's makefile 
since mpmontg was modified to use NSPR symbols?  (I suspect it isn't)

>Index: mpi/mpcpucache.c

>-char *manMap[] = {
>+static const char *manMap[] = {

I believe you want the word const in two places in that declarataion, 
the second place is after the *.  As written, it declares a non-constant 
array of pointers to constant strings.	The second const will make the 
array of pointers be constant too, which will help keep it in
the text segment.


>Index: mpi/mpi.h
>===================================================================
>RCS file: /cvsroot/mozilla/security/nss/lib/freebl/mpi/mpi.h,v
>retrieving revision 1.22
>diff -u -r1.22 mpi.h
>--- mpi/mpi.h	27 Apr 2004 23:04:36 -0000	1.22
>+++ mpi/mpi.h	21 Sep 2005 01:57:48 -0000
>@@ -80,7 +80,8 @@
> #define  MP_RANGE        -3 /* argument out of range */
> #define  MP_BADARG       -4 /* invalid parameter     */
> #define  MP_UNDEF        -5 /* answer is undefined   */
>-#define  MP_LAST_CODE    MP_UNDEF
>+#define  MP_PROGERR      -6 /* internal programming error in mpi  */

Do we really need this new value?  
Will this value get reflected out to the user?	
Will it be preserved out through the blapi layer, then through the
PKCS11 layer, etc?  I'd prefer that we use assertions, and be 
confident that our code is correctly coded, and show that in our 
error space.  I particularly don't want to increase the number of 
error paths that must be considered by the users of MPI, especially
if they have no effect on what the user sees. 

>Index: mpi/mpmontg.c

> /* #define MP_USING_MONT_MULF 1 */
>+#define MP_USING_CACHE_SAFE_MOD_EXP 1 
>+/*#define MP_USING_WEAVE_COPY 1  */
>+/*#define MP_CHAR_STORE_SLOW 1 */
> #include <string.h>
> #include "mpi-priv.h"
>+#include "mp_gf2m-priv.h"

That last header is an ECC header.  
Is mpmontg now calling any ECC functions?  
If not, why is it including that header file?

The dependencies in MPI should continue to be very well ordered,
and not circular.  ECC code may depend on integer routines, but 
the integer code should not depend on the ECC code.
If there's some really good reason to include that header file 
then maybe the content of that file needs to be refactored.
Maybe part of it really is integer code after all.

> #endif
>+#include <stddef.h> /* ptrdiff_t */

Is stddef.h available on all our platforms?  Including windows and OS/2?
Perhaps that #include may need to be ifdeffed by platform, with a 
different file name on some.  But I'm not sure.

>+/* need to know endianness of this platform. If we aren't told, get it from
>+ * nspr... */
>+#ifdef MP_CHAR_STORE_SLOW
>+#if !defined(IS_BIG_ENDIAN) && !defined(IS_LITTLE_ENDIAN)
>+#include "prcpucfg.h"
>+#endif
>+#endif

OK, this is my single biggest problem with this patch.	
It makes mpi depend on NSPR headers.  
So far, mpi has not depended on any NSPR headers, and can be built 
stand-alone with mpi's own makefile, without needing any other files 
from NSS or NSPR.  Today one can cvs checkout nss/lib/freebl/mpi and
build it without any other NSS or NSPR sources.  I'd like to preserve 
that.  Is there another way to find out the endianness?

At the very least, it should remain possible for the mpi makefile to build
MPI that runs correctly and uses the available performance techniques 
without using NSPR.  If achieving that will add more than a few days to
the schedule, then some compromise can be arranged.

This partial review ends here.	
I'm marking this patch as still awaiting review to help me remember it.
Attachment #196873 - Flags: review?(nelson)
>This is a partial review.

This is a partial response to the partial review;).... (I've only addressed the
issues where questions have been raised)


>>Index: mpi/Makefile

>> -SRCS=   mpi.c mpprime.c mplogic.c mp_gf2m.c mpmontg.c mpi-test.c primes.c
tests/ \
>> +SRCS=   mpi.c mpprime.c mplogic.c mp_gf2m.c mpmontg.c mpi-test.c primes.c \
>> +	mpcpucache.c tests/ \
>
> In addition to adding more source files, don't we want/need to add some new
> -D options to the CC commands?
>
> Also, have you tested that mpi is still buildable from mpi's makefile 
> since mpmontg was modified to use NSPR symbols?  (I suspect it isn't)
> 

I have not built with the MPI's makefiles, what are the instructions for doing so?

I have built mpcpucache.c with just a CC or gcc command line on the platforms I
tested on (cc -o mpcpucache[.exe] mpccpucache.c -DTEST_IT). I've made sure the
native compilier set flags are uses whenever possible.

In mpmontg.c I have had to rely on NSPR to get the endianess of the platform. If
there is a better way to do this in the mpi build environment, I'd be happy to
use it. The code is set up so if mpi supplies these values, NSPR isn't used.

>
>>+#define  MP_PROGERR      -6 /* internal programming error in mpi  */
> 
> Do we really need this new value?  
> Will this value get reflected out to the user?	
> Will it be preserved out through the blapi layer, then through the
> PKCS11 layer, etc?  I'd prefer that we use assertions, and be 
> confident that our code is correctly coded, and show that in our 
> error space.  I particularly don't want to increase the number of 
> error paths that must be considered by the users of MPI, especially
> if they have no effect on what the user sees. 

Here's its only use. I was trying to match what appears to be the normal MPI
Checking usage. I have no problem changing it to simply an assert.

  /* if the accum1 isn't set, then either j was out of range, or our logic
   * above does not populate all the powers values. either case it shouldn't
   * happen and is an internal mpi programming error */
  ARGCHK(MP_USED(&accum1) != 0, MP_PROGERR);

> >+#include "mp_gf2m-priv.h"

> That last header is an ECC header.  
> Is mpmontg now calling any ECC functions?  
> If not, why is it including that header file?

To get the definition of MP_DIGIT_BITS
It wasn't clear that there was anything special about mp_gf2m-priv.h from the
comments. Given your description of how it works, it looks like it's best just
to move MP_DIGIT_BITS definition to mpi-priv.h


>> #endif
>>+#include <stddef.h> /* ptrdiff_t */
> Is stddef.h available on all our platforms?  Including windows and OS/2?
> Perhaps that #include may need to be ifdeffed by platform, with a 
> different file name on some.  But I'm not sure.

It's certainly on windows (windows was the first platform I built and tested
this code on). I would be pretty surprised if it wasn't also in OS/2. It
replaces the NSPR from a previous patch.
stddef.h is part of C89 (like stdio.h and ctype.h), and ptrdiff_t is has been
defined since C89, so it may not be there for really old platforms (win16, maybe
old Mac 68k, but I would be surprised if it's not defined for something modern).

>>+/* need to know endianness of this platform. If we aren't told, get it from
>>+ * nspr... */
>>+#ifdef MP_CHAR_STORE_SLOW
>>+#if !defined(IS_BIG_ENDIAN) && !defined(IS_LITTLE_ENDIAN)
>>+#include "prcpucfg.h"
>>+#endif
>>+#endif

>OK, this is my single biggest problem with this patch.	
>It makes mpi depend on NSPR headers.  
>So far, mpi has not depended on any NSPR headers, and can be built 
>stand-alone with mpi's own makefile, without needing any other files 
>from NSS or NSPR.  Today one can cvs checkout nss/lib/freebl/mpi and
>build it without any other NSS or NSPR sources.  I'd like to preserve 
>that.  Is there another way to find out the endianness?
>
>At the very least, it should remain possible for the mpi makefile to build
>MPI that runs correctly and uses the available performance techniques 
>without using NSPR.  If achieving that will add more than a few days to
>the schedule, then some compromise can be arranged.

I'd be happy to use it if you can supply one. the only places in freebl that
need Endianness or out of mpi. I tried to construct the ifdef so that if you
supply the endianness at build time, somehow, I don't have resort to grabbing
the NSPR definition. Does MPI use autoconfig (I think autoconfig has dynamic
detection of endianness that could be used to set the right variable in the
build environment?).


.... this ends my partial response. I'll adjust most of these in my tree
shortly. I'm willing to take suggestions on how to deal with the endianness issue.
This patch was tested on Redhat 32/64 x86; Solaris 32/64 x86 with both gcc and
studio; Solaris sparc.

Detailed test results will follow shortly.

The patch rolls everything together except the testing patch to rsaperf.
Attachment #196856 - Attachment is obsolete: true
Attachment #196873 - Attachment is obsolete: true
Attachment #196929 - Attachment is obsolete: true
I'd like to avoid pasting large graphs and duplicating Neil's work in the bug,
so I'll just summarize my results.

In 32-bit builds, regardless of arch (x86 vs sparc) or OS (RH x86 vs Solaris
x86) or compiler (gcc vs studio), adding Bob's code to the trunk had no
measurable effect on performance.  Enabling or disabling his code with the new
freebl API had almost no effect either.

In 64-bit builds, however, the story is quite different.  As Neil reported, RSA
performance does improve several (1-5%) percent when adding Bob's patch and
disabling his code with the freebl API.  When enabling his code in gcc builds,
the results range from a 1.5% degradation from the baseline (trunk with no
patches) on RH Linux 3 AMD64 to a tiny improvement on Solaris AMD64.  I didn't
get quite the improvements that Neil did.  I think that's because of the cache
differences between our respective AMD64 hardware.

On Solaris Studio AMD64 builds using the assembly code attached to the bug, I
found a 4-6% improvement, which I also attribute to icache peculiarities.

Bottom line: the most damage this code can do to RSA performance is about two
percent, while it actually helps in most 64-bit environments.
I removed the assembly from the large patch because it was enormous.  Also, the
DOS-version replacement for mpcpucache.c that I received was causing the entire
file to be diffed, which is obviously useless.

In the process of combining several of Bob's patches, I only fixed syntax
errors and build problems on several platforms.  I didn't change the core C
code, except to add cpuid for AMD64.  That is why I obsoleted the other
patches.  Sorry for the inconvenience(s).
Attachment #197477 - Attachment is obsolete: true
Comment on attachment 197495 [details] [diff] [review]
same as above, minus the assembly; mpcpucache converted from dos format

Some comments on the patch for this one file:
>Index: mpi/mpcpucache.c

>-#if defined(i386) || defined(__i386) || defined(__X86__) || defined (_M_IX86)
>+#if defined(i386) || defined(__i386) || defined(__X86__) || defined (_M_IX86) || defined(__x86_64__)
> /* X86 processors have special instructions that tell us about the cache */
> #include "string.h"
> 
>+#if defined(__x86_64__)
>+#define AMD_64 1
>+#endif

Why define a synonym for __x86_64__ ?  
Why not just use __x86_64__ instead of AMD_64 below?
I could understand if you were creating a single symbol that meant that 
any of those 5 other *x86* symbols was defined, but it appears that 
AMD_64 is just a synonym for one of them. 

> {
> /* sigh GCC isn't smart enough to save the ebx PIC register on it's own
>  * in this case, so do it by hand. */

Is that because the second output argument is specified below 
as =r instead of =b ?

> 	__asm__("pushl %%ebx\n\t"
>-	        "cpuid\n\t"
>-                "mov %%ebx,%1\n\t"
>-                "popl %%ebx\n\t"
>+		  "cpuid\n\t"
>+		  "mov %%ebx,%1\n\t"
>+		  "popl %%ebx\n\t"
> 		: "=a" (*eax),
> 		  "=r" (*ebx),
> 		  "=c" (*ecx),
> 		  "=d" (*edx)
> 		: "0" (op));
> }


>@@ -198,17 +218,18 @@
>     CacheType type;
>     unsigned long data;
> #define pageSize size
> #define trcuops size
>     unsigned long size;
>     unsigned long association;
> #define tlbEntries lineSize
>     unsigned long lineSize;
>-} CacheMap[] = {
>+};
>+static const struct _cache CacheMap[256] = {
> /* 00 */ {Cache_NONE,    DATA_NONE,    0,       0,    0   },
> /* 01 */ {Cache_TLB,     DATA_INSTR,   TLB_4k,  4,    32  },
> /* 02 */ {Cache_TLB,     DATA_INSTR,   TLB_4M,  0,    2   },

This table is huge (10KB on AMD64).  Can it be made smaller?
Could any of those be reduced from longs (which are 64b on AMD64) 
to ints or shorts or even chars?


>-/* target.mk can define LINE_SIZE if it's common for the family or OS */
>-#if defined(LINE_SIZE) && !defined(MPI_GET_PROCESSOR_LINE_SIZE_DEFINED)
>+/* target.mk can define MPI_CACHE_LINE_SIZE if it's common for the family or 
>+ * OS */
>+#if defined(MPI_CACHE_LINE_SIZE) && !defined(MPI_GET_PROCESSOR_LINE_SIZE_DEFINED)

target.mk is an mpi makefiles that's not used in freebl builds.
Maybe you want to say "The Makefile can define ..."
>>+#if defined(__x86_64__)
>>+#define AMD_64 1
>>+#endif
>
> Why define a synonym for __x86_64__ ?  
> Why not just use __x86_64__ instead of AMD_64 below?

Basically because I wasn't sure that __x86_64__ was going to be sufficient. If
we run into another compiler with has a different define for 64 bit x86, this
makes adding that define easier.

>> /* sigh GCC isn't smart enough to save the ebx PIC register on it's own
>>  * in this case, so do it by hand. */
>
> Is that because the second output argument is specified below 
> as =r instead of =b ?

Other way around. If the parameter is =b and PIC is turned on GCC complains
because it doesn't know how, or doesn't want to, save the ebx, so we have to
save it by hand, and use another register (which at this point we really don't
care which, thus the =r.

>>+static const struct _cache CacheMap[256] = {
>> /* 00 */ {Cache_NONE,    DATA_NONE,    0,       0,    0   },
>> /* 01 */ {Cache_TLB,     DATA_INSTR,   TLB_4k,  4,    32  },
>> /* 02 */ {Cache_TLB,     DATA_INSTR,   TLB_4M,  0,    2   },
>
> This table is huge (10KB on AMD64).  Can it be made smaller?
> Could any of those be reduced from longs (which are 64b on AMD64) 
> to ints or shorts or even chars?

Yes, this is a generic table. Not only can these vaules be changed to
ints/shorts and chars, several of the fields could be removed. We really only
care about DATA L1, L2, and L3 caches, and only the LineSize, not the actual
cache size.

There are other more intrusive ways to reduce the table size (since the table is
sparse) as well, but just doing the above (and storing them in chars) reduces
the table down to 512 bytes.

bob


(In reply to comment #40)

> Basically because I wasn't sure that __x86_64__ was going to be sufficient. If
> we run into another compiler with has a different define for 64 bit x86, this
> makes adding that define easier.

OK, there already is one other known symbol for AMD64.  __x86_64

> >> /* sigh GCC isn't smart enough to save the ebx PIC register on it's own
> >>  * in this case, so do it by hand. */
> >
> > Is that because the second output argument is specified below 
> > as =r instead of =b ?
> 
> Other way around. If the parameter is =b and PIC is turned on GCC complains
> because it doesn't know how, or doesn't want to, save the ebx, so we have to
> save it by hand, and use another register (which at this point we really don't
> care which, thus the =r.

Is the ebx register not used for this purpose in the 64-bit ABI?
In the AND_64 part of this patch, changing that =r to =b solved numerous 
problems, including the saving/restoring of ebx.  I'm wondering why it 
would do that for 64 bit code and not 32.

> > This table is huge (10KB on AMD64).  Can it be made smaller?
> > Could any of those be reduced from longs (which are 64b on AMD64) 
> > to ints or shorts or even chars?
> 
> Yes, this is a generic table. Not only can these vaules be changed to
> ints/shorts and chars, several of the fields could be removed. We really only
> care about DATA L1, L2, and L3 caches, and only the LineSize, not the actual
> cache size.
> 
> There are other more intrusive ways to reduce the table size (since the 
> table is sparse) as well, but just doing the above (and storing them in 
> chars) reduces the table down to 512 bytes.

Please do that.  We'll have to respin the .s files, but the savings is 
worth the effort.  
>> Other way around. If the parameter is =b and PIC is turned on GCC complains
>> because it doesn't know how, or doesn't want to, save the ebx, so we have to
>> save it by hand, and use another register (which at this point we really don't
>> care which, thus the =r.
>
> Is the ebx register not used for this purpose in the 64-bit ABI?
> In the AND_64 part of this patch, changing that =r to =b solved numerous 
> problems, including the saving/restoring of ebx.  I'm wondering why it 
> would do that for 64 bit code and not 32.

At a guess, it's because PIC is not turned on in 64 bit (I have no idea why it's
on for freebl for 32 bit). The =b code works just find in 32 bit as well if PIC
is turned off. (the original test programs I wrote using cpuid do not use the
nasty 'hand save ebx'. That is why I suggested that as a possible fix for 64 bit
(as a result we wind up with no explicit register references, so we don't have
to figure you if we need to push ebx or push rbx...).

> > > This table is huge (10KB on AMD64).  Can it be made smaller?
> > Yes, this is a generic table. 
> Please do that.  We'll have to respin the .s files, but the savings is 
> worth the effort.

Will do.

Some info on the __xxxcpu and __xxxcpu__ macros:

I've found that __xxxcpu is defined by both the
native (i.e., OS vendor's) compiler and GCC.
In addition, GCC also defines __xxxcpu__.

xxxcpu, without any leading underscores, is considered
to violate some ANSI or ISO C or POSIX requirement,
so compilers generally don't use macros with such
names.  But NSPR and NSS's build systems sometimes
define xxxcpu on the compiler command line (with a
-Dxxxcpu flag).

So between __x86_64 and __x86_64__, I believe that
__x86_64 is better.
Actually, PIC is turned on for 64-bit builds too.  It seems that the difference
between 32-bit and 64-bit builds on x86 is just the number of registers the
compiler is willing to use for temps.  In fact, if you compile the same 64-bit
code that calls cpuid with gcc -O3 instead of gcc -O2, you get compile errors. 
The code works on all platforms as it is now, so I vote that we leave the inline
assembly for cpuid alone.

We should solve the x86_64 define problem as it is already solved in loader.c
and sha_fast.h - test for __x86_64__ OR __x86_64.  Checking both of these
defines justifies the AMD_64 variable currently in mpcpucache.c.

Finally, I'd like to test the performance of the new mpcpucache.c file once we
reduce the size of the cache map array.  Unfortunately, we've seen a lot of very
unpredictable effects due to caching, so I wouldn't be surprised if performance
decreased even though we are removing code.  Personally, I'm not too worried
about the readability of an assembly file, especially the .data section.
Here's the replacement with the smaller array size.
Old array size 4*5*256 or 5k on 32bit platforms, 8*5*256 or 10k on 64bit
platforms to 1*2*256 or 512 bytes on all platforms.
(In reply to comment #35)

> I have not built with the MPI's makefiles, 
> what are the instructions for doing so?

Set an environment variable TARGET to one of these values:
mipsIRIX
alphaOSF1
v9SOLARIS
v8plusSOLARIS
v8SOLARIS
ia64HPUX
ia64HPUX64
PA2.0WHPUX
PA2.0NHPUX
PA1.1HPUX
32AIX
64AIX
x86LINUX
AMD64SOLARIS
(hmm, windows seems to be missing from that list :/  )

Then

cd nss/lib/freebl/mpi
./all-tests 

That shell script builds mpi and runs MPI's test suite against it.
Comment on attachment 197733 [details] [diff] [review]
Add __x86_64 to our defines, reduce the array size 

r=nelson.  Saul, please verify that the big table is now only 512 bytes total
size, and regenerate the .s files for it to be sure.
Attachment #197733 - Flags: review+
(In reply to comment #47)
> (From update of attachment 197733 [details] [diff] [review] [edit])
> r=nelson.  Saul, please verify that the big table is now only 512 bytes total
> size, and regenerate the .s files for it to be sure.
> 

I tested all builds for AMD64 and Sparc 64-bit.  Everything looks good. 
Performance even went up slightly with gcc.  The new .s file has 512 entries in
the cache map, all of which are .byte.
Comment on attachment 197495 [details] [diff] [review]
same as above, minus the assembly; mpcpucache converted from dos format

Bob, the attachment that you've asked me to review is now marked obsolete.  Is
*this* attachment the one I should be reviewing now?
Comment on attachment 197733 [details] [diff] [review]
Add __x86_64 to our defines, reduce the array size 

I've checked in this version of the patch, since it's got a r+, is seperable
(it doesn't depend on other changes, but other changes depend on it). It will
reduce the noise if we have to add to the patch. Saul you can pick it up
directly from CVS.
/cvsroot/mozilla/security/nss/lib/freebl/mpi/mpcpucache.c,v  <--  mpcpucache.c
new revision: 1.3; previous revision: 1.2
done
Comment on attachment 196873 [details] [diff] [review]
replacement full patch. includes fixed mpi-priv.h and mpmontg.c

Go ahead and finish your review on this patch. The essitial stuff that needs to
be reviewed shouldn't have changes (except mpcpucache.c which you've already
r+).

You've already given several comments to me which I've been incorporating, so
it's clear there will have to be a new version of the mpmontg.c code and some
header files.

The other patch is one that saul rolled up his changes in. We should probably
split those out as a separate patch for review. We can check in all the
Makefile and .s files before mpmontg.c

bob
Attachment #196873 - Attachment is obsolete: false
Bob, I am now reviewing your weaving code, and I just noticed this comment:

 * on some platforms character stores into memory is very expensive since they
 * generate a read/modify/write operation on the bus. On those platforms
 * we need to do integer writes to the bus.

To eliminate RMW cycles, on those platforms we need to do writes of the largest
integer size, which is likely to be "long" or ptrdiff_t, not int.  That is,
64-bits, not 32.  I'm not certain, but I think your code is always using 
32-bits for that case.  On AMD64, that won't eliminate any RMWs.  It might 
reduce their number to 1/4 of what they would be with byte writes, but it 
still won't eliminate them.
Attached patch Updated Weave patch. (obsolete) — Splinter Review
The following patch implements my proposals to address nelson's issues.
This patch does not include the mpcpucache code because it is checked in.
It also does not have the AMDX86 and Solaris .s files and the Makefile changes for them since I don't have systems to generate the .s files or to test the result.

The option to turn off safe modexp has been removed based on feedback I have received. As a result mpi-priv.h is the only header that has been modified under all of freebl.

The bulk of the changes are still in mpmontg.c. This patch shows the changes against the tip. I will attach an additional patch for reference which will show the differences between this patch that the previously reviewed patch.

bob
Attachment #196873 - Attachment is obsolete: true
Attachment #202738 - Flags: review?(nelson)
Attachment #196873 - Flags: review?(nelson)
This bug needs to be P1, since we won't ship 3.11 without the fix.
Priority: -- → P1
Comment on attachment 202738 [details] [diff] [review]
Updated Weave patch.

Bob, this patch contains changes to two files in cmd/rsaperf.
The connection between those 2 changes and the weaving code is not apparent.
So, I plan to exclude those two files from my review, unless you tell
me how they're related to weaving.

>Index: cmd/rsaperf/defkey.c
>Index: cmd/rsaperf/rsaperf.c
Ah, yes. They are tangentially related to weaving. The patch creates a static 2k RSA key, which was useful in testing performance. It is not necessary to get these into 3.11.

bob
Attached patch Incorperate Nelson's comments. (obsolete) — Splinter Review
Attachment #202738 - Attachment is obsolete: true
Attachment #203766 - Flags: review?(nelson)
Attachment #202738 - Flags: review?(nelson)
inter patch diff should work on the latest patch.

bob
Comment on attachment 203766 [details] [diff] [review]
Incorperate Nelson's comments.

2 unresolved comment issues (for which I wouldn't hold up this patch),
and one logic change that I think was unintentional.

>+ * Our same logical array p array, m is sizeof(mp_digit),
>+ * N is still count and n is now b_size. If we define p[i].digit[j]0 as the 
>+ * most significant byte of the word p[i].digit[j], p[i].digit[j]1 as 
>+ * the next most significant byte of p[i].digit[j], ...  and p[i].digit[j]m 
>+ * is the least significant byte.our array would look like:
>+ * p[0].digit[0]0   p[1].digit[0]0   ...... p[N-1].digit[0]0   p[N].digit[0]0
>+ * p[0].digit[0]1   p[1].digit[0]1   ...... p[N-1].digit[0]1   p[N].digit[0]1
>+ *                .                                         .
>+ * p[0].digit[0]m   p[1].digit[0]m   ...... p[N-1].digit[0]m   p[N].digit[0]m
>+ * p[0].digit[1]0   p[1].digit[1]0   ...... p[N-1].digit[1]0   p[N].digit[1]0
>+ *                .                                         .
>+ *                .                                         .
>+ * p[0].digit[n]m-1 p[1].digit[n]m-1 ...... p[N-1].digit[n]m-1 p[N].digit[n]m-1
>+ * p[0].digit[n]m   p[1].digit[n]m   ...... p[N-1].digit[n]m   p[N].digit[n]m 

This table has the same problem as the previous one had.  Arrays are shown
going from 0..n and 0..m but should be 0..n-1 (definitely) and 0..m-1 (I think).


>+    } else {
>+      /* up to 8 we can find 2^i-1 in the accum array, but at 8 we our source
>+       * and target are the same so we need to copy.. After that, the
>+       * value is overwritten, so we need to fetch it from the stored
>+       * weave array */
>+      if (i > WEAVE_WORD_SIZE) {

That last line was "if (i > 8) {" in the previous patch, and I believe it should now 
be  "if (i > 2*WEAVE_WORD_SIZE) {" in this patch.  That seems like an unintended 
logic change.  

I won't hold this patch up for any more comment changes.  When you've resolved the 
line above, I'll give it r+.  I think we have to respin the release candidate 
because of problems on HPUX and Linux.  So, there's time to get this fix into 
3.11, I think.
Attachment #203766 - Flags: review?(nelson) → review-
That's for the quick review. on the previous patch nelson..
Attachment #203766 - Attachment is obsolete: true
Attachment #203831 - Flags: review?
Comment on attachment 203831 [details] [diff] [review]
fix missing 2x in front of MP_WEAVE_WORD_SIZE

Sigh. actual mark this as request for review.....
Attachment #203831 - Flags: review? → review?(nelson)
Comment on attachment 203831 [details] [diff] [review]
fix missing 2x in front of MP_WEAVE_WORD_SIZE

This is not the patch!
Attachment #203831 - Attachment is obsolete: true
Attachment #203831 - Flags: review?(nelson)
OK this should be the correct patch
Attachment #203849 - Flags: review?
Comment on attachment 203849 [details] [diff] [review]
fix missing 2x in front of MP_WEAVE_WORD_SIZE

Sigh this one should be correct now..
Attachment #203849 - Flags: review? → review?(nelson)
I carried forward parts of Saul's patch.  Bob, please review.
Attachment #203896 - Flags: review?(rrelyea)
Comment on attachment 203896 [details] [diff] [review]
supplemental patch for .s files for Sun Studio compilers

Initial developer should read: Red Hat, Inc.

other than that r+
Attachment #203896 - Flags: review?(rrelyea) → review+
Comment on attachment 203849 [details] [diff] [review]
fix missing 2x in front of MP_WEAVE_WORD_SIZE

I am giving r+ to a subset of this patch, namely the 4 files changed in nss/lib/freebl/mpi.  
This patch will be suplemented by the following patch.
Attachment #203849 - Flags: review?(nelson) → review+
Checked in "Supplemental Patch"
Checking in Makefile;    new revision: 1.70; previous revision: 1.69
Checking in manifest.mn; new revision: 1.44; previous revision: 1.43
Checking in mpi/mpcpucache_amd64.s; initial revision: 1.1
Checking in mpi/mpcpucache_x86.s;   initial revision: 1.1
In this morning's nightly builds, shlibsign crashed on some 64-bit platforms,
notably AIX and HPUX.  The problem is in the definition of the macro 
MP_ALIGN.  It produces a 32-bit result.  This patch fixes it.

-#define MP_ALIGN(x,y) ((((ptrdiff_t)(x))+((y)-1))&(~((y)-1)))
+#define MP_ALIGN(x,y) ((((ptrdiff_t)(x))+((y)-1))&(((ptrdiff_t)0)-(y)))

The issue is that the type of the expression (~((y)-1)) is the type of y.
If y is an unsigned int, then the expression is an unsigned 32-bit value.
That value then gets converted to a ptrdiff_t, to be anded with the other
expression, but since it is unsigned, the sign bit is not extended, so
the and always produces a result in which only the low order 32 bits 
can be non-zero.

The fix is to change the expression to one that computes a mask of type
ptrdiff_t, rather than a mask of the type of "y".  That's what this patch
does.  
r=relyea for nelson's patch.
NOTE: nelson's patch only works for 2's complement platforms. That's true of all the NSS supported platforms (and most systems), so we should go with it.

We might want put a comment for the poor fool that is tasked to port this to an old Cray...;), though platforms that use 1's complement are often the some platform where byte != 8 bits... I'm pretty sure we would break pretty horribly on the latter anyway.
^the some^the same^
Why don't we use ~((ptrdiff_t)(y)-1)) instead of
((ptrdiff_t)0)-(y))?
In answer to comment 75, because it's one ALU operation instead of two.
We're continuing to test this.  
Christophe, this is the patch you wanted.
Attachment #203993 - Flags: review?(rrelyea)
Comment on attachment 203993 [details] [diff] [review]
Fix bug in MP_ALIGN macro

r=relyea
Attachment #203993 - Flags: review?(rrelyea) → review+
Checking in mpmontg.c;  new revision: 1.17; previous revision: 1.16

Checked in MP_ALIGN fix.  In addition to the shlibsign failures on HPUX and AIX,
we saw a crash in selfserv on AMD64 with RHEL OS.  
What, if anything, remains to be done for this bug?
Was this bug resolved for NSS 3.11 ?
yes, closing it now..
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
While reviewing one of Douglas Stebila's patches to mpi, I made some startling 
discoveries.  I now think that we must not be defining MP_CHAR_STORE_SLOW on 
ANY platforms, because the code simply won't compile if we do.  

Rather than reopening this bug, I will file a new one. 
It appears that the weaving does not take effect for all versions of freebl. For example, the two freebl versions that use floating point on Sparc don't go through the "safe" mp_exptmod_safe_i function . Do we need to improve the floating point code as well WRT cache attacks ?
And are there other freebl versions that don't have this fix ?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: