Closed Bug 270742 Opened 20 years ago Closed 20 years ago

Incorporate AMD64 assembler implementation of arcfour

Categories

(NSS :: Libraries, enhancement, P2)

3.9.3
Other
Solaris
enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nelson, Assigned: julien.pierre)

Details

Attachments

(2 files, 2 obsolete files)

Mr. Marc Bevand <bevand_m (at) epita.fr> has written a hand-optimized assembly code implementation of ARCFOUR for the 64-bit X86 cpus, such as the AMD64. It's FAST. He has graciously agreed to contribute his code to mozilla. With his permission, I have taken his original source, and modified it to use mozilla's "tri-license" boilerplate. I have produced two versions of the source, one that assembles with Gnu's "gas" assembler and one that assembles with Solaris's as assembler, both for AMD64. I have modified the freebl Makefile and arcfour.c to work with the assembler version. I will attach a patch with the new sources and the modifications to the existing sources to this bug.
Attached patch patch v1 (obsolete) — Splinter Review
The original code is described in http://etud.epita.fr/~bevand_m/papers/rc4-amd64.html and is available as a link from that page.
Comment on attachment 166411 [details] [diff] [review] patch v1 I ask the reviewers to particularly focus on the makefile and arcfour.c. I chose to checkin two copies of the assembler sources conforming to the two assemblers' different syntaxes. It would also be possible to use a sed script to derive one source from the other at build time, but that would require larger changes to the makefiles and/or coreconf. If the reviewers think that that alternative approach is preferable, then I invite comment on the best/preferred way to modify the makefiles to handle sources derived at build time. Presumably, the modified sources (derived from the ones checked in) would want to be created in OBJDIR. It appears to me that, presently, coreconf has no rules to build from sources in OBJDIR. Do we want to add them?
Attachment #166411 - Flags: superreview?(wchang0222)
Attachment #166411 - Flags: review?(julien.pierre.bugs)
target = 3.10
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → 3.10
Comment on attachment 166411 [details] [diff] [review] patch v1 The cipher regression tests (bltest) all pass with this patch in place, but each of the SSL3/RC4 ciphersuite tests fails in all.sh. So, I am cancelling the review requests and obsoleting the patch until that is fixed. Sorry. :-(
Attachment #166411 - Attachment is obsolete: true
Attachment #166411 - Flags: superreview?(wchang0222)
Attachment #166411 - Flags: review?(julien.pierre.bugs)
Attached patch patch v2 (obsolete) — Splinter Review
This patch is the same as the previous one for the Makefile and the two new .s files. The modification to arcfour.c has changed slightly to ensure that the correct output length value is returned.
Comment on attachment 166524 [details] [diff] [review] patch v2 Reviewers, please review the changes to Makefile and arcfour.c. As with the previous patch, I am particularly interested in your opinion regarding the use of two separate .s files for the two relevant assemblers, as opposed to one source file and a more complicated Makefile.
Attachment #166524 - Flags: superreview?(wchang0222)
Attachment #166524 - Flags: review?(julien.pierre.bugs)
Comment on attachment 166524 [details] [diff] [review] patch v2 Two separate .s files are fine. 1. Makefile Solaris 10 for AMD64: it's better to just say "Solaris". Instead of ifneq ($(USE_64),1) # Solaris x86 ... else # Solaris 10 for AMD64 ... endif it is clearer to say ifeq ($(USE_64),1) # Solaris for AMD64 ... else # Solaris x86 ... endif >+# AS = gcc -march=opteron -m64 -c >+# AS = as -xarch=amd64 -K PIC We should just remove the two commented-out lines. Why do you use -xarch=generic64 instead of -xarch=amd64? 2. arcfour.c >+#ifdef __amd64 >+#define NSS_ASSEM_ARCFOUR >+#endif We should do this in the Makefile. >+#if defined(AIX) || defined(OSF1) || defined(NSS_ASSEM_ARCFOUR) > /* Treat array variables as longs, not bytes */ > #define USE_LONG > #endif Are you sure this will be true for all assembler implementations of ARCFOUR? >+#if defined(NSS_ASSEM_ARCFOUR) >+ Stype i; >+ Stype j; >+ Stype S[ARCFOUR_STATE_SIZE]; >+#else > Stype S[ARCFOUR_STATE_SIZE]; > PRUint8 i; > PRUint8 j; >+#endif We should just say: > Stype S[ARCFOUR_STATE_SIZE]; >+#if defined(NSS_ASSEM_ARCFOUR) >+ Stype i; >+ Stype j; >+#else > PRUint8 i; > PRUint8 j; >+#endif >+#if !defined(NSS_ASSEM_ARCFOUR) > /* > * Generate the next byte in the stream. > */ ... >+#else >+extern void ARCFOUR(RC4Context *cx, unsigned long inputLen, >+ const unsigned char *input, unsigned char *output); >+#endif /* NSS_ASSEM_ARCFOUR */ I think it is clearer to reorder this and say #if defined(NSS_ASSEM_ARCFOUR) instead. >+#if defined(NSS_ASSEM_ARCFOUR) >+ ARCFOUR(cx, inputLen, input, output); >+ *outputLen = inputLen; >+ return SECSuccess; The indentation of the two "*outputLen = inputLen;" lines seems to be off by 2.
Nelson, The Makefile also needs to be fixed so that this code can build on Linux AMD64 .
Attachment #166524 - Flags: review?(julien.pierre.bugs) → review-
Comment on attachment 166524 [details] [diff] [review] patch v2 removing SR request, since julien gave it r-
Attachment #166524 - Flags: superreview?(wtchang)
Attached patch patch v3Splinter Review
Differences between this patch and the previous one: 1) The previous patch added only one new "feature test macro" to the sources. This patch adds a second one also, to separate changes that are specific to the assembler language portion of this patch from changes that relevant to CPUs with slow byte-write capabilities. 2) This patch supports linux on AMD64. 3) This patch addresses the ifdef ordering issue nd removes unneeded lines. 4) This patch changes the type of the "i" and "j" members of the arc4 context on all platforms to "Stype". This change only affects systems on which Stype is not PRUint8, which presently are only CPUs with slow byte writes. It's a change that should have been made before. I *AM NOT* yet asking for this to be reviewed, because I want to test this patch on a few more CPUs first, to make certian it doesn't introduce any new build issues. (But I won't mind if someone reviews it now anyway :-)
Attachment #166524 - Attachment is obsolete: true
Comment on attachment 169498 [details] [diff] [review] patch v3 All.sh passes on all platforms with this patch, so I am now asking for a review.
Attachment #169498 - Flags: review?(wtchang)
Comment on attachment 169498 [details] [diff] [review] patch v3 Nelson, Please use x86-64 or x86_64 instead of amd64 in as many places as possible. The reason is that Intel has a processor architecture called EM64T that's compatible with AMD64. The x86-64 architecture is more general and covers both AMD and Intel CPUs. In Makefile: >+# Solaris x86 or Solaris 10 for AMD64 Change "Solaris 10 for AMD64" to "Solaris for AMD64". >+else > # Solaris x86 >-ifneq ($(USE_64),1) > DEFINES += -D_X86_ > DEFINES += -DMP_USE_UINT_DIGIT > DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE -DMP_ASSEMBLY_DIV_2DX1D > ASFILES = mpi_i86pc.s > endif The four surviving lines should probably be moved left now that the ifneq is removed? In arcfour.c >-#if defined(AIX) || defined(OSF1) >-/* Treat array variables as longs, not bytes */ >+#if defined(AIX) || defined(OSF1) || defined(__amd64) __amd64 should probably be __x86_64. Since many operating systems other than Solaris and Linux also support x86-64, are you sure this will be correct for any OS running on x86-64? > struct RC4ContextStr > { >+#if defined(NSS_ARCFOUR_IJ_B4_S) >+ Stype i; >+ Stype j; >+ Stype S[ARCFOUR_STATE_SIZE]; >+#else > Stype S[ARCFOUR_STATE_SIZE]; >- PRUint8 i; >- PRUint8 j; >+ Stype i; >+ Stype j; >+#endif > }; Is the ordering of i, j, and S important? >- * Straight RC4 op. No optimization. >+ * Straight ARCFOUR op. No optimization. I know why you changed RC4 to ARCFOUR here. But this change seems gratuitous when our code is still full of identifiers containing rc4 or RC4. >+#if defined(NSS_ASSEM_ARCFOUR) >+ ARCFOUR(cx, inputLen, input, output); >+ *outputLen = inputLen; >+ return SECSuccess; >+#elif defined( CONVERT_TO_WORDS ) The indentation of the third line seems to be off by two spaces. >+#if defined(NSS_ASSEM_ARCFOUR) >+ ARCFOUR(cx, inputLen, input, output); >+ *outputLen = inputLen; >+ return SECSuccess; >+#elif defined( CONVERT_TO_WORDS ) Same as above.
Wan-Teh, thanks for the quick review. Let me answer some of your questions and explain a few inobvious things. 1. In the Makefile: The "four surviving lines" are properly indented. The lines before the "else" line that preceeds these lines probably should be indented another notch to match the lines after the else. 2. About the applicability of these patches to other 64-bit x86 family CPUs besides AMD64. Based on test results reported to me, it appears that the AMD64 CPU does read-modify-write cycles to write to memory any amount other than a properly aligned 64-bit word. This behavior makes the use of intermediate and temporary variables (such as those in the ARCFOUR context) whose size is not 64-bits MUCH less less efficient than if they are extended to 64-bit size. Expanding the size of those variables to 64-bits helps AMD64 CPUs a lot. AFAIK, this behavior (use of RMW memory cycles) is not required for all 64-bit X86 family CPUs. Other such CPUs may be able to store smaller amounts of memory more efficiently than with full word RMW cycles, and so the effects of this code may not benefit other 64-bit x86 CPUs' performance in the same way as it benefits amd64. Consequently, I am reluctant to apply these changes to all 64-bit x86 family CPUs. Also, I have no other 64-bit x86 family CPUs on which to test them. However if someone is willing to test the relative performance of this patch on other x86 CPUs, and demonstrate that it also benefits them, I am willing to change this code to also work for those CPUs. Finally, before changing to __x86_64 I need to be sure that that symbol is defined by the compilers used on Solaris. I know that __amd64 is defined. 3. Regarding the order of the variables I, J, and the array S in the context. These variables are initialized by one function, that is always c code, and are handled in another function that is either c code or assembly code. The AMD64 assembly code was written to assume that I and J come before S, and so the c language definition of this struct must match the assembly code's layout. In my previous patch, the order of these variables was tied to the presense/absence of assembly code. In your review comments for that patch, you asked if all assembly language implementations would likewise assume that i and j come before S. Of course, They would not. So, based on your review comments, I decided to make the c code be able to work with assembly code that uses either layout, and not to tie the order of these variables to the presence/absense of assembly code. 4. regarding indentation issues in c code: Let me suggest that you look at bugzilla's side-by-side diff display of the patches for these. e.g. https://bugzilla.mozilla.org/attachment.cgi?id=169498&action=diff In this patch, I believe the indentation is correct, but this is not obvious in an examination of the diff itself because of the addition of an extra column or two at the beginning of each line of text by cvs diff. This extra column often causes lines that use tabs to appear to be indented differently than lines that use leading spaces, but when the patch is applied and the extra columns (of + and -) are removed, the text appears correctly indented. The bugzilla diff display does a great job of clearing that up.
Here's how I would like to move forward with this bug. If this patch is correct for all existing supported platforms and does not cause build failures, incorrect results, or performance regressions (compared to the unpatched code) for any platform, then I would like to check it in as is (perhaps indenting a few lines in the Makefile by one more notch). If it is determined that other CPUs can and should also take advantage of this new code, then the makefile and/or code ifdefs can *subsequently* be changed to include those other CPUs/compilers also, by people who have those other CPUs and/or compilers with which to test. I do not have those other platforms' CPUs and/or compilers with which to test, and I do not wish to take responsbility for optimal builds and performance on those platforms.
Comment on attachment 169498 [details] [diff] [review] patch v3 In arcfour.c, we have >+#if defined(AIX) || defined(OSF1) || defined(__amd64) >+/* Treat array variables as longs, not bytes, on CPUs that take >+ * much longer to write bytes than to write longs. >+ */ > #define USE_LONG > #endif Is this change correct for the operating systems (such as FreeBSD and NetBSD) that have been ported to AMD64 but aren't using the assembly code in arcfour-amd64-xxx.s? Regarding AMD64 vs. Intel EM64T: the makefile changes aren't distinguishing between these two flavors of x86-64. For example, the Linux makefile code says: ifeq ($(OS_TARGET),Linux) +ifeq ($(CPU_ARCH),x86_64) +# AMD64 This test will evaluate to true for Intel EM64T as well. So, Linux on Intel EM64T will be using arcfour-amd64-gas.s The Solaris makefile code says: ifeq ($(OS_TARGET),SunOS) ... ifeq ($(CPU_ARCH),sparc) ... else + +# Solaris x86 or Solaris 10 for AMD64 ... +ifeq ($(USE_64),1) +# Solaris for AMD64 I am afraid that this test will also evaluate to true for Intel EM64T, but you may not need to worry about that now if Solaris hasn't been ported to Intel EM64T.
Comment on attachment 169498 [details] [diff] [review] patch v3 r=wtc. I am not that sure about the use of __amd64 in arcfour.c. I'll feel more comfortable if you use __x86_64 instead. I believe we will only produce a build for both AMD64 and Intel EM64T (see note below). So, I want to make sure that we won't produce different builds when we compile our code on AMD64 and Intel EM64T. (See the following examples of software vendors using the same binaries for both AMD64 and Intel EM64T. http://www.redhat.com/apps/commerce/rhel/as/#amd http://www.novell.com/products/linuxenterpriseserver/packages.html http://www.sun.com/software/solaris/specs.html#sol10 http://www.microsoft.com/windowsxp/64bit/evaluation/upgrade.mspx ) I don't know if __amd64 will be defined when the compiler is running on Intel EM64T. If __amd64 is not defined when the compiler is running on Intel EM64T, our build output will have the undesirable dependency on the CPU of the build machine. I hope I've made my point clear.
Attachment #169498 - Flags: review?(wtchang) → review+
This patch shows only the changes to Makefile and the arcfour.c files. The assem files are unchanged from previous versions. It attempts to address the requests to minimize differences between x86-64 family CPUs. Note that according to GCC documentation, the only -march option supported for that family is -march=opteron. Wan-Teh, any issues with this patch?
Attachment #173345 - Flags: review?(wtchang)
I should add that the latest patch also addresses indentation issues in the makefile by adding much more indentation for nested ifs. However, when reviewing the patch for indentation and alignment, please use bugzilla's side by side diff feature, because it displays tabs correctly aligned. https://bugzilla.mozilla.org/attachment.cgi?id=173345&action=diff
Comment on attachment 173345 [details] [diff] [review] patch v4 - makefile and .c file only This patch looks good, Nelson. By the way, I didn't mean to make you fix the indentations in the Makefile. Why do we only use floating point ECC code on Linux for x86 or x86-64 and Solaris for SPARC? Why don't we use floating point ECC code on Solaris for x86 or AMD64, or other OS/CPU combinations? Another question: in arcfour.c, you have > struct RC4ContextStr > { >+#if defined(NSS_ARCFOUR_IJ_B4_S) || defined(NSS_BEVAND_ARCFOUR) >+ Stype i; >+ Stype j; >+ Stype S[ARCFOUR_STATE_SIZE]; >+#else > Stype S[ARCFOUR_STATE_SIZE]; >- PRUint8 i; >- PRUint8 j; >+ Stype i; >+ Stype j; >+#endif > }; Do we still need the NSS_ARCFOUR_IJ_B4_S test? That macro is never defined.
Wan-Teh, was comment 19 intended to convey r+? Regarding your two questions in comment 19: a) This patch doesn't change (or isn't intended to change) the use of FP for ECC on any platform. Please ask Vipul about that in any of the ECC bugs that are still open. b) The test for the symbol NSS_ARCFOUR_IJ_B4_S is left as a convenience for future alternative implementations, should they need it. It could be removed, but hurts nothing.
Nelson, This patch does change the use of FP for ECC on one platform: Linux x86-64. It didn't use FP for ECC before. Is this change intentional? It is better to use NSS_ARCFOUR_IJ_B4_S this way: /* * Alternatively, we can define this macro * in the Makefile. */ #ifdef NSS_BEVAND_ARCFOUR #define NSS_ARCFOUR_IJ_B4_S 1 #endif struct RC4ContextStr { #if defined(NSS_ARCFOUR_IJ_B4_S) Stype i; Stype j; Stype S[ARCFOUR_STATE_SIZE]; #else Stype S[ARCFOUR_STATE_SIZE]; Stype i; Stype j; #endif };
Comment on attachment 173345 [details] [diff] [review] patch v4 - makefile and .c file only r=wtc. (Those two are minor issues.)
Attachment #173345 - Flags: review?(wtchang) → review+
Taking bug .
Assignee: nelson → julien.pierre.bugs
Status: ASSIGNED → NEW
I have checked in Nelson's patch with the only exception that I removed the USE_FP=1 line from the Linux x86_64 section, in order to preserve the existing behavior of not building these files in that case. Whether that was correct or not can be addressed in a separate ECC bug . Checking in Makefile; /cvsroot/mozilla/security/nss/lib/freebl/Makefile,v <-- Makefile new revision: 1.59; previous revision: 1.58 done Checking in arcfour.c; /cvsroot/mozilla/security/nss/lib/freebl/arcfour.c,v <-- arcfour.c new revision: 1.15; previous revision: 1.14 done RCS file: /cvsroot/mozilla/security/nss/lib/freebl/arcfour-amd64-gas.s,v done Checking in arcfour-amd64-gas.s; /cvsroot/mozilla/security/nss/lib/freebl/arcfour-amd64-gas.s,v <-- arcfour-amd64-gas.s initial revision: 1.1 done RCS file: /cvsroot/mozilla/security/nss/lib/freebl/arcfour-amd64-sun.s,v done Checking in arcfour-amd64-sun.s; /cvsroot/mozilla/security/nss/lib/freebl/arcfour-amd64-sun.s,v <-- arcfour-amd64-sun.s initial revision: 1.1 done
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: