Incorporate AMD64 assembler implementation of arcfour

RESOLVED FIXED in 3.10

Status

NSS
Libraries
P2
enhancement
RESOLVED FIXED
13 years ago
12 years ago

People

(Reporter: Nelson Bolyard (seldom reads bugmail), Assigned: Julien Pierre)

Tracking

3.9.3
3.10
Other
Solaris

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 2 obsolete attachments)

Mr. Marc Bevand <bevand_m (at) epita.fr> has written a hand-optimized 
assembly code implementation of ARCFOUR for the 64-bit X86 cpus, such 
as the AMD64.  It's FAST.  He has graciously agreed to contribute his
code to mozilla.  

With his permission, I have taken his original source, and modified it
to use mozilla's "tri-license" boilerplate.  I have produced two versions
of the source, one that assembles with Gnu's "gas" assembler and one that
assembles with Solaris's as assembler, both for AMD64.  I have modified
the freebl Makefile and arcfour.c to work with the assembler version.  

I will attach a patch with the new sources and the modifications to the 
existing sources to this bug.
(Reporter)

Comment 1

13 years ago
Created attachment 166411 [details] [diff] [review]
patch v1

The original code is described in 
http://etud.epita.fr/~bevand_m/papers/rc4-amd64.html 
and is available as a link from that page.
(Reporter)

Comment 2

13 years ago
Comment on attachment 166411 [details] [diff] [review]
patch v1

I ask the reviewers to particularly focus on the makefile and arcfour.c.  

I chose to checkin two copies of the assembler sources conforming to the
two assemblers' different syntaxes.  It would also be possible to use a 
sed script to derive one source from the other at build time, but that 
would require larger changes to the makefiles and/or coreconf.	

If the reviewers think that that alternative approach is preferable, 
then I invite comment on the best/preferred way to modify the makefiles
to handle sources derived at build time.  Presumably, the modified 
sources (derived from the ones checked in) would want to be created in
OBJDIR.  It appears to me that, presently, coreconf has no rules to 
build from sources in OBJDIR.  Do we want to add them?
Attachment #166411 - Flags: superreview?(wchang0222)
Attachment #166411 - Flags: review?(julien.pierre.bugs)
(Reporter)

Comment 3

13 years ago
target = 3.10
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → 3.10
(Reporter)

Comment 4

13 years ago
Comment on attachment 166411 [details] [diff] [review]
patch v1

The cipher regression tests (bltest) all pass with this patch in place, 
but each of the SSL3/RC4 ciphersuite tests fails in all.sh.
So, I am cancelling the review requests and obsoleting the patch 
until that is fixed.  Sorry. :-(
Attachment #166411 - Attachment is obsolete: true
Attachment #166411 - Flags: superreview?(wchang0222)
Attachment #166411 - Flags: review?(julien.pierre.bugs)
(Reporter)

Comment 5

13 years ago
Created attachment 166524 [details] [diff] [review]
patch v2

This patch is the same as the previous one for the Makefile and the 
two new .s files.  The modification to arcfour.c has changed slightly
to ensure that the correct output length value is returned.
(Reporter)

Comment 6

13 years ago
Comment on attachment 166524 [details] [diff] [review]
patch v2

Reviewers, please review the changes to Makefile and arcfour.c.  As with the
previous patch, I am particularly interested in your opinion regarding the use
of two separate .s files for the two relevant assemblers, as opposed to one
source file and a more complicated Makefile.
Attachment #166524 - Flags: superreview?(wchang0222)
Attachment #166524 - Flags: review?(julien.pierre.bugs)

Comment 7

13 years ago
Comment on attachment 166524 [details] [diff] [review]
patch v2

Two separate .s files are fine.

1. Makefile

Solaris 10 for AMD64: it's better to just
say "Solaris".

Instead of

 ifneq ($(USE_64),1)
 # Solaris x86
 ...
 else
 # Solaris 10 for AMD64
 ...
 endif

it is clearer to say

 ifeq ($(USE_64),1)
 # Solaris for AMD64
 ...
 else
  # Solaris x86
 ...
 endif

>+#   AS       = gcc -march=opteron -m64 -c
>+#   AS       = as -xarch=amd64 -K PIC 

We should just remove the two commented-out lines.

Why do you use -xarch=generic64 instead of -xarch=amd64?

2. arcfour.c

>+#ifdef __amd64
>+#define NSS_ASSEM_ARCFOUR
>+#endif

We should do this in the Makefile.

>+#if defined(AIX) || defined(OSF1) || defined(NSS_ASSEM_ARCFOUR)
> /* Treat array variables as longs, not bytes */
> #define USE_LONG
> #endif

Are you sure this will be true for all assembler
implementations of ARCFOUR?

>+#if defined(NSS_ASSEM_ARCFOUR)
>+	Stype i;
>+	Stype j;
>+	Stype S[ARCFOUR_STATE_SIZE];
>+#else
> 	Stype S[ARCFOUR_STATE_SIZE];
> 	PRUint8 i;
> 	PRUint8 j;
>+#endif

We should just say:

> 	Stype S[ARCFOUR_STATE_SIZE];
>+#if defined(NSS_ASSEM_ARCFOUR)
>+	Stype i;
>+	Stype j;
>+#else
> 	PRUint8 i;
> 	PRUint8 j;
>+#endif

>+#if !defined(NSS_ASSEM_ARCFOUR)
> /*
>  * Generate the next byte in the stream.
>  */
...
>+#else
>+extern void ARCFOUR(RC4Context *cx, unsigned long inputLen, 
>+	const unsigned char *input, unsigned char *output);
>+#endif /* NSS_ASSEM_ARCFOUR */

I think it is clearer to reorder this and say
#if defined(NSS_ASSEM_ARCFOUR) instead.

>+#if defined(NSS_ASSEM_ARCFOUR)
>+	ARCFOUR(cx, inputLen, input, output);
>+        *outputLen = inputLen;
>+	return SECSuccess;

The indentation of the two "*outputLen = inputLen;"
lines seems to be off by 2.
(Assignee)

Comment 8

13 years ago
Nelson,

The Makefile also needs to be fixed so that this code can build on Linux AMD64 .
(Assignee)

Updated

13 years ago
Attachment #166524 - Flags: review?(julien.pierre.bugs) → review-
(Reporter)

Comment 9

13 years ago
Comment on attachment 166524 [details] [diff] [review]
patch v2

removing SR request, since julien gave it r-
Attachment #166524 - Flags: superreview?(wtchang)
(Reporter)

Comment 10

13 years ago
Created attachment 169498 [details] [diff] [review]
patch v3

Differences between this patch and the previous one:
1) The previous patch added only one new "feature test macro" to the sources.
   This patch adds a second one also, to separate changes that are specific to
   the assembler language portion of this patch from changes that relevant to
   CPUs with slow byte-write capabilities.  
2) This patch supports linux on AMD64.
3) This patch addresses the ifdef ordering issue nd removes unneeded lines.
4) This patch changes the type of the "i" and "j" members of the arc4 context
   on all platforms to "Stype".  This change only affects systems on which 
   Stype is not PRUint8, which presently are only CPUs with slow byte writes.
   It's a change that should have been made before.

I *AM NOT* yet asking for this to be reviewed, because I want to test this 
patch on a few more CPUs first, to make certian it doesn't introduce any
new build issues.  (But I won't mind if someone reviews it now anyway :-)
Attachment #166524 - Attachment is obsolete: true
(Reporter)

Comment 11

13 years ago
Comment on attachment 169498 [details] [diff] [review]
patch v3

All.sh passes on all platforms with this patch, so I am now asking for a
review.
Attachment #169498 - Flags: review?(wtchang)

Comment 12

13 years ago
Comment on attachment 169498 [details] [diff] [review]
patch v3

Nelson,

Please use x86-64 or x86_64 instead of amd64
in as many places as possible.

The reason is that Intel has a processor
architecture called EM64T that's compatible
with AMD64.  The x86-64 architecture is
more general and covers both AMD and Intel
CPUs.

In Makefile:

>+# Solaris x86 or Solaris 10 for AMD64

Change "Solaris 10 for AMD64" to "Solaris for AMD64".

>+else
> # Solaris x86
>-ifneq ($(USE_64),1)
>     DEFINES += -D_X86_
>     DEFINES += -DMP_USE_UINT_DIGIT
>     DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE -DMP_ASSEMBLY_DIV_2DX1D
>     ASFILES  = mpi_i86pc.s
> endif

The four surviving lines should probably be
moved left now that the ifneq is removed?

In arcfour.c

>-#if defined(AIX) || defined(OSF1)
>-/* Treat array variables as longs, not bytes */
>+#if defined(AIX) || defined(OSF1) || defined(__amd64)

__amd64 should probably be __x86_64.

Since many operating systems other than Solaris
and Linux also support x86-64, are you sure this
will be correct for any OS running on x86-64?

> struct RC4ContextStr
> {
>+#if defined(NSS_ARCFOUR_IJ_B4_S)
>+	Stype i;
>+	Stype j;
>+	Stype S[ARCFOUR_STATE_SIZE];
>+#else
> 	Stype S[ARCFOUR_STATE_SIZE];
>-	PRUint8 i;
>-	PRUint8 j;
>+	Stype i;
>+	Stype j;
>+#endif
> };

Is the ordering of i, j, and S important?

>- * Straight RC4 op.  No optimization.
>+ * Straight ARCFOUR op.  No optimization.

I know why you changed RC4 to ARCFOUR here.
But this change seems gratuitous when our
code is still full of identifiers containing
rc4 or RC4.

>+#if defined(NSS_ASSEM_ARCFOUR)
>+	ARCFOUR(cx, inputLen, input, output);
>+        *outputLen = inputLen;
>+	return SECSuccess;
>+#elif defined( CONVERT_TO_WORDS )

The indentation of the third line seems to be off
by two spaces.

>+#if defined(NSS_ASSEM_ARCFOUR)
>+	ARCFOUR(cx, inputLen, input, output);
>+        *outputLen = inputLen;
>+	return SECSuccess;
>+#elif defined( CONVERT_TO_WORDS )

Same as above.
(Reporter)

Comment 13

13 years ago
Wan-Teh, thanks for the quick review.  Let me answer some of your questions
and explain a few inobvious things.  

1. In the Makefile:  The "four surviving lines" are properly indented.  
The lines before the "else" line that preceeds these lines probably should
be indented another notch to match the lines after the else.

2. About the applicability of these patches to other 64-bit x86 family CPUs 
besides AMD64.  Based on test results reported to me, it appears that the 
AMD64 CPU does read-modify-write cycles to write to memory any amount other 
than a properly aligned 64-bit word.  This behavior makes the use of 
intermediate and temporary variables (such as those in the ARCFOUR context) 
whose size is not 64-bits MUCH less less efficient than if they are extended
to 64-bit size.  Expanding the size of those variables to 64-bits helps
AMD64 CPUs a lot.  AFAIK, this behavior (use of RMW memory cycles) is not
required for all 64-bit X86 family CPUs.  Other such CPUs may be able to 
store smaller amounts of memory more efficiently than with full word RMW
cycles, and so the effects of this code may not benefit other 64-bit x86 
CPUs' performance in the same way as it benefits amd64.  

Consequently, I am reluctant to apply these changes to all 64-bit x86 family
CPUs.  Also, I have no other 64-bit x86 family CPUs on which to test them.
However if someone is willing to test the relative performance of this 
patch on other x86 CPUs, and demonstrate that it also benefits them,
I am willing to change this code to also work for those CPUs.

Finally, before changing to __x86_64 I need to be sure that that symbol is 
defined by the compilers used on Solaris.  I know that __amd64 is defined.


3. Regarding the order of the variables I, J, and the array S in the 
context.  These variables are initialized by one function, that is always
c code, and are handled in another function that is either c code or 
assembly code.  The AMD64 assembly code was written to assume that I and J 
come before S, and so the c language definition of this struct must match
the assembly code's layout.  In my previous patch, the order of these 
variables was tied to the presense/absence of assembly code.  In your review
comments for that patch, you asked if all assembly language implementations
would likewise assume that i and j come before S.  Of course, They would not.  
So, based on your review comments, I decided to make the c code be able to 
work with assembly code that uses either layout, and not to tie the order of
these variables to the presence/absense of assembly code.

4. regarding indentation issues in c code:  Let me suggest that you look at
bugzilla's side-by-side diff display of the patches for these.  e.g. 
https://bugzilla.mozilla.org/attachment.cgi?id=169498&action=diff

In this patch, I believe the indentation is correct, but this is not obvious 
in an examination of the diff itself because of the addition of an extra 
column or two at the beginning of each line of text by cvs diff.  This extra 
column often causes lines that use tabs to appear to be indented differently
than lines that use leading spaces, but when the patch is applied and the 
extra columns (of + and -) are removed, the text appears correctly indented.
The bugzilla diff display does a great job of clearing that up.
(Reporter)

Comment 14

13 years ago
Here's how I would like to move forward with this bug.  

If this patch is correct for all existing supported platforms and does 
not cause build failures, incorrect results, or performance regressions 
(compared to the unpatched code) for any platform, then I would like to 
check it in as is (perhaps indenting a few lines in the Makefile by one 
more notch).  

If it is determined that other CPUs can and should also take advantage of 
this new code, then the makefile and/or code ifdefs can *subsequently* be
changed to include those other CPUs/compilers also, by people who have 
those other CPUs and/or compilers with which to test.

I do not have those other platforms' CPUs and/or compilers with which to 
test, and I do not wish to take responsbility for optimal builds and 
performance on those platforms. 

Comment 15

13 years ago
Comment on attachment 169498 [details] [diff] [review]
patch v3

In arcfour.c, we have

>+#if defined(AIX) || defined(OSF1) || defined(__amd64)
>+/* Treat array variables as longs, not bytes, on CPUs that take 
>+ * much longer to write bytes than to write longs. 
>+ */
> #define USE_LONG
> #endif

Is this change correct for the operating
systems (such as FreeBSD and NetBSD) that
have been ported to AMD64 but aren't using
the assembly code in arcfour-amd64-xxx.s?

Regarding AMD64 vs. Intel EM64T: the makefile changes
aren't distinguishing between these two flavors of
x86-64.  For example, the Linux makefile code says:

 ifeq ($(OS_TARGET),Linux)
+ifeq ($(CPU_ARCH),x86_64)
+# AMD64

This test will evaluate to true for Intel EM64T as
well.  So, Linux on Intel EM64T will be using
arcfour-amd64-gas.s

The Solaris makefile code says:

 ifeq ($(OS_TARGET),SunOS)
...
 ifeq ($(CPU_ARCH),sparc)
...
 else
+
+# Solaris x86 or Solaris 10 for AMD64
...
+ifeq ($(USE_64),1)
+# Solaris for AMD64

I am afraid that this test will also evaluate to
true for Intel EM64T, but you may not need to
worry about that now if Solaris hasn't been ported
to Intel EM64T.

Comment 16

13 years ago
Comment on attachment 169498 [details] [diff] [review]
patch v3

r=wtc.

I am not that sure about the use of __amd64
in arcfour.c.  I'll feel more comfortable if
you use __x86_64 instead.

I believe we will only produce a build for both
AMD64 and Intel EM64T (see note below).  So, I
want to make sure that we won't produce different
builds when we compile our code on AMD64 and
Intel EM64T.

(See the following examples of software vendors
using the same binaries for both AMD64 and Intel EM64T.
http://www.redhat.com/apps/commerce/rhel/as/#amd
http://www.novell.com/products/linuxenterpriseserver/packages.html
http://www.sun.com/software/solaris/specs.html#sol10
http://www.microsoft.com/windowsxp/64bit/evaluation/upgrade.mspx
)

I don't know if __amd64 will be defined when
the compiler is running on Intel EM64T.  If
__amd64 is not defined when the compiler is
running on Intel EM64T, our build output will
have the undesirable dependency on the CPU of
the build machine.

I hope I've made my point clear.
Attachment #169498 - Flags: review?(wtchang) → review+
(Reporter)

Comment 17

13 years ago
Created attachment 173345 [details] [diff] [review]
patch v4 - makefile and .c file only

This patch shows only the changes to Makefile and the arcfour.c files.
The assem files are unchanged from previous versions.

It attempts to address the requests to minimize differences between 
x86-64 family CPUs.  Note that according to GCC documentation, the 
only -march option supported for that family is -march=opteron.  
Wan-Teh, any issues with this patch?
Attachment #173345 - Flags: review?(wtchang)
(Reporter)

Comment 18

13 years ago
I should add that the latest patch also addresses indentation issues in 
the makefile by adding much more indentation for nested ifs.  
However, when reviewing the patch for indentation and alignment, please
use bugzilla's side by side diff feature, because it displays tabs correctly
aligned.  https://bugzilla.mozilla.org/attachment.cgi?id=173345&action=diff

Comment 19

13 years ago
Comment on attachment 173345 [details] [diff] [review]
patch v4 - makefile and .c file only

This patch looks good, Nelson.	By the way, I
didn't mean to make you fix the indentations
in the Makefile.

Why do we only use floating point ECC code on
Linux for x86 or x86-64 and Solaris for SPARC?
Why don't we use floating point ECC code on
Solaris for x86 or AMD64, or other OS/CPU
combinations?

Another question: in arcfour.c, you have

> struct RC4ContextStr
> {
>+#if defined(NSS_ARCFOUR_IJ_B4_S) || defined(NSS_BEVAND_ARCFOUR)
>+	Stype i;
>+	Stype j;
>+	Stype S[ARCFOUR_STATE_SIZE];
>+#else
> 	Stype S[ARCFOUR_STATE_SIZE];
>-	PRUint8 i;
>-	PRUint8 j;
>+	Stype i;
>+	Stype j;
>+#endif
> };

Do we still need the NSS_ARCFOUR_IJ_B4_S test?
That macro is never defined.
(Reporter)

Comment 20

12 years ago
Wan-Teh, was comment 19 intended to convey r+? 

Regarding your two questions in comment 19:
a) This patch doesn't change (or isn't intended to change)  
the use of FP for ECC on any platform.  Please ask Vipul
about that in any of the ECC bugs that are still open.

b) The test for the symbol NSS_ARCFOUR_IJ_B4_S is left as a 
convenience for future alternative implementations, should they
need it.  It could be removed, but hurts nothing.

Comment 21

12 years ago
Nelson,

This patch does change the use of FP for
ECC on one platform: Linux x86-64.  It
didn't use FP for ECC before.  Is this
change intentional?

It is better to use NSS_ARCFOUR_IJ_B4_S
this way:

  /*
   * Alternatively, we can define this macro
   * in the Makefile.
   */
  #ifdef NSS_BEVAND_ARCFOUR
  #define NSS_ARCFOUR_IJ_B4_S 1
  #endif

  struct RC4ContextStr
  {
  #if defined(NSS_ARCFOUR_IJ_B4_S)
          Stype i;
          Stype j;
          Stype S[ARCFOUR_STATE_SIZE];
  #else
          Stype S[ARCFOUR_STATE_SIZE];
          Stype i;
          Stype j;
  #endif
  };

Comment 22

12 years ago
Comment on attachment 173345 [details] [diff] [review]
patch v4 - makefile and .c file only

r=wtc.	(Those two are minor issues.)
Attachment #173345 - Flags: review?(wtchang) → review+
(Assignee)

Comment 23

12 years ago
Taking bug .
Assignee: nelson → julien.pierre.bugs
Status: ASSIGNED → NEW
(Assignee)

Comment 24

12 years ago
I have checked in Nelson's patch with the only exception that I removed the
USE_FP=1 line from the Linux x86_64 section, in order to preserve the existing
behavior of not building these files in that case. Whether that was correct or
not can be addressed in a separate ECC bug .

Checking in Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v  <--  Makefile
new revision: 1.59; previous revision: 1.58
done
Checking in arcfour.c;
/cvsroot/mozilla/security/nss/lib/freebl/arcfour.c,v  <--  arcfour.c
new revision: 1.15; previous revision: 1.14
done
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/arcfour-amd64-gas.s,v
done
Checking in arcfour-amd64-gas.s;
/cvsroot/mozilla/security/nss/lib/freebl/arcfour-amd64-gas.s,v  <-- 
arcfour-amd64-gas.s
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/arcfour-amd64-sun.s,v
done
Checking in arcfour-amd64-sun.s;
/cvsroot/mozilla/security/nss/lib/freebl/arcfour-amd64-sun.s,v  <-- 
arcfour-amd64-sun.s
initial revision: 1.1
done
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.