Closed
Bug 270742
Opened 20 years ago
Closed 20 years ago
Incorporate AMD64 assembler implementation of arcfour
Categories
(NSS :: Libraries, enhancement, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
3.10
People
(Reporter: nelson, Assigned: julien.pierre)
Details
Attachments
(2 files, 2 obsolete files)
14.46 KB,
patch
|
wtc
:
review+
|
Details | Diff | Splinter Review |
10.50 KB,
patch
|
wtc
:
review+
|
Details | Diff | Splinter Review |
Mr. Marc Bevand <bevand_m (at) epita.fr> has written a hand-optimized
assembly code implementation of ARCFOUR for the 64-bit X86 cpus, such
as the AMD64. It's FAST. He has graciously agreed to contribute his
code to mozilla.
With his permission, I have taken his original source, and modified it
to use mozilla's "tri-license" boilerplate. I have produced two versions
of the source, one that assembles with Gnu's "gas" assembler and one that
assembles with Solaris's as assembler, both for AMD64. I have modified
the freebl Makefile and arcfour.c to work with the assembler version.
I will attach a patch with the new sources and the modifications to the
existing sources to this bug.
Reporter | ||
Comment 1•20 years ago
|
||
The original code is described in
http://etud.epita.fr/~bevand_m/papers/rc4-amd64.html
and is available as a link from that page.
Reporter | ||
Comment 2•20 years ago
|
||
Comment on attachment 166411 [details] [diff] [review]
patch v1
I ask the reviewers to particularly focus on the makefile and arcfour.c.
I chose to checkin two copies of the assembler sources conforming to the
two assemblers' different syntaxes. It would also be possible to use a
sed script to derive one source from the other at build time, but that
would require larger changes to the makefiles and/or coreconf.
If the reviewers think that that alternative approach is preferable,
then I invite comment on the best/preferred way to modify the makefiles
to handle sources derived at build time. Presumably, the modified
sources (derived from the ones checked in) would want to be created in
OBJDIR. It appears to me that, presently, coreconf has no rules to
build from sources in OBJDIR. Do we want to add them?
Attachment #166411 -
Flags: superreview?(wchang0222)
Attachment #166411 -
Flags: review?(julien.pierre.bugs)
Reporter | ||
Comment 3•20 years ago
|
||
target = 3.10
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → 3.10
Reporter | ||
Comment 4•20 years ago
|
||
Comment on attachment 166411 [details] [diff] [review]
patch v1
The cipher regression tests (bltest) all pass with this patch in place,
but each of the SSL3/RC4 ciphersuite tests fails in all.sh.
So, I am cancelling the review requests and obsoleting the patch
until that is fixed. Sorry. :-(
Attachment #166411 -
Attachment is obsolete: true
Attachment #166411 -
Flags: superreview?(wchang0222)
Attachment #166411 -
Flags: review?(julien.pierre.bugs)
Reporter | ||
Comment 5•20 years ago
|
||
This patch is the same as the previous one for the Makefile and the
two new .s files. The modification to arcfour.c has changed slightly
to ensure that the correct output length value is returned.
Reporter | ||
Comment 6•20 years ago
|
||
Comment on attachment 166524 [details] [diff] [review]
patch v2
Reviewers, please review the changes to Makefile and arcfour.c. As with the
previous patch, I am particularly interested in your opinion regarding the use
of two separate .s files for the two relevant assemblers, as opposed to one
source file and a more complicated Makefile.
Attachment #166524 -
Flags: superreview?(wchang0222)
Attachment #166524 -
Flags: review?(julien.pierre.bugs)
Comment 7•20 years ago
|
||
Comment on attachment 166524 [details] [diff] [review]
patch v2
Two separate .s files are fine.
1. Makefile
Solaris 10 for AMD64: it's better to just
say "Solaris".
Instead of
ifneq ($(USE_64),1)
# Solaris x86
...
else
# Solaris 10 for AMD64
...
endif
it is clearer to say
ifeq ($(USE_64),1)
# Solaris for AMD64
...
else
# Solaris x86
...
endif
>+# AS = gcc -march=opteron -m64 -c
>+# AS = as -xarch=amd64 -K PIC
We should just remove the two commented-out lines.
Why do you use -xarch=generic64 instead of -xarch=amd64?
2. arcfour.c
>+#ifdef __amd64
>+#define NSS_ASSEM_ARCFOUR
>+#endif
We should do this in the Makefile.
>+#if defined(AIX) || defined(OSF1) || defined(NSS_ASSEM_ARCFOUR)
> /* Treat array variables as longs, not bytes */
> #define USE_LONG
> #endif
Are you sure this will be true for all assembler
implementations of ARCFOUR?
>+#if defined(NSS_ASSEM_ARCFOUR)
>+ Stype i;
>+ Stype j;
>+ Stype S[ARCFOUR_STATE_SIZE];
>+#else
> Stype S[ARCFOUR_STATE_SIZE];
> PRUint8 i;
> PRUint8 j;
>+#endif
We should just say:
> Stype S[ARCFOUR_STATE_SIZE];
>+#if defined(NSS_ASSEM_ARCFOUR)
>+ Stype i;
>+ Stype j;
>+#else
> PRUint8 i;
> PRUint8 j;
>+#endif
>+#if !defined(NSS_ASSEM_ARCFOUR)
> /*
> * Generate the next byte in the stream.
> */
...
>+#else
>+extern void ARCFOUR(RC4Context *cx, unsigned long inputLen,
>+ const unsigned char *input, unsigned char *output);
>+#endif /* NSS_ASSEM_ARCFOUR */
I think it is clearer to reorder this and say
#if defined(NSS_ASSEM_ARCFOUR) instead.
>+#if defined(NSS_ASSEM_ARCFOUR)
>+ ARCFOUR(cx, inputLen, input, output);
>+ *outputLen = inputLen;
>+ return SECSuccess;
The indentation of the two "*outputLen = inputLen;"
lines seems to be off by 2.
Assignee | ||
Comment 8•20 years ago
|
||
Nelson,
The Makefile also needs to be fixed so that this code can build on Linux AMD64 .
Assignee | ||
Updated•20 years ago
|
Attachment #166524 -
Flags: review?(julien.pierre.bugs) → review-
Reporter | ||
Comment 9•20 years ago
|
||
Comment on attachment 166524 [details] [diff] [review]
patch v2
removing SR request, since julien gave it r-
Attachment #166524 -
Flags: superreview?(wtchang)
Reporter | ||
Comment 10•20 years ago
|
||
Differences between this patch and the previous one:
1) The previous patch added only one new "feature test macro" to the sources.
This patch adds a second one also, to separate changes that are specific to
the assembler language portion of this patch from changes that relevant to
CPUs with slow byte-write capabilities.
2) This patch supports linux on AMD64.
3) This patch addresses the ifdef ordering issue nd removes unneeded lines.
4) This patch changes the type of the "i" and "j" members of the arc4 context
on all platforms to "Stype". This change only affects systems on which
Stype is not PRUint8, which presently are only CPUs with slow byte writes.
It's a change that should have been made before.
I *AM NOT* yet asking for this to be reviewed, because I want to test this
patch on a few more CPUs first, to make certian it doesn't introduce any
new build issues. (But I won't mind if someone reviews it now anyway :-)
Attachment #166524 -
Attachment is obsolete: true
Reporter | ||
Comment 11•20 years ago
|
||
Comment on attachment 169498 [details] [diff] [review]
patch v3
All.sh passes on all platforms with this patch, so I am now asking for a
review.
Attachment #169498 -
Flags: review?(wtchang)
Comment 12•20 years ago
|
||
Comment on attachment 169498 [details] [diff] [review]
patch v3
Nelson,
Please use x86-64 or x86_64 instead of amd64
in as many places as possible.
The reason is that Intel has a processor
architecture called EM64T that's compatible
with AMD64. The x86-64 architecture is
more general and covers both AMD and Intel
CPUs.
In Makefile:
>+# Solaris x86 or Solaris 10 for AMD64
Change "Solaris 10 for AMD64" to "Solaris for AMD64".
>+else
> # Solaris x86
>-ifneq ($(USE_64),1)
> DEFINES += -D_X86_
> DEFINES += -DMP_USE_UINT_DIGIT
> DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE -DMP_ASSEMBLY_DIV_2DX1D
> ASFILES = mpi_i86pc.s
> endif
The four surviving lines should probably be
moved left now that the ifneq is removed?
In arcfour.c
>-#if defined(AIX) || defined(OSF1)
>-/* Treat array variables as longs, not bytes */
>+#if defined(AIX) || defined(OSF1) || defined(__amd64)
__amd64 should probably be __x86_64.
Since many operating systems other than Solaris
and Linux also support x86-64, are you sure this
will be correct for any OS running on x86-64?
> struct RC4ContextStr
> {
>+#if defined(NSS_ARCFOUR_IJ_B4_S)
>+ Stype i;
>+ Stype j;
>+ Stype S[ARCFOUR_STATE_SIZE];
>+#else
> Stype S[ARCFOUR_STATE_SIZE];
>- PRUint8 i;
>- PRUint8 j;
>+ Stype i;
>+ Stype j;
>+#endif
> };
Is the ordering of i, j, and S important?
>- * Straight RC4 op. No optimization.
>+ * Straight ARCFOUR op. No optimization.
I know why you changed RC4 to ARCFOUR here.
But this change seems gratuitous when our
code is still full of identifiers containing
rc4 or RC4.
>+#if defined(NSS_ASSEM_ARCFOUR)
>+ ARCFOUR(cx, inputLen, input, output);
>+ *outputLen = inputLen;
>+ return SECSuccess;
>+#elif defined( CONVERT_TO_WORDS )
The indentation of the third line seems to be off
by two spaces.
>+#if defined(NSS_ASSEM_ARCFOUR)
>+ ARCFOUR(cx, inputLen, input, output);
>+ *outputLen = inputLen;
>+ return SECSuccess;
>+#elif defined( CONVERT_TO_WORDS )
Same as above.
Reporter | ||
Comment 13•20 years ago
|
||
Wan-Teh, thanks for the quick review. Let me answer some of your questions
and explain a few inobvious things.
1. In the Makefile: The "four surviving lines" are properly indented.
The lines before the "else" line that preceeds these lines probably should
be indented another notch to match the lines after the else.
2. About the applicability of these patches to other 64-bit x86 family CPUs
besides AMD64. Based on test results reported to me, it appears that the
AMD64 CPU does read-modify-write cycles to write to memory any amount other
than a properly aligned 64-bit word. This behavior makes the use of
intermediate and temporary variables (such as those in the ARCFOUR context)
whose size is not 64-bits MUCH less less efficient than if they are extended
to 64-bit size. Expanding the size of those variables to 64-bits helps
AMD64 CPUs a lot. AFAIK, this behavior (use of RMW memory cycles) is not
required for all 64-bit X86 family CPUs. Other such CPUs may be able to
store smaller amounts of memory more efficiently than with full word RMW
cycles, and so the effects of this code may not benefit other 64-bit x86
CPUs' performance in the same way as it benefits amd64.
Consequently, I am reluctant to apply these changes to all 64-bit x86 family
CPUs. Also, I have no other 64-bit x86 family CPUs on which to test them.
However if someone is willing to test the relative performance of this
patch on other x86 CPUs, and demonstrate that it also benefits them,
I am willing to change this code to also work for those CPUs.
Finally, before changing to __x86_64 I need to be sure that that symbol is
defined by the compilers used on Solaris. I know that __amd64 is defined.
3. Regarding the order of the variables I, J, and the array S in the
context. These variables are initialized by one function, that is always
c code, and are handled in another function that is either c code or
assembly code. The AMD64 assembly code was written to assume that I and J
come before S, and so the c language definition of this struct must match
the assembly code's layout. In my previous patch, the order of these
variables was tied to the presense/absence of assembly code. In your review
comments for that patch, you asked if all assembly language implementations
would likewise assume that i and j come before S. Of course, They would not.
So, based on your review comments, I decided to make the c code be able to
work with assembly code that uses either layout, and not to tie the order of
these variables to the presence/absense of assembly code.
4. regarding indentation issues in c code: Let me suggest that you look at
bugzilla's side-by-side diff display of the patches for these. e.g.
https://bugzilla.mozilla.org/attachment.cgi?id=169498&action=diff
In this patch, I believe the indentation is correct, but this is not obvious
in an examination of the diff itself because of the addition of an extra
column or two at the beginning of each line of text by cvs diff. This extra
column often causes lines that use tabs to appear to be indented differently
than lines that use leading spaces, but when the patch is applied and the
extra columns (of + and -) are removed, the text appears correctly indented.
The bugzilla diff display does a great job of clearing that up.
Reporter | ||
Comment 14•20 years ago
|
||
Here's how I would like to move forward with this bug.
If this patch is correct for all existing supported platforms and does
not cause build failures, incorrect results, or performance regressions
(compared to the unpatched code) for any platform, then I would like to
check it in as is (perhaps indenting a few lines in the Makefile by one
more notch).
If it is determined that other CPUs can and should also take advantage of
this new code, then the makefile and/or code ifdefs can *subsequently* be
changed to include those other CPUs/compilers also, by people who have
those other CPUs and/or compilers with which to test.
I do not have those other platforms' CPUs and/or compilers with which to
test, and I do not wish to take responsbility for optimal builds and
performance on those platforms.
Comment 15•20 years ago
|
||
Comment on attachment 169498 [details] [diff] [review]
patch v3
In arcfour.c, we have
>+#if defined(AIX) || defined(OSF1) || defined(__amd64)
>+/* Treat array variables as longs, not bytes, on CPUs that take
>+ * much longer to write bytes than to write longs.
>+ */
> #define USE_LONG
> #endif
Is this change correct for the operating
systems (such as FreeBSD and NetBSD) that
have been ported to AMD64 but aren't using
the assembly code in arcfour-amd64-xxx.s?
Regarding AMD64 vs. Intel EM64T: the makefile changes
aren't distinguishing between these two flavors of
x86-64. For example, the Linux makefile code says:
ifeq ($(OS_TARGET),Linux)
+ifeq ($(CPU_ARCH),x86_64)
+# AMD64
This test will evaluate to true for Intel EM64T as
well. So, Linux on Intel EM64T will be using
arcfour-amd64-gas.s
The Solaris makefile code says:
ifeq ($(OS_TARGET),SunOS)
...
ifeq ($(CPU_ARCH),sparc)
...
else
+
+# Solaris x86 or Solaris 10 for AMD64
...
+ifeq ($(USE_64),1)
+# Solaris for AMD64
I am afraid that this test will also evaluate to
true for Intel EM64T, but you may not need to
worry about that now if Solaris hasn't been ported
to Intel EM64T.
Comment 16•20 years ago
|
||
Comment on attachment 169498 [details] [diff] [review]
patch v3
r=wtc.
I am not that sure about the use of __amd64
in arcfour.c. I'll feel more comfortable if
you use __x86_64 instead.
I believe we will only produce a build for both
AMD64 and Intel EM64T (see note below). So, I
want to make sure that we won't produce different
builds when we compile our code on AMD64 and
Intel EM64T.
(See the following examples of software vendors
using the same binaries for both AMD64 and Intel EM64T.
http://www.redhat.com/apps/commerce/rhel/as/#amd
http://www.novell.com/products/linuxenterpriseserver/packages.html
http://www.sun.com/software/solaris/specs.html#sol10
http://www.microsoft.com/windowsxp/64bit/evaluation/upgrade.mspx
)
I don't know if __amd64 will be defined when
the compiler is running on Intel EM64T. If
__amd64 is not defined when the compiler is
running on Intel EM64T, our build output will
have the undesirable dependency on the CPU of
the build machine.
I hope I've made my point clear.
Attachment #169498 -
Flags: review?(wtchang) → review+
Reporter | ||
Comment 17•20 years ago
|
||
This patch shows only the changes to Makefile and the arcfour.c files.
The assem files are unchanged from previous versions.
It attempts to address the requests to minimize differences between
x86-64 family CPUs. Note that according to GCC documentation, the
only -march option supported for that family is -march=opteron.
Wan-Teh, any issues with this patch?
Attachment #173345 -
Flags: review?(wtchang)
Reporter | ||
Comment 18•20 years ago
|
||
I should add that the latest patch also addresses indentation issues in
the makefile by adding much more indentation for nested ifs.
However, when reviewing the patch for indentation and alignment, please
use bugzilla's side by side diff feature, because it displays tabs correctly
aligned. https://bugzilla.mozilla.org/attachment.cgi?id=173345&action=diff
Comment 19•20 years ago
|
||
Comment on attachment 173345 [details] [diff] [review]
patch v4 - makefile and .c file only
This patch looks good, Nelson. By the way, I
didn't mean to make you fix the indentations
in the Makefile.
Why do we only use floating point ECC code on
Linux for x86 or x86-64 and Solaris for SPARC?
Why don't we use floating point ECC code on
Solaris for x86 or AMD64, or other OS/CPU
combinations?
Another question: in arcfour.c, you have
> struct RC4ContextStr
> {
>+#if defined(NSS_ARCFOUR_IJ_B4_S) || defined(NSS_BEVAND_ARCFOUR)
>+ Stype i;
>+ Stype j;
>+ Stype S[ARCFOUR_STATE_SIZE];
>+#else
> Stype S[ARCFOUR_STATE_SIZE];
>- PRUint8 i;
>- PRUint8 j;
>+ Stype i;
>+ Stype j;
>+#endif
> };
Do we still need the NSS_ARCFOUR_IJ_B4_S test?
That macro is never defined.
Reporter | ||
Comment 20•20 years ago
|
||
Wan-Teh, was comment 19 intended to convey r+?
Regarding your two questions in comment 19:
a) This patch doesn't change (or isn't intended to change)
the use of FP for ECC on any platform. Please ask Vipul
about that in any of the ECC bugs that are still open.
b) The test for the symbol NSS_ARCFOUR_IJ_B4_S is left as a
convenience for future alternative implementations, should they
need it. It could be removed, but hurts nothing.
Comment 21•20 years ago
|
||
Nelson,
This patch does change the use of FP for
ECC on one platform: Linux x86-64. It
didn't use FP for ECC before. Is this
change intentional?
It is better to use NSS_ARCFOUR_IJ_B4_S
this way:
/*
* Alternatively, we can define this macro
* in the Makefile.
*/
#ifdef NSS_BEVAND_ARCFOUR
#define NSS_ARCFOUR_IJ_B4_S 1
#endif
struct RC4ContextStr
{
#if defined(NSS_ARCFOUR_IJ_B4_S)
Stype i;
Stype j;
Stype S[ARCFOUR_STATE_SIZE];
#else
Stype S[ARCFOUR_STATE_SIZE];
Stype i;
Stype j;
#endif
};
Comment 22•20 years ago
|
||
Comment on attachment 173345 [details] [diff] [review]
patch v4 - makefile and .c file only
r=wtc. (Those two are minor issues.)
Attachment #173345 -
Flags: review?(wtchang) → review+
Assignee | ||
Comment 23•20 years ago
|
||
Taking bug .
Assignee: nelson → julien.pierre.bugs
Status: ASSIGNED → NEW
Assignee | ||
Comment 24•20 years ago
|
||
I have checked in Nelson's patch with the only exception that I removed the
USE_FP=1 line from the Linux x86_64 section, in order to preserve the existing
behavior of not building these files in that case. Whether that was correct or
not can be addressed in a separate ECC bug .
Checking in Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v <-- Makefile
new revision: 1.59; previous revision: 1.58
done
Checking in arcfour.c;
/cvsroot/mozilla/security/nss/lib/freebl/arcfour.c,v <-- arcfour.c
new revision: 1.15; previous revision: 1.14
done
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/arcfour-amd64-gas.s,v
done
Checking in arcfour-amd64-gas.s;
/cvsroot/mozilla/security/nss/lib/freebl/arcfour-amd64-gas.s,v <--
arcfour-amd64-gas.s
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/arcfour-amd64-sun.s,v
done
Checking in arcfour-amd64-sun.s;
/cvsroot/mozilla/security/nss/lib/freebl/arcfour-amd64-sun.s,v <--
arcfour-amd64-sun.s
initial revision: 1.1
done
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•