Windows 64 bits (AMD64) support for NSS libraries

RESOLVED FIXED in 3.11.9

Status

enhancement
P1
normal
RESOLVED FIXED
16 years ago
10 years ago

People

(Reporter: m_kato, Assigned: julien.pierre)

Tracking

trunk
3.11.9
x86
Windows XP
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: FIPS _X86_)

Attachments

(4 attachments, 21 obsolete attachments)

6.23 KB, patch
nelson
: review+
glenbeasley
: review+
rrelyea
: superreview+
Details | Diff | Splinter Review
2.07 KB, patch
nelson
: review+
Details | Diff | Splinter Review
371.43 KB, patch
Details | Diff | Splinter Review
374.57 KB, patch
nelson
: review+
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6a) Gecko/20031030
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6a) Gecko/20031030

Curretly code doesn't support Windows XP 64bits for AMD64.  (we cannot complile
and don't work)  So I need to support this platform

Reproducible: Always

Steps to Reproduce:
This is build issue, so this doesn't need
Actual Results:  
This is build issue, so this doesn't need

Expected Results:  
We can build and work

Updated

15 years ago
Blocks: 237202
Comment on attachment 146607 [details] [diff] [review]
diff of mozilla/security

This diff is for WinXP AMD64 port.
Attachment #146607 - Flags: review?(wchang0222)

Updated

15 years ago
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 6

15 years ago
Comment on attachment 146607 [details] [diff] [review]
diff of mozilla/security

In nss/lib/freebl/Makefile, we have:

>+ifeq ($(CPU_ARCH),x86_64)
>+    DEFINES += -DMP_NO_MP_WORD -DMP_USE_UINT_DIGIT
>+else
>     ASFILES  = mpi_x86.asm
>     DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE -DMP_ASSEMBLY_DIV_2DX1D \
>     -DMP_USE_UINT_DIGIT -DMP_NO_MP_WORD
>+endif

x86_64 can execute x86 instructions natively.
Why can't we use mpi_x86.asm (which contains
x86 assembly code) for x86_64?
> x86_64 can execute x86 instructions natively.
> Why can't we use mpi_x86.asm (which contains
> x86 assembly code) for x86_64?

x86_64 mode isn't 32 bit mode (like 16bit protect mode vs 32bit protect mode). 
And a way to pass parameters between 64 bit mode and 32 bit mode is different. 
So if I port assember to x86_64, I have to re-write all code.
Makoto, have you tested this without any DEFINES?  That is, without 

>+    DEFINES += -DMP_NO_MP_WORD -DMP_USE_UINT_DIGIT 

The MPI code is written so that it should compile on every platform
without any DEFINES.  The DEFINES exist to enable manual tuning of 
performance.  I'd be interested in a performance comparison between
builds with and without the above DEFINES.
QA Contact: bishakhabanerjee → jason.m.reid
(Assignee)

Comment 9

14 years ago
The previous patches are incomplete. They were probably generated while compiling NSS as part of the browser, rather than a full build of NSS (gmake nss_build_all in mozilla/security/nss). As it stands, the command-line tools would not link because many of them had additional LDFLAGS including /MACHINE:386, which broke on 64-bit. This is a patch that removes this option, which I believe is unnecessary.
(Assignee)

Comment 10

14 years ago
Re: the mpi patch, we can't use the x86 code indeed. We have several options.

1) Use the C code
We'll need to find the proper defines for it.
I don't think the best defines are going to be no defines in this case, because the mpi code uses long by default as the largest type, but long on Windows 64-bit is only a 32-bit type. I think -DMP_ULONG_LONG_MAX will provide the best performance for this code.
I haven't been able to benchmark yet, because I have run into manifest issues with VC2005 .

2) Use assembly code.
We have several very good AMD64 assembly implementations which will likely vastly outperform the C code. mpi_amd64_gas.s / mpi_amd64_sun.s is the first generation of our AMD code that went into 3.10 .
mp_comba_amd64_sun.s is the implementation that is in 3.11 .
However, all those .s files are written in gas syntax. They need to be converted to MASM syntax to build with ML64 .
(Assignee)

Comment 11

14 years ago
I got past the VC2005 assembly/manifest problem so I'm benchmarking now.
I'm running on a dual-CPU, dual-core opteron 275 (total of 4 cores) at 2.2 GHz.
The current 32-bit x86 code with mpi_x86.asm does 297 ops/s for RSA 1024 private keys. This was a single-threaded test. I got the same result with the bits built with VC6 and VC2005.

For 64-bits, I didn't set any MPI defines, which caused the mp_digit to be unsigned long long, ie 64-bit. Unfortunately, the RSA 1024 private key test gets just under 350 ops/s.

I tried using assembly, but unfortunately the .o files from Solaris can't be linked as .obj files on Windows. The MS linker just complains that the .obj file is corrupt. I was hoping this would work as the object files are intercheable between Solaris and Linux, but of course not with Windows.

For the record, the RSA performance with the assembly code is in the 1100 - 1200 range on another opteron 2200 (model 248). So there is quite a lot to be gained by using the assembly code. Unfortunately, VC 2005 performs at the low end of all the C compilers I have tested for opteron for this mpi C code.

I'm not sure how to approach the assembly code here. Before I try to convert all the .s files from gas to MASM syntax, I'd like to make sure that the C calling convention is the same between Solaris/Linux and VC2005 . If not, then the job won't just be a matter of assembler syntax, the code may also have to be changed.

Comment 12

14 years ago
Could you post the source and generated assembler code along with the
compiler optimizations that you selected for 64-bit VS2005? Or just email
them to me?

(In reply to comment #11)
> I got past the VC2005 assembly/manifest problem so I'm benchmarking now.
> I'm running on a dual-CPU, dual-core opteron 275 (total of 4 cores) at 2.2 GHz.
> The current 32-bit x86 code with mpi_x86.asm does 297 ops/s for RSA 1024
> private keys. This was a single-threaded test. I got the same result with the
> bits built with VC6 and VC2005.
> 
> For 64-bits, I didn't set any MPI defines, which caused the mp_digit to be
> unsigned long long, ie 64-bit. Unfortunately, the RSA 1024 private key test
> gets just under 350 ops/s.
> 
> I tried using assembly, but unfortunately the .o files from Solaris can't be
> linked as .obj files on Windows. The MS linker just complains that the .obj
> file is corrupt. I was hoping this would work as the object files are
> intercheable between Solaris and Linux, but of course not with Windows.
> 
> For the record, the RSA performance with the assembly code is in the 1100 -
> 1200 range on another opteron 2200 (model 248). So there is quite a lot to be
> gained by using the assembly code. Unfortunately, VC 2005 performs at the low
> end of all the C compilers I have tested for opteron for this mpi C code.
> 
> I'm not sure how to approach the assembly code here. Before I try to convert
> all the .s files from gas to MASM syntax, I'd like to make sure that the C
> calling convention is the same between Solaris/Linux and VC2005 . If not, then
> the job won't just be a matter of assembler syntax, the code may also have to
> be changed.
> 
(Assignee)

Comment 13

14 years ago
Michael,

As I mentioned, I wasn't able to use any assembly for the 64-bit Windows case yet, because the sources are written for gas and need to be converted to MASM syntax. If you check out the tip of NSS, the file is called  mozilla/security/nss/lib/freebl/mpi/mp_comba_amd64_sun.s . If you manage to convert this to an mp_comba_amd64_win.asm that assembles successfully with ML64, then the defines will be :

ASFILES += mp_comba_amd64_win.asm
DEFINES += -DMPI_AMD64 -DMP_ASSEMBLY_MULTIPLY
MPI_SRCS += mpi_amd64.c

A new rule to assemble with ML64.EXE is also needed. The current one assembles with ML.EXE for 32-bits only.

Comment 14

14 years ago
I was interested in the C code and the optimization switches used to build it
with VS2003/2005 and what the generated Assembler from that looked like. Not
the Assembler source.

(In reply to comment #13)
> Michael,
> 
> As I mentioned, I wasn't able to use any assembly for the 64-bit Windows case
> yet, because the sources are written for gas and need to be converted to MASM
> syntax. If you check out the tip of NSS, the file is called 
> mozilla/security/nss/lib/freebl/mpi/mp_comba_amd64_sun.s . If you manage to
> convert this to an mp_comba_amd64_win.asm that assembles successfully with
> ML64, then the defines will be :
> 
> ASFILES += mp_comba_amd64_win.asm
> DEFINES += -DMPI_AMD64 -DMP_ASSEMBLY_MULTIPLY
> MPI_SRCS += mpi_amd64.c
> 
> A new rule to assemble with ML64.EXE is also needed. The current one assembles
> with ML.EXE for 32-bits only.
> 

(Assignee)

Comment 15

14 years ago
Michael,

The C code is all checked in to mozilla/security/nss/lib/freebl .
The patch to the Makefile to build freebl with the C code for Win32 is as follows. This is a replacement for part of patch 146607 attached to this bug.

Index: Makefile
===================================================================
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/Makefile,v
retrieving revision 1.70
diff -u -4 -r1.70 Makefile
--- Makefile    22 Nov 2005 07:13:32 -0000      1.70
+++ Makefile    8 Dec 2005 00:04:05 -0000
@@ -97,16 +97,20 @@
 # using assembler right now.
     ASFILES  =
     DEFINES += -DMP_NO_MP_WORD -DMP_USE_UINT_DIGIT
 else
+ifeq (1,$(USE_64))
+       OPTIMIZER += -Ox  # maximum optimization for freebl
+else
     ASFILES  = mpi_x86.asm
     DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE
     DEFINES += -DMP_ASSEMBLY_DIV_2DX1D -DMP_USE_UINT_DIGIT -DMP_NO_MP_WORD
     ifdef BUILD_OPT
        OPTIMIZER += -Ox  # maximum optimization for freebl
     endif
 endif
 endif
+endif

 ifeq ($(OS_TARGET),WINCE)
     DEFINES += -DMP_ARGCHK=0   # no assert in WinCE
     DEFINES += -DSHA_NO_LONG_LONG # avoid 64-bit arithmetic in SHA512

You can look at the full command-line of the compiler in your build log. For the assembler output, just use a disassembler on the resulting .obj files.
(Assignee)

Comment 16

14 years ago
Comment on attachment 205107 [details] [diff] [review]
Additional patch for command-line tools. Remove /MACHINE:386

This patch works, but I attached a better one in bug 319495 . Marking obsolete.
Attachment #205107 - Attachment is obsolete: true
Assignee: wtchang → nobody
QA Contact: jason.m.reid → libraries
Posted patch update win*.mk files (obsolete) — Splinter Review
Mr. Kato, Please try this patch and tell us if it works
with your other patches.
Attachment #146606 - Attachment is obsolete: true
Attachment #217198 - Flags: review?(wtchang)

Updated

13 years ago
Attachment #217198 - Flags: review?(wtchang) → review?(julien.pierre.bugs)

Updated

13 years ago
Attachment #146607 - Flags: review?(wtchang) → review?(julien.pierre.bugs)
(Assignee)

Comment 18

13 years ago
Comment on attachment 146607 [details] [diff] [review]
diff of mozilla/security

This patch no longer applies. The freebl/Makefile needs to be updated slightly due to recent changes.

Also, this patch contains an update for the coreconf/WIN954.0.mk file, which conflicts with your later one.

Even after I combined your two patches, I got a build error in DBM :

"no target architecure" in winnt.h from the VC8 platform SDK .
_AMD64 did not get passed in.

I believe that is because you depend on a specific value of uname which is simply not correct with the MKS uname.
But I tried cygwin and it failed too.
Attachment #146607 - Flags: review?(julien.pierre.bugs) → review-
(Assignee)

Comment 19

13 years ago
Comment on attachment 217198 [details] [diff] [review]
update win*.mk files

The combination of this patch with the other one failed in both my MKS and cygwin environment due to not getting the expected CPU_ARCH from uname -m .

I would recommend that if an x86 CPU is detected, and USE_64 is set, to assume x86_64 .
Attachment #217198 - Flags: review?(julien.pierre.bugs) → review-
Priority: -- → P4

Comment 21

12 years ago
The patches from comments #19 and #20 applied cleanly and the build went through after I tweaked one more conf file to deal with 32 bit cygwin:

diff -ur security/coreconf/arch.mk mozilla/security/coreconf/arch.mk
--- security/coreconf/arch.mk	2007-03-29 16:50:54.000000000 -0700
+++ security/coreconf/arch.mk	2007-03-29 16:59:57.781250000 -0700
@@ -258,6 +258,10 @@
 ifeq (CYGWIN_NT,$(findstring CYGWIN_NT,$(OS_ARCH)))
     OS_RELEASE := $(patsubst CYGWIN_NT-%,%,$(OS_ARCH))
     OS_ARCH = WINNT
+    ifeq (WOW64,$(findstring WOW64,$(OS_RELEASE)))
+        OS_RELEASE := $(patsubst %-WOW64,%,$(OS_RELEASE))
+        CPU_ARCH := x86_64
+    endif
     ifndef CPU_ARCH
 	CPU_ARCH := $(shell uname -m)
 	#
                                                              
All the NSS unit tests pass. We plan to run some more extensive tests in the next couple weeks. 

However, during the build there were a lot of warnings about "int16/int32" to "size_t" assignments. These warnings look scary...

BTW, this bug seems to have a very low priority. Any chance of bumping it up?
(In reply to comment #21)
> The patches from comments #19 and #20 applied cleanly and the build went
> through after I tweaked one more conf file to deal with 32 bit cygwin:

OK.  I will create new patch with your code, then send code review.
Posted patch a patch with x64 asm v4 (obsolete) — Splinter Review
Attachment #268653 - Attachment is obsolete: true
Attachment #268683 - Flags: review?(julien.pierre.boogz)
(Assignee)

Comment 26

12 years ago
Comment on attachment 268683 [details] [diff] [review]
a patch with x64 asm v4

The 64-bit Windows build should support the USE_64 build macro, to force a 64-bit target, just like we do on all other platforms that support both 32-bit and 64-bit targets today. The default target should be 32-bits, unless not supported by the local host - but I am not aware of any version of windows that is 64-bit only.
Attachment #268683 - Flags: review?(julien.pierre.boogz) → review-
(In reply to comment #26)

Windows x64 is different of Unix platform such as Linux.  To builld x64 binaries, we need x64 compliler and assember by Microsoft.  In other word, example, it is like that CPU is different like ARM and x86.

But, If we must use USE_64 macro, should I add this condition into security/coreconf?
(Assignee)

Comment 28

12 years ago
Makota, on Unix platforms you also need 64-bit compilers and assemblers in order to build 64-bit targets. It is not different. I'm taking this bug now as the priority has been raised for Sun. I'm changing the description since this is not just for Win XP 64 bits, but also for Windows Server 2003 and Vista 64 bits.
Assignee: nobody → julien.pierre.boogz
Priority: P4 → P2
Summary: WinXP 64 bits (AMD64) support for NSS libraries → Windows 64 bits (AMD64) support for NSS libraries
Target Milestone: --- → 3.12
(Assignee)

Updated

12 years ago
Depends on: 401194
Version: unspecified → trunk
(Assignee)

Comment 29

12 years ago
Makoto (sorry for the mis-spelling in comment 28), in my installation of VC8, the 64-bit version of MASM (version 8.00.50727.762) is called ML.EXE, not ML64.EXE . Did you rename it ? The MS procedure seems to be to set different PATHs to the 64-bit tools to build, not use different switches to the same tools.
(Assignee)

Comment 30

12 years ago
Makoto,

FYI, I prefer to do the Windows x64 port in 2 parts - first, get everything to build correctly and tests to pass, then, do optimizations in separate bugs, such as the assembly for freebl that you have contributed (thank you for that).

I have everything building with both MSYS and cygwin now, either for the 64-bit and 32-bit targets (controlled by the USE_64 define), on my Windows server 2003 system. I am in the process of running the NSS QA test suite. After the fix for bug 401194 lands, every test will pass, except the new upgrade DB tests.
(Assignee)

Comment 31

12 years ago
Posted patch My work in progress (obsolete) — Splinter Review
This patch contains all my current changes for win64, including fixes for 401194 and bug 399669 .
The QA runs with msys/mingw, but currently not with cygwin. It appears to have been broken before even for 32-bit. I thought we had gotten it to work previously with cygwin, but that's no longer the case.
This patch is not for review, but please feel free to play with it and see if you find anything wrong. I won't be able to work on this again until next tuesday at the earliest.
Attachment #217198 - Attachment is obsolete: true
(Assignee)

Comment 32

12 years ago
Changing target milestone to 3.11.9 since we have customers needing this port in 2007 .
Priority: P2 → P1
Target Milestone: 3.12 → 3.11.9
(In reply to comment #29)
> Makoto (sorry for the mis-spelling in comment 28), in my installation of VC8,
> the 64-bit version of MASM (version 8.00.50727.762) is called ML.EXE, not
> ML64.EXE . Did you rename it ? The MS procedure seems to be to set different

Julien, x64 version of MASM is ML64.EXE, not ML.EXE.  Please see http://msdn2.microsoft.com/en-us/library/hb5z4sxd(VS.80).aspx

Also, I will try testing on MSYS shell.

(Assignee)

Comment 34

12 years ago
Makoto, you are right. On my installation, I found ML64.EXE in the platform SDK under C:\Program Files\Microsoft Platform SDK\Bin\win64\x86\AMD64 . But VC8 only had the 32-bit only ML.EXE in C:\Program Files (x86)\Microsoft Visual Studio 8\VC\bin . There wasn't any ML*.EXE in C:\Program Files\Microsoft Visual Studio 8 . This may be an issue with my install. I will set my PATH to point to the Platform SDK first for 64-bit.

Comment 35

12 years ago
(In reply to comment #34)
> Makoto, you are right. On my installation, I found ML64.EXE in the platform SDK
> under C:\Program Files\Microsoft Platform SDK\Bin\win64\x86\AMD64 . But VC8
> only had the 32-bit only ML.EXE in C:\Program Files (x86)\Microsoft Visual
> Studio 8\VC\bin . There wasn't any ML*.EXE in C:\Program Files\Microsoft Visual
> Studio 8 . This may be an issue with my install. I will set my PATH to point to
> the Platform SDK first for 64-bit.

I have this problem as well and the build dies at this phase. I just run the assembler by hand to do the assembly and restart the build and it continues on.

(Assignee)

Comment 36

12 years ago
This patch should apply to both the NSS 3.11 branch and the trunk. Currently, the trunk of NSPR is required.

With this patch, the build will work, and QA will pass 100% on the branch. There were still a few problems in DB upgrade tests on the trunk last I checked, which I believe aren't porting related, so they shouldn't hold this patch.

This patch does not contain any assembly optimizations or enable prevention of cache attacks. I will tackle those issues in separate patches.
Attachment #286372 - Attachment is obsolete: true
Attachment #287638 - Flags: superreview?(rrelyea)
Attachment #287638 - Flags: review?(nelson)
(Assignee)

Comment 37

12 years ago
Note that the change to arch.mk is only necessary when running with cygwin due to the fact that cygwin's uname returns a weird platform string when running on a win64 box. The change doesn't hurt with other shells.

Comment 38

12 years ago
Comment on attachment 287638 [details] [diff] [review]
Patch to build and pass QA (checked in to NSS_3_11_BRANCH and trunk). This went into 3.11.9 release.


r+ with some comments:


In win_rand.c, how is wincrypt.h getting included (so that we get the correct definition of HCRYPTPROV). We probably want to ifdef out the entire wincrypt.h block more generally (BTW do we still have a requirement to build on Win95, maybe the whole block can be removed.

RE _X86_ -> _AMD64_ I presume AMD64 includes support for X86_64 on intel 64 bit (I think it's highly likely that it does, just wanted to verify).

RE WIN32 -> WIN64, I'm surprised that we don't have and WIN32 defines that really mean 'Windows" rather than Windows 32 specifically. A quick grep turns up some code of the form #if defined(WIN_XP) && !defined(WIN32) In command (which I bet really means WIN16.
Attachment #287638 - Flags: superreview?(rrelyea) → superreview+
Comment on attachment 287638 [details] [diff] [review]
Patch to build and pass QA (checked in to NSS_3_11_BRANCH and trunk). This went into 3.11.9 release.


r=nelson.  This patch won't hurt any existing platform.
I have one question about the patch to nss/lib/freebl/win_rand.c

>+#ifndef WIN64
> typedef unsigned long HCRYPTPROV;
>+#endif

This implies that the type HCRYPTPROV is defined in one of the files
we include for WIN64, but is NOT defined in WIN32 builds.  
Are we including different sets of files?  
Or is that type defined in a different file for WIN64 than for WIN32?
Attachment #287638 - Flags: review?(nelson) → review+
(Assignee)

Comment 40

12 years ago
Bob,

Re: comment 38,

wincrypt.h is automatically included from windows.h in the Win64 platform SDK. It is in an #ifndef NOCRYPT block.

Yes, AMD64 is good for X86_64 . There are no defines for X86_64 in windows.h, only AMD64.

As for WIN32, it gets automatically defined from the windows header files even for Win64 . That's why places that test it in our source still work.

Nelson,

Re: comment 39

That type is defined in wincrypt.h, which we explicitly don't include on WIN32. But the WIN64 platform SDK version of windows.h automatically includes it.
(Assignee)

Comment 41

12 years ago
I checked in attachment 287638 [details] [diff] [review] to NSS_3_11_BRANCH :

Checking in coreconf/WIN32.mk;
/cvsroot/mozilla/security/coreconf/WIN32.mk,v  <--  WIN32.mk
new revision: 1.20.2.6; previous revision: 1.20.2.5
done
Checking in coreconf/WIN954.0.mk;
/cvsroot/mozilla/security/coreconf/WIN954.0.mk,v  <--  WIN954.0.mk
new revision: 1.7.28.1; previous revision: 1.7
done
Checking in coreconf/WINNT5.2.mk;
/cvsroot/mozilla/security/coreconf/WINNT5.2.mk,v  <--  WINNT5.2.mk
new revision: 1.1.2.1; previous revision: 1.1
done
Checking in WINNT6.0.mk;
/cvsroot/mozilla/security/coreconf/WINNT6.0.mk,v  <--  WINNT6.0.mk
new revision: 1.1.6.1; previous revision: 1.1
done
Checking in coreconf/arch.mk;
/cvsroot/mozilla/security/coreconf/arch.mk,v  <--  arch.mk
new revision: 1.19.2.1; previous revision: 1.19
done
Checking in nss/lib/freebl/Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v  <--  Makefile
new revision: 1.70.2.14; previous revision: 1.70.2.13
done
Checking in nss/lib/freebl/win_rand.c;
/cvsroot/mozilla/security/nss/lib/freebl/win_rand.c,v  <--  win_rand.c
new revision: 1.9.28.3; previous revision: 1.9.28.2
done

And to the trunk :
Checking in coreconf/WIN32.mk;
/cvsroot/mozilla/security/coreconf/WIN32.mk,v  <--  WIN32.mk
new revision: 1.26; previous revision: 1.25
done
Checking in coreconf/WIN954.0.mk;
/cvsroot/mozilla/security/coreconf/WIN954.0.mk,v  <--  WIN954.0.mk
new revision: 1.9; previous revision: 1.8
done
Checking in coreconf/WINNT5.2.mk;
/cvsroot/mozilla/security/coreconf/WINNT5.2.mk,v  <--  WINNT5.2.mk
new revision: 1.2; previous revision: 1.1
done
Checking in coreconf/WINNT6.0.mk;
/cvsroot/mozilla/security/coreconf/WINNT6.0.mk,v  <--  WINNT6.0.mk
new revision: 1.2; previous revision: 1.1
done
Checking in coreconf/arch.mk;
/cvsroot/mozilla/security/coreconf/arch.mk,v  <--  arch.mk
new revision: 1.20; previous revision: 1.19
done
Checking in nss/lib/freebl/Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v  <--  Makefile
new revision: 1.91; previous revision: 1.90
done
Checking in nss/lib/freebl/win_rand.c;
/cvsroot/mozilla/security/nss/lib/freebl/win_rand.c,v  <--  win_rand.c
new revision: 1.15; previous revision: 1.14
done
(Assignee)

Comment 42

12 years ago
Comment on attachment 287638 [details] [diff] [review]
Patch to build and pass QA (checked in to NSS_3_11_BRANCH and trunk). This went into 3.11.9 release.


Makoto, please check if these changes worked for you. You can just pull the NSS trunk or from NSS_3_11_BRANCH . You need to set USE_64 to 1. NSS should be able to build with either Cygwin or MSYS .  I only tested the standalone NSS build which is done by doing gmake nss_build_all in mozilla/security/nss .
Attachment #287638 - Attachment description: Patch to build and pass QA → Patch to build and pass QA (checked in to NSS_3_11_BRANCH and trunk)
Attachment #287638 - Flags: review?(m_kato)
(Assignee)

Updated

12 years ago
Attachment #287638 - Flags: review?(glen.beasley)
(Assignee)

Comment 43

12 years ago
Posted patch asm patch with x64 asm v5 (obsolete) — Splinter Review
This is an update to Makoto's work to include the assembly changes for freebl and is not for review. I just merged the changes so that they apply to the NSS_3_11_BRANCH (they probably apply to the trunk too). I am running the NSS QA with the changes right now before I start the manual code review.
Attachment #268683 - Attachment is obsolete: true
(Assignee)

Comment 44

12 years ago
Makoto,

Good news, the QA is all green with your changes. With the newly released VC9 from VS2008, the RSA performance looks good, a little better than it is on Solaris x64 with an identical Opteron machine. I haven't benchmarked anything else.

Updated

12 years ago
Attachment #287638 - Flags: review?(glen.beasley) → review+
(Assignee)

Comment 45

12 years ago
Comment on attachment 291575 [details] [diff] [review]
asm patch with x64 asm v5

Nelson, could you please review this patch ? Things look fine to me so far, and the QA passes, but I'm not certain about the content of the new *masm.asm files in particular.
Attachment #291575 - Flags: review?(nelson)
Comment on attachment 291575 [details] [diff] [review]
asm patch with x64 asm v5

I think this patch sets some kind of new record for size. :-)
Overall, I must give this patch an r-.  

I have numerous issues with both the c code and the assembler code.  
I have previously sent review comments on the c code to Julien, and later 
I will attach a revision of those comments to this bug.  
This bug comment will focus on the issues with the four new .asm files.

I have an issue with the new .asm files' names.  Once the names are 
made satsifactory, I would suggest that we commit these .asm files as
they appear in this patch, DESPITE the other issues.  The reason is 
that then we can address those issues in MUCH smaller and easier to 
review patches to those files once they are committed, rather than in new revisions of a monster patch like this one.  

Here are the .asm files' issues:

1. file names.  All the new files' names end in "amd64_masm.asm".
Some use dashes, some use underscores.  Let's make them be consistent
in the use of dashes or underscores.  I have a very slight preference
for dashes.  I feel strongly about consistency, but not about which one
we choose.

The terms "masm" and "asm" are redundant.  .asm files are for MASM, exclusively, so there's no need to put the _masm into the file name.  
The .asm suffix suffices to make that distinction apparent.

The term "AMD64" is the name of the ABI (the register and stack usage 
conventions) used in Linux and Solaris.  Microsoft's competing ABI is 
named "x64".  The term x86_64 describes the instruction set, separately 
from either ABI.  Since the entire /raison d'etre/ for these files is to 
accommodate the x64 ABI, naming it AMD64 is clearly wrong.  

So, I propose that all these files be renamed from "*_amd64_masm.asm"
to simply "*-x64.asm".  With that change made, I think these can be 
checked in, and then the following issues can be addressed.

2. The x64 ABI imposes MANY additional requirements on stack usage and 
on code file contents than are imposed by the AMD64 ABI.  

This patch addresses the differences between the two ABIs in the area of choice of registers for arguments.  

Both ABIs apparently require that non-leaf functions (which Microsoft calls "frame functions") use stacks whose size is exactly a multiple of 16 bytes, and the original .s files from which these new .asm files were derived seem 
to already meet that requirement, I think.  This patch adds some pushes and 
pops, but I think it still meets the 16 byte stack alignment requirement.  
(Makoto, can you confirm that?)

But there are other x64 requirements that are not met by the original gcc .s files and do not appear to be met in these new .asm files.  
Here are some.

3. (most important) As documented in 
http://msdn2.microsoft.com/en-us/library/ew5tede7.aspx 
the x64 ABI requires all *frame* functions to allocate at least 32 bytes 
(four "quad-words") of stack for the exclusive use of the called function 
(the callee).  These bytes are located immediately before the address where 
the caller's return address is pushed by the call instruction.  They were 
originally intended as "home" storage for the values of the (up to) 4 arguments passed to the callee in registers, but the callee may use them however it wishes.  Regardless of the number of arguments passed to the 
callee in registers, the caller always allocates a minimum of 4 quadwords 
for "home" storage.  

The AMD64 ABI does not require that, and the code generated by gcc in 
the .s files does not allocate any home storage.  Consequently, a called function might store values into the "home" area in the caller's stack, and might clobber the caller function's local variables.

So, IMO, the "frame" functions in the .asm files need to have the 
amount of allocated stack space increased by 32 bytes for this purpose.
It appears that this is as trivial as adding 32 to the value subtracted
from the RSP register in the function prolog and added back in the 
function epilog.  

4. Use of negative offsets with RSP.  
Terminlogy: Microsoft documentation refers to the end of the stack 
where new values get pushed as the "bottom" of the stack.  I will use
that terminology here.  The code generated by gcc for use in AMD64 ABI 
*leaf* functions uses memory locations beyond the bottom of the stack.  
It uses memory locations that would be overwritten if any more pushes 
were done.  The reason to do this seems to be to generate instructions that 
use small signed offsets rather than larger positive offsets. This makes the 
code smaller, and maybe faster.  

Microsoft documentation seems to suggest that the OS may use the space 
below the RSP (perhaps interrupts occur in user stack space?), and if 
so, then the use of negative RSP offsets is unsafe, even in leaf functions.  

The gcc-generated leaf functions typically do not use frame pointer register (RBP) as a frame pointer, and instead uses it as general purpose register.  
If the leaf functions had any spare registers, then we could use RBP as a 
frame pointer and still use small signed offsets.
Otherwise, we will need to use larger positive values for all RSP offsets.  Fortunately, this is trivial to do with MASM.

If someone can point to Microsoft documentation that explicitly says it's OK to use the area below the bottom of the stack in leaf functions, that would be good.  Otherwise, I think we must change this
code. 

4. MS's x64 documentation explains a stack unwinding procedure used with exceptions.  It imposes many additional requirements on both leaf and frame functions, most notably on the structure of function prologs and epilogs.  
64-bit MASM has numerous new psuedo-ops and macros to aid in the development 
of code that meets these requirements.  This is documented in these pages, 
among others:

http://msdn2.microsoft.com/en-us/library/8ydc79k6.aspx Unwind procedure
http://msdn2.microsoft.com/en-us/library/ms235241.aspx Unwind Helpers
http://msdn2.microsoft.com/en-us/library/ms235217.aspx ASM unwind macros
http://msdn2.microsoft.com/en-us/library/ms235231.aspx Raw pseudo-ops

There is evidence that suggests that gcc did that same thing.  Notice 
all the seemingly unreferenced labels in the function prologs in the .s files produced by gcc.

I'm thinking this probably is unnecessary for c code.  But, nowhere in the 
x64 documentation does it say that these requirements do NOT apply to c-only 
code.  It should be easy to meet these requirements by using the macros defined for that purpose in the function prologs.  

5. Minor Performance issue.  It may help performance to insert ALIGN 
pseudo-ops just before targets of backward branches (just before the top 
of a loop), to align them on 16-byte boundaries.  The gcc-generated
code clearly did that.  I think we should, too.
Attachment #291575 - Flags: review?(nelson) → review-
Comment on attachment 291575 [details] [diff] [review]
asm patch with x64 asm v5

Review comments on the c language portion of this patch:

1. knowledge of underlying types in too many ifdefs ******************

This patch increases the number of ifdefs that choose long vs long long.

The fact that we use long on some platforms and long long on others
should be known in very few places.  That knowledge should be used
in the definition of a typedef which is then used everywhere else,
to avoid having ifdef'ed code in many places that uses long or
long long in those places.  Such typedefs already exist, but aren't
being used to full advantage.  Let's fix that.

In MPI, that typedef is mp_digit.  So, instead of this patch to mpi-priv.h

> +#if defined(_WIN64)
> +/* c = a * b */
> +#define s_mpv_mul_d(a, a_len, b, c) \
> +	((unsigned long long *)c)[a_len] = s_mpv_mul_set_vec64(c, a, a_len, b)
> +
> +/* c += a * b */
> +#define s_mpv_mul_d_add(a, a_len, b, c) \
> +	((unsigned long long *)c)[a_len] = s_mpv_mul_add_vec64(c, a, a_len, b)
> +#else
>  /* c = a * b */
>  #define s_mpv_mul_d(a, a_len, b, c) \
>  	((unsigned long*)c)[a_len] = s_mpv_mul_set_vec64(c, a, a_len, b)

I propose instead be the much simpler:

> /* c = a * b */
> #define s_mpv_mul_d(a, a_len, b, c) \
>-  	((unsigned long*)c)[a_len] = s_mpv_mul_set_vec64(c, a, a_len, b)
>+  	((mp_digit *)c)[a_len] = s_mpv_mul_set_vec64(c, a, a_len, b)

> /* c += a * b */
> #define s_mpv_mul_d_add(a, a_len, b, c) \
>-  	((unsigned long*)c)[a_len] = s_mpv_mul_add_vec64(c, a, a_len, b)
>+ 	((mp_digit *)c)[a_len] = s_mpv_mul_add_vec64(c, a, a_len, b)

(In fact, I don't know why that cast exists at all.  I think it could be:)

> /* c = a * b */
> #define s_mpv_mul_d(a, a_len, b, c) \
>-  	((unsigned long*)c)[a_len] = s_mpv_mul_set_vec64(c, a, a_len, b)
>+  	(c)[a_len] = s_mpv_mul_set_vec64(c, a, a_len, b)

> /* c += a * b */
> #define s_mpv_mul_d_add(a, a_len, b, c) \
>-  	((unsigned long*)c)[a_len] = s_mpv_mul_add_vec64(c, a, a_len, b)
>+ 	(c)[a_len] = s_mpv_mul_add_vec64(c, a, a_len, b)


In freebl/arcfour.c, that typedef is WORD.  The patch for that file:

> +#if defined(_WIN64)
> +/*
> + * Windows 64-bit uses LLP64 model
> + */
> +extern void ARCFOUR(RC4Context *cx, unsigned long long inputLen, 
> +	const unsigned char *input, unsigned char *output);
> +#else
>  extern void ARCFOUR(RC4Context *cx, unsigned long inputLen, 
>  	const unsigned char *input, unsigned char *output);
> +#endif

could instead be:

>- extern void ARCFOUR(RC4Context *cx, unsigned long inputLen,
>+ extern void ARCFOUR(RC4Context *cx, WORD inputLen,
>  	const unsigned char *input, unsigned char *output);

2. More use of typedefs to reduce complexity in arcfour.c *************

Originally, arcfour.c did not #include prtypes.h.  Now it does,
so we can use PRUint64 and PRUint32 types instead of all those
other ifdefs.  So I propose to replace some aspects of your
patch to freebl/arcfour.c with simpler patches.

First, I would eliminate this change from the patch:

>-#if defined(AIX) || defined(OSF1) || defined(NSS_BEVAND_ARCFOUR)
>+#if defined(AIX) || defined(OSF1) || (defined(NSS_BEVAND_ARCFOUR) && !defined(_WIN64))
> /* Treat array variables as longs, not bytes, on CPUs that take 
>  * much longer to write bytes than to write longs, or when using 
>  * assembler code that required it.

Then, Instead of this patch:

>-#if defined(IS_64) && !defined(__sparc) && !defined(NSS_USE_64) 
>+#if (defined(IS_64) && !defined(__sparc) && !defined(NSS_USE_64)) || defined(_WIN64)
> typedef unsigned long long WORD;
> #else
> typedef unsigned long WORD;
> #endif

I propose this patch which produces simpler to understand code:

>-#if defined(IS_64) && !defined(__sparc) && !defined(NSS_USE_64) 
>+#if (defined(IS_64) && !defined(__sparc))
>-typedef unsigned long long WORD;
>+typedef PRUint64 WORD;
> #else
>-typedef unsigned long WORD;
>+typedef PRUint32 WORD;
> #endif

And instead of this patch:

>-#ifdef USE_LONG
>+#if defined(_WIN64)
>+/*
>+ * Windows 64-bit uses LLP64 model, we must use ULONGLONG
>+ */
>+typedef unsigned long long Stype;
>+#elif defined(USE_LONG)
> typedef unsigned long Stype;
> #else
> typedef PRUint8 Stype;

I propose this simpler patch:

> #ifdef USE_LONG
>-typedef unsigned long Stype;
>+typedef WORD Stype;
> #else
> typedef PRUint8 Stype;
> #endif

(You might want to rename USE_LONG to USE_WORD throughout that file.)

I don't know if all those proposed patches will work perfectly.
There might be some long vs. int issues, that might require some
additional casts in various places.  But overall I think it would
be good to not increase the number of ifdefs for long vs long long.

3. This patch modifies mpi/mpi.h as follows:

> #if defined(ULONG_LONG_MAX)			/* GCC, HPUX */
> #define MP_ULONG_LONG_MAX ULONG_LONG_MAX
> #elif defined(ULLONG_MAX)			/* Solaris */
> #define MP_ULONG_LONG_MAX ULLONG_MAX
> /* MP_ULONG_LONG_MAX was defined to be ULLONG_MAX */
> #elif defined(ULONGLONG_MAX)			/* IRIX, AIX */
> #define MP_ULONG_LONG_MAX ULONGLONG_MAX
>+#elif defined(_WIN64)               /* WIN64 */
>+#define MP_ULONG_LONG_MAX 0xffffffffffffffffui64
> #endif

According to http://msdn2.microsoft.com/en-us/library/aa934141.aspx
the value 0xffffffffffffffffui64 is defined in Windows' <limits.h>
header file with the symbol names ULLONG_MAX and _UI64_MAX.  Since mpi.h
already looks for ULLONG_MAX, I'd prefer if we could use the constant
from the windows header file rather than hard coding this value.

This last item (#3) is not a MUST.
(Assignee)

Comment 48

11 years ago
Nelson,

Thanks for your feedback.

Re: comment #46

1. Actually, .asm files are not exclusively for MASM . The extension has also been used by other assemblers such as Borland's TASM, Watcom assembler, or IBM's ALP assembler for OS/2 . We used to have some non-MASM .asm files for OS/2 ALP in lib/freebl, even. So, I don't think having _masm.asm in the filename is really redundant, just like we have -gas.s and -sun.s files. I prefer dashes to underscores as well.

The word AMD64 has been used for many things. The Solaris and Linux ABI  is only one of them. Sun also markets "Solaris for x64". And Linux people prefer to talk about x86_64. From http://en.wikipedia.org/wiki/Amd64 :

"x86-64 is a 64-bit superset of the x86 instruction set architecture. The x86-64 instruction set natively supports Intel's x86 and was designed by Advanced Micro Devices (AMD), who have since renamed it AMD64. This architecture has been cloned by Intel under the name Intel 64 (formerly known as Yamhill, Clackamas Technology, CT, IA-32e, and EM64T).[1] This leads to the common use of the names x86-64 or x64 as more vendor-neutral terms to collectively refer to the two nearly identical implementations."

I think it's fair to say that AMD64 is one of the many valid terms used to refer to the original 64-bit extensions for x86 created by AMD. We already use that term extensively in many source files. It doesn't mean they are in any way AMD-specific, or Linux- or Solaris-ABI specific. I don't think there is a good reason to stray from this name now unless we change them all.

Anyway, I don't have strong opinions about the names.

2. Yes, unfortunately Microsoft chose to use a different calling convention than the other OSes of course.

This brings another issue. At least some of existing amd64 assembler sources that were used by Makoto to create the masm files were originally the output of gcc . I think some processing was done for Solaris, which was done manually, but I believe it could be automated as the changes were small.

However, in this case, due to the calling convention differences, the changes are not trivial. If we ever change the C source, it will be a headache to propagate the changes to Win64. So, I wonder if we should go in the direction of using the same source file, and generating a .s file for win64 with gcc, that could be linked to the rest of the freebl library built with VC9.

What do you think of this proposal ? I think this would obviate your issues 2, 3, 4, 4 again ;),
(Assignee)

Comment 49

11 years ago
Nelson,

Note that re: comment 48, I am assuming that with gcc it is be possible to output functions that use the win64 x64 ABI. So we would let gcc do the hard work for us. I am wondering if this is what Makoto did.

re: comment 47 :

1. Agreed, this should be fixed.

2. Yes, your suggestion seems OK. I am not sure what the original code was doing with NSS_USE_64, though. It seems backwards.

3. Yes, this is reasonable.
Comment on attachment 291575 [details] [diff] [review]
asm patch with x64 asm v5

There's one more problem with this patch that I noticed while reviewing
it, but neglected to mention in previous review comments.  It's a c++
style comment ("//") in c code in nss/lib/freebl/mpi/mpcpucache.c

>+#else
>+// implement in mpcpucache_amd64_vc.asm due to no inline assembler
(Assignee)

Comment 51

11 years ago
Unfortunately, it appears that gcc for win64 is not ready for prime time yet. There is no support for the win64 calling convention yet.
(Assignee)

Comment 52

11 years ago
Posted patch Only prevent cache attacks. (obsolete) — Splinter Review
This patch uses the VC2008 compiler intrinsic __cpuid instead of using an external assembly file to call the CPUID instruction. I checked that this gave the same result as the other external function, by calling both and comparing the result.
I built optimized, and looked at the disassembly of the generated optimized object in the debugger for this patch, and noted that the _cpuid function was entirely eliminated, and actually inlined by the compiler. Pretty nice job.
Attachment #299098 - Flags: superreview?(wtc)
Attachment #299098 - Flags: review?(nelson)

Comment 53

11 years ago
Comment on attachment 299098 [details] [diff] [review]
Only prevent cache attacks.

r=wtc.

I suggest you rewrite the nested ifdefs like this:

#if defined(AMD_64)

#if defined(__GNUC__)
GCC inline assembly for x64
#elif defined(_MSC_VER)
Your new code (MSVC intrinsic __cpuid)
#endif

#else

#if defined(__GNUC__)
GCC inline assembly for x86
#elif defined(_MSC_VER)
MSVC inline assembly for x86
#endif

#endif
Attachment #299098 - Flags: superreview?(wtc) → superreview+
Comment on attachment 299098 [details] [diff] [review]
Only prevent cache attacks.

Julien, two questions.

1. This code defines a new function _cpuid, and #defines cpuid to be
_cpuid.  Why?  Why not just name the new function cpuid?

2. The new function _cpuid calls another function named __cpuid. 
I see no declaration for __cpuid.  IINM, there must be a declaration
that declares it to be an intrinsic, or else it will be treated like
any other ordinary function call.  Where is that declaration?
If the declaration is absent, then this patch gets r-.

Comment 55

11 years ago
MSDN says we need to include <intrin.h>.

http://msdn2.microsoft.com/en-us/library/hskdteyh(vs.80).aspx
Comment on attachment 299098 [details] [diff] [review]
Only prevent cache attacks.

lxr does not find the string intrin.h anywhere in NSS or NSPR source code
so I gather it is not being included, although perhaps it is being included
by some other windows header that is being included.

I don't have the build environment to try to build this Win64 code myself,
or I would have applied the patch and then built the file mpcpucache.i
(which is the output of the c preprocessor) and then grepped it for a 
declaration of __cpuid.  

But since I cannot test it, I will mark r- for this issue.  If you find 
that intrin.h is being included somewhere, please let us know.
Attachment #299098 - Flags: review?(nelson) → review-
(Assignee)

Comment 57

11 years ago
This update fixes the macro tests the way Wan-Teh likes, and removes the unnecessary #define cpuid macro . It also includes intrin.h . The symbol was being implicitly declared without any warning .
Attachment #299098 - Attachment is obsolete: true
Attachment #299938 - Flags: superreview?(wtc)
Attachment #299938 - Flags: review?(nelson)

Comment 58

11 years ago
Comment on attachment 299938 [details] [diff] [review]
Only prevent cache attacks v2, with feedback from Wan-Teh and Nelson (checked in to NSS_3_11_BRANCH and trunk)


r=wtc.  Remember to add static below before you check in.

>+void cpuid(unsigned long op, unsigned long *eax,
Attachment #299938 - Flags: superreview?(wtc) → superreview+

Comment 59

11 years ago
Comment on attachment 299938 [details] [diff] [review]
Only prevent cache attacks v2, with feedback from Wan-Teh and Nelson (checked in to NSS_3_11_BRANCH and trunk)


If the Sun Studio compiler also accepts GCC's inline assembly
syntax, you need to add the defined(__SUNPRO_C) test, or you can
go back to the original !define(_MSC_VER) test.  In any case,
it's good to have an AMD64 section and an x86 section.

Comment 60

11 years ago
Comment on attachment 299938 [details] [diff] [review]
Only prevent cache attacks v2, with feedback from Wan-Teh and Nelson (checked in to NSS_3_11_BRANCH and trunk)


OK, you're using separate assembly files for the Sun Studio
compilers.  So it is correct to test just defined(__GNUC__) in
this patch.
Comment on attachment 299938 [details] [diff] [review]
Only prevent cache attacks v2, with feedback from Wan-Teh and Nelson (checked in to NSS_3_11_BRANCH and trunk)


After several readings of this patch, I concluded that it is correct,
so sr+. But there are some aspects of it I found confusing, and I'd
like to suggest that you change them before you commit.  They are:

>-#if defined(__x86_64__) || defined(__x86_64)
>+#if defined(__x86_64__) || defined(__x86_64) || defined(_M_AMD64)
> #define AMD_64 1
> #endif
> 
> /* Generic CPUID function */
> #if defined(AMD_64)

Both the the immediately preceeding #if's have exactly the same truth.  
One merely tests the result of the other.  I'd like to suggest that you 
get rid of the immediately preceding #endif and #if defined(AMD_64).  

>+
>+#else
>+
>+/* x86 */
>+

For some reason, I found it really difficult to find the matching #if
for that else.  So Please change those lines above to something like:

>+#else /* !defined(AMD_64) */ 

or

>+#else /* not __x86_64__ nor __x86_64 nor _M_AMD64 */
Attachment #299938 - Flags: review?(nelson) → review+
(Assignee)

Comment 62

11 years ago
Wan-Teh,

The sun studio compiler doesn't support inline assembly at this time.
(Assignee)

Comment 63

11 years ago
Wan-Teh,

I added the static.

Nelson,

I added some comments about the #else case. I didn't want to get rid of AMD_64 as it is used in other places.

I checked the patch in to NSS_3_11_BRANCH :

Checking in mpi/mpcpucache.c;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/mpcpucache.c,v  <--  mpcpucache.c
new revision: 1.3.2.1; previous revision: 1.3
done

And to the trunk :

Checking in mpi/mpcpucache.c;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/mpcpucache.c,v  <--  mpcpucache.c
new revision: 1.5; previous revision: 1.4
done

We can either keep this bug open until all the assembly optimizations have gone in, or close this bug and move the optimizations into a separate bug.
Julien, I didn't suggest that you get rid of the definition of AMD_64, but 
only that you combine the block that defines it with the immediately following block that is the first test of that definition.
(Assignee)

Updated

11 years ago
Attachment #299938 - Attachment description: Only prevent cache attacks v2, with feedback from Wan-Teh and Nelson → Only prevent cache attacks v2, with feedback from Wan-Teh and Nelson (checked in to NSS_3_11_BRANCH and trunk)
(Assignee)

Comment 65

11 years ago
This is just to bring the previous patch up to date, given that the cache attacks were taken care of differently in attachment 299938 [details] [diff] [review] . Nothing else was changed.
Attachment #291575 - Attachment is obsolete: true
Posted patch asm patch with x64 asm v6 (obsolete) — Splinter Review
I believe this patch contains most of the changes that I described in my 
comment 46 and comment 47.  It doesn't use any of the psuedo ops that 
facilitate stack unwinding.  Note that this patch was based on Makoto's
"a patch with x64 asm v4" and not on Julien's more recent patch.

Unfortunately, I don't have a build or test environment for this code,
so this code is uncompiled and untested.  

Julien, if you're interested in it, please apply this patch and try to 
build it.  If you get build errors, please use gmake -i or gmake -k 
so that the make will keep going and we'll see all the errors.

Also, please let me know if your <limits.h> file does not contain a 
definition of ULLONG_MAX
Attachment #306836 - Flags: review?(julien.pierre.boogz)
Makoto, I also welcome you to try my latest patch and provide feedback.
(Assignee)

Updated

11 years ago
Attachment #287638 - Flags: review?(m_kato)
I have no permision as reviewer.  (Do you know how to get it?)

You should add unwind code like .pushreg for debugging and etc.

See http://msdn2.microsoft.com/en-us/library/5kbwa7zs(VS.80).aspx for more detail.

(Assignee)

Comment 69

11 years ago
Makoto,

I think you need to file a bug against product: mozilla.org / component : other b.m.o. issues. And ask for the editbugs permission . Cc me and I will vouch for you.
Makoto,  Since I invited you to review this patch, you can just add a comment
with your review summary.  You can also attach a better patch. :-)

Comment 71

11 years ago
Makoto, I just gave you the canconfirm and editbugs permissions.
Attachment #310967 - Flags: review?(julien.pierre.boogz)
Interdiff cannot compare Makoto's latest patch to my previous patch because 
they patch a different set of files.  This is simply my latest patch with 
one file removed.  Hopefully, interdiff will be able to compare this to 
Makoto's latest patch.
Attachment #306836 - Attachment is obsolete: true
Attachment #306836 - Flags: review?(julien.pierre.boogz)
This is yet another attempt to get Interdiff to compare Makoto's latest 
patch to mine.
Attachment #310982 - Attachment is obsolete: true
One more attempt to make interdiff work.
Attachment #310984 - Attachment is obsolete: true
Comment on attachment 310967 [details] [diff] [review]
merge latest patch w/ unwind code

Well, interdiff is just useless.  When I got the same set of files in the same order and the same versions, it gives no output at all.

Makoto, your latest patch uses negative offsets from rsp again, something my previous patch eliminated.
Please explain that.

I believe the pages I cited in comment 46 make it clear that negative rsp offsets cannot be used.  So, please explain your choice to return to negative rsp offsets.
Hmmm.  
Reversing the order in which the two attachments are given to interdiff 
makes it work.  :-/
This link works.
https://bugzilla.mozilla.org/attachment.cgi?oldid=310988&action=interdiff&newid=310967&headers=1
Comment on attachment 310967 [details] [diff] [review]
merge latest patch w/ unwind code

Using Interdiff (at last!) I see that most of the changes in my patch
"asm patch with x64 asm v6" are missing from Makoto's patch 
"merge latest patch w/ unwind code". 

The negative offset elimination work is gone.  
The ALIGN directives are missing.
The duplicated epilogues are back.

Makoto, when you wrote "merge latest patch", did you mean the "latest patch"
before mine?
Comment on attachment 310988 [details] [diff] [review]
asm patch with x64 asm v6 minus mpcpucache file - same versions

I'm moving the review request from the old copy of this patch to this copy, which is comparable to Makoto's via interdiff.
Attachment #310988 - Flags: review?(julien.pierre.boogz)
(Assignee)

Comment 80

11 years ago
I did some performance tests with the various patches. Unfortunately, they do not bring Win64 performance on par with Solaris or Linux x64 for SHA1 or RC4. RC4 on Win64 tops at only about 60% of the Unix platforms, on identical hardware. This will need to be investigated/fixed .
(Assignee)

Comment 81

11 years ago
FYI, the patches improve the RSA 1024 private key performance from 598 to 2320 ops/s. Solaris does 2225 ops. So the RSA is better on Win64.

SHA1 stays unchanged at about 370 MB/s. Solaris does 391 MB/s. So it's close.

RC4 goes from 50 MB/s to 394 MB/s. Solaris is at 621 MB/s, far ahead.
(In reply to comment #81)
> FYI, the patches improve the RSA 1024 private key performance from 598 to 2320
> ops/s. Solaris does 2225 ops. So the RSA is better on Win64.

Please tell which of the above patches you tested, exactly.  Thanks.
(Assignee)

Updated

11 years ago
Duplicate of this bug: 370780

Updated

11 years ago
Blocks: FIPS2008

Updated

11 years ago
Whiteboard: FIPS

Comment 84

11 years ago
If this bug is completed by Nov17 2008 it will be included in the FIPS2008 validation otherwise it will be dropped for a later release.
The patches in this bug have been awaiting review for 8 months.  
It is relevant to softoken.

Evidently I am the only member of the NSS team who knows x86_64 assembly 
code, and I cannot review my own patch under the present rules.
Either we need to find someone who can review this, or we need to change
the rules for x86 and x86_64 patches.

Updated

11 years ago
Attachment #310988 - Flags: superreview?
(Assignee)

Comment 86

11 years ago
Nelson, you are not the only one, but we had previously decided that the optimizations for AMD64 were not an urgent priority before. This is why these optimizations did not get reviewed.

Comment 87

11 years ago
Many of the new Windows PCs and laptops being sold today have
>= 4GB of RAM and are running 64-bit Windows Vista.  So these
patches may become more important soon.

Comment 88

11 years ago
(In reply to comment #87)
> Many of the new Windows PCs and laptops being sold today have
> >= 4GB of RAM and are running 64-bit Windows Vista.  So these
> patches may become more important soon.

Both my work and home PC's have been running Vista x64 for awhile now (although my work pc has 6GB more RAM than my home machine). I think this really hasn't been a priority because the only solid 64-bit browser is IE. And it already performs so poorly in most benchmarks - if it can finish them without crashing - that it doesn't really matter:

IE64: http://tinyurl.com/6p8c7h 22418.0ms +/- 3.3%
IE32: http://tinyurl.com/5te7kx 29706.6ms +/- 3.4%
Fx3:  http://tinyurl.com/5lx458  3204.4ms +/- 2.7%

It still would be nice to have a version of Mozilla's applications that don't have to run under WoW64 emulation though since IE gets about a 25% gain in speed.
Comment on attachment 310988 [details] [diff] [review]
asm patch with x64 asm v6 minus mpcpucache file - same versions

Wan-Teh perhaps you would like to review this patch?
Attachment #310988 - Flags: superreview? → superreview?(wtc)
(Assignee)

Comment 90

11 years ago
Wan-Teh,

I have been running vista x64 for a while at home too (with 8 GB RAM). And I run Windows server 2003 64-bit on one of the boxes in my office as well.
So far there is no 64-bit Firefox, Thunderbird, or Seamonkey. Also, with current CPUs, I don't think these optimizations would be noticeable by a user in the context of a browser application. I think the unoptimized C code compiled to 64-bit actually performs similarly, or faster, than the optimized 32-bit C code. So, a browser user would simply see a lack of improvement going to 64-bit, not a slowdown. In a server, these optimizations are more important, since the crypto performance is expected to be higher when going to 64-bit.

So far, Sun shipped one server product on 64-bit Windows, without those optimizations, but their team has not been complaining about the performance yet. We have informed them that we could improve the performance on 64-bit Windows if needed.

Comment 91

11 years ago
Comment on attachment 310988 [details] [diff] [review]
asm patch with x64 asm v6 minus mpcpucache file - same versions

r=wtc.  Please address the following issues before checking this in,
especially the change to the 'unsigned long inputLen' parameter of
ARCFOUR.

1. coreconf/WIN32.mk

>+ifdef USE_64
>+	AS	= ml64.exe
>+	ASFLAGS = -Cp -Sn -Zi $(INCLUDES)
>+else
> 	AS	= ml.exe
> 	ASFLAGS = -Cp -Sn -Zi -coff $(INCLUDES)
> endif
>+endif

The only difference in ASFLAGS is -coff.  Are you sure you can't use
-coff with ml64.exe?  In nss/lib/freebl/mpi/target.mk you are passing
-coff to ml64.

2. nss/lib/freebl/Makefile

>+# 64-bit Windows
>+#    MPI_SRCS += mpi_x86_asm.c
>+# DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE 
>+#    DEFINES += -DMP_ASSEMBLY_DIV_2DX1D -DMP_USE_UINT_DIGIT -DMP_NO_MP_WORD
>+#    DEFINES += -DMP_USE_UINT_DIGIT
>+    # -DMP_NO_MP_WORD

Remove these commented-out lines.

3. nss/lib/freebl/arcfour.c

>@@ -56,7 +57,7 @@
>  * much longer to write bytes than to write longs, or when using 
>  * assembler code that required it.
>  */
>-#define USE_LONG
>+#define USE_WORD
> #endif

You may need to update the block comment above #define USE_WORD.
For example, should we change "write longs" to "write words"?

>-#if defined(IS_64) && !defined(__sparc) && !defined(NSS_USE_64) 

It seems that the NSS_USE_64 macro defined in nss/lib/freebl/Makefile
is essentially the same as the IS_64 macro defined in prcpucfg.h.
I don't know why we need NSS_USE_64.

Do we need to update the ifdef at
http://mxr.mozilla.org/security/source/security/nss/lib/freebl/arcfour.c#358
to match the above change?

> #if defined(NSS_BEVAND_ARCFOUR)
>-extern void ARCFOUR(RC4Context *cx, unsigned long inputLen, 
>+extern void ARCFOUR(RC4Context *cx, WORD inputLen, 
> 	const unsigned char *input, unsigned char *output);

Is this change correct?  Don't we always use 'unsigned long' for
buffer lengths?  Why don't need need to update arcfour-amd64-gas.s
and arcfour-amd64-sun.s to match the new function prototype?

4. nss/lib/freebl/mpi/Makefile.win

>+ifeq ($(CPU_ARCH),x86_64)
>+AS_SRCS = mpi_x86_64.asm
>+CFLAGS = -O2 -Z7 -MD -W3 -nologo -DXP_PC -UDEBUG -U_DEBUG -DNDEBUG \
>+ -DWIN32 -D_WIN64 -D_AMD64_ -D_M_AMD64 -D_WINDOWS -DWIN95 $(MPICMN)
>+ASFLAGS = -Cp -Sn -Zi -I.
>+else
> #NT
> AS_SRCS = mpi_x86.asm
> MPICMN += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE -DMP_ASSEMBLY_DIV_2DX1D

Nit: The "#NT" comment should be before ifeq ($(CPU_ARCH),x86_64)/

5. nss/lib/freebl/mpi/mp_comba_amd64_masm.asm

Would be nice to document how you generated this file.

6. nss/lib/freebl/mpi/mpi-priv.h

>@@ -261,11 +261,12 @@
> 
> /* c = a * b */
> #define s_mpv_mul_d(a, a_len, b, c) \
>-	((unsigned long*)c)[a_len] = s_mpv_mul_set_vec64(c, a, a_len, b)
>+	((mp_digit *)c)[a_len] = s_mpv_mul_set_vec64(c, a, a_len, b)
> 
> /* c += a * b */
> #define s_mpv_mul_d_add(a, a_len, b, c) \
>-	((unsigned long*)c)[a_len] = s_mpv_mul_add_vec64(c, a, a_len, b)
>+	((mp_digit *)c)[a_len] = s_mpv_mul_add_vec64(c, a, a_len, b)
>+

Since 'c' is already 'mp_digit *', the typecasts aren't necessary.
Attachment #310988 - Flags: superreview?(wtc) → superreview+

Updated

10 years ago
Attachment #310988 - Flags: review?(julien.pierre.boogz) → review?(rrelyea)

Comment 92

10 years ago
Comment on attachment 310988 [details] [diff] [review]
asm patch with x64 asm v6 minus mpcpucache file - same versions

r+ go ahead and check this in if you haven't already.

bob
Attachment #310988 - Flags: review?(rrelyea) → review+
The changes Wan-Teh described in Comment 41 have not yet been made. 
I do not have a Win64 platform on which to work any more.
Julien is out for a while.  Someone with a Win64 platform needs to 
carry this to completion.
er, I meant comment 91, not 41.
(Assignee)

Comment 95

10 years ago
Wan-Teh,

Re: comment 91,

1. No, -coff is an invalid argument for ml64.exe .
So, the change in target.mk is incorrect.

Nelson,

I tried to build your latest patch, but it requires a file called mpcpucache_amd64_masm.asm, which I didn't find attached anywhere in this bug.
I just commented the corresponding entry in the Makefile and I was able to link freebl without it, using the implementation in mpcpucache.c .
(Assignee)

Comment 96

10 years ago
I just did two 64-bit optimized NSS builds on Windows with VC9, one of the trunk, and another of the trunk + the latest patch. Here are some data points on my two 64-bit quad-core machines, running rsaperf -n none -t4 -p 30.

trunk, Intel Q6600 : 1398 ops/s
trunk + patch, Intel Q6600 : 5048 ops/s

trunk, AMD Phenom 9750 : 1508 ops/s
trunk + patch, AMD Phenom 9750 : 6131 ops/s

The results are the reverse of what I got in 32-bits - in this case, AMD beats the Intel for both implementations. The "trunk" implementation is just C implementation - no assembly. The "trunk patch" uses the 64x64 multiply instruction.

Perhaps the Intel would do better if we had an SSE2 implementation in 64 bit mode, but we don't have one to try.
Julien, I'm in the unlpeasent position of having produced a patch for a 
platform that I do no possess and for which I cannot build.  It sounds 
like you have perfected my patch.  Please generate a new patch from your
workarea and attach it here.  

Thanks to your recent research, I _think_ that all that remains to be done 
is to remove the test for Intel CPUs in the function that detects SSE2 
and address Wan-Teh's issues in comment 91.  Would you do those things?
(Assignee)

Comment 98

10 years ago
Nelson,

Re: comment 97,

I will attach a patch for the change that I made, which was very simple. It will take a bit more time for me to make the other changes requested by Wan-Teh.

I just tested the change on the older AMD Opteron 246 2 GHz, and the patch makes the RSA performance go up from 620 ops/s to 2408 ops/s.
(Assignee)

Comment 99

10 years ago
Posted patch Simple change to build (obsolete) — Splinter Review
I could not get the standalone mpi Makefiles to work, but that work shouldn't hold this bug from being fixed.
Attachment #310988 - Attachment is obsolete: true
(Assignee)

Comment 100

10 years ago
Posted patch added missing coreconf files (obsolete) — Splinter Review
Attachment #366727 - Attachment is obsolete: true
Attachment #366728 - Attachment is patch: true
Attachment #366728 - Attachment mime type: application/octet-stream → text/plain
(Assignee)

Updated

10 years ago
Attachment #310967 - Attachment is obsolete: true
Attachment #310967 - Flags: review?(julien.pierre.boogz)
(Assignee)

Comment 101

10 years ago
I compiled Nelsons's arcfour patch on all our supported platforms, and determined that it was also modifying the WORD on the Sparc 64 bits ABI, changing it from "unsigned long" (64 bit) to PRUint32 (32 bit). The result of that change is that the RC4 throughput decreased from 149 MB/s to 92 MB/s, on a dual-CPU Ultrasparc IIIi. I will modify the patch so that it doesn't change the type of WORD on Sparc 64.
(Assignee)

Comment 102

10 years ago
I also checked the RC4 performance on my dual opteron 246 (2 GHz), and the performance on Win64 went up from 54 MB/s to 544 MB/s, ie. about 10x.
Attachment #366728 - Attachment is obsolete: true
Attachment #368163 - Flags: review?(nelson)
(Assignee)

Updated

10 years ago
Attachment #368163 - Attachment is patch: true
Attachment #368163 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 368174 [details] [diff] [review]
Add bignum and RC4 assembly optimizations for x64 on Win64.

Don't include cipher.txt which wasn't supposed to be there.

Checked in to trunk for NSS 3.12.3 .


r=nelson
Attachment #368174 - Flags: review?(nelson) → review+
(Assignee)

Comment 105

10 years ago
I checked in the patch. Thanks to everybody. The bignum and RC4 optimizations for Win64 are now in, to give parity with the other x64 platforms, Solaris and Linux. This is for 3.12.3 .

I'm not changing the original milestone for this bug, which is 3.11.9, since that's the NSS release when we added Win64 support, although without optimizations.

Checking in coreconf/WIN32.mk;
/cvsroot/mozilla/security/coreconf/WIN32.mk,v  <--  WIN32.mk
new revision: 1.30; previous revision: 1.29
done
Checking in nss/lib/freebl/Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v  <--  Makefile
new revision: 1.99; previous revision: 1.98
done
Checking in nss/lib/freebl/arcfour-amd64-masm.asm;
/cvsroot/mozilla/security/nss/lib/freebl/arcfour-amd64-masm.asm,v  <--  arcfour-
amd64-masm.asm
new revision: 1.2; previous revision: 1.1
done
Checking in nss/lib/freebl/arcfour.c;
/cvsroot/mozilla/security/nss/lib/freebl/arcfour.c,v  <--  arcfour.c
new revision: 1.19; previous revision: 1.18
done
Checking in nss/lib/freebl/mpi/Makefile.win;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/Makefile.win,v  <--  Makefile.win
new revision: 1.7; previous revision: 1.6
done
Checking in nss/lib/freebl/mpi/mp_comba_amd64_masm.asm;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/mp_comba_amd64_masm.asm,v  <--  mp_
comba_amd64_masm.asm
new revision: 1.2; previous revision: 1.1
done
Checking in nss/lib/freebl/mpi/mpi-priv.h;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/mpi-priv.h,v  <--  mpi-priv.h
new revision: 1.21; previous revision: 1.20
done
Checking in nss/lib/freebl/mpi/mpi_amd64_masm.asm;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/mpi_amd64_masm.asm,v  <--  mpi_amd6
4_masm.asm
new revision: 1.2; previous revision: 1.1
done
Checking in nss/lib/freebl/mpi/target.mk;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/target.mk,v  <--  target.mk
new revision: 1.6; previous revision: 1.5
done
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
(Assignee)

Updated

10 years ago
Attachment #287638 - Attachment description: Patch to build and pass QA (checked in to NSS_3_11_BRANCH and trunk) → Patch to build and pass QA (checked in to NSS_3_11_BRANCH and trunk). This went into 3.11.9 release.
(Assignee)

Updated

10 years ago
Attachment #368174 - Attachment description: Don't include cipher.txt which wasn't supposed to be there. → Add bignum and RC4 assembly optimizations for x64 on Win64. Don't include cipher.txt which wasn't supposed to be there. Checked in to trunk for NSS 3.12.3 .
Whiteboard: FIPS → FIPS _X86_
You need to log in before you can comment on or make changes to this bug.