Closed
Bug 979132
Opened 10 years ago
Closed 10 years ago
AES-NI is broken on Solaris x64
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
3.16.1
People
(Reporter: julien.pierre, Assigned: julien.pierre)
Details
Attachments
(2 files, 4 obsolete files)
8.40 KB,
patch
|
julien.pierre
:
review-
wtc
:
checked-in+
|
Details | Diff | Splinter Review |
593 bytes,
patch
|
julien.pierre
:
review+
wtc
:
checked-in+
|
Details | Diff | Splinter Review |
This build error has been there for a while. I think 3.14 but possibly before. I believe AES-NI used to work, but it no longer does. The builds fails as follows in 3.15.5 : "rijndael.c", line 981: warning: implicit function declaration: __asm__ "rijndael.c", line 981: syntax error before or at: : "rijndael.c", line 985: cannot recover from previous errors cc: acomp failed for rijndael.c The only way to build freebl currently on this platform is to comment out the following lines in lib/freebl/Makefile in the Solaris x86 block : DEFINES += -DUSE_HW_AES ASFILES += intel-aes.s intel-gcm.s But this means there will be no AES-NI acceleration.
Assignee | ||
Comment 1•10 years ago
|
||
There is another build problem in lib/freebl . sha-fast-amd64-sun.s does not include the symbol SHA1_EndRaw . The workaround for that one is to use sha_fast.c and leave SHA_SRCS unchanged. Or someone needs to regenerate sha-fast-amd64-sun.s . I believe it was previously generated with the output of gcc.
Comment 2•10 years ago
|
||
Chris and I previously discussed the SHA1_EndRaw build problem in bug 822365 comment 93 and bug 822365 comment 94. If the Solaris compiler supports inline assembly code, the __asm__ code in rijndael.c should be easy to fix.
Assignee | ||
Comment 3•10 years ago
|
||
Wan-Teh, I don't believe the compiler currently being used for the official NSS builds at Oracle supports any inline assembly. I think the rijndael code needs to be moved to a .s . Re: SHA, I haven't benchmarked the latest studio compiler against gcc. The official builds may still be using an old compiler, though.
Assignee | ||
Comment 4•10 years ago
|
||
I did some benchmarking for the SHA1. gcc is still faster than studio for this algorithm, unfortunately. Other algs are faster in studio. But that issue should probably be discussed in a separate bug. For AES-NI, it looks like the code may have been broken when AES-GCM was added. I don't see how to make intel-gcm-wrap.c buildable with studio. It's using some attributes that are proprietary to gcc. There are 2 defines in freebl/Makefile, one for USE_HW_AES, and the other for INTEL_GCM . But the code in rijndael.c doesn't have any #ifdef INTEL_GCM . All the calls are #ifdef USE_HW_AES, even for GCM . There appears to have been some intention to allow the Intel GCM code to be compiled conditionally, separately from the existing AES-NI code, but it was never tested, and in fact can't be done with the current state of the code.
Comment 5•10 years ago
|
||
I'm using Solaris Studio 12.3, and this change seemed sufficient to build intel-gcm-wrap.c on Solaris x64: #if defined(__INTEL_COMPILER) #include <ia32intrin.h> -#elif defined(__GNUC__) +#elif defined(__GNUC__) || defined(__SUNPRO_C) #include <emmintrin.h> #include <tmmintrin.h> #endif But I couldn't get rijndael.c or intel-gcm.c to build properly.
Comment 6•10 years ago
|
||
The check_xcr0_ymm() function in rijndael.c is only used by the Intel AES-GCM assembly code, so it seems reasonable to move check_xcr0_ymm() to intel-gcm.s, even though this function could be useful for other assembly code that uses the AVX instructions. I compiled the check_xcr0_ymm() function using gcc -S -O3 -c and then copied the generated assembly code to intel-gcm.s.
Assignee | ||
Comment 7•10 years ago
|
||
Chris, (In reply to Chris Newman from comment #5) > I'm using Solaris Studio 12.3, and this change seemed sufficient to build > intel-gcm-wrap.c on Solaris x64: > > #if defined(__INTEL_COMPILER) > #include <ia32intrin.h> > -#elif defined(__GNUC__) > +#elif defined(__GNUC__) || defined(__SUNPRO_C) > #include <emmintrin.h> > #include <tmmintrin.h> > #endif > > But I couldn't get rijndael.c or intel-gcm.c to build properly. I assume you meant you couldn't get intel-gcm.s to build properly. I think the only solution for now is to have the GCM assembly conditionally built only for Linux, or Solaris with gcc, and the code that uses it conditionally compiled. I am working on a patch to do this.
Assignee | ||
Comment 8•10 years ago
|
||
Chris, re: intel-gcm-wrap.c , even with your fix, I got this : cc: Warning: Option -mssse3 passed to ld, if ld is invoked, ignored otherwise "intel-gcm-wrap.c", line 116: ube: error: _mm_shuffle_epi8 intrinsic requires at least -xarch=ssse3. cc: ube failed for intel-gcm-wrap.c Line 116 is as follows : _mm_storeu_si128((__m128i*)gcm->CTR, _mm_shuffle_epi8(_mm_add_epi32(ONE, _mm_shuffle_epi8(_mm_loadu_si128((__m128i*)gcm->CTR), BSWAP_MASK)), BSWAP_MASK)); We can't switch to -xarch=ssse3 because not all CPUs have these instructions. The bits would not run on every system. If I comment this line out, it compiles. The "proper" Solaris solution is to build separate freebl shared libraries for these cases with different instructions, some with -msse3 and some without, like we do for Sparc and VIS/etc. Even so, we would still have to figure out how to build intel-gcm.s with studio, which seems like a much bigger problem. I don't think switching to gcc for production is an option, because it introduced a dependency on libgcc_s.so which is not on every solaris system (in particular, it was not not on a machine with AES-NI that I was testing - but not building on, since my main host doesn't support AES-NI instructions). So it seems for now the acceleration for GCM has to be left out for the Solaris/studio build.
Comment 9•10 years ago
|
||
(In reply to Julien Pierre from comment #8) > > We can't switch to -xarch=ssse3 because not all CPUs have these > instructions. This is fine on Linux x64, so I think it should also be fine on Solaris x64. The code checks instruction support at run time. I know the problem you talked about. It is a problem for Solaris SPARC binaries. A binary containing unsupported instructions can't even be loaded. If you have enough time, you can write a script to convert intel-gcm.s to the Sun assembler format. You can also look into assemblying it with GNU assembler, and then disassemblying the object file with a Sun tool. Maybe the author of this blog post can help you: https://blogs.oracle.com/DanX/entry/solaris_x86_64_bit_assembly
Assignee | ||
Comment 10•10 years ago
|
||
Wan-Teh, The problem with loading binaries with unsupported instructions exists not just on Sparc, but also on Solaris x64 . While experimenting with my patch, I was able to build a libfreebl3.so binary that wouldn't load on the host machine I was compiling on. This happened when trying to compile rijndael.c with the xgetbv inline assembly with Studio 12 u3 . The fact that libfreebl3.so couldn't be loaded locally caused the NSS build to fail later in shlibsign . I don't believe that Linux has a mechanism for flagging binaries with requiring certain instruction sets, and that is the reason why the approach taken here - to check for instruction support, and then only invoking those instructions conditionally, works on Linux. It's not so on Solaris, however. The Studio compiler marks binaries with the set of instructions required by the object files it contains. gcc 4.5.2 on Solaris doesn't appear to do that - it marks everything as just "AMD64 version 1" regardless of what the instructions used. One could consider this either a feature or bug of gcc. In theory, it should be possible to "hack" the header in the libfreebl3.so binary produced by studio to require only the AMD64 basic set of instructions. [jrpierre@slc01heg SunOS5.11_i86pc_gcc_64_OPT.OBJ]>cd lib [jrpierre@slc01heg lib]>file *.so libfreebl3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available libgcc_s.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped libnspr4.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available libnss3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available libnssckbi.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available libnssdbm3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available libnssutil3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available libplc4.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available libplds4.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available libsmime3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available libsoftokn3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available libsqlite3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available libssl3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available [jrpierre@slc01heg lib]>cd .. [jrpierre@slc01heg SunOS5.11_i86pc_gcc_64_OPT.OBJ]>cd .. [jrpierre@slc01heg dist]>cd SunOS5.11_i86pc_64_OPT.OBJ/lib [jrpierre@slc01heg lib]>!file file *.so libfreebl3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped libnspr4.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV FPU], dynamically linked, not stripped libnss3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped libnssckbi.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE CMOV], dynamically linked, not stripped libnssdbm3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped libnssutil3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped libplc4.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 CMOV], dynamically linked, not stripped libplds4.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [CMOV], dynamically linked, not stripped libsmime3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped libsoftokn3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped libsqlite3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV FPU], dynamically linked, not stripped libssl3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped [jrpierre@slc01heg lib]> In the above case, the studio libfreebl3.so binary was built with AES-NI support from intel-aes.s . Yet the binary was not flagged with "aes". The reason is that the we use the opcode with "byte" rather than the mnemonics. If we used the mnemonics, then the binary would end up getting flagged "aes" and wouldn't run on my host which doesn't support it . I may consider converting intel-gcm.s, but I'm not sure I can get management approval to do this much work on NSS as it is not my main duty. It looks like not many CPUs support avx. The host I had located for testing that supported aes-ni didn't support avx. I just found another machine with avx support. Converting intel-gcm.s seems like a lot of work, especially if I have to replace the mnemonics with byte opcodes. It almost makes more sense to fix coreconf to allow to build assembly files with gcc, and the C files with studio. Hopefully that would do away with the libgcc_s.so dependency that is required when building everything with gcc. And there is still the issue of the CPU instruction flags in intel-gcm-wrap.c .
Assignee | ||
Comment 11•10 years ago
|
||
This patch includes the changes by Wan-Teh. It restores AES-NI hardware acceleration on Solaris with both gcc and studio compilers. AES-GCM hardware acceleration builds with gcc, but is disabled with Studio. I have also fixed sha-fast-amd64.s to include the missing SHA1_EndRaw function.
Assignee: nobody → julien.pierre
Attachment #8386154 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8387987 -
Flags: review?(wtc)
Assignee | ||
Comment 12•10 years ago
|
||
Comment on attachment 8387987 [details] [diff] [review] Patch for Solaris x64 I am seeing crashes in SHA1_End with this patch on 2 highly parallel machines. I will generate a new patch that fixes the AES-NI only.
Attachment #8387987 -
Flags: review?(wtc) → review-
Assignee | ||
Comment 13•10 years ago
|
||
This fix passes QA and includes SHA-1 thanks to Chris Newman. Still no AES-GCM acceleration. I think that will have to be in a separate bug.
Attachment #8387987 -
Attachment is obsolete: true
Attachment #8389612 -
Flags: review?(wtc)
Comment 14•10 years ago
|
||
Comment on attachment 8389612 [details] [diff] [review] Updated fix Review of attachment 8389612 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for the patch. I should be able to check this in next week. I'd like to suggest some changes. I can take care of these, but I need some info from you. ::: lib/freebl/intel-gcm.h @@ +46,5 @@ > [1] Shay Gueron, Michael E. Kounavis: Intel® Carry-Less Multiplication > Instruction and its Usage for Computing the GCM Mode */ > + > +/* Checks if XMM and YMM state are enabled in XCR0. */ > +PRBool check_xcr0_ymm(void); We don't need to move the check_xcr0_ymm function into intel-gcm.s now, right? ::: lib/freebl/sha-fast-amd64-sun.s @@ +2114,5 @@ > + > + > +.globl SHA1_EndRaw > + .type SHA1_EndRaw, @function > +SHA1_EndRaw: 1. Please document how you generated the patch for this file. You can just add a comment to the bug report. 2. The original code has a .align 16 directive before each .globl directive. We should also add one here. @@ +2120,5 @@ > + movq 72(%rdi), %rax > +/APP > +/ 81 "sha_fast.h" 1 > + bswap %eax > +/ 0 "" 2 The original code doesn't have the / 81 "sha_fast.h" 1 and / 0 "" 2 lines. Perhaps they should be manually removed?
Assignee | ||
Comment 15•10 years ago
|
||
Wan-Teh 1) re: moving check_xcr0_ymm to intel-gcm.s file, you are right that it's not strictly required with this patch. However, it's still a helpful change towards adding the intel-gcm support to Solaris later on. Right now, the official Solaris build is still using studio 10, and that inline assembler code doesn't compile with studio 10. We may switch to studio12u3, but it's not done yet. I think with the assembly code in a separate .s file, it would be possible to easily change the xgetbv call to a "DB" with the hex opcode and fool the linker into not flagging the libfreebl3.so binary with something non-standard. I am not sure if the same trick would work inside the inline assembly with studio12 because I have not tested it yet. The rest of intel-gcm.s would have to be fixed to use DB also to get GCM to work with the Solaris assembler. If we decide against using the "DB" tricks, then we would need to build multiple freebl shared libs for Solaris x64. At that point all non-standard instructions would need to be in separate .s files and not in the common .c file . It's very unlikely we will add the multiple shared libs, though, because of the sustaining status of NSS at Oracle. 2) for sha-fast-amd64-sun.s, I used gcc --save-temps -O2 -Wall -Wno-format -Werror-implicit-function-declaration -Wno-switch -fPIC -DSVR4 -DSYSV -D__svr4 -D__svr4__ -DSOLARIS -D_REENTRANT -m64 -DXP_UNIX -DSHLIB_SUFFIX=\"so\" -DSHLIB_PREFIX=\"lib\" -DSHLIB_VERSION=\"3\" -DSOFTOKEN_SHLIB_VERSION=\"3\" -DRIJNDAEL_INCLUDE_TABLES -UDEBUG -DNDEBUG -DUSE_UTIL_DIRECTLY -DNO_NSPR_10_SUPPORT -DNSS_USE_64 -DNSS_X86_OR_X64 -DNSS_X64 -DINTEL_GCM -DNSS_BEVAND_ARCFOUR -DMPI_AMD64 -DMP_ASSEMBLY_MULTIPLY -DNSS_USE_COMBA -DMP_CHAR_STORE_SLOW -DMP_IS_LITTLE_ENDIAN -DUSE_HW_AES -DMP_API_COMPATIBLE -I../../../dist/SunOS5.11_i86pc_gcc_64_OPT.OBJ/include -I../../../dist/public/nss -I../../../dist/private/nss -Impi -Iecl sha_fast.c Then I just copied the code for SHA1_EndRaw . I only had to rename one of the labels that was duplicate. The change to SHA1_End was spotted by Chris Newman by observing the difference in gcc output with the content of the .s file. We might just as well have regenerated the whole file, but I did not think that was necessary. We know the existing implementation in the .s file performs OK, and this saves time in benchmarking. 2) I will add the .align 16 and remove these unnecessary comments
Assignee | ||
Comment 16•10 years ago
|
||
Attachment #8389612 -
Attachment is obsolete: true
Attachment #8389612 -
Flags: review?(wtc)
Attachment #8392605 -
Flags: review?(wtc)
Comment 17•10 years ago
|
||
Julien: I edited your patch and checked it in: https://hg.mozilla.org/projects/nss/rev/a906f71ff383 I removed the check_xcr0_ymm change. I need someone familiar with x86_64 assembly code to review that change. I am especially not sure whether I need a .align directive. Re: encoding the xgetbv opcode in bytes: it seems that we would also need to encode the opcodes of all the AVX instructions used in intel-gcm.s, right?
Attachment #8392605 -
Attachment is obsolete: true
Attachment #8392605 -
Flags: review?(wtc)
Attachment #8392631 -
Flags: checked-in+
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Priority: -- → P1
Resolution: --- → FIXED
Target Milestone: --- → 3.16.1
Assignee | ||
Comment 18•10 years ago
|
||
Wan-Teh, Sorry you had to remove that change. I don't think it's a very good idea for this case to keep the inline assembly in the rijndael.c file just for this test function when the implementation of the cipher already requires an intel-gcm.s file anyway. It just doesn't make much sense. I can test just that check_xcr0_ymm function as I have Solaris hosts both with and without AVX to test it on, if it would help convince you that it works properly. You could also compare the code to the output of the inline assembly with gcc --save-temps . Re: the opcodes, yes, we would need to do the same trick for all the AVX instructions in intel-gcm.s . This would probably require a separate intel-gcm-solaris.s file since we can't do #ifdef in the .asm files, and you probably don't want to use the hack for the Linux build. I don't have time to work on this new assembler source file for now, but I may get back to it later.
Assignee | ||
Comment 19•10 years ago
|
||
Comment on attachment 8392631 [details] [diff] [review] Patch as checked in Review of attachment 8392631 [details] [diff] [review]: ----------------------------------------------------------------- Wan-Teh, Your edit to the patch is incorrect. The check_xcr0_ymm function needs to be in an #ifdef INTEL_GCM, not #ifdef USE_HW_AES .
Attachment #8392631 -
Flags: review-
Assignee | ||
Updated•10 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 20•10 years ago
|
||
Thank you for catching my mistake. Please review this patch.
Attachment #8393122 -
Flags: review?(julien.pierre)
Assignee | ||
Updated•10 years ago
|
Attachment #8393122 -
Flags: review?(julien.pierre) → review+
Comment 21•10 years ago
|
||
Comment on attachment 8393122 [details] [diff] [review] Conditionally compile check_xcr0_ymm if INTEL_GCM is defined Patch checked in: https://hg.mozilla.org/projects/nss/rev/7a11b6c73fd0
Attachment #8393122 -
Flags: checked-in+
Updated•10 years ago
|
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•