Closed Bug 979132 Opened 9 years ago Closed 9 years ago

AES-NI is broken on Solaris x64

Categories

(NSS :: Libraries, defect, P1)

3.15.5
x86_64
Solaris
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.16.1

People

(Reporter: julien.pierre, Assigned: julien.pierre)

Details

Attachments

(2 files, 4 obsolete files)

This build error has been there for a while. I think 3.14 but possibly before.

I believe AES-NI used to work, but it no longer does.
The builds fails as follows in 3.15.5 :

"rijndael.c", line 981: warning: implicit function declaration: __asm__
"rijndael.c", line 981: syntax error before or at: :
"rijndael.c", line 985: cannot recover from previous errors
cc: acomp failed for rijndael.c

The only way to build freebl currently on this platform is to comment out the following lines in lib/freebl/Makefile in the Solaris x86 block :
DEFINES += -DUSE_HW_AES
ASFILES += intel-aes.s intel-gcm.s 

But this means there will be no AES-NI acceleration.
There is another build problem in lib/freebl .
sha-fast-amd64-sun.s
does not include the symbol SHA1_EndRaw .
The workaround for that one is to use sha_fast.c and leave SHA_SRCS unchanged.
Or someone needs to regenerate sha-fast-amd64-sun.s . I believe it was previously generated with the output of gcc.
Chris and I previously discussed the SHA1_EndRaw build problem
in bug 822365 comment 93 and bug 822365 comment 94.

If the Solaris compiler supports inline assembly code, the
__asm__ code in rijndael.c should be easy to fix.
Wan-Teh,

I don't believe the compiler currently being used for the official NSS builds at Oracle supports any inline assembly. I think the rijndael code needs to be moved to a .s .

Re: SHA, I haven't benchmarked the latest studio compiler against gcc. The official builds may still be using an old compiler, though.
I did some benchmarking for the SHA1. gcc is still faster than studio for this algorithm, unfortunately. Other algs are faster in studio. But that issue should probably be discussed in a separate bug.

For AES-NI, it looks like the code may have been broken when AES-GCM was added.
I don't see how to make intel-gcm-wrap.c buildable with studio. It's using some attributes that are proprietary to gcc.

There are 2 defines in freebl/Makefile, one for USE_HW_AES, and the other for INTEL_GCM .
But the code in rijndael.c doesn't have any #ifdef INTEL_GCM .
All the calls are #ifdef USE_HW_AES, even for GCM .

There appears to have been some intention to allow the Intel GCM code to be compiled conditionally, separately from the existing AES-NI code, but it was never tested, and in fact can't be done with the current state of the code.
I'm using Solaris Studio 12.3, and this change seemed sufficient to build intel-gcm-wrap.c on Solaris x64:

 #if defined(__INTEL_COMPILER)
 #include <ia32intrin.h> 
-#elif defined(__GNUC__)
+#elif defined(__GNUC__) || defined(__SUNPRO_C)
 #include <emmintrin.h>
 #include <tmmintrin.h>
 #endif

But I couldn't get rijndael.c or intel-gcm.c to build properly.
The check_xcr0_ymm() function in rijndael.c is only used
by the Intel AES-GCM assembly code, so it seems reasonable
to move check_xcr0_ymm() to intel-gcm.s, even though this
function could be useful for other assembly code that uses
the AVX instructions.

I compiled the check_xcr0_ymm() function using gcc -S -O3 -c
and then copied the generated assembly code to intel-gcm.s.
Chris,

(In reply to Chris Newman from comment #5)
> I'm using Solaris Studio 12.3, and this change seemed sufficient to build
> intel-gcm-wrap.c on Solaris x64:
> 
>  #if defined(__INTEL_COMPILER)
>  #include <ia32intrin.h> 
> -#elif defined(__GNUC__)
> +#elif defined(__GNUC__) || defined(__SUNPRO_C)
>  #include <emmintrin.h>
>  #include <tmmintrin.h>
>  #endif
> 
> But I couldn't get rijndael.c or intel-gcm.c to build properly.

I assume you meant you couldn't get intel-gcm.s to build properly.
I think the only solution for now is to have the GCM assembly conditionally built only for Linux, or Solaris with gcc, and the code that uses it conditionally compiled. I am working on a patch to do this.
Chris,

re: intel-gcm-wrap.c , even with your fix, I got this :

cc: Warning: Option -mssse3 passed to ld, if ld is invoked, ignored otherwise
"intel-gcm-wrap.c", line 116: ube: error: _mm_shuffle_epi8 intrinsic requires at least -xarch=ssse3.
cc: ube failed for intel-gcm-wrap.c

Line 116 is as follows :
_mm_storeu_si128((__m128i*)gcm->CTR, _mm_shuffle_epi8(_mm_add_epi32(ONE, _mm_shuffle_epi8(_mm_loadu_si128((__m128i*)gcm->CTR), BSWAP_MASK)), BSWAP_MASK));

We can't switch to -xarch=ssse3 because not all CPUs have these instructions. The bits would not run on every system. If I comment this line out, it compiles. 
The "proper" Solaris solution is to build separate freebl shared libraries for these cases with different instructions, some with -msse3 and some without, like we do for Sparc and VIS/etc.

Even so, we would still have to figure out how to build intel-gcm.s with studio, which seems like a much bigger problem.

I don't think switching to gcc for production is an option, because it introduced a dependency on libgcc_s.so which is not on every solaris system (in particular, it was not not on a machine with AES-NI that I was testing - but not building on, since my main host doesn't support AES-NI instructions).

So it seems for now the acceleration for GCM has to be left out for the Solaris/studio build.
(In reply to Julien Pierre from comment #8)
>
> We can't switch to -xarch=ssse3 because not all CPUs have these
> instructions.

This is fine on Linux x64, so I think it should also be fine on
Solaris x64. The code checks instruction support at run time.
I know the problem you talked about. It is a problem for Solaris
SPARC binaries. A binary containing unsupported instructions can't
even be loaded.

If you have enough time, you can write a script to convert intel-gcm.s
to the Sun assembler format. You can also look into assemblying
it with GNU assembler, and then disassemblying the object file with
a Sun tool. Maybe the author of this blog post can help you:
https://blogs.oracle.com/DanX/entry/solaris_x86_64_bit_assembly
Wan-Teh,

The problem with loading binaries with unsupported instructions exists not just on Sparc, but also on Solaris x64 . While experimenting with my patch, I was able to build a libfreebl3.so binary that wouldn't load on the host machine I was compiling on. This happened when trying to compile rijndael.c with the xgetbv inline assembly with Studio 12 u3 . The fact that libfreebl3.so couldn't be loaded locally caused the NSS build to fail later in shlibsign .

I don't believe that Linux has a mechanism for flagging binaries with requiring certain instruction sets, and that is the reason why the approach taken here - to check for instruction support, and then only invoking those instructions conditionally, works on Linux.

It's not so on Solaris, however. The Studio compiler marks binaries with the set of instructions required by the object files it contains. gcc 4.5.2 on Solaris doesn't appear to do that - it marks everything as just "AMD64 version 1" regardless of what the instructions used. One could consider this either a feature or bug of gcc. In theory, it should be possible to "hack" the header in the libfreebl3.so binary produced by studio to require only the AMD64 basic set of instructions.

[jrpierre@slc01heg SunOS5.11_i86pc_gcc_64_OPT.OBJ]>cd lib
[jrpierre@slc01heg lib]>file *.so
libfreebl3.so:  ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available
libgcc_s.so:    ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped
libnspr4.so:    ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available
libnss3.so:     ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available
libnssckbi.so:  ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available
libnssdbm3.so:  ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available
libnssutil3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available
libplc4.so:     ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available
libplds4.so:    ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available
libsmime3.so:   ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available
libsoftokn3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available
libsqlite3.so:  ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available
libssl3.so:     ELF 64-bit LSB dynamic lib AMD64 Version 1, dynamically linked, not stripped, no debugging information available
[jrpierre@slc01heg lib]>cd ..
[jrpierre@slc01heg SunOS5.11_i86pc_gcc_64_OPT.OBJ]>cd ..
[jrpierre@slc01heg dist]>cd SunOS5.11_i86pc_64_OPT.OBJ/lib
[jrpierre@slc01heg lib]>!file
file *.so
libfreebl3.so:  ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped
libnspr4.so:    ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV FPU], dynamically linked, not stripped
libnss3.so:     ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped
libnssckbi.so:  ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE CMOV], dynamically linked, not stripped
libnssdbm3.so:  ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped
libnssutil3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped
libplc4.so:     ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 CMOV], dynamically linked, not stripped
libplds4.so:    ELF 64-bit LSB dynamic lib AMD64 Version 1 [CMOV], dynamically linked, not stripped
libsmime3.so:   ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped
libsoftokn3.so: ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped
libsqlite3.so:  ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV FPU], dynamically linked, not stripped
libssl3.so:     ELF 64-bit LSB dynamic lib AMD64 Version 1 [SSE2 SSE CMOV], dynamically linked, not stripped
[jrpierre@slc01heg lib]>

In the above case, the studio libfreebl3.so binary was built with AES-NI support from intel-aes.s .
Yet the binary was not flagged with "aes". The reason is that the we use the opcode with "byte" rather than the mnemonics. If we used the mnemonics, then the binary would end up getting flagged "aes" and wouldn't run on my host which doesn't support it .

I may consider converting intel-gcm.s, but I'm not sure I can get management approval to do this much work on NSS as it is not my main duty. 

It looks like not many CPUs support avx. The host I had located for testing that supported aes-ni didn't support avx. I just found another machine with avx support.
Converting intel-gcm.s seems like a lot of work, especially if I have to replace the mnemonics with byte opcodes. It almost makes more sense to fix coreconf to allow to build assembly files with gcc, and the C files with studio. Hopefully that would do away with the libgcc_s.so dependency that is required when building everything with gcc. And there is still the issue of the CPU instruction flags in intel-gcm-wrap.c .
Attached patch Patch for Solaris x64 (obsolete) — Splinter Review
This patch includes the changes by Wan-Teh.

It restores AES-NI hardware acceleration on Solaris with both gcc and studio compilers.

AES-GCM hardware acceleration builds with gcc, but is disabled with Studio.

I have also fixed sha-fast-amd64.s to include the missing SHA1_EndRaw function.
Assignee: nobody → julien.pierre
Attachment #8386154 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8387987 - Flags: review?(wtc)
Comment on attachment 8387987 [details] [diff] [review]
Patch for Solaris x64

I am seeing crashes in SHA1_End with this patch on 2 highly parallel machines.
I will generate a new patch that fixes the AES-NI only.
Attachment #8387987 - Flags: review?(wtc) → review-
Attached patch Updated fix (obsolete) — Splinter Review
This fix passes QA and includes SHA-1 thanks to Chris Newman.
Still no AES-GCM acceleration. I think that will have to be in a separate bug.
Attachment #8387987 - Attachment is obsolete: true
Attachment #8389612 - Flags: review?(wtc)
Comment on attachment 8389612 [details] [diff] [review]
Updated fix

Review of attachment 8389612 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the patch. I should be able to check
this in next week. I'd like to suggest some changes.
I can take care of these, but I need some info from
you.

::: lib/freebl/intel-gcm.h
@@ +46,5 @@
>     [1] Shay Gueron, Michael E. Kounavis: Intel® Carry-Less Multiplication 
>         Instruction and its Usage for Computing the GCM Mode                */
> +
> +/* Checks if XMM and YMM state are enabled in XCR0. */
> +PRBool check_xcr0_ymm(void);

We don't need to move the check_xcr0_ymm function
into intel-gcm.s now, right?

::: lib/freebl/sha-fast-amd64-sun.s
@@ +2114,5 @@
> +
> +
> +.globl SHA1_EndRaw
> +        .type   SHA1_EndRaw, @function
> +SHA1_EndRaw:

1. Please document how you generated the patch
for this file. You can just add a comment to
the bug report.

2. The original code has a
  .align 16
directive before each .globl directive. We should
also add one here.

@@ +2120,5 @@
> +        movq    72(%rdi), %rax
> +/APP
> +/ 81 "sha_fast.h" 1
> +        bswap %eax
> +/ 0 "" 2

The original code doesn't have the
  / 81 "sha_fast.h" 1
and
  / 0 "" 2
lines. Perhaps they should be manually removed?
Wan-Teh

1) re: moving check_xcr0_ymm to intel-gcm.s file, you are right that it's not strictly required with this patch.

However, it's still a helpful change towards adding the intel-gcm support to Solaris later on.

Right now, the official Solaris build is still using studio 10, and that inline assembler code doesn't compile with studio 10. We may switch to studio12u3, but it's not done yet.

I think with the assembly code in a separate .s file, it would be possible to easily change the xgetbv call to a "DB" with the hex opcode and fool the linker into not flagging the libfreebl3.so binary with something non-standard.  I am not sure if the same trick would work inside the inline assembly with studio12 because I have not tested it yet.

The rest of intel-gcm.s would have to be fixed to use DB also to get GCM to work with the Solaris assembler.

If we decide against using the "DB" tricks, then we would need to build multiple freebl shared libs for Solaris x64. At that point all non-standard instructions would need to be in separate .s files and not in the common .c file . It's very unlikely we will add the multiple shared libs, though, because of the sustaining status of NSS at Oracle.

2) for sha-fast-amd64-sun.s, I used 
gcc --save-temps -O2 -Wall -Wno-format -Werror-implicit-function-declaration -Wno-switch -fPIC -DSVR4 -DSYSV -D__svr4 -D__svr4__ -DSOLARIS -D_REENTRANT  -m64 -DXP_UNIX -DSHLIB_SUFFIX=\"so\" -DSHLIB_PREFIX=\"lib\" -DSHLIB_VERSION=\"3\" -DSOFTOKEN_SHLIB_VERSION=\"3\" -DRIJNDAEL_INCLUDE_TABLES -UDEBUG -DNDEBUG -DUSE_UTIL_DIRECTLY -DNO_NSPR_10_SUPPORT -DNSS_USE_64 -DNSS_X86_OR_X64 -DNSS_X64 -DINTEL_GCM -DNSS_BEVAND_ARCFOUR -DMPI_AMD64 -DMP_ASSEMBLY_MULTIPLY -DNSS_USE_COMBA -DMP_CHAR_STORE_SLOW -DMP_IS_LITTLE_ENDIAN -DUSE_HW_AES -DMP_API_COMPATIBLE -I../../../dist/SunOS5.11_i86pc_gcc_64_OPT.OBJ/include -I../../../dist/public/nss -I../../../dist/private/nss -Impi -Iecl  sha_fast.c

Then I just copied the code for SHA1_EndRaw . I only had to rename one of the labels that was duplicate.

The change to SHA1_End was spotted by Chris Newman by observing the difference in gcc output with the content of the .s file.

We might just as well have regenerated the whole file, but I did not think that was necessary.
We know the existing implementation in the .s file performs OK, and this saves time in benchmarking.

2) I will add the .align 16 and remove these unnecessary comments
Attached patch Fix with Wan-Teh's comments (obsolete) — Splinter Review
Attachment #8389612 - Attachment is obsolete: true
Attachment #8389612 - Flags: review?(wtc)
Attachment #8392605 - Flags: review?(wtc)
Julien:

I edited your patch and checked it in:
https://hg.mozilla.org/projects/nss/rev/a906f71ff383

I removed the check_xcr0_ymm change. I need someone familiar
with x86_64 assembly code to review that change. I am especially
not sure whether I need a .align directive.

Re: encoding the xgetbv opcode in bytes: it seems that we would
also need to encode the opcodes of all the AVX instructions used
in intel-gcm.s, right?
Attachment #8392605 - Attachment is obsolete: true
Attachment #8392605 - Flags: review?(wtc)
Attachment #8392631 - Flags: checked-in+
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Priority: -- → P1
Resolution: --- → FIXED
Target Milestone: --- → 3.16.1
Wan-Teh,

Sorry you had to remove that change. I don't think it's a very good idea for this case to keep the inline assembly in the rijndael.c file just for this test function when the implementation of the cipher already requires an intel-gcm.s file anyway. It just doesn't make much sense.

I can test just that check_xcr0_ymm function as I have Solaris hosts both with and without AVX to test it on, if it would help convince you that it works properly. You could also compare the code to the output of the inline assembly with gcc --save-temps .

Re: the opcodes, yes, we would need to do the same trick for all the AVX instructions in intel-gcm.s .
This would probably require a separate intel-gcm-solaris.s file since we can't do #ifdef in the .asm files, and you probably don't want to use the hack for the Linux build. I don't have time to work on this new assembler source file for now, but I may get back to it later.
Comment on attachment 8392631 [details] [diff] [review]
Patch as checked in

Review of attachment 8392631 [details] [diff] [review]:
-----------------------------------------------------------------

Wan-Teh,

Your edit to the patch is incorrect.
The check_xcr0_ymm function needs to be in an #ifdef INTEL_GCM, not #ifdef USE_HW_AES .
Attachment #8392631 - Flags: review-
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Thank you for catching my mistake. Please review this patch.
Attachment #8393122 - Flags: review?(julien.pierre)
Attachment #8393122 - Flags: review?(julien.pierre) → review+
Comment on attachment 8393122 [details] [diff] [review]
Conditionally compile check_xcr0_ymm if INTEL_GCM is defined

Patch checked in: https://hg.mozilla.org/projects/nss/rev/7a11b6c73fd0
Attachment #8393122 - Flags: checked-in+
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.