Closed Bug 536485 Opened 15 years ago Closed 15 years ago

crash during ssl handshake in [@ intel_aes_decrypt_cbc_256]

Categories

(NSS :: Libraries, defect, P1)

3.12
x86
OpenSolaris
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.12.9

People

(Reporter: nelson, Assigned: alvolkov.bgs)

References

(Blocks 1 open bug, )

Details

(Keywords: crash)

Crash Data

Attachments

(5 files, 2 obsolete files)

We've now seen our first crash in the new code that uses Intel's new AES instructions. Server crashes almost immediately after startup of benchmark test. Stack trace : t@353 (l@353) terminated by signal SEGV (Segmentation Fault) 0xfffffd7ffa49b006: intel_aes_decrypt_cbc_256+0x0226: (dbx) where current thread: t@353 =>[1] intel_aes_decrypt_cbc_256(0x18ed9da8, 0x942e7010, 0x18ed9d88, 0x4800, 0x942f1010, 0x30), at 0xfffffd7ffa49b006 [2] AES_Decrypt(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffa4474fa [3] ssl3_HandleRecord(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb848699 [4] ssl3_GatherCompleteHandshake(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb84a4b3 [5] ssl_GatherRecord1stHandshake(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb84b8f4 [6] ssl_SecureRecv(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb858006 [7] ssl_Recv(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb86187f [8] DaemonSession::run(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffd1c9761 [9] ThreadMain(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffcf79347 [10] _pt_root(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffd434e15 [11] _thrp_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff143244 [12] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff143500 (dbx) x $rdi /4 x 0x0000000018ed9da8: 0x2e75 0x6ca1 0x24a6 0x2246 (dbx) x $rip /16 x 0xfffffd7ffa49b006: intel_aes_decrypt_cbc_256+0x0226: 0x0f66 0xdf38 0x660f 0xef0f 0xf3c8 0x7f0f 0x060c 0x41f3 Crash occurs at this instruction: 1624 .byte 0x66,0x0f,0x38,0xdf,0x0f /* aesdeclast (%rdi), %xmm1 */ Crash is a SIGSEGV. Memory at address in RDI register is readily accessible. Unclear why SEGV occurs. Do we know that this code actually works on any real Intel CPUs? Maybe we need a way to disable/override the test for this instruction set.
Severity: major → critical
Keywords: crash
Summary: crash during ssl handshake in intel_aes_decrypt_cbc_256 → crash during ssl handshake in [@ intel_aes_decrypt_cbc_256]
This bug looks similar to Bug 539744.
duping 539744 to this bug: info from 539744 Chip info: oaf578# isainfo -v 64-bit amd64 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp cx16 mon sse3 pause sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 mon sse3 pause sse2 sse fxsr mmx cmov sep cx8 tsc fpu oaf578# /export/bench/cpu_tools/cpuid CPU: vendor_id = "GenuineIntel" version information (1/eax): processor type = primary processor (0) family = Intel Pentium Pro/II/III/Celeron, AMD Athlon/Duron, Cyrix M2, VIA C3 (6) model = 0xc (12) stepping id = 0x0 (0) extended family = 0x0 (0) extended model = 0x2 (2) (simple synth) = Intel Pentium II / Pentium III / Pentium M / Celeron / Mobile Celeron / Celeron M / Core Solo / Core Duo / Core 2 / Core 2 Extreme Processor / Xeon Processor LV / Xeon Processor 5100 (unknown model) miscellaneous (1/ebx): process local APIC physical ID = 0x2 (2) cpu count = 0x20 (32) CLFLUSH line size = 0x8 (8) brand index = 0x0 (0) brand id = 0x00 (0): unknown feature information (1/edx): x87 FPU on chip = true virtual-8086 mode enhancement = true debugging extensions = true page size extensions = true time stamp counter = true RDMSR and WRMSR support = true physical address extensions = true machine check exception = true CMPXCHG8B inst. = true APIC on chip = true SYSENTER and SYSEXIT = true memory type range registers = true PTE global bit = true machine check architecture = true conditional move/compare instruction = true page attribute table = true page size extension = true processor serial number = false CLFLUSH instruction = true debug store = true thermal monitor and clock ctrl = true MMX Technology = true FXSAVE/FXRSTOR = true SSE extensions = true SSE2 extensions = true self snoop = true hyper-threading / multi-core supported = true therm. monitor = true IA64 = false pending break event = true feature information (1/ecx): PNI/SSE3: Prescott New Instructions = true MONITOR/MWAIT = true CPL-qualified debug store = true VMX: virtual machine extensions = true Secure mode extensions = true Enhanced Intel SpeedStep Technology = true thermal monitor 2 = true Supplemental SSE3 instructions = true context ID: adaptive or shared L1 data = false cmpxchg16b available = true xTPR disable = true processor serial number = true direct cache access = true SSE4.1 instructions = true SSE4.2 instructions = true POPCNT instructions = true AES instructions = true cache and TLB information (2): 0x5a: unknown 0x03: data TLB: 4K pages, 4-way, 64 entries 0x55: unknown 0xff: unknown 0xb2: unknown 0xf0: 64 byte prefetching 0xca: unknown processor serial number: 0002-06C0-0000-0000-0000-0000 deterministic cache parameters (4): cache type = no more caches (0) cache level = 0x0 (0) self-initializing cache level = false fully associative cache = false extra threads sharing this cache = 0x0 (0) extra processor cores on this die = 0x0 (0) system coherency line size = 0x0 (0) physical line partitions = 0x0 (0) ways of associativity = 0x0 (0) number of sets - 1 (s) = 0 MONITOR/MWAIT (5): smallest monitor-line size (bytes) = 0x40 (64) largest monitor-line size (bytes) = 0x40 (64) enum of Monitor-MWAIT exts supported = true supports intrs as break-event for MWAIT = true number of C0 sub C-states using MWAIT = 0x0 (0) number of C1 sub C-states using MWAIT = 0x2 (2) number of C2 sub C-states using MWAIT = 0x1 (1) number of C3 sub C-states using MWAIT = 0x1 (1) number of C4 sub C-states using MWAIT = 0x0 (0) Thermal and Power Management Features (6): digital thermometer = true operating point protection = true digital thermometer thresholds = 0x7 (7) ACNT/MCNT supported performance measure = true Architecture Performance Monitoring Features (0xa/eax): version ID = 0x3 (3) number of counters per logical processor = 0x4 (4) bit width of counter = 0x30 (48) length of EBX bit vector = 0x7 (7) Architecture Performance Monitoring Features (0xa/ebx): core cycle event not available = false instruction retired event not available = false reference cycles event not available = true last-level cache ref event not available = false last-level cache miss event not avail = false branch inst retired event not available = false branch mispred retired event not avail = false 0x0000000b: eax=0x00000000 ebx=0x00000000 ecx=0x0000002c edx=0x00000002 extended feature flags (0x80000001/edx): SYSCALL and SYSRET instructions = false execution disable = true 64-bit extensions technology available = true Intel feature flags (0x80000001/ecx): LAHF/SAHF supported in 64-bit mode = true brand = "Genuine Intel(R) CPU 000 @ 3.07GHz" L1 TLB/cache information: 2M/4M pages & L1 TLB (0x80000005/eax): instruction # entries = 0x0 (0) instruction associativity = 0x0 (0) data # entries = 0x0 (0) data associativity = 0x0 (0) L1 TLB/cache information: 4K pages & L1 TLB (0x80000005/ebx): instruction # entries = 0x0 (0) instruction associativity = 0x0 (0) data # entries = 0x0 (0) data associativity = 0x0 (0) L1 data cache information (0x80000005/ecx): line size (bytes) = 0x0 (0) lines per tag = 0x0 (0) associativity = 0x0 (0) size (Kb) = 0x0 (0) L1 instruction cache information (0x80000005/ecx): line size (bytes) = 0x0 (0) lines per tag = 0x0 (0) associativity = 0x0 (0) size (Kb) = 0x0 (0) L2 TLB/cache information: 2M/4M pages & L2 TLB (0x80000006/eax): instruction # entries = 0x0 (0) instruction associativity = L2 off (0) data # entries = 0x0 (0) data associativity = L2 off (0) L2 TLB/cache information: 4K pages & L2 TLB (0x80000006/ebx): instruction # entries = 0x0 (0) instruction associativity = L2 off (0) data # entries = 0x0 (0) data associativity = L2 off (0) L2 unified cache information (0x80000006/ecx): line size (bytes) = 0x40 (64) lines per tag = 0x0 (0) associativity = 8-way (6) size (Kb) = 0x100 (256) Advanced Power Management Features (0x80000007/edx): temperature sensing diode = 0x0 (0) frequency ID (FID) control = 0x0 (0) voltage ID (VID) control = 0x0 (0) thermal trip (TTP) = 0x0 (0) thermal monitor (TM) = 0x0 (0) software thermal control (STC) = 0x0 (0) TscInvariant = 0x1 (1) Physical Address and Linear Address Size (0x80000008/eax): maximum physical address = 0x28 (40) maximum linear address = 0x30 (48) Logical CPU cores (0x80000008/ecx): number of logical CPU cores - 1 = 0x0 (0) ApicIdCoreIdSize = 0x0 (0) (multi-processing synth): hyper-threaded (t=32) (synth) = Intel Pentium II / Pentium III / Pentium M / Celeron / Mobile Celeron / Celeron M / Core Solo / Core Duo / Core 2 / Core 2 Extreme Processor / Xeon Processor LV / Xeon Processor 5100 (unknown model) -------------------------------------------------------------- t@353 (l@353) terminated by signal SEGV (Segmentation Fault) 0xfffffd7ffa49b006: intel_aes_decrypt_cbc_256+0x0226: ***ERROR--unknown op code*** Crash seems to happen at 0xfffffd7ffa49b006 0xfffffd7ffa49b001: intel_aes_decrypt_cbc_256+0x0221: ***ERROR--unknown op code*** 0xfffffd7ffa49b004: intel_aes_decrypt_cbc_256+0x0224: fmulp %st,%st(2) 0xfffffd7ffa49b006: intel_aes_decrypt_cbc_256+0x0226: ***ERROR--unknown op code*** 0xfffffd7ffa49b009: intel_aes_decrypt_cbc_256+0x0229: fisttp (%rdi) 0xfffffd7ffa49b00b: intel_aes_decrypt_cbc_256+0x022b: pxor %xmm0,%xmm1 0xfffffd7ffa49b00f: intel_aes_decrypt_cbc_256+0x022f: movdqu %xmm1,(%rsi,%rax) 0xfffffd7ffa49b014: intel_aes_decrypt_cbc_256+0x0234: movdqu (%r8,%rax),%xmm0 0xfffffd7ffa49b01a: intel_aes_decrypt_cbc_256+0x023a: addq $0x0000000000000010,%rax 0xfffffd7ffa49b01e: intel_aes_decrypt_cbc_256+0x023e: cmpq %rax,%r9 0xfffffd7ffa49b021: intel_aes_decrypt_cbc_256+0x0241: jne intel_aes_decrypt_cbc_256+0x1d3 [ 0xfffffd7ffa49afb3, .-0x6e ] 0xfffffd7ffa49b023: intel_aes_decrypt_cbc_256+0x0243: movdqu %xmm0,(%rdx) 0xfffffd7ffa49b027: intel_aes_decrypt_cbc_256+0x0247: xorl %eax,%eax 0xfffffd7ffa49b029: intel_aes_decrypt_cbc_256+0x0249: ret ------------------------------------------------ From source code : intel_aes_decrypt_ecb_256: ... .byte 0x66,0x0f,0x38,0xde,0xcb /* aesdec %xmm3, %xmm1 */ .byte 0x66,0x0f,0x38,0xde,0xca /* aesdec %xmm2, %xmm1 */ .byte 0x66,0x0f,0x38,0xdf,0x0f /* aesdeclast (%rdi), %xmm1 */ pxor %xmm0, %xmm1 movdqu %xmm1, (%rsi, %rax) movdqu (%r8, %rax), %xmm0 addq $16, %rax cmpq %rax, %r9 jne 4b 5: movdqu %xmm0, (%rdx) xor %eax, %eax ret .size intel_aes_decrypt_cbc_256, .-intel_aes_decrypt_cbc_256 ----------------------------------- The problem is the first operation to the AESDECLAST instruction is NOT aligned. That will generate an exception. It must be aligned on a 0 mod 16 address: intel_aes_decrypt_ecb_256 + 0x226: .byte 0x66,0x0f,0x38,0xdf,0x0f /* aesdeclast (%rdi), %xmm1 */ The value of the %rdi register is 0x0000000018ed9da8. This is not aligned and will generate a SEGV. To fix, please ensure the imputs are aligned 0 mod 16. Reproducible: Always Steps to Reproduce: Apply an SSL load to a webserver using NSS libraries on a Westmere class system, eg: sun webserver 7.0 update 6 Actual Results: Libs seg fault
Status: NEW → ASSIGNED
Assignee: nobody → glen.beasley
Priority: -- → P1
Depends on: 459248
That's a first: an Intel CPU segfaulting for an unaligned address.
Attached patch Fix alignment issue in AES 256 (obsolete) — Splinter Review
Here's a patch to handle the alignment issue. It's not so nice but unfortunately all 16 registers are used. We have to reload something. Not really tested since I don't have the hardware but from inspection it should work. And re comment #3: it not surprising. Intel MMX/SSE instructions generally have alignment requirements unless they explicitly handle unaligned accesses.
Blocks: 527659
Comment on attachment 422022 [details] [diff] [review] Fix alignment issue in AES 256 I applied the patch, but now the code is crashing t@2 (l@2) signal SEGV (no mapping at the fault address) in intel_aes_encrypt_cbc_256 at 0xfffffd7ff60d8ddc 0xfffffd7ff60d8ddc: intel_aes_encrypt_cbc_256+0x0084: pxor (%rdi),%xmm1 Current function is AES_Encrypt 1161 input, inputLen, blocksize); current thread: t@2 [1] intel_aes_encrypt_cbc_256(0x592f68, 0x5aa2b5, 0x592f48, 0x40, 0x5aa2b5, 0x40), at 0xfffffd7ff60d8ddc =>[2] AES_Encrypt(cx = 0x592f38, output = 0x5aa2b5 "^T", outputLen = 0xfffffd7ffee5bae0, maxOutputLen = 64U, input = 0x5aa2b5 "^T", inputLen = 64U), line 1161 in "rijndael.c" [3] AES_Encrypt(cx = 0x592f38, output = 0x5aa2b5 "^T", outputLen = 0xfffffd7ffee5bae0, maxOutputLen = 64U, input = 0x5aa2b5 "^T", inputLen = 64U), line 562 in "loader.c" [4] ssl3_CompressMACEncryptRecord(ss = 0x5908b0, type = content_handshake, pIn = 0x5b32d0 "^T", contentLen = 40U), line 2131 in "ssl3con.c" [5] ssl3_SendRecord(ss = 0x5908b0, type = content_handshake, pIn = 0x5b32d0 "^T", nIn = 40, flags = 0), line 2230 in "ssl3con.c" [6] ssl3_FlushHandshake(ss = 0x5908b0, flags = 0), line 2408 in "ssl3con.c" [7] ssl3_SendFinished(ss = 0x5908b0, flags = 0), line 7848 in "ssl3con.c"
Attachment #422022 - Flags: review-
Updated patch which fixes the encryption code.
Attachment #422022 - Attachment is obsolete: true
This patch simply turns off intel AES HW accelleration. The current patch still crashes due to alignment issues. selfserv crashes: t@4 (l@4) signal SEGV (no mapping at the fault address) in intel_aes_decrypt_cbc_256 at 0xfffffd7ff5d38f70 0xfffffd7ff5d38f70: intel_aes_decrypt_cbc_256+0x00f8: pxor (%r8,%rax),%xmm4 Current function is AES_Decrypt 1192 input, inputLen, blocksize); (dbx) where current thread: t@4 [1] intel_aes_decrypt_cbc_256(0xfffffd7ffee4b0d0, 0x57c020, 0xfffffd7ffee4b0b0, 0x2c0, 0x5aeb54, 0x2c0), at 0xfffffd7ff5d38f70 =>[2] AES_Decrypt(cx = 0xfffffd7ffee4b0a0, output = 0x57c020 "", outputLen = 0x698480, maxOutputLen = 704U, input = 0x5aeb54 "\xf6\xb4U\xd3\xa6\xa1\xf6\x86Y+\xcfdN\xbf^O^L\xaeB\x90^]^E3M'|8G1\xda{\x9b\xcaV\xc6\xe2\xf6^Fc\xd54^]\xc0\x958?c^]\xdd\xfb^P\xfe\xe0|^]\xbe\x9e\xc3\xf7^BZ\xe9\xd1^DLi\xf9\x87\x89aJ\xf4%\xd1\xe7^[\x97\xc9~\xad^[Y\xa4\xfd\xac_\xb6T*^T^[\xe6Y%o\xd1\x93\xf0+Vd\xf6{`^^\x8ew\xcb\xca\xbf1\xed\xadNg>+b\xc7;\xd4<\x93\x8f\x84", inputLen = 704U), line 1192 in "rijndael.c" Ulrich is on vacation, but before he left we discussed not using this latest patch, and instead having AES_AllocateContext call a new PORT_ZNew function that returns a value aligned to 16 bytes.
Attachment #424056 - Flags: review?(rrelyea)
Comment on attachment 424056 [details] [diff] [review] turn off intel HW accelleration for now [Checkin: Comment 10] r+ rrelyea
Attachment #424056 - Flags: review?(rrelyea) → review+
checked into trunk: Checking in Makefile; /cvsroot/mozilla/security/nss/lib/freebl/Makefile,v <-- Makefile new revision: 1.109; previous revision: 1.108 done check into SOFTOKEN_3_13_BRANCH Checking in Makefile; /cvsroot/mozilla/security/nss/lib/freebl/Makefile,v <-- Makefile new revision: 1.108.4.1; previous revision: 1.108 done
Blocks: FIPS2010
Blocks: 540986
Found one more issuer in the assemller code: function key_expansion256 does two expansions in one call leading to buffer expanded key buffer overflow. Didn't see it in testing since AES_Context key expansion buffer is twice bigger then needed. /* * This magic number is (Nb_max * (Nr_max + 1)) * where Nb_max is the maximum block size in 32-bit words, * Nr_max is the maximum number of rounds, which is Nb_max + 6 */ #define RIJNDAEL_MAX_EXP_KEY_SIZE (8 * 15) ^^^^^^^^ I think is should be 4 /* AESContextStr * * Values which maintain the state for Rijndael encryption/decryption. * * iv - initialization vector for CBC mode * Nb - the number of bytes in a block, specified by user * Nr - the number of rounds, specified by a table * expandedKey - the round keys in 4-byte words, the length is Nr * Nb * worker - the encryption/decryption function to use with this context */ struct AESContextStr { unsigned int Nb; unsigned int Nr; AESFunc *worker; unsigned char iv[RIJNDAEL_MAX_BLOCKSIZE]; PRUint32 expandedKey[RIJNDAEL_MAX_EXP_KEY_SIZE]; void *ztContext; };
Assignee: gbmozilla → alexei.volkov.bugs
Attachment #433680 - Flags: review?
Attachment #433680 - Flags: review? → review?(rrelyea)
Comment on attachment 433680 [details] [diff] [review] Do not do unnecessary 16th round of key expantion r+ with some comments... 1) It might be better to replicate the last step in another function rather than take the jz overhead on each call. (the function is relatively small). bob
Attachment #433680 - Flags: review?(rrelyea) → review+
Attachment #424056 - Attachment description: turn off intel HW accelleration for now → turn off intel HW accelleration for now [Checkin: Comment 10]
This patch also includes fixes from attachment 423044 [details] [diff] [review] "Fix alignment issue in AES 256"
Attachment #447632 - Flags: review?
Attachment #447632 - Flags: review? → review?(rrelyea)
Comment on attachment 447632 [details] [diff] [review] Fix alignment problems Do we need this patch for re-validation? Or the decision is to feed only aligned buffers?
Attachment #447632 - Flags: superreview?(nelson)
Attachment #447632 - Flags: superreview?(nelson)
Comment on attachment 447632 [details] [diff] [review] Fix alignment problems So the plan is for Bob to still look at the patch and try it out. For me to create an alternative code path that will work with only aligned buffers. Removing super review request from Nelson since the patch need to have more attention.
Comment on attachment 433680 [details] [diff] [review] Do not do unnecessary 16th round of key expantion Alexei, your checked in this patch on 2010-05-07 11:46 on the NSS trunk (CVS revision 1.3). Please also check this in on the SOFTOKEN_3_13_BRANCH, which is the effective tip of the Softoken. Note: what you checked in in CVS revision 1.3 looks very different from this patch.
I've merged the SOFTOKEN_3_13_BRANCH onto the NSS trunk. SOFTOKEN_3_13_BRANCH is now dead. Alexei, please ignore my previous comment.
See Also: → 536485
I took NSS 3.12.8 Source code from mozilla.org: wget --no-check-certificate https://ftp.mozilla.org/pub/mozilla.org/security/nss/releases/NSS_3_12_8_RTM/src/nss-3.12.8-with-nspr-4.8.6.tar.gz Setting up environmnet : JAVA_HOME="/share/builds/components/jdk/1.5.0_09/SunOS_x86" JAVA_HOME64="/share/builds/components/jdk/1.5.0_09/SunOS_amd64" CVSROOT=:pserver:anonymous@cvs-mirror.mozilla.org:/cvsroot PATH="${JAVA_HOME}/bin:/usr/bin:/usr/ccs/bin:/tools/ns/bin" GMAKE="/tools/ns/bin/gmake-3.80" export JAVA_HOME JAVA_HOME64 PATH GMAKE CVSROOT export CC=/h/iws-files/export/i386/opt/SUNWspro/SOS12U1/bin/cc export PATH=/h/iws-files/export/i386/opt/SUNWspro/SOS12U1/bin/:$PATH Code Changes : Modified mozilla/security/nss/lib/freebl/Makefile and removed # before intel line $diff nss-3.12.8/mozilla/security/nss/lib/freebl/Makefile nss-3.12.8/mozilla/security/nss/lib/freebl/Makefile.ORIGINAL 171,172c171,172 < DEFINES += -DUSE_HW_AES < ASFILES += intel-aes.s --- > # DEFINES += -DUSE_HW_AES > # ASFILES += intel-aes.s 418,419c418,419 < DEFINES += -DUSE_HW_AES < ASFILES += intel-aes.s --- > # DEFINES += -DUSE_HW_AES > # ASFILES += intel-aes.s cd /export/home/meena/west/nss-3.12.8/mozilla/security/nss/cmd/shlibsign Modified Makefile put # before 3 lines thats causing complation problems Building : cd mozilla/security/nss gmake BUILD_OPT=1 USE_64=1 nss_build_all When I used this NSS on oaf578, it doesn't crash any more. Also we saw huge performance improvements(20-40%) in Web Server. It would be nice if this bug gets fixed.
History of these files ~~~~~~~~~~~~~~~~~ NSS version crashes Changes in security/nss/lib/freebl/Makefile and intel.s NSS 3.12.3 (Web Server 7.0u6) - crashes NSS 3.12.6.2 (WS7.0u9) - no core dumps - intel-aes.s is commented out in Makefile NSS 3.12.7 & 3.12.8 - no core dumps - intel-aes.s changed as given below. But intel-aes.s is still commented out in Makefile. Makefile was changed in NSS 3.12.6 as shown below : + diff nss-3.12.5/mozilla/security/nss/lib/freebl/Makefile nss-3.12.6/mozilla/security/nss/lib/freebl/Makefile 166,167c166,167 < DEFINES += -DUSE_HW_AES < ASFILES += intel-aes.s --- > # DEFINES += -DUSE_HW_AES > # ASFILES += intel-aes.s 413,414c413,414 < DEFINES += -DUSE_HW_AES < ASFILES += intel-aes.s --- > # DEFINES += -DUSE_HW_AES > # ASFILES += intel-aes.s intel-aes.s file was changed in NSS 3.12.7 as shown below : + diff nss-3.12.6/mozilla/security/nss/lib/freebl/intel-aes.s nss-3.12.7/mozilla/security/nss/lib/freebl/ 1119c1119,1126 < call key_expansion256 --- > pxor %xmm6, %xmm6 > pshufd $0xff, %xmm2, %xmm2 > shufps $0x10, %xmm1, %xmm6 > pxor %xmm6, %xmm1 > shufps $0x8c, %xmm1, %xmm6 > pxor %xmm2, %xmm1 > pxor %xmm6, %xmm1 > movdqu %xmm1, (%rsi) 1177c1184,1191 < call key_expansion256 --- > pxor %xmm6, %xmm6 > pshufd $0xff, %xmm2, %xmm2 > shufps $0x10, %xmm1, %xmm6 > pxor %xmm6, %xmm1 > shufps $0x8c, %xmm1, %xmm6 > pxor %xmm2, %xmm1 > pxor %xmm6, %xmm1 > movdqu %xmm1, (%rsi) 1193a1208 > 1196d1210 <
Change log shows these two check-ins are made for this bug : 1.109 glen.beasley%sun.com 2010-01-28 15:10 536485 turn off intel aes HW accelleration r=bob relyea For more details refer http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=Makefile&branch=&root=/cvsroot&subdir=mozilla/security/nss/lib/freebl&command=DIFF_FRAMESET&rev1=1.108&rev2=1.109 1.3 alexei.volkov.bugs%sun.com 2010-05-07 11:46 536485: Do not do unnecessary 16th round of key expantion. r=rrelyea For more details refer http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=intel-aes.s&branch=&root=/cvsroot&subdir=mozilla/security/nss/lib/freebl&command=DIFF_FRAMESET&rev1=1.2&rev2=1.3
Target Milestone: --- → 3.12.9
Comment on attachment 447632 [details] [diff] [review] Fix alignment problems r+ with the following changes... The c++ like comments after the the command need to be C style comments (they fail to compile on Linux). + leaq 16(%rdi), %rdx // iv + leaq 48(%rdi), %rdi // expanded key + movdqu (%rdx), %xmm0 // iv + movdqu (%rdi), %xmm2 // first key block + movdqu 160(%rdi), %xmm12 // last key block xorl %eax, %eax+2: movdqu (%r8, %rax), %xmm3 // 1st data block + movdqu 16(%r8, %rax), %xmm4 // 2d data block I'd also like to see a patch to turn off AES HW with an environment variable.
Attachment #447632 - Flags: review?(rrelyea) → review+
Here's the performance results: Without Alexi's alignment safe patch: > bltest -E -m aes_cbc -i x -k key128 -v iv0 -p 100000 -o /dev/null # mode in symmkey opreps cxreps context op time(sec) thrgput aes_cbc_e 1Gb 128 100T 0 0.000 2359.000 2.359 662Mb > bltest -E -m aes_cbc -i x -k key192 -v iv0 -p 100000 -o /dev/null # mode in symmkey opreps cxreps context op time(sec) thrgput aes_cbc_e 1Gb 192 100T 0 0.000 2778.000 2.778 562Mb > bltest -E -m aes_cbc -i x -k key256 -v iv0 -p 100000 -o /dev/null # mode in symmkey opreps cxreps context op time(sec) thrgput aes_cbc_e 1Gb 256 100T 0 0.000 3238.000 3.238 482Mb With Alexi's alignment safe patch: > bltest -E -m aes_cbc -i x -k key0 -v iv0 -p 100000 -o /dev/null # mode in symmkey opreps cxreps context op time(sec) thrgput aes_cbc_e 1Gb 128 100T 0 0.000 2359.000 2.359 662Mb > bltest -E -m aes_cbc -i x -k key192 -v iv0 -p 100000 -o /dev/null # mode in symmkey opreps cxreps context op time(sec) thrgput aes_cbc_e 1Gb 192 100T 0 0.000 2779.000 2.779 562Mb > bltest -E -m aes_cbc -i x -k key256 -v iv0 -p 100000 -o /dev/null # mode in symmkey opreps cxreps context op time(sec) thrgput aes_cbc_e 1Gb 256 100T 0 0.000 3198.000 3.199 488Mb Without using intel aes hw: > bltest -E -m aes_cbc -i x -k key128 -v iv0 -p 100000 -o /dev/null # mode in symmkey opreps cxreps context op time(sec) thrgput aes_cbc_e 1Gb 128 100T 0 0.000 30984.000 30.985 50Mb > bltest -E -m aes_cbc -i x -k key192 -v iv0 -p 100000 -o /dev/null # mode in symmkey opreps cxreps context op time(sec) thrgput aes_cbc_e 1Gb 192 100T 0 0.000 35395.000 35.396 44Mb > bltest -E -m aes_cbc -i x -k key256 -v iv0 -p 100000 -o /dev/null # mode in symmkey opreps cxreps context op time(sec) thrgput aes_cbc_e 1Gb 256 100T 0 0.000 39762.000 39.763 39Mb
Clearly there is no reason not to take Alexi's patch as is. (HW gives us a wapping 10x performance win on raw AES, that has got to give us several percent on AES SSL performance). bob
I also don't see a penalty when using unaligned buffers and Alexi's code (same command with -1 7 and -2 5, for instance). bob
Alexi's patch checked into the tip/trunk... Checking in intel-aes.s; /cvsroot/mozilla/security/nss/lib/freebl/intel-aes.s,v <-- intel-aes.s new revision: 1.4; previous revision: 1.3 done
This patch undoes Glen's patch to disable HW_AES. It also allows us to turn it off in the field if we run into other problems. bob
Attachment #491369 - Flags: review?(wtc)
Attachment #491369 - Flags: superreview?(alexei.volkov.bugs)
Comment on attachment 491369 [details] [diff] [review] Turn HW AES back on. Include an Environment variable to disable it. I suggest two changes. >+ char *use_hw = getenv("NSS_DONT_USE_HW_AES"); The local variable name should have a negative sense, such as "disable_hw" or "dont_use_hw". I suggest naming the environment variable NSS_DISABLE_HW_AES for consistency with other NSS environment variables, unless you really want it to match the -DUSE_HW_AES macro.
I'm ameniable to both suggested changes. bob
Attachment #491369 - Attachment is obsolete: true
Attachment #491382 - Flags: review?(wtc)
Attachment #491369 - Flags: superreview?(alexei.volkov.bugs)
Attachment #491369 - Flags: review?(wtc)
Comment on attachment 491382 [details] [diff] [review] V2: Turn HW AES back on. Include an Environment variable to disable it. r=wtc. Thanks.
Attachment #491382 - Flags: review?(wtc) → review+
Alexi's patch checked into NSS 3.12 BRANCH. Checking in intel-aes.s; /cvsroot/mozilla/security/nss/lib/freebl/intel-aes.s,v <-- intel-aes.s new revision: 1.3.2.1; previous revision: 1.3 done
NSS 3.12 BRANCH Checking in Makefile; /cvsroot/mozilla/security/nss/lib/freebl/Makefile,v <-- Makefile new revision: 1.113.2.2; previous revision: 1.113.2.1 done Checking in rijndael.c; /cvsroot/mozilla/security/nss/lib/freebl/rijndael.c,v <-- rijndael.c new revision: 1.25.6.1; previous revision: 1.25 done NSS TIP/TRUNK Checking in Makefile; /cvsroot/mozilla/security/nss/lib/freebl/Makefile,v <-- Makefile new revision: 1.116; previous revision: 1.115 done Checking in rijndael.c; /cvsroot/mozilla/security/nss/lib/freebl/rijndael.c,v <-- rijndael.c new revision: 1.26; previous revision: 1.25 done
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: 4_3.12.10
Target Milestone: 3.12.9 → 3.12.10
Restoring target milestone. The patch was checked into 3.12.9 already.. bob
Target Milestone: 3.12.10 → 3.12.9
Whiteboard: 4_3.12.10
Crash Signature: [@ intel_aes_decrypt_cbc_256]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: