Closed Bug 536485 Opened 10 years ago Closed 9 years ago

crash during ssl handshake in [@ intel_aes_decrypt_cbc_256]

Categories

(NSS :: Libraries, defect, P1, critical)

3.12
x86
OpenSolaris
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.12.9

People

(Reporter: nelson, Assigned: alvolkov.bgs)

References

(Blocks 1 open bug, )

Details

(Keywords: crash)

Crash Data

Attachments

(5 files, 2 obsolete files)

We've now seen our first crash in the new code that uses Intel's new AES instructions.  Server crashes almost immediately after startup of benchmark test.

Stack trace :
t@353 (l@353) terminated by signal SEGV (Segmentation Fault)
0xfffffd7ffa49b006: intel_aes_decrypt_cbc_256+0x0226:  
(dbx) where
current thread: t@353
=>[1] intel_aes_decrypt_cbc_256(0x18ed9da8, 0x942e7010, 0x18ed9d88, 0x4800, 0x942f1010, 0x30), at 0xfffffd7ffa49b006 
  [2] AES_Decrypt(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffa4474fa 
  [3] ssl3_HandleRecord(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb848699 
  [4] ssl3_GatherCompleteHandshake(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb84a4b3 
  [5] ssl_GatherRecord1stHandshake(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb84b8f4 
  [6] ssl_SecureRecv(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb858006 
  [7] ssl_Recv(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffb86187f 
  [8] DaemonSession::run(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffd1c9761 
  [9] ThreadMain(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffcf79347 
  [10] _pt_root(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffd434e15 
  [11] _thrp_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff143244 
  [12] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff143500

(dbx) x $rdi /4 x
0x0000000018ed9da8: 0x2e75 0x6ca1 0x24a6 0x2246
(dbx) x $rip /16 x
0xfffffd7ffa49b006: intel_aes_decrypt_cbc_256+0x0226:    
                    0x0f66 0xdf38 0x660f 0xef0f 0xf3c8 0x7f0f 0x060c 0x41f3

Crash occurs at this instruction:

1624 .byte 0x66,0x0f,0x38,0xdf,0x0f /* aesdeclast (%rdi), %xmm1 */

Crash is a SIGSEGV.  Memory at address in RDI register is readily accessible.
Unclear why SEGV occurs.  

Do we know that this code actually works on any real Intel CPUs?
Maybe we need a way to disable/override the test for this instruction set.
Severity: major → critical
Keywords: crash
Summary: crash during ssl handshake in intel_aes_decrypt_cbc_256 → crash during ssl handshake in [@ intel_aes_decrypt_cbc_256]
This bug looks similar to Bug 539744.
duping 539744 to this bug:

info from 539744

Chip info:

oaf578# isainfo -v
64-bit amd64 applications
    pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp cx16 mon sse3 pause sse2 
    sse fxsr mmx cmov amd_sysc cx8 tsc fpu 
32-bit i386 applications
    pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 mon sse3 pause 
    sse2 sse fxsr mmx cmov sep cx8 tsc fpu 


oaf578# /export/bench/cpu_tools/cpuid 
CPU:
   vendor_id = "GenuineIntel"
   version information (1/eax):
      processor type  = primary processor (0)
      family          = Intel Pentium Pro/II/III/Celeron, AMD Athlon/Duron,
Cyrix M2, VIA C3 (6)
      model           = 0xc (12)
      stepping id     = 0x0 (0)
      extended family = 0x0 (0)
      extended model  = 0x2 (2)
      (simple synth)  = Intel Pentium II / Pentium III / Pentium M / Celeron /
Mobile Celeron / Celeron M / Core Solo / Core Duo / Core 2 / Core 2 Extreme
Processor / Xeon Processor LV / Xeon Processor 5100 (unknown model)
   miscellaneous (1/ebx):
      process local APIC physical ID = 0x2 (2)
      cpu count                      = 0x20 (32)
      CLFLUSH line size              = 0x8 (8)
      brand index                    = 0x0 (0)
   brand id = 0x00 (0): unknown
   feature information (1/edx):
      x87 FPU on chip                        = true
      virtual-8086 mode enhancement          = true
      debugging extensions                   = true
      page size extensions                   = true
      time stamp counter                     = true
      RDMSR and WRMSR support                = true
      physical address extensions            = true
      machine check exception                = true
      CMPXCHG8B inst.                        = true
      APIC on chip                           = true
      SYSENTER and SYSEXIT                   = true
      memory type range registers            = true
      PTE global bit                         = true
      machine check architecture             = true
      conditional move/compare instruction   = true
      page attribute table                   = true
      page size extension                    = true
      processor serial number                = false
      CLFLUSH instruction                    = true
      debug store                            = true
      thermal monitor and clock ctrl         = true
      MMX Technology                         = true
      FXSAVE/FXRSTOR                         = true
      SSE extensions                         = true
      SSE2 extensions                        = true
      self snoop                             = true
      hyper-threading / multi-core supported = true
      therm. monitor                         = true
      IA64                                   = false
      pending break event                    = true
   feature information (1/ecx):
      PNI/SSE3: Prescott New Instructions    = true
      MONITOR/MWAIT                          = true
      CPL-qualified debug store              = true
      VMX: virtual machine extensions        = true
      Secure mode extensions                 = true
      Enhanced Intel SpeedStep Technology    = true
      thermal monitor 2                      = true
      Supplemental SSE3 instructions         = true
      context ID: adaptive or shared L1 data = false
      cmpxchg16b available                   = true
      xTPR disable                           = true
      processor serial number                = true
      direct cache access                    = true
      SSE4.1 instructions                    = true
      SSE4.2 instructions                    = true
      POPCNT instructions                    = true
      AES instructions                       = true
   cache and TLB information (2):
      0x5a: unknown
      0x03: data TLB: 4K pages, 4-way, 64 entries
      0x55: unknown
      0xff: unknown
      0xb2: unknown
      0xf0: 64 byte prefetching
      0xca: unknown
   processor serial number: 0002-06C0-0000-0000-0000-0000
   deterministic cache parameters (4):
      cache type                        = no more caches (0)
      cache level                       = 0x0 (0)
      self-initializing cache level     = false
      fully associative cache           = false
      extra threads sharing this cache  = 0x0 (0)
      extra processor cores on this die = 0x0 (0)
      system coherency line size        = 0x0 (0)
      physical line partitions          = 0x0 (0)
      ways of associativity             = 0x0 (0)
      number of sets - 1 (s)            = 0
   MONITOR/MWAIT (5):
      smallest monitor-line size (bytes)      = 0x40 (64)
      largest monitor-line size (bytes)       = 0x40 (64)
      enum of Monitor-MWAIT exts supported    = true
      supports intrs as break-event for MWAIT = true
      number of C0 sub C-states using MWAIT   = 0x0 (0)
      number of C1 sub C-states using MWAIT   = 0x2 (2)
      number of C2 sub C-states using MWAIT   = 0x1 (1)
      number of C3 sub C-states using MWAIT   = 0x1 (1)
      number of C4 sub C-states using MWAIT   = 0x0 (0)
   Thermal and Power Management Features (6):
      digital thermometer                     = true
      operating point protection              = true
      digital thermometer thresholds          = 0x7 (7)
      ACNT/MCNT supported performance measure = true
   Architecture Performance Monitoring Features (0xa/eax):
      version ID                               = 0x3 (3)
      number of counters per logical processor = 0x4 (4)
      bit width of counter                     = 0x30 (48)
      length of EBX bit vector                 = 0x7 (7)
   Architecture Performance Monitoring Features (0xa/ebx):
      core cycle event not available           = false
      instruction retired event not available  = false
      reference cycles event not available     = true
      last-level cache ref event not available = false
      last-level cache miss event not avail    = false
      branch inst retired event not available  = false
      branch mispred retired event not avail   = false
   0x0000000b: eax=0x00000000 ebx=0x00000000 ecx=0x0000002c edx=0x00000002
   extended feature flags (0x80000001/edx):
      SYSCALL and SYSRET instructions        = false
      execution disable                      = true
      64-bit extensions technology available = true
   Intel feature flags (0x80000001/ecx):
      LAHF/SAHF supported in 64-bit mode = true
   brand = "Genuine Intel(R) CPU             000  @ 3.07GHz"
   L1 TLB/cache information: 2M/4M pages & L1 TLB (0x80000005/eax):
      instruction # entries     = 0x0 (0)
      instruction associativity = 0x0 (0)
      data # entries            = 0x0 (0)
      data associativity        = 0x0 (0)
   L1 TLB/cache information: 4K pages & L1 TLB (0x80000005/ebx):
      instruction # entries     = 0x0 (0)
      instruction associativity = 0x0 (0)
      data # entries            = 0x0 (0)
      data associativity        = 0x0 (0)
   L1 data cache information (0x80000005/ecx):
      line size (bytes) = 0x0 (0)
      lines per tag     = 0x0 (0)
      associativity     = 0x0 (0)
      size (Kb)         = 0x0 (0)
   L1 instruction cache information (0x80000005/ecx):
      line size (bytes) = 0x0 (0)
      lines per tag     = 0x0 (0)
      associativity     = 0x0 (0)
      size (Kb)         = 0x0 (0)
   L2 TLB/cache information: 2M/4M pages & L2 TLB (0x80000006/eax):
      instruction # entries     = 0x0 (0)
      instruction associativity = L2 off (0)
      data # entries            = 0x0 (0)
      data associativity        = L2 off (0)
   L2 TLB/cache information: 4K pages & L2 TLB (0x80000006/ebx):
      instruction # entries     = 0x0 (0)
      instruction associativity = L2 off (0)
      data # entries            = 0x0 (0)
      data associativity        = L2 off (0)
   L2 unified cache information (0x80000006/ecx):
      line size (bytes) = 0x40 (64)
      lines per tag     = 0x0 (0)
      associativity     = 8-way (6)
      size (Kb)         = 0x100 (256)
   Advanced Power Management Features (0x80000007/edx):
      temperature sensing diode      = 0x0 (0)
      frequency ID (FID) control     = 0x0 (0)
      voltage ID (VID) control       = 0x0 (0)
      thermal trip (TTP)             = 0x0 (0)
      thermal monitor (TM)           = 0x0 (0)
      software thermal control (STC) = 0x0 (0)
      TscInvariant                   = 0x1 (1)
   Physical Address and Linear Address Size (0x80000008/eax):
      maximum physical address = 0x28 (40)
      maximum linear address   = 0x30 (48)
   Logical CPU cores (0x80000008/ecx):
      number of logical CPU cores - 1 = 0x0 (0)
      ApicIdCoreIdSize                = 0x0 (0)
   (multi-processing synth): hyper-threaded (t=32)
   (synth) = Intel Pentium II / Pentium III / Pentium M / Celeron / Mobile
Celeron / Celeron M / Core Solo / Core Duo / Core 2 / Core 2 Extreme Processor
/ Xeon Processor LV / Xeon Processor 5100 (unknown model)




--------------------------------------------------------------
t@353 (l@353) terminated by signal SEGV (Segmentation Fault)
0xfffffd7ffa49b006: intel_aes_decrypt_cbc_256+0x0226:   ***ERROR--unknown op
code***

Crash seems to happen at 0xfffffd7ffa49b006

0xfffffd7ffa49b001: intel_aes_decrypt_cbc_256+0x0221:   ***ERROR--unknown op
code***
0xfffffd7ffa49b004: intel_aes_decrypt_cbc_256+0x0224:   fmulp    %st,%st(2)
0xfffffd7ffa49b006: intel_aes_decrypt_cbc_256+0x0226:   ***ERROR--unknown op
code***
0xfffffd7ffa49b009: intel_aes_decrypt_cbc_256+0x0229:   fisttp   (%rdi)
0xfffffd7ffa49b00b: intel_aes_decrypt_cbc_256+0x022b:   pxor     %xmm0,%xmm1
0xfffffd7ffa49b00f: intel_aes_decrypt_cbc_256+0x022f:   movdqu  
%xmm1,(%rsi,%rax)
0xfffffd7ffa49b014: intel_aes_decrypt_cbc_256+0x0234:   movdqu  
(%r8,%rax),%xmm0
0xfffffd7ffa49b01a: intel_aes_decrypt_cbc_256+0x023a:   addq    
$0x0000000000000010,%rax
0xfffffd7ffa49b01e: intel_aes_decrypt_cbc_256+0x023e:   cmpq     %rax,%r9
0xfffffd7ffa49b021: intel_aes_decrypt_cbc_256+0x0241:   jne     
intel_aes_decrypt_cbc_256+0x1d3        [ 0xfffffd7ffa49afb3, .-0x6e ]
0xfffffd7ffa49b023: intel_aes_decrypt_cbc_256+0x0243:   movdqu   %xmm0,(%rdx)
0xfffffd7ffa49b027: intel_aes_decrypt_cbc_256+0x0247:   xorl     %eax,%eax
0xfffffd7ffa49b029: intel_aes_decrypt_cbc_256+0x0249:   ret
------------------------------------------------

From source code :


intel_aes_decrypt_ecb_256:
...
    .byte 0x66,0x0f,0x38,0xde,0xcb    /* aesdec    %xmm3, %xmm1 */
    .byte 0x66,0x0f,0x38,0xde,0xca    /* aesdec    %xmm2, %xmm1 */
    .byte 0x66,0x0f,0x38,0xdf,0x0f    /* aesdeclast (%rdi), %xmm1 */
    pxor    %xmm0, %xmm1
    movdqu    %xmm1, (%rsi, %rax)
    movdqu    (%r8, %rax), %xmm0
    addq    $16, %rax
    cmpq    %rax, %r9
    jne    4b

5:    movdqu    %xmm0, (%rdx)

    xor    %eax, %eax
    ret
    .size intel_aes_decrypt_cbc_256, .-intel_aes_decrypt_cbc_256

-----------------------------------


The problem is the first operation to the AESDECLAST instruction is NOT
aligned.  That will generate an exception.  It must be aligned on a 0 mod 16
address:



intel_aes_decrypt_ecb_256 + 0x226:
.byte 0x66,0x0f,0x38,0xdf,0x0f    /* aesdeclast (%rdi), %xmm1 */

The value of the %rdi register is 0x0000000018ed9da8. This is not aligned and
will generate a SEGV.

To fix, please ensure the imputs are aligned 0 mod 16. 


Reproducible: Always

Steps to Reproduce:
Apply an SSL load to a webserver using NSS libraries on a Westmere class
system, eg: sun webserver 7.0 update 6
Actual Results:  
Libs seg fault
Status: NEW → ASSIGNED
Duplicate of this bug: 539744
Assignee: nobody → glen.beasley
Priority: -- → P1
Depends on: 459248
That's a first: an Intel CPU segfaulting for an unaligned address.
Attached patch Fix alignment issue in AES 256 (obsolete) — Splinter Review
Here's a patch to handle the alignment issue.  It's not so nice but unfortunately all 16 registers are used.  We have to reload something.  Not really tested since I don't have the hardware but from inspection it should work.

And re comment #3: it not surprising.  Intel MMX/SSE instructions generally have alignment requirements unless they explicitly handle unaligned accesses.
Blocks: 527659
Comment on attachment 422022 [details] [diff] [review]
Fix alignment issue in AES 256

I applied the patch, but now the code is crashing


t@2 (l@2) signal SEGV (no mapping at the fault address) in intel_aes_encrypt_cbc_256 at 0xfffffd7ff60d8ddc
0xfffffd7ff60d8ddc: intel_aes_encrypt_cbc_256+0x0084:   pxor     (%rdi),%xmm1
Current function is AES_Encrypt
 1161                                input, inputLen, blocksize);


current thread: t@2
  [1] intel_aes_encrypt_cbc_256(0x592f68, 0x5aa2b5, 0x592f48, 0x40, 0x5aa2b5, 0x40), at 0xfffffd7ff60d8ddc
=>[2] AES_Encrypt(cx = 0x592f38, output = 0x5aa2b5 "^T", outputLen = 0xfffffd7ffee5bae0, maxOutputLen = 64U, input = 0x5aa2b5 "^T", inputLen = 64U), line 1161 in "rijndael.c"
  [3] AES_Encrypt(cx = 0x592f38, output = 0x5aa2b5 "^T", outputLen = 0xfffffd7ffee5bae0, maxOutputLen = 64U, input = 0x5aa2b5 "^T", inputLen = 64U), line 562 in "loader.c"
  [4] ssl3_CompressMACEncryptRecord(ss = 0x5908b0, type = content_handshake, pIn = 0x5b32d0 "^T", contentLen = 40U), line 2131 in "ssl3con.c"
  [5] ssl3_SendRecord(ss = 0x5908b0, type = content_handshake, pIn = 0x5b32d0 "^T", nIn = 40, flags = 0), line 2230 in "ssl3con.c"
  [6] ssl3_FlushHandshake(ss = 0x5908b0, flags = 0), line 2408 in "ssl3con.c"
  [7] ssl3_SendFinished(ss = 0x5908b0, flags = 0), line 7848 in "ssl3con.c"
Attachment #422022 - Flags: review-
Updated patch which fixes the encryption code.
Attachment #422022 - Attachment is obsolete: true
This patch simply turns off intel AES HW accelleration. 

The current patch still crashes due to alignment issues. 

selfserv crashes:
t@4 (l@4) signal SEGV (no mapping at the fault address) in intel_aes_decrypt_cbc_256 at 0xfffffd7ff5d38f70
0xfffffd7ff5d38f70: intel_aes_decrypt_cbc_256+0x00f8:   pxor     (%r8,%rax),%xmm4
Current function is AES_Decrypt
 1192                                input, inputLen, blocksize);
(dbx) where
current thread: t@4
   [1] intel_aes_decrypt_cbc_256(0xfffffd7ffee4b0d0, 0x57c020, 0xfffffd7ffee4b0b0, 0x2c0, 0x5aeb54, 0x2c0), at 0xfffffd7ff5d38f70
=>[2] AES_Decrypt(cx = 0xfffffd7ffee4b0a0, output = 0x57c020 "", outputLen = 0x698480, maxOutputLen = 704U, input = 0x5aeb54 "\xf6\xb4U\xd3\xa6\xa1\xf6\x86Y+\xcfdN\xbf^O^L\xaeB\x90^]^E3M'|8G1\xda{\x9b\xcaV\xc6\xe2\xf6^Fc\xd54^]\xc0\x958?c^]\xdd\xfb^P\xfe\xe0|^]\xbe\x9e\xc3\xf7^BZ\xe9\xd1^DLi\xf9\x87\x89aJ\xf4%\xd1\xe7^[\x97\xc9~\xad^[Y\xa4\xfd\xac_\xb6T*^T^[\xe6Y%o\xd1\x93\xf0+Vd\xf6{`^^\x8ew\xcb\xca\xbf1\xed\xadNg>+b\xc7;\xd4<\x93\x8f\x84", inputLen = 704U), line 1192 in "rijndael.c" 

Ulrich is on vacation, but before he left we discussed not using this latest patch, and instead having AES_AllocateContext call a new PORT_ZNew function that returns a value aligned to 16 bytes.
Attachment #424056 - Flags: review?(rrelyea)
Comment on attachment 424056 [details] [diff] [review]
turn off intel HW accelleration for now
[Checkin: Comment 10]

r+ rrelyea
Attachment #424056 - Flags: review?(rrelyea) → review+
checked into trunk:
Checking in Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v  <--  Makefile
new revision: 1.109; previous revision: 1.108
done
check into SOFTOKEN_3_13_BRANCH
Checking in Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v  <--  Makefile
new revision: 1.108.4.1; previous revision: 1.108
done
Blocks: FIPS2010
Blocks: 540986
Found one more issuer in the assemller code: function key_expansion256 does two expansions in one call leading to buffer expanded key buffer overflow. Didn't see it in testing since AES_Context key expansion buffer is twice bigger then needed.

/*
 * This magic number is (Nb_max * (Nr_max + 1))
 * where Nb_max is the maximum block size in 32-bit words,
 *       Nr_max is the maximum number of rounds, which is Nb_max + 6
 */
#define RIJNDAEL_MAX_EXP_KEY_SIZE (8 * 15)
                                  ^^^^^^^^ I think is should be 4

/* AESContextStr
 *
 * Values which maintain the state for Rijndael encryption/decryption.
 *
 * iv          - initialization vector for CBC mode
 * Nb          - the number of bytes in a block, specified by user
 * Nr          - the number of rounds, specified by a table
 * expandedKey - the round keys in 4-byte words, the length is Nr * Nb
 * worker      - the encryption/decryption function to use with this context
 */
struct AESContextStr
{
    unsigned int   Nb;
    unsigned int   Nr;
    AESFunc       *worker;
    unsigned char iv[RIJNDAEL_MAX_BLOCKSIZE];
    PRUint32      expandedKey[RIJNDAEL_MAX_EXP_KEY_SIZE];
    void          *ztContext;
};
Assignee: gbmozilla → alexei.volkov.bugs
Attachment #433680 - Flags: review?
Attachment #433680 - Flags: review? → review?(rrelyea)
Comment on attachment 433680 [details] [diff] [review]
Do not do unnecessary 16th round of key expantion

r+ with some comments...

1) It might be better to replicate the last step in another function rather than take the jz overhead on each call. (the function is relatively small).

bob
Attachment #433680 - Flags: review?(rrelyea) → review+
Attachment #424056 - Attachment description: turn off intel HW accelleration for now → turn off intel HW accelleration for now [Checkin: Comment 10]
This patch also includes fixes from attachment 423044 [details] [diff] [review] "Fix alignment issue in AES 256"
Attachment #447632 - Flags: review?
Attachment #447632 - Flags: review? → review?(rrelyea)
Comment on attachment 447632 [details] [diff] [review]
Fix alignment problems

Do we need this patch for re-validation? Or the decision is to feed only aligned buffers?
Attachment #447632 - Flags: superreview?(nelson)
Attachment #447632 - Flags: superreview?(nelson)
Comment on attachment 447632 [details] [diff] [review]
Fix alignment problems

So the plan is for Bob to still look at the patch and try it out. For me to create an alternative code path that will work with only aligned buffers.
Removing super review request from Nelson since the patch need to have more attention.
Comment on attachment 433680 [details] [diff] [review]
Do not do unnecessary 16th round of key expantion

Alexei, your checked in this patch on 2010-05-07 11:46
on the NSS trunk (CVS revision 1.3).

Please also check this in on the SOFTOKEN_3_13_BRANCH,
which is the effective tip of the Softoken.

Note: what you checked in in CVS revision 1.3 looks very
different from this patch.
I've merged the SOFTOKEN_3_13_BRANCH onto the NSS trunk.
SOFTOKEN_3_13_BRANCH is now dead.

Alexei, please ignore my previous comment.
See Also: → 536485
I took NSS 3.12.8 Source code from mozilla.org:
wget --no-check-certificate https://ftp.mozilla.org/pub/mozilla.org/security/nss/releases/NSS_3_12_8_RTM/src/nss-3.12.8-with-nspr-4.8.6.tar.gz

Setting up environmnet :
JAVA_HOME="/share/builds/components/jdk/1.5.0_09/SunOS_x86"
JAVA_HOME64="/share/builds/components/jdk/1.5.0_09/SunOS_amd64"
CVSROOT=:pserver:anonymous@cvs-mirror.mozilla.org:/cvsroot
PATH="${JAVA_HOME}/bin:/usr/bin:/usr/ccs/bin:/tools/ns/bin"
GMAKE="/tools/ns/bin/gmake-3.80"
export JAVA_HOME JAVA_HOME64 PATH GMAKE CVSROOT
export CC=/h/iws-files/export/i386/opt/SUNWspro/SOS12U1/bin/cc
export PATH=/h/iws-files/export/i386/opt/SUNWspro/SOS12U1/bin/:$PATH

Code Changes :
Modified mozilla/security/nss/lib/freebl/Makefile and removed # before intel line

$diff nss-3.12.8/mozilla/security/nss/lib/freebl/Makefile nss-3.12.8/mozilla/security/nss/lib/freebl/Makefile.ORIGINAL 

171,172c171,172
<     DEFINES += -DUSE_HW_AES
<     ASFILES += intel-aes.s
---
> #    DEFINES += -DUSE_HW_AES
> #    ASFILES += intel-aes.s
418,419c418,419
<       DEFINES += -DUSE_HW_AES
<       ASFILES += intel-aes.s
---
> #     DEFINES += -DUSE_HW_AES
> #     ASFILES += intel-aes.s

cd /export/home/meena/west/nss-3.12.8/mozilla/security/nss/cmd/shlibsign
Modified Makefile put # before 3 lines thats causing complation problems

Building :
cd mozilla/security/nss
gmake BUILD_OPT=1 USE_64=1 nss_build_all

When I used this NSS on oaf578, it doesn't crash any more. Also we saw huge performance improvements(20-40%) in Web Server.

It would be nice if this bug gets fixed.
History of these files 
~~~~~~~~~~~~~~~~~
NSS version                                    crashes        Changes in security/nss/lib/freebl/Makefile  and intel.s 
NSS 3.12.3 (Web Server 7.0u6) - crashes                                     
NSS 3.12.6.2 (WS7.0u9) - no core dumps - intel-aes.s is commented out in Makefile
NSS 3.12.7 & 3.12.8 - no core dumps - intel-aes.s changed as given below. But intel-aes.s is still commented out in Makefile.

Makefile was changed in NSS 3.12.6 as shown below :
+ diff nss-3.12.5/mozilla/security/nss/lib/freebl/Makefile nss-3.12.6/mozilla/security/nss/lib/freebl/Makefile 
166,167c166,167
<     DEFINES += -DUSE_HW_AES
<     ASFILES += intel-aes.s
---
> #    DEFINES += -DUSE_HW_AES
> #    ASFILES += intel-aes.s
413,414c413,414
<       DEFINES += -DUSE_HW_AES
<       ASFILES += intel-aes.s
---
> #     DEFINES += -DUSE_HW_AES
> #     ASFILES += intel-aes.s


intel-aes.s file was changed in NSS 3.12.7 as shown below :
+ diff nss-3.12.6/mozilla/security/nss/lib/freebl/intel-aes.s nss-3.12.7/mozilla/security/nss/lib/freebl/ 
1119c1119,1126
<       call key_expansion256
---
>       pxor    %xmm6, %xmm6
>       pshufd  $0xff, %xmm2, %xmm2
>       shufps  $0x10, %xmm1, %xmm6
>       pxor    %xmm6, %xmm1
>       shufps  $0x8c, %xmm1, %xmm6
>       pxor    %xmm2, %xmm1
>       pxor    %xmm6, %xmm1
>       movdqu  %xmm1, (%rsi)
1177c1184,1191
<       call key_expansion256
---
>       pxor    %xmm6, %xmm6
>       pshufd  $0xff, %xmm2, %xmm2
>       shufps  $0x10, %xmm1, %xmm6
>       pxor    %xmm6, %xmm1
>       shufps  $0x8c, %xmm1, %xmm6
>       pxor    %xmm2, %xmm1
>       pxor    %xmm6, %xmm1
>       movdqu  %xmm1, (%rsi)
1193a1208
> 
1196d1210
<
Change log shows these two check-ins are made for this bug :

1.109 glen.beasley%sun.com 2010-01-28 15:10 536485 turn off intel aes HW accelleration r=bob relyea
For more details refer  http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=Makefile&branch=&root=/cvsroot&subdir=mozilla/security/nss/lib/freebl&command=DIFF_FRAMESET&rev1=1.108&rev2=1.109

1.3 alexei.volkov.bugs%sun.com	2010-05-07 11:46 536485: Do not do unnecessary 16th round of key expantion. r=rrelyea
For more details refer  http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=intel-aes.s&branch=&root=/cvsroot&subdir=mozilla/security/nss/lib/freebl&command=DIFF_FRAMESET&rev1=1.2&rev2=1.3
Target Milestone: --- → 3.12.9
Comment on attachment 447632 [details] [diff] [review]
Fix alignment problems

r+ with the following changes...

The c++ like comments after the the command need to be C style comments (they fail to compile on Linux).
+	leaq	16(%rdi), %rdx  // iv
+	leaq	48(%rdi), %rdi  // expanded key
 +	movdqu	(%rdx), %xmm0   // iv
+	movdqu	(%rdi), %xmm2   // first key block
+	movdqu	160(%rdi), %xmm12 // last key block
 	xorl	%eax, %eax+2:	movdqu	(%r8, %rax), %xmm3 // 1st data block
+	movdqu	16(%r8, %rax), %xmm4 // 2d data block


I'd also like to see a patch to turn off AES HW with an environment variable.
Attachment #447632 - Flags: review?(rrelyea) → review+
Here's the performance results:
Without Alexi's alignment safe patch:

> bltest -E -m aes_cbc -i x -k key128 -v iv0 -p 100000 -o /dev/null
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_cbc_e         1Gb     128    100T       0       0.000    2359.000       2.359       662Mb
> bltest -E -m aes_cbc -i x -k key192 -v iv0 -p 100000 -o /dev/null
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_cbc_e         1Gb     192    100T       0       0.000    2778.000       2.778       562Mb
> bltest -E -m aes_cbc -i x -k key256 -v iv0 -p 100000 -o /dev/null
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_cbc_e         1Gb     256    100T       0       0.000    3238.000       3.238       482Mb

With Alexi's alignment safe patch:
> bltest -E -m aes_cbc -i x -k key0 -v iv0 -p 100000 -o /dev/null
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_cbc_e         1Gb     128    100T       0       0.000    2359.000       2.359       662Mb
> bltest -E -m aes_cbc -i x -k key192 -v iv0 -p 100000 -o /dev/null
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_cbc_e         1Gb     192    100T       0       0.000    2779.000       2.779       562Mb
> bltest -E -m aes_cbc -i x -k key256 -v iv0 -p 100000 -o /dev/null
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_cbc_e         1Gb     256    100T       0       0.000    3198.000       3.199       488Mb

Without using intel aes hw:

> bltest -E -m aes_cbc -i x -k key128  -v iv0 -p 100000 -o /dev/null
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_cbc_e         1Gb     128    100T       0       0.000   30984.000      30.985        50Mb
> bltest -E -m aes_cbc -i x -k key192 -v iv0 -p 100000 -o /dev/null
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_cbc_e         1Gb     192    100T       0       0.000   35395.000      35.396        44Mb
> bltest -E -m aes_cbc -i x -k key256 -v iv0 -p 100000 -o /dev/null
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_cbc_e         1Gb     256    100T       0       0.000   39762.000      39.763        39Mb
Clearly there is no reason not to take Alexi's patch as is.

(HW gives us a wapping 10x performance win on raw AES, that has got to give us several percent on AES SSL performance).

bob
I also don't see a penalty when using unaligned buffers and Alexi's code (same command with -1 7 and -2 5, for instance).

bob
Alexi's patch checked into the tip/trunk...

Checking in intel-aes.s;
/cvsroot/mozilla/security/nss/lib/freebl/intel-aes.s,v  <--  intel-aes.s
new revision: 1.4; previous revision: 1.3
done
This patch undoes Glen's patch to disable HW_AES. It also allows us to turn it off in the field if we run into other problems.

bob
Attachment #491369 - Flags: review?(wtc)
Attachment #491369 - Flags: superreview?(alexei.volkov.bugs)
Comment on attachment 491369 [details] [diff] [review]
Turn HW AES back on. Include an Environment variable to disable it.

I suggest two changes.

>+	char *use_hw = getenv("NSS_DONT_USE_HW_AES");

The local variable name should have a negative sense,
such as "disable_hw" or "dont_use_hw".

I suggest naming the environment variable NSS_DISABLE_HW_AES
for consistency with other NSS environment variables, unless
you really want it to match the -DUSE_HW_AES macro.
I'm ameniable to both suggested changes.

bob
Attachment #491369 - Attachment is obsolete: true
Attachment #491382 - Flags: review?(wtc)
Attachment #491369 - Flags: superreview?(alexei.volkov.bugs)
Attachment #491369 - Flags: review?(wtc)
Comment on attachment 491382 [details] [diff] [review]
V2: Turn HW AES back on. Include an Environment variable to disable it.

r=wtc.  Thanks.
Attachment #491382 - Flags: review?(wtc) → review+
Alexi's patch checked into NSS 3.12 BRANCH.

Checking in intel-aes.s;
/cvsroot/mozilla/security/nss/lib/freebl/intel-aes.s,v  <--  intel-aes.s
new revision: 1.3.2.1; previous revision: 1.3
done
NSS 3.12 BRANCH

Checking in Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v  <--  Makefile
new revision: 1.113.2.2; previous revision: 1.113.2.1
done
Checking in rijndael.c;
/cvsroot/mozilla/security/nss/lib/freebl/rijndael.c,v  <--  rijndael.c
new revision: 1.25.6.1; previous revision: 1.25
done


NSS TIP/TRUNK
Checking in Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v  <--  Makefile
new revision: 1.116; previous revision: 1.115
done
Checking in rijndael.c;
/cvsroot/mozilla/security/nss/lib/freebl/rijndael.c,v  <--  rijndael.c
new revision: 1.26; previous revision: 1.25
done
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: 4_3.12.10
Target Milestone: 3.12.9 → 3.12.10
Restoring target milestone. The patch was checked into 3.12.9 already..

bob
Target Milestone: 3.12.10 → 3.12.9
Whiteboard: 4_3.12.10
Crash Signature: [@ intel_aes_decrypt_cbc_256]
You need to log in before you can comment on or make changes to this bug.