Last Comment Bug 706024 - AES-NI enhancements to NSS on Sandy Bridge systems
: AES-NI enhancements to NSS on Sandy Bridge systems
: perf
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: 3.13
: x86_64 All
-- normal with 1 vote (vote)
: ---
Assigned To: nobody
Depends on:
  Show dependency treegraph
Reported: 2011-11-29 01:43 PST by Aleksey
Modified: 2016-09-30 01:09 PDT (History)
8 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---

Enhancement patch (66.42 KB, patch)
2011-11-29 01:43 PST, Aleksey
rrelyea: review+
Details | Diff | Splinter Review
Micro benchmark (6.15 KB, text/plain)
2011-11-29 01:44 PST, Aleksey
no flags Details
howto (1.88 KB, text/plain)
2011-11-29 01:45 PST, Aleksey
no flags Details

Description User image Aleksey 2011-11-29 01:43:14 PST
Created attachment 577519 [details] [diff] [review]
Enhancement patch

In Sandy Bridge, the AES round instructions have a throughput of one cycle and latency of eight cycles. Compared to the Westmere, where these instructions have throughput of two cycles and a latency of six cycles,  Sandy Bridge offers a two fold increase in the throughput, with some additional latency. As a result, the AES encryption/decryption throughput can be significantly increased, for parallel modes of operation (Intel® 64 and IA-32 Architectures Optimization Reference Manual, page 310, paragraph 2). 

In NSS, parallel mode is already implemented for eight blocks (it's optimum). But without loop unrolling, which also can increase results for CBC decrypt and ECB encrypt/decrypt for various key sizes. The bug contains implementation of unrolled version of CBC decrypt and ECB encrypt/decrypt.

Platforms: Sandy Bridge (RHEL 6, gcc 4.4.4), Westmere (RHEL 5.5, gcc 4.1.2).
NSS: Mozilla nss 3.12.11 with nspr 4.8.9.
JDK: b147-x64.
Microbenchmark: written on Java // warmup: 2 minutes; iteration: 3 minutes; score =  number of operations (CBC decrypt, ECB decrypt or ECB encrypt) / time iteration [ops/m].

Testing results: the results show performance improve on Sandy Bridge:

Ratio Unrolled/Original on 1 thread
Cipher/Key_Size	128-bit	192-bit	256-bit
CBC decrypt		1.238		1.247		1.297
ECB decrypt		1.280		1.320		1.300
ECB encrypt		1.185		1.212		1.165

Ratio Unrolled/Original on 4 threads
Cipher/Key Size	128-bit	192-bit	256-bit
CBC decrypt		1.201		1.242		1.277
ECB decrypt		1.228		1.293		1.269
ECB encrypt		1.160		1.170		1.151

There is no performance impact on Westmere platform.
Micro benchmark, intel-aes.s patch with loop unrolling and howto in attach.
Comment 1 User image Aleksey 2011-11-29 01:44:23 PST
Created attachment 577520 [details]
Micro benchmark
Comment 2 User image Aleksey 2011-11-29 01:45:02 PST
Created attachment 577521 [details]
Comment 3 User image Aleksey 2011-11-29 02:22:01 PST
Link to Optimization Notice
Comment 4 User image Justin Dolske [:Dolske] 2012-02-21 11:41:04 PST
Who needs to review this to move this along?

Any interdependency with bug 540986?
Comment 5 User image Aleksey 2012-02-22 01:28:42 PST
The enhancement proposed in this bug is for Linux only, #540986 is a port to  Windows and seems not to be related to Linux intel-aes implementation.
Comment 6 User image Robert Relyea 2012-05-02 14:19:41 PDT
Comment on attachment 577519 [details] [diff] [review]
Enhancement patch

r+ rrelyea
Comment 7 User image Elio Maldonado 2012-05-02 18:02:19 PDT
Patch applied in the trunk: 
revision 1.7
date: 2012/05/03 00:51:37;  author:;  state: Exp;  lines: +1011 -159
Bug 706024 - AES-NI enhancements to NSS on Sandy Bridge system,, r=rrelyea

Note You need to log in before you can comment on or make changes to this bug.