AES-NI not used for non-GCM AES ciphers on macOS
Categories
(NSS :: Libraries, enhancement, P1)
Tracking
(Not tracked)
People
(Reporter: arthur.ramsey, Assigned: kjacobs)
References
Details
Attachments
(2 files)
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36
Steps to reproduce:
Run nss/tests/chiper/performance.sh with and without attached patch on macOS 10.14.6. Note the differences in throughput. The hw-support tool provided by NSS indicates all CPU features are available on the test device which has an i7-8559U. The patch would need to be tweaked a bit to be useable for all OSs.
Actual results:
A profiler will reveal that without the patch the functions provided by the intel-aes.s assembly are not used for non-GCM AES ciphers.
Expected results:
The blog post https://blog.mozilla.org/security/2017/09/29/improving-aes-gcm-performance and bug 1357670 imply AES-NI and other CPU features should be used for all AES ciphers when available. Some clarity around the use of AES-NI and other CPU features on different OSs and CPU architectures would be appreciated.
Reporter | ||
Comment 1•5 years ago
|
||
I reviewed the latest NSS source code and there appears to be no difference in the use of AES-NI for non-GCM ciphers on macOS.
Reporter | ||
Comment 2•5 years ago
|
||
I should clarify I don't expect use of the functions provided in intel-aes.s assembly for AES GCM ciphers.
Comment 3•5 years ago
|
||
Kevin, I know you looked at this recently, can you look again?
Assignee | ||
Comment 4•5 years ago
|
||
Thanks for the report. This is similar to what I noticed in bug 1573672.
At a high level, NSS chooses an implementation in two steps. First, in aes_InitContext, intel-aes.s is used if USE_HW_AES
. Otherwise, a source-level worker is assigned (in this example, rijndael_encryptECB
). This worker runtime checks for AES-NI and chooses either an intrinsics implementation instead of assembly, or a SW implementation if no AES-NI. Have you checked whether the intrinsics are being used in your builds, or are you just looking at the performance difference between intrinsics and asm?
All that said, I just noticed that CBC actually does not use intrinsics as it should. Mode-specific performance figures on Mac are below, with the second trial disabling AES-NI. Note these are extra slow since it's an ASAN build.
Lastly, I'm not seeing a performance difference after your patch, but maybe I've imported it incorrectly (as mercurial fails to do so cleanly).
ECB:
kjacobs-44776:cipher kjacobs$ bltest -E -m aes_ecb -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.out
# mode in symmkey opreps cxreps context op time(sec) thrgput
aes_ecb_e 78Mb 256 10T 0 0.000 342.000 0.342 228Mb
kjacobs-44776:cipher kjacobs$ NSS_DISABLE_HW_AES=1 bltest -E -m aes_ecb -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.out
# mode in symmkey opreps cxreps context op time(sec) thrgput
aes_ecb_e 78Mb 256 10T 0 0.000 3501.000 3.502 22Mb
GCM:
kjacobs-44776:cipher kjacobs$ bltest -E -m aes_gcm -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.out
# mode in symmkey opreps cxreps context op time(sec) thrgput
aes_gcm_e 78Mb 256 10T 0 0.000 1325.000 1.325 58Mb
kjacobs-44776:cipher kjacobs$ NSS_DISABLE_HW_AES=1 bltest -E -m aes_gcm -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.out
# mode in symmkey opreps cxreps context op time(sec) thrgput
aes_gcm_e 78Mb 256 10T 0 0.000 4470.000 4.471 17Mb
CTR:
kjacobs-44776:common kjacobs$ bltest -E -m aes_ctr -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.out
# mode in symmkey opreps cxreps context op time(sec) thrgput
aes_ctr_e 78Mb 256 10T 0 0.000 846.000 0.846 92Mb
kjacobs-44776:common kjacobs$ NSS_DISABLE_HW_AES=1 bltest -E -m aes_ctr -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.out
# mode in symmkey opreps cxreps context op time(sec) thrgput
aes_ctr_e 78Mb 256 10T 0 0.000 4117.000 4.117 18Mb
CBC:
kjacobs-44776:common kjacobs$ bltest -E -m aes_cbc -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.out
# mode in symmkey opreps cxreps context op time(sec) thrgput
aes_cbc_e 78Mb 256 10T 0 0.000 3946.000 3.946 19Mb
kjacobs-44776:common kjacobs$ NSS_DISABLE_HW_AES=1 bltest -E -m aes_cbc -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.out
# mode in symmkey opreps cxreps context op time(sec) thrgput
aes_cbc_e 78Mb 256 10T 0 0.000 3824.000 3.826 20Mb
Reporter | ||
Comment 6•5 years ago
|
||
My patch was against 3.36.0 but I could rebase on latest and cleanup to only affect macOS. Wasn't sure if there would be any interest in this change so I was holding off on that until there was.
I profiled using Xcode Instruments and found use of "intel_aes_encrypt_cbc_256" only after the patch. Decompiling also shows no use of "intel_aes_encrypt_cbc_256" prior to patch. That combined with the throughput using bltest with and without the patch is the basis for my conclusion.
Reporter | ||
Comment 7•5 years ago
|
||
Testing was mainly focused around AES CBC ciphers. After reviewing code and the referenced blog post I concluded this poor performance probably applies to all non-GCM AES ciphers. I believe AES GCM ciphers are using AES-NI and other CPU features as your tests confirm.
Assignee | ||
Comment 8•5 years ago
|
||
For decrypt, CBC and ECB are not using the AES-NI. For encrypt, CBC is not, but ECB is. CTR/GCM use it for both encrypt and decrypt.
I think the right approach here is to build out the intrinsics rather than porting assembly. If you'd like to take this on, I can upload my WIP patch (which fixes ECB), otherwise I'll continue on to CBC over the next week or two.
Thanks again for the report, this will be a nice performance boost!
Reporter | ||
Comment 9•5 years ago
|
||
I agree building out the intrinsics is the way to go. I'll let you finish your work. Thanks for looking at this Kevin.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 10•5 years ago
|
||
AES-NI is currently not used for //CBC// or //ECB decrypt// when an assembly implementation (intel-aes.s
or intel-aes-x86/64-masm.asm
) is not available. Concretely, this is the case on MacOS, Linux32, and other non-Linux OSes such as BSD. This patch adds the plumbing to use AES-NI intrinsics when available.
Before:
mode in symmkey opreps cxreps context op time(sec) thrgput
aes_ecb_d 78Mb 256 10T 0 0.000 395.000 0.395 197Mb
aes_cbc_e 78Mb 256 10T 0 0.000 392.000 0.393 198Mb
aes_cbc_d 78Mb 256 10T 0 0.000 425.000 0.425 183Mb
After:
mode in symmkey opreps cxreps context op time(sec) thrgput
aes_ecb_d 78Mb 256 10T 0 0.000 39.000 0.039 1Gb
aes_cbc_e 78Mb 256 10T 0 0.000 94.000 0.094 831Mb
aes_cbc_d 78Mb 256 10T 0 0.000 74.000 0.075 1Gb
Updated•5 years ago
|
Assignee | ||
Comment 11•5 years ago
|
||
Description
•