Closed Bug 1608493 Opened 2 years ago Closed 2 years ago

AES-NI not used for non-GCM AES ciphers on macOS

Categories

(NSS :: Libraries, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arthur.ramsey, Assigned: kjacobs)

References

Details

Attachments

(2 files)

Attached file aes-ni-macos.patch

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36

Steps to reproduce:

Run nss/tests/chiper/performance.sh with and without attached patch on macOS 10.14.6. Note the differences in throughput. The hw-support tool provided by NSS indicates all CPU features are available on the test device which has an i7-8559U. The patch would need to be tweaked a bit to be useable for all OSs.

Actual results:

A profiler will reveal that without the patch the functions provided by the intel-aes.s assembly are not used for non-GCM AES ciphers.

Expected results:

The blog post https://blog.mozilla.org/security/2017/09/29/improving-aes-gcm-performance and bug 1357670 imply AES-NI and other CPU features should be used for all AES ciphers when available. Some clarity around the use of AES-NI and other CPU features on different OSs and CPU architectures would be appreciated.

I reviewed the latest NSS source code and there appears to be no difference in the use of AES-NI for non-GCM ciphers on macOS.

I should clarify I don't expect use of the functions provided in intel-aes.s assembly for AES GCM ciphers.

Kevin, I know you looked at this recently, can you look again?

Flags: needinfo?(kjacobs.bugzilla)

Thanks for the report. This is similar to what I noticed in bug 1573672.

At a high level, NSS chooses an implementation in two steps. First, in aes_InitContext, intel-aes.s is used if USE_HW_AES. Otherwise, a source-level worker is assigned (in this example, rijndael_encryptECB). This worker runtime checks for AES-NI and chooses either an intrinsics implementation instead of assembly, or a SW implementation if no AES-NI. Have you checked whether the intrinsics are being used in your builds, or are you just looking at the performance difference between intrinsics and asm?

All that said, I just noticed that CBC actually does not use intrinsics as it should. Mode-specific performance figures on Mac are below, with the second trial disabling AES-NI. Note these are extra slow since it's an ASAN build.

Lastly, I'm not seeing a performance difference after your patch, but maybe I've imported it incorrectly (as mercurial fails to do so cleanly).

ECB:

kjacobs-44776:cipher kjacobs$ bltest -E -m aes_ecb -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.out
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_ecb_e        78Mb     256     10T       0       0.000     342.000       0.342       228Mb
kjacobs-44776:cipher kjacobs$ NSS_DISABLE_HW_AES=1 bltest -E -m aes_ecb -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ecb.out
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_ecb_e        78Mb     256     10T       0       0.000    3501.000       3.502        22Mb

GCM:

kjacobs-44776:cipher kjacobs$ bltest -E -m aes_gcm -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.out
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_gcm_e        78Mb     256     10T       0       0.000    1325.000       1.325        58Mb
kjacobs-44776:cipher kjacobs$ NSS_DISABLE_HW_AES=1  bltest -E -m aes_gcm -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_gcm.out
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_gcm_e        78Mb     256     10T       0       0.000    4470.000       4.471        17Mb

CTR:

kjacobs-44776:common kjacobs$ bltest -E -m aes_ctr -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.out

#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_ctr_e        78Mb     256     10T       0       0.000     846.000       0.846        92Mb
kjacobs-44776:common kjacobs$ NSS_DISABLE_HW_AES=1  bltest -E -m aes_ctr -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_ctr.out
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_ctr_e        78Mb     256     10T       0       0.000    4117.000       4.117        18Mb

CBC:

kjacobs-44776:common kjacobs$ bltest -E -m aes_cbc -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.out
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_cbc_e        78Mb     256     10T       0       0.000    3946.000       3.946        19Mb
kjacobs-44776:common kjacobs$ NSS_DISABLE_HW_AES=1 bltest -E -m aes_cbc -i /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.in -k /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.key -v /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.iv -p 10000 -o /Users/kjacobs/repos/tests_results/security/localhost.118/cipher/aes_cbc.out
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_cbc_e        78Mb     256     10T       0       0.000    3824.000       3.826        20Mb
Flags: needinfo?(kjacobs.bugzilla)
Duplicate of this bug: 1573672

My patch was against 3.36.0 but I could rebase on latest and cleanup to only affect macOS. Wasn't sure if there would be any interest in this change so I was holding off on that until there was.

I profiled using Xcode Instruments and found use of "intel_aes_encrypt_cbc_256" only after the patch. Decompiling also shows no use of "intel_aes_encrypt_cbc_256" prior to patch. That combined with the throughput using bltest with and without the patch is the basis for my conclusion.

Testing was mainly focused around AES CBC ciphers. After reviewing code and the referenced blog post I concluded this poor performance probably applies to all non-GCM AES ciphers. I believe AES GCM ciphers are using AES-NI and other CPU features as your tests confirm.

For decrypt, CBC and ECB are not using the AES-NI. For encrypt, CBC is not, but ECB is. CTR/GCM use it for both encrypt and decrypt.

I think the right approach here is to build out the intrinsics rather than porting assembly. If you'd like to take this on, I can upload my WIP patch (which fixes ECB), otherwise I'll continue on to CBC over the next week or two.

Thanks again for the report, this will be a nice performance boost!

I agree building out the intrinsics is the way to go. I'll let you finish your work. Thanks for looking at this Kevin.

Assignee: nobody → kjacobs.bugzilla
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Priority: -- → P1
Target Milestone: --- → 3.50

AES-NI is currently not used for //CBC// or //ECB decrypt// when an assembly implementation (intel-aes.s or intel-aes-x86/64-masm.asm) is not available. Concretely, this is the case on MacOS, Linux32, and other non-Linux OSes such as BSD. This patch adds the plumbing to use AES-NI intrinsics when available.

Before:

       mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
  aes_ecb_d        78Mb     256     10T       0       0.000     395.000       0.395       197Mb
  aes_cbc_e        78Mb     256     10T       0       0.000     392.000       0.393       198Mb
  aes_cbc_d        78Mb     256     10T       0       0.000     425.000       0.425       183Mb

After:

      mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 aes_ecb_d        78Mb     256     10T       0       0.000      39.000       0.039         1Gb
 aes_cbc_e        78Mb     256     10T       0       0.000      94.000       0.094       831Mb
 aes_cbc_d        78Mb     256     10T       0       0.000      74.000       0.075         1Gb

Attachment #9121427 - Attachment description: Bug 1608493 - Use AES-NI intrinsics for CBC, ECB decrypt. → Bug 1608493 - Use AES-NI intrinsics for CBC and ECB decrypt when no assembly implementation is available.
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.