Avoid using the generic CTR_Update function for AES-GCM
Categories
(NSS :: Libraries, enhancement, P3)
Tracking
(Not tracked)
People
(Reporter: jschanck, Assigned: jschanck)
References
(Blocks 2 open bugs)
Details
Attachments
(1 file)
The CTR_Update function allows for generic counter sizes and block sizes. While investigating Bug 1935190, we found that a significant amount of our AES-GCM cycles are spent in this generic code on aarch64. We should add a special variant of CTR_Update that is tuned for AES-GCM.
Assignee | ||
Comment 1•8 months ago
|
||
Assignee | ||
Comment 2•8 months ago
•
|
||
Here are some before and after profiles from a Samsung Galaxy A15. In both profiles I downloaded a 500MB file from a caddy server on my local network.
Before: https://share.firefox.dev/3ZNiGYR
After: https://share.firefox.dev/4gbm2L3
We'll need to collect more data, but it looks like a 30% reduction in time spent in GCM_DecryptAEAD here. I don't see a lot of low hanging fruit at this point. To go faster we'll probably need to drop to assembly and unroll the main loop to handle a few blocks at a time.
Comment 3•7 months ago
|
||
Thanks - any word when this might land? Sounds like if it reduces CTR by 30%, we'll drop from roughly 22% in the original report to maybe 15ish percent. Definitely useful.
How much would you guess it could be improved by ASM, and how hard is that to do? Any pragma/etc tweaks to affect the codegen that can be done without asm?
Assignee | ||
Comment 4•7 months ago
|
||
This will land in NSS and get uplifted to M-C during the 136 cycle.
I'm not sure how much of an improvement we'd get with a full ASM implementation---we have an ASM implementation for x64 that would be a good baseline for comparison. Ideally we'd take an off-the-shelf implementation from, say, HACL*, and it would just require some review work.
Assignee | ||
Comment 5•7 months ago
|
||
Description
•