Closed
Bug 600106
Opened 14 years ago
Closed 14 years ago
Use "REV" instruction or __builtin_bswap32 for byteswap code for ARM
Categories
(NSS :: Libraries, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
3.13
People
(Reporter: m_kato, Assigned: m_kato)
Details
Attachments
(2 files)
3.40 KB,
patch
|
nelson
:
review+
|
Details | Diff | Splinter Review |
6.67 KB,
patch
|
Details | Diff | Splinter Review |
From gcc 4.5.1, ARM gcc can inlined byteswap using "REV" by __builtin_bswap32. We should use it or inline assember for DES, SHA1 and SHA512.
REV instruction is supported by thumb2 and ARMv6
Assignee | ||
Comment 1•14 years ago
|
||
Assignee | ||
Comment 2•14 years ago
|
||
Test environment is Freescale i.MX515 (Cortext A8) with -mcpu=cortext-a8
SHA1 -> 10% improve
BEFORE
======
# mode in opreps cxreps context op time(sec) thrgput
sha1_e 154Mb 5M 0 0.000 10000.000 10.001 15Mb
AFTER
=====
# mode in opreps cxreps context op time(sec) thrgput
sha1_e 161Mb 5M 0 0.000 10000.000 10.000 16Mb
SHA256 -> 3% improve
BEFORE
======
# mode in opreps cxreps context op time(sec) thrgput
sha256_e 8Mb 3M 0 0.000 10000.000 10.000 881Kb
AFTER
=====
# mode in opreps cxreps context op time(sec) thrgput
sha256_e 8Mb 3M 0 0.000 10000.000 10.001 907Kb
DES CBC Encryption -> 2% improve
BEFORE
======
# mode in symmkey opreps cxreps context op time(sec) thrgput
des_cbc_e 88Mb 64 11M 0 0.000 10000.000 10.001 8Mb
AFTER
=====
# mode in symmkey opreps cxreps context op time(sec) thrgput
des_cbc_e 89Mb 64 11M 0 0.000 10000.000 10.000 8Mb
Comment 3•14 years ago
|
||
Makoto, a question for you.
Your patch adds this compound ifdef test in two places:
+#elif defined(__GNUC__) && (defined(__thumb2__) || \
+ (!defined(__thumb__) && \
+ (defined(__ARM_ARCH_6__) || \
+ defined(__ARM_ARCH_6J__) || \
+ defined(__ARM_ARCH_6K__) || \
+ defined(__ARM_ARCH_6Z__) || \
+ defined(__ARM_ARCH_6ZK__) || \
+ defined(__ARM_ARCH_6T2__) || \
+ defined(__ARM_ARCH_7__) || \
+ defined(__ARM_ARCH_7A__) || \
+ defined(__ARM_ARCH_7R__))))
Does that set of feature test macros include the ARM7TDMI CPU?
I ask because, AFAIK, the ARM7TDMI CPU does NOT feature the REV instruction.
Assignee: nobody → m_kato
Assignee | ||
Comment 4•14 years ago
|
||
(In reply to comment #3)
> Does that set of feature test macros include the ARM7TDMI CPU?
> I ask because, AFAIK, the ARM7TDMI CPU does NOT feature the REV instruction.
No, arm7tdmi is __ARM_ARCH_4T__. you know, REV is for ARMv6 later or thumb2.
Assignee | ||
Comment 5•14 years ago
|
||
Also, if gcc 4.5.1 or later, we can use __builtin_bswap32 for this usage.
Although this builtin function is supported from gcc 4.3, gcc of ARM version doesn't use REV until gcc 4.5.0.
Assignee | ||
Updated•14 years ago
|
Attachment #479037 -
Flags: review?(nelson)
Comment 6•14 years ago
|
||
Comment on attachment 479037 [details] [diff] [review]
patch
m_kato: thank you for the patch.
Can you come up with a solution that does not require
repeating the following long condition expression in three
files?
>+#elif defined(__GNUC__) && (defined(__thumb2__) || \
>+ (!defined(__thumb__) && \
>+ (defined(__ARM_ARCH_6__) || \
>+ defined(__ARM_ARCH_6J__) || \
>+ defined(__ARM_ARCH_6K__) || \
>+ defined(__ARM_ARCH_6Z__) || \
>+ defined(__ARM_ARCH_6ZK__) || \
>+ defined(__ARM_ARCH_6T2__) || \
>+ defined(__ARM_ARCH_7__) || \
>+ defined(__ARM_ARCH_7A__) || \
>+ defined(__ARM_ARCH_7R__))))
Perhaps that condition can be tested in a header
file (such as blapi.h) included by these three
files, or you can test the equivalent condition
in lib/freebl/Makefile or lib/freebl/config.mk.
Assignee | ||
Comment 7•14 years ago
|
||
(In reply to comment #6)
> Comment on attachment 479037 [details] [diff] [review]
> patch
>
> m_kato: thank you for the patch.
>
> Can you come up with a solution that does not require
> repeating the following long condition expression in three
> files?
No easy way that gcc mcpu flag is ARMv6 or later. But I should unify define such as __ARM_ARCH__ (ex. #if __ARM_ARCH__ >= 6) or ARM_REV_SUPPORTED.
Updated•14 years ago
|
Attachment #479037 -
Flags: review?(nelson) → review+
Comment 8•14 years ago
|
||
Bug 600106: Use "REV" instruction or __builtin_bswap32 for byteswap code for ARM
Patch contributed by Makoto Kato <m_kato@ga2.so-net.ne.jp>, r=nelson
Checking in des.c; new revision: 1.9; previous revision: 1.8
Checking in sha512.c; new revision: 1.17; previous revision: 1.16
Checking in sha_fast.h; new revision: 1.9; previous revision: 1.8
Status: NEW → RESOLVED
Closed: 14 years ago
Priority: -- → P3
Resolution: --- → FIXED
Target Milestone: --- → 3.13
Comment 9•14 years ago
|
||
Comment 10•14 years ago
|
||
Wan-Teh, Yes, I know. But in this case, I think the value of the improvement
outweighs the "harm" of the duplication. It can always be improved later.
Comment 11•14 years ago
|
||
This patch is for reference only. Do not check it in.
I found that lib/freebl doesn't have a header suitable
for defining internal configuration macros. The closest
thing is blapii.h. While working on this patch, I
realized that the best solution is to move the duplicated
SHA_HTONL and BYTESWAP macro definitions to a header
file.
You need to log in
before you can comment on or make changes to this bug.
Description
•