Closed Bug 600106 Opened 14 years ago Closed 14 years ago

Use "REV" instruction or __builtin_bswap32 for byteswap code for ARM

Categories

(NSS :: Libraries, defect, P3)

ARM
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: m_kato, Assigned: m_kato)

Details

Attachments

(2 files)

From gcc 4.5.1, ARM gcc can inlined byteswap using "REV" by __builtin_bswap32. We should use it or inline assember for DES, SHA1 and SHA512. REV instruction is supported by thumb2 and ARMv6
Attached patch patchSplinter Review
Test environment is Freescale i.MX515 (Cortext A8) with -mcpu=cortext-a8 SHA1 -> 10% improve BEFORE ====== # mode in opreps cxreps context op time(sec) thrgput sha1_e 154Mb 5M 0 0.000 10000.000 10.001 15Mb AFTER ===== # mode in opreps cxreps context op time(sec) thrgput sha1_e 161Mb 5M 0 0.000 10000.000 10.000 16Mb SHA256 -> 3% improve BEFORE ====== # mode in opreps cxreps context op time(sec) thrgput sha256_e 8Mb 3M 0 0.000 10000.000 10.000 881Kb AFTER ===== # mode in opreps cxreps context op time(sec) thrgput sha256_e 8Mb 3M 0 0.000 10000.000 10.001 907Kb DES CBC Encryption -> 2% improve BEFORE ====== # mode in symmkey opreps cxreps context op time(sec) thrgput des_cbc_e 88Mb 64 11M 0 0.000 10000.000 10.001 8Mb AFTER ===== # mode in symmkey opreps cxreps context op time(sec) thrgput des_cbc_e 89Mb 64 11M 0 0.000 10000.000 10.000 8Mb
Makoto, a question for you. Your patch adds this compound ifdef test in two places: +#elif defined(__GNUC__) && (defined(__thumb2__) || \ + (!defined(__thumb__) && \ + (defined(__ARM_ARCH_6__) || \ + defined(__ARM_ARCH_6J__) || \ + defined(__ARM_ARCH_6K__) || \ + defined(__ARM_ARCH_6Z__) || \ + defined(__ARM_ARCH_6ZK__) || \ + defined(__ARM_ARCH_6T2__) || \ + defined(__ARM_ARCH_7__) || \ + defined(__ARM_ARCH_7A__) || \ + defined(__ARM_ARCH_7R__)))) Does that set of feature test macros include the ARM7TDMI CPU? I ask because, AFAIK, the ARM7TDMI CPU does NOT feature the REV instruction.
Assignee: nobody → m_kato
(In reply to comment #3) > Does that set of feature test macros include the ARM7TDMI CPU? > I ask because, AFAIK, the ARM7TDMI CPU does NOT feature the REV instruction. No, arm7tdmi is __ARM_ARCH_4T__. you know, REV is for ARMv6 later or thumb2.
Also, if gcc 4.5.1 or later, we can use __builtin_bswap32 for this usage. Although this builtin function is supported from gcc 4.3, gcc of ARM version doesn't use REV until gcc 4.5.0.
Attachment #479037 - Flags: review?(nelson)
Comment on attachment 479037 [details] [diff] [review] patch m_kato: thank you for the patch. Can you come up with a solution that does not require repeating the following long condition expression in three files? >+#elif defined(__GNUC__) && (defined(__thumb2__) || \ >+ (!defined(__thumb__) && \ >+ (defined(__ARM_ARCH_6__) || \ >+ defined(__ARM_ARCH_6J__) || \ >+ defined(__ARM_ARCH_6K__) || \ >+ defined(__ARM_ARCH_6Z__) || \ >+ defined(__ARM_ARCH_6ZK__) || \ >+ defined(__ARM_ARCH_6T2__) || \ >+ defined(__ARM_ARCH_7__) || \ >+ defined(__ARM_ARCH_7A__) || \ >+ defined(__ARM_ARCH_7R__)))) Perhaps that condition can be tested in a header file (such as blapi.h) included by these three files, or you can test the equivalent condition in lib/freebl/Makefile or lib/freebl/config.mk.
(In reply to comment #6) > Comment on attachment 479037 [details] [diff] [review] > patch > > m_kato: thank you for the patch. > > Can you come up with a solution that does not require > repeating the following long condition expression in three > files? No easy way that gcc mcpu flag is ARMv6 or later. But I should unify define such as __ARM_ARCH__ (ex. #if __ARM_ARCH__ >= 6) or ARM_REV_SUPPORTED.
Attachment #479037 - Flags: review?(nelson) → review+
Bug 600106: Use "REV" instruction or __builtin_bswap32 for byteswap code for ARM Patch contributed by Makoto Kato <m_kato@ga2.so-net.ne.jp>, r=nelson Checking in des.c; new revision: 1.9; previous revision: 1.8 Checking in sha512.c; new revision: 1.17; previous revision: 1.16 Checking in sha_fast.h; new revision: 1.9; previous revision: 1.8
Status: NEW → RESOLVED
Closed: 14 years ago
Priority: -- → P3
Resolution: --- → FIXED
Target Milestone: --- → 3.13
Nelson: Makoto hasn't made my suggested change. (See comment 6 and comment 7.) It is bad to duplicated a complicated ifdef in three files.
Wan-Teh, Yes, I know. But in this case, I think the value of the improvement outweighs the "harm" of the duplication. It can always be improved later.
This patch is for reference only. Do not check it in. I found that lib/freebl doesn't have a header suitable for defining internal configuration macros. The closest thing is blapii.h. While working on this patch, I realized that the best solution is to move the duplicated SHA_HTONL and BYTESWAP macro definitions to a header file.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: