Use "REV" instruction or __builtin_bswap32 for byteswap code for ARM

RESOLVED FIXED in 3.13

Status

P3
normal
RESOLVED FIXED
8 years ago
8 years ago

People

(Reporter: m_kato, Assigned: m_kato)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Assignee)

Description

8 years ago
From gcc 4.5.1, ARM gcc can inlined byteswap using "REV" by __builtin_bswap32.  We should use it or inline assember for DES, SHA1 and SHA512.

REV instruction is supported by thumb2 and ARMv6
(Assignee)

Comment 2

8 years ago
Test environment is Freescale i.MX515 (Cortext A8) with -mcpu=cortext-a8

SHA1 -> 10% improve

BEFORE
======
#     mode          in  opreps  cxreps     context          op   time(sec)     thrgput
    sha1_e       154Mb      5M       0       0.000   10000.000      10.001        15Mb

AFTER
=====
#     mode          in  opreps  cxreps     context          op   time(sec)     thrgput
    sha1_e       161Mb      5M       0       0.000   10000.000      10.000        16Mb

SHA256 -> 3% improve

BEFORE
======
#     mode          in  opreps  cxreps     context          op   time(sec)     thrgput
  sha256_e         8Mb      3M       0       0.000   10000.000      10.000       881Kb

AFTER
=====
#     mode          in  opreps  cxreps     context          op   time(sec)     thrgput
  sha256_e         8Mb      3M       0       0.000   10000.000      10.001       907Kb


DES CBC Encryption -> 2% improve

BEFORE
======
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 des_cbc_e        88Mb      64     11M       0       0.000   10000.000      10.001         8Mb

AFTER
=====
#     mode          in symmkey  opreps  cxreps     context          op   time(sec)     thrgput
 des_cbc_e        89Mb      64     11M       0       0.000   10000.000      10.000         8Mb
Makoto, a question for you. 
Your patch adds this compound ifdef test in two places:

+#elif defined(__GNUC__) && (defined(__thumb2__) || \
+      (!defined(__thumb__) && \
+      (defined(__ARM_ARCH_6__) || \
+       defined(__ARM_ARCH_6J__) || \
+       defined(__ARM_ARCH_6K__) || \
+       defined(__ARM_ARCH_6Z__) || \
+       defined(__ARM_ARCH_6ZK__) || \
+       defined(__ARM_ARCH_6T2__) || \
+       defined(__ARM_ARCH_7__) || \
+       defined(__ARM_ARCH_7A__) || \
+       defined(__ARM_ARCH_7R__))))

Does that set of feature test macros include the ARM7TDMI CPU?
I ask because, AFAIK, the ARM7TDMI CPU does NOT feature the REV instruction.
Assignee: nobody → m_kato
(Assignee)

Comment 4

8 years ago
(In reply to comment #3)
> Does that set of feature test macros include the ARM7TDMI CPU?
> I ask because, AFAIK, the ARM7TDMI CPU does NOT feature the REV instruction.

No, arm7tdmi is __ARM_ARCH_4T__.  you know, REV is for ARMv6 later or thumb2.
(Assignee)

Comment 5

8 years ago
Also, if gcc 4.5.1 or later, we can use __builtin_bswap32 for this usage.

Although this builtin function is supported from gcc 4.3, gcc of ARM version doesn't use REV until gcc 4.5.0.
(Assignee)

Updated

8 years ago
Attachment #479037 - Flags: review?(nelson)

Comment 6

8 years ago
Comment on attachment 479037 [details] [diff] [review]
patch

m_kato: thank you for the patch.

Can you come up with a solution that does not require
repeating the following long condition expression in three
files?

>+#elif defined(__GNUC__) && (defined(__thumb2__) || \
>+      (!defined(__thumb__) && \
>+      (defined(__ARM_ARCH_6__) || \
>+       defined(__ARM_ARCH_6J__) || \
>+       defined(__ARM_ARCH_6K__) || \
>+       defined(__ARM_ARCH_6Z__) || \
>+       defined(__ARM_ARCH_6ZK__) || \
>+       defined(__ARM_ARCH_6T2__) || \
>+       defined(__ARM_ARCH_7__) || \
>+       defined(__ARM_ARCH_7A__) || \
>+       defined(__ARM_ARCH_7R__))))

Perhaps that condition can be tested in a header
file (such as blapi.h) included by these three
files, or you can test the equivalent condition
in lib/freebl/Makefile or lib/freebl/config.mk.
(Assignee)

Comment 7

8 years ago
(In reply to comment #6)
> Comment on attachment 479037 [details] [diff] [review]
> patch
> 
> m_kato: thank you for the patch.
> 
> Can you come up with a solution that does not require
> repeating the following long condition expression in three
> files?

No easy way that gcc mcpu flag is ARMv6 or later.  But I should unify define such as __ARM_ARCH__ (ex. #if __ARM_ARCH__ >= 6) or ARM_REV_SUPPORTED.
Attachment #479037 - Flags: review?(nelson) → review+
Bug 600106: Use "REV" instruction or __builtin_bswap32 for byteswap code for ARM
Patch contributed by Makoto Kato <m_kato@ga2.so-net.ne.jp>, r=nelson

Checking in des.c;      new revision: 1.9; previous revision: 1.8
Checking in sha512.c;   new revision: 1.17; previous revision: 1.16
Checking in sha_fast.h; new revision: 1.9; previous revision: 1.8
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Priority: -- → P3
Resolution: --- → FIXED
Target Milestone: --- → 3.13

Comment 9

8 years ago
Nelson: Makoto hasn't made my suggested change.  (See
comment 6 and comment 7.)  It is bad to duplicated a
complicated ifdef in three files.
Wan-Teh, Yes, I know.  But in this case, I think the value of the improvement
outweighs the "harm" of the duplication.  It can always be improved later.

Comment 11

8 years ago
Created attachment 510937 [details] [diff] [review]
define NSS_ARM_REV_SUPPORTED in blapii.h (for reference only)

This patch is for reference only.  Do not check it in.

I found that lib/freebl doesn't have a header suitable
for defining internal configuration macros.  The closest
thing is blapii.h.  While working on this patch, I
realized that the best solution is to move the duplicated
SHA_HTONL and BYTESWAP macro definitions to a header
file.
You need to log in before you can comment on or make changes to this bug.