Closed Bug 572085 Opened 9 years ago Closed 9 years ago

ARM optimization for MPI

Categories

(NSS :: Libraries, defect, P3)

ARM
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: m_kato, Assigned: m_kato)

Details

Attachments

(2 files, 3 obsolete files)

Attached patch WIP (obsolete) — Splinter Review
Now although I test on Cortex-A8 600MHz Linux box, We can improve crypt performance using assemble.

Original C implementation
=========================
# ./rsaperf -n none -p 20

Using freebl with 1024 bits key.
Using hardcoded 1024 bits key.
843 iterations in 20.017 seconds
42.11 operations/s .
one operation every 23745 microseconds


Use assembler (this patch)
==========================

# ./rsaperf -n none -p 20
Using freebl with 1024 bits key.
Using hardcoded 1024 bits key.
1391 iterations in 20.003 seconds
69.54 operations/s .
one operation every 14380 microseconds
Which of the 49 flavors of ARM cpu does this work with?
Does with work with (say) an ARM7-Txxx CPU?
(In reply to comment #1)
> Which of the 49 flavors of ARM cpu does this work with?
> Does with work with (say) an ARM7-Txxx CPU?

If replacing .arch with armv4t, I believe that this will work on armv4t or later. (I don't have ARMv4t box).
For interworking, I will update a patch to use bx lr.
Attached patch patch v1 (obsolete) — Splinter Review
Attachment #451243 - Attachment is obsolete: true
In January of this year, I started working professionally with ARM CPUs for 
the first time.  I immediately learned that ARM is a family of CPUs with 
capabilities that vary immensely.   Some ARM family CPUs don't even have a 
hardware multiply instruction and do multiplication via a shift and add 
subroutine.  Others have advanced ALUs with SIMD capabilities.  There are at least 9 separate major sets of ARM capabilities.

We have just one arch named "arm".  It's not clear to me which of the 9 (or 
more) sets are covered by our arm code.  It IS clear to me that we should 
do either or both of:

1) Make our ARM code conditionally compiled for the particular flavor of 
ARM CPU on which the resultant code is expected to run.

2) Choose one (or a small number) of those ARM flavors, declare that our ARM 
code is only for that target set, and PROMINENTLY document this fact.
(In reply to comment #4)
> In January of this year, I started working professionally with ARM CPUs for 
> the first time.  I immediately learned that ARM is a family of CPUs with 
> capabilities that vary immensely.   Some ARM family CPUs don't even have a 
> hardware multiply instruction and do multiplication via a shift and add 
> subroutine.  Others have advanced ALUs with SIMD capabilities.  There are at
> least 9 separate major sets of ARM capabilities.
>
> We have just one arch named "arm".  It's not clear to me which of the 9 (or 
> more) sets are covered by our arm code.  It IS clear to me that we should 
> do either or both of:

ARM is various.  This code is for ARMv4 (ARM7) and later (also, this code supports interworking, so .arch should be armv4t).  But it doesn't support thumb mode.  It mean that this doesn't work on Cortex-Mx series (This series only supports Thumb/Thumb-2).  SIMD is from ARMv6 (ARM11) and later and NEON (like SSE2 on x86) is from ARMv7 (well known as Cortex-A8 or Qualcomm's Scorpion).

ARMv4 code is generic of ARM ISA 32bit except to thumb.
 
> 1) Make our ARM code conditionally compiled for the particular flavor of 
> ARM CPU on which the resultant code is expected to run.
> 
> 2) Choose one (or a small number) of those ARM flavors, declare that our ARM 
> code is only for that target set, and PROMINENTLY document this fact.

I think choose 2.  And users should choose using ASM or not, when compiling code.  Because ARM has no good cpu detection opcode like cpuid on x86.

This is an example of Makefile.  If NSS_ENABLE_ARMV4_ASM is 1 on Makefile, it uses ARMv4 assembler optimization.

ifdef NSS_ENABLE_ARMV4_ASM
    # Although assembler code for ARM supports thumb interworking, this doesn't support
    # thumb/thumb-2 only CPU such as Cortex-M series.
    # if you want to enable ARMv4 optimization, define NSS_ENABLE_ARMV4_ASM on make.
    ifeq ($(CPU_ARCH),arm)
        ASFILES  = mpi_armv4.s
        DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE
    endif
endif
Attached patch v2 (obsolete) — Splinter Review
Attachment #451538 - Attachment is obsolete: true
Attachment #461979 - Flags: review?(nelson)
Comment on attachment 461979 [details] [diff] [review]
v2

cancel review.

I am working new patch to support arm v4 (or later) and thumb-2 platform.  On other platforms (16-bit thumb and arm v3), it will be turned off by compiler macro automatically.
Attachment #461979 - Flags: review?(nelson)
Attached patch fix v3Splinter Review
Attachment #461979 - Attachment is obsolete: true
Attachment #495447 - Flags: review?(nelson)
Comment on attachment 495447 [details] [diff] [review]
fix v3

This patch appears to be low risk for all non-ARM platforms.  r=nelson
Attachment #495447 - Flags: review?(nelson) → review+
Bug 572085: ARM optimization for MPI  
Patch contributed by Makoto Kato <m_kato@ga2.so-net.ne.jp>, r=nelson

Checking in Makefile;       new revision: 1.117; previous revision: 1.116
Checking in mpi/Makefile;   new revision: 1.27; previous revision: 1.26
Checking in mpi/mpi.c;      new revision: 1.49; previous revision: 1.48
Checking in mpi/mpi_arm.c;  initial revision: 1.1
Checking in mpi/target.mk;  new revision: 1.7; previous revision: 1.6
Status: NEW → RESOLVED
Closed: 9 years ago
Priority: -- → P3
Resolution: --- → FIXED
Target Milestone: --- → 3.13
Makoto: in your fix v3, you commented out two lines in mpi/target.mk.
I guess that was not intentional.  This patch uncomments those two lines.
Attachment #528238 - Flags: review?(m_kato)
Comment on attachment 528238 [details] [diff] [review]
Uncomment two lines in mpi/target.mk in fix v3

Thanks.  I forget removing it.
Attachment #528238 - Flags: review?(m_kato) → review+
Comment on attachment 528238 [details] [diff] [review]
Uncomment two lines in mpi/target.mk in fix v3

Patch checked in on the NSS trunk (NSS 3.13).

Checking in target.mk;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/target.mk,v  <--  target.mk
new revision: 1.8; previous revision: 1.7
done
You need to log in before you can comment on or make changes to this bug.