Closed Bug 333601 Opened 18 years ago Closed 12 years ago

Use the x86 assembly code in mpi on Intel Macs

Categories

(NSS :: Libraries, defect, P2)

3.11
x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wtc, Assigned: mark)

Details

Attachments

(3 files, 3 obsolete files)

The mpi library in NSS (mozilla/security/nss/lib/freebl/mpi)
has some x86 assembly code.  It is written in several formats
for various assemblers:
  mpi_x86.s  (GNU as, or gas)
  mpi_i86pc.s (Solaris x86 assembler)
  mpi_x86.asm (Microsoft MASM, or ml.exe)
  mpi_x86_asm.c (MSVC inline assembly code)

We should be able to use this assembly code on Intel Macs.
Hopefully mpi_x86.s (the gas version) can be used directly.
If the non-executable stack directives at the end of that file
cause problem on Intel Macs:

   # Magic indicating no need for an executable stack
  .section .note.GNU-stack, "", @progbits
  .previous

let us know.  We recently learned that we can use the
gas command-line option --noexecstack to accomplish the
same thing, so we can remove those directives from mpi_x86.s
if necessary.
(https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180726#c17)

Note that the x86 assembly code in mpi_x86.s probably only
requires Pentium.  If all the Intel Mac processors support
certain instruction set extensions (e.g., MMX, SSE2), we can
take advantage of that, too.

To use assembly code on Intel Macs, you need to modify
mozilla/security/nss/lib/freebl/Makefile.  By searching for
mpi_x86.s in that file, you can figure out how to make the
necessary changes for Intel Macs.
The gas version can't be used directly for the following reasons:

1. .type @function does not work

2. numeric local labels must be in the range 0..9

3. .section @progbits does not work

If you want to use the existing mpi_x86.s:

#2 can be solved by either moving to local labels of the form Ln and eliminating the hints, or by renumbering the local labels to keep them in range.  Renumbering should be safe on all platforms because of the use of the 'f' and 'b' hints.

#1 can be solved by introducing in-asm #ifdefs, as long as the assembler is always called by the compiler driver after running the preprocessor.  The same can be done for #3, as an alternative to --noexecstack.

Any x86 Mac ever sold is guaranteed to support MMX, SSE, and SSE2.  SSE3 is available in all shipping CPUs, but Apple hasn't committed to it as they have to MMX, SSE, and SSE2.
Mark, You're invited to attach a patch to this bug to solve this for X86 macs.
:)
Attached patch Patch (obsolete) — Splinter Review
This implements the suggestions from comment 1.  It also adds leading underscores to symbol names, because Darwin as does not do this automatically.

I also made changes to reduce unnecessary allocation of stack space.  These changes can be brought over to the other x86 mpi implementations.  (Aside: I found it odd that div_2dx1d omits the frame pointer but the other functions don't.)

mp_digit was defined as unsigned long long (64-bit) but this asm code assumes that mp_digits are 32 bits wide, so MP_USE_UINT_DIGIT is defined to force mp_digit to 32 bits.

The patch also includes changes to the mpi tests to allow them to build (mpcpucache.o was missing) and run without requiring . to be in the $PATH.
Attachment #218305 - Flags: review?(nelsonbhchan)
Attachment #218305 - Flags: review?(nelsonbhchan) → review?(nelson)
Mark, thanks a lot for the patch.

In bug 326482 we have a patch for mpi_x86.s to allow it
to decide at run time if it can use SSE2 instructions.
The patch is titled "Use SSE2 multiply instructions on
intel processors." (attachment 215739 [details] [diff] [review])  So you can also
create a Mac-specific version of mpi_x86.s that just
uses the SSE2 assembly code in that patch.
Attached file raw mpi_sse2 code.
Mark, here is the original SSE2 code I did for intel processors if you don't want to split out the SSE2 code from the current mpi_x86.s code. It doesn't have your Mac specific changes.

bob
Attachment #218349 - Attachment is patch: false
Attached patch Patch, raw sse2 (obsolete) — Splinter Review
Thanks, Bob!  I took your raw sse2 version but kept the Darwin #ifdefs in case there are other platforms want to take advantage of the raw sse2 code.

It's an incredible pain to run the mpi tests with the asm implementations.  It looks like the Makefile in mpi has fallen out of a good state of repair since everything is built from freebl.  Is there interest in fixing that Makefile?  Would it be a better idea to move the mpi tests into the rest of the NSS test suite?
Attachment #218372 - Flags: review?
Attachment #218372 - Flags: review? → review?(nelson)
There are two files attached to this bug, both awaiting my review..
They seem to overlap each other, significantly.
Is one intended to replace the other?

Bob,  Your attachment 215739 [details] [diff] [review] to bug 326482 adds SSE2 code to mpi.  
It already has two reviews, but is not yet checked in.
Attachment 218372 [details] [diff] above seems to be essentially a duplicate of attachment 215739 [details] [diff] [review]. Was it your intent to duplicate that file for the MAC x86 platform?

I'd certainly rather that we have only ONE source file containing that
source code, perhaps ifdef'ed for different platforms if necessary.
I'd rather not have multiple copies of the file checked in with very
different names for slightly-different platforms.  
Comment on attachment 218305 [details] [diff] [review]
Patch

Nelson, please review the SSE2 one unless someone has a good reason to not check in an SSE2-only implementation.

In fact, it's possible, with a few more #ifdefs, to have a single asm source file cover many unixish platforms in any of three configurations:

 - never SSE2
 - dynamic SSE2 detection at runtime
 - always SSE2

We might even be able to roll the i86pc implementation, currently used on Solaris, into the same file.  Is that the way you'd like to proceed?
Attachment #218305 - Flags: review?(nelson)
Priority: -- → P3
(In reply to comment #9)

> In fact, it's possible, with a few more #ifdefs, to have a single asm source
> file cover many unixish platforms in any of three configurations:
> 
>  - never SSE2
>  - dynamic SSE2 detection at runtime
>  - always SSE2
> 
> We might even be able to roll the i86pc implementation, currently used on
> Solaris, into the same file.  Is that the way you'd like to proceed?

Yes.  Note that SSE2 is often a win on Intel CPUs, and seldom (er, much
less often) on AMD.  That's why we need to continue to support non-SSE2.
A one-source-fits-all solution would be potentially best, assuming it's 
not so ifdeffed that it's more difficult to understand than separate source
files.  
Comment on attachment 218372 [details] [diff] [review]
Patch, raw sse2

Oh, my previous comments were intended to convey r-.  Sorry.
Attachment #218372 - Flags: review?(nelson) → review-
Comment on attachment 218372 [details] [diff] [review]
Patch, raw sse2

>+.globl s_mpv_mul_d
>+TYPE_FUNCTION(s_mpv_mul_d)
>+s_mpv_mul_d:

For each of these functions, we also need to add
the .private_extern directive so that the symbols
do not become global.  See bug 568600.

Is .private_extern an Apple extension of GCC?
I just discovered that NSS is still not using x86 assembly code
on Mac OS X.

I updated Mark's "Patch, raw sse2" (attachment 218372 [details] [diff] [review])
to the current NSS trunk.  This patch will be used as a baseline
for further changes.
Attachment #218305 - Attachment is obsolete: true
Attachment #218372 - Attachment is obsolete: true
Mark, could you take a look at this before I ask Bob for a review?

I experimented with some flags.
1. -DMP_NO_MP_WORD hurts performance, especially if no assembly code
   is used.
2. -DMP_CHAR_STORE_SLOW -DMP_IS_LITTLE_ENDIAN makes little difference,
   seems slightly slower (roughly 1/740).

I made the following two changes to mpi_sse2.s:
1. I added the .private_extern directives to the functions so that
   they are not exported from libfreebl4.dylib.
2. I compared with the current SSE2 code in mpi_x86.s and modified
   mpi_sse2.s accordingly.
Attachment #650635 - Attachment is obsolete: true
Attachment #650651 - Flags: review?(mark)
For some reason, patch interdiffs don't work.  Here are the differences
between the mpi_sse2.s files in the two patches.
Attachment #650654 - Flags: review?(mark)
Comment on attachment 650651 [details] [diff] [review]
Patch, raw sse2, v2 (checked in)

I have compared mpi_sse2.s with the SSE2 assembly code in mpi_x86.s
carefully, and tested this patch.  I am now confident to ask Bob to
review it.

I tried to make mpi_x86.s compilable on Mac OS X, but it is nontrivial.
I suggest that we not let that block the use of x86 assembly code on
Mac OS X.

Here are the RSA performance numbers returned by the
    rsaperf -n none -p 20
command.

Before this patch:

Using freebl with 1024 bits key.
Using hardcoded 1024 bits key.
6997 iterations in 20.001 seconds
349.82 operations/s .
one operation every 2858 microseconds

After this patch:

Using freebl with 1024 bits key.
Using hardcoded 1024 bits key.
14876 iterations in 20 seconds
743.79 operations/s .
one operation every 1344 microseconds

So this patch doubles the RSA performance.
Attachment #650651 - Flags: superreview?(rrelyea)
Attachment #650651 - Flags: review?(mark) → review+
Attachment #650654 - Flags: review?(mark)
Comment on attachment 650651 [details] [diff] [review]
Patch, raw sse2, v2 (checked in)

Patch checked in on the NSS trunk (NSS 3.14).

Checking in mpi/all-tests;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/all-tests,v  <--  all-tests
new revision: 1.5; previous revision: 1.4
done

Checking in Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v  <--  Makefile
new revision: 1.122; previous revision: 1.121
done
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/mpi/mpi_sse2.s,v
done
Checking in mpi/mpi_sse2.s;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/mpi_sse2.s,v  <--  mpi_sse2.s
initial revision: 1.1
done
Attachment #650651 - Attachment description: Patch, raw sse2, v2 → Patch, raw sse2, v2 (checked in)
Assignee: nobody → mark
Status: NEW → RESOLVED
Closed: 12 years ago
Priority: P3 → P2
Hardware: PowerPC → x86
Resolution: --- → FIXED
Target Milestone: --- → 3.14
Attachment #650651 - Flags: superreview?(rrelyea)
You need to log in before you can comment on or make changes to this bug.