Closed Bug 493693 Opened 11 years ago Closed 11 years ago

SSE2 instructions for bignum are not implemented on OS/2


(NSS :: Libraries, enhancement, P3)



(Not tracked)



(Reporter: julien.pierre, Assigned: julien.pierre)


(Whiteboard: FIPS)


(2 files)

All of the old bignum optimizations for bignum from Nelson that I had put in years ago were only working with the IBM VACPP compiler, which is no longer supported. So, right now, NSS is using a pure C implementation with gcc, and it is very slow.

rsaperf only manages about 140 1024-bit private key RSA ops/s on a 2.4 GHz AMD Phenom 9750 with a single thread. With 4 threads, using all 4 cores (I'm running the OS/2 SMP kernel), it gets to about 560 ops/s.

Porting the assembly code from Linux to OS/2 was very simple since they both use gcc and the same CPU.

The RSA performance went up to 389 ops/s with a single thread, and 1493 ops/s with all 4 cores. This is almost twice as good as Windows manages on the same hardware with the 32-bit build (I have multi-boot). That says to me that there is something suboptimal with our Windows SSE2 implementation. I may open a separate bug about that when I confirm the result.

Patch forthcoming for this OS/2 issue.
Unfortunately, even after review, this will have to wait for check-in due to the FIPS code freeze.
Attachment #378261 - Flags: review?(mozilla)
Severity: normal → enhancement
Priority: -- → P3
Whiteboard: FIPS [Awaiting Softoken's Thaw]

FYI, the command to check the performance change from this patch is

rsaperf -n none -p 30 -t x

Where x is the number of threads you want to use. typically 1, unless you have a multi-core CPU and OS/2 SMP kernel.

rsaperf will be built only if you go to mozilla/security/nss/cmd and build from there. It will be in mozilla/dist/OBJDIR/bin

I just rebooted the machine to Vista, and it got the same performance in 32 bit with SSE2, slightly better, actually, 1521 ops/s with 4 cores. This is much better than I reported previously in - 775 ops/s . I have been tuning it a little bit in the BIOS and reinstalled the OS, but it's still the same motherboard and CPU, so it's hard to see what could have caused the performance to nearly double. Anyway, to make a long story short, the SSE2 performance is on par between OS/2 and Vista 32 bit on this AMD Phenom now, so all should be good with this patch.
Comment on attachment 378261 [details] [diff] [review]
Add SSE2 optimizations to OS/2 gcc build

Julien, it's really great that you got OS/2 working again. :-)

Unfortunately, I still have problems to build NSS, the only way that works is if I copy it into a Firefox tree and do "make tier_toolkit" and that takes awfully long and does not build rsaperf. If there is a trick for OS/2 that is not listed on could you add it? (If I follow that, it already fails when building now.c in NSPR. If I just try to build rsaperf from within the Firefox tree it cannot find the NSPR headers.)

I'm not aware of any tricks on OS/2, except the need to run autoconf in NSPR, since the configure script that's checked in to NSPR does not support OS/2.
I suspect that's what you are running into. Perhaps the browser build automatically takes care of this requirement somehow.
I hope this helps.
Attachment #378261 - Flags: review?(mozilla) → review+
Comment on attachment 378261 [details] [diff] [review]
Add SSE2 optimizations to OS/2 gcc build

Great, thanks, that helped. Now I can build and indeed my rsaperf rates in terms of ops in 30s also go up by almost a factor of 3 (up from 5950/23600 to 17400/67609 for 1/4 cores on my Core2Quad 3GHz CPU). Cool. :-)

Not knowing NSS I wonder if (in my daily browser usage) I would see the a real-world difference...
BTW, is lib/freebl/mpi/Makefile.os2 not obsolete? Perhaps this would be a chance to remove it. (Why didn't we remove it with VACPP support?)

The Makefiles under the mpi directory are used to build the mpi library standalone, separately from freebl and the rest of NSS. Most of them don't work right. I don't think it's worth fixing Makefile.os2 to build with gcc on OS/2 since nobody will ever use that Makefile. It should just be deleted.
Re: comment 6, no, I don't think you will see much difference for browser usage, except when generating keys for your own certificate when you sign up for say, an SSL client cert or an S/MIME cert. The RSA key generation tends to be one slow operation (up to several seconds, depending on the key size), and the patch will speed it up significantly. This is typically something you do about once a year or so, if at all. The patch is more likely to help SSL servers that use NSS, but I don't know of any that runs on OS/2 except our selfserv test server under mozilla/security/nss/cmd/selfserv . However, it's possible that somebody could port mod_nss to OS/2 and make Apache work with NSS instead of mod_ssl, which uses OpenSSL. Then this patch would help a lot.
Target Milestone: --- → 3.12.4
Checking in Makefile;
/cvsroot/mozilla/security/nss/lib/freebl/Makefile,v  <--  Makefile
new revision: 1.107; previous revision: 1.106
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/mpi/mpi_x86_os2.s,v
Checking in mpi/mpi_x86_os2.s;
/cvsroot/mozilla/security/nss/lib/freebl/mpi/mpi_x86_os2.s,v  <--  mpi_x86_os2.s

initial revision: 1.1
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: FIPS [Awaiting Softoken's Thaw] → FIPS
Attachment #378428 - Attachment mime type: application/octet-stream → application/zip
You need to log in before you can comment on or make changes to this bug.