Closed Bug 293091 Opened 20 years ago Closed 8 years ago

Client-side SSL performance is 1/2 the speed for full handshakes as for restart handshakes

Categories

(NSS :: Libraries, defect, P3)

3.10
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: julien.pierre, Unassigned)

Details

Attachments

(1 file)

Problem :

I am using one big box with selfserv, which is capable of over 1000 full
handshakes/s and 4000 restart handshakes/s.

I am using a much smaller client box with strsclnt .

When using strsclnt with the -N option, I only get about 440 ops/s .

When using strsclnt without the -N option, I get about 930 ops/s.

In both cases, the client CPU is fully saturated, and the server CPU is not
close to being saturated. The ops/s are measured at the server .

This means our client-side SSL code is half the speed for full handshakes as for
restart handshakes . There is no big difference in the protocol that should
account for this result. The expected result is that the two ops/s numbers
should be very close .
Why do you think the 2 sides should be very close?

In full handshakes, the client does at least 2 probably 3 RSA public key
operations. These operations, on average, are small compared to the private key
operations done by the server, but they aren't exactly neglible. They would
certainly seem to be significate compared simply running some random numbers
through a hash and starting mac and RC4 operations.

That being said, the client side of NSS has not been tuned. Other than your test
case (where you are trying to simulate lots of clients going to the server), do
you have a need to make lots of outgoing client connections? (I can think of a
few possibilities, I just want to make sure they are needed).

Anyway I wouldn't expect the client to do full handshakes nearly as fast as it
can do restart handshakes. What is interesting is the server isn't saturated.
I'd still expect the server cost to handshake to be greater than the client...
are the machines particuarly misbalances (a server which is 4x faster than the
client would cause that).

bob
The interesting question is why the server isn't saturated. Is the se
Bob,

I forgot to mention I was using the -o parameter on the client side to skip
server cert verification. That should remove at least one RSA public key op per
SSL op. My client machine is capable of about 3000 RSA public key ops/s
(measured with 2 threads and the enhanced rsaperf with PKCS#11 session keys).

There is in fact a need to generate lots of connections to drive a server - this
is for benchmarking purposes. The better the client performs, the fewer client
machines are needed. We need to generate lots of full handshakes to stress boxes
that have high-speed RSA accelerators (in the 10,000 ops/s range) and multi-core
CPU.

In my little test, I was using software crypto, and the server cost of full
handshakes is of course higher than restarts. The 2 machines are misbalanced.
The client is a dual CPU ultrasparc III @ 900 MHz (sun blade 2500). The server
is a dual CPU opteron @ 2 GHz (sun java workstation 2100z). I would say the
server is about 3x as fast as the client .

In the full handshake case, the client is running at 0% idle CPU, and the server
is at 60% idle, with a rate of 445 connections/s.

In the restart handshake case, the client is running at 0% idle CPU, and the
server is at 73% idle, with a rate of 945 connections/s.

It looks like the last part of your comment #1 got truncated. Did you mean to
ask another question ?
It turns out the -o flag doesn't actually skip the server cert verification, it
only makes it OK if it fails. We need another option to skip the actual check. 

When skipping the call to SSL_AuthCertificate, I can drive 600 ops/s from my
client instead of 445 ops/s . This is a good improvement, but still well below
the 945 ops/s of the restart case .
Attachment #182807 - Flags: review?(nelson)
The truncated message was evidently a fragment I didn't remove when I rewrote
the previous paragraph.

So, yes, I would be worried if you had that big a performance difference and you
weren't doing certificate authentication. I'm still confused, however, about the
assumption that you expect client handshake performance to equal client restart
performance... It won't. You still need to do at least one RSA public key
operation to complete a full handshake... on a full handshake you also go through
the key derivation operation twice... once to generate the master key from the
pms and once to generate your keyset, so cryptographically you are doing at
least twice the work as you are on a restart handshake.

Anyway the numbers you are getting sound exactly what I would have expected,
though I could be convinced otherwise (if RSA Public KeyOps are

I am still mystified why the client is pegged and the server is idle. Even
though the client has to do more than twice the work on full handshakes, the
server has to do much more (it has to do all the work of the client does, plus
it has to do an RSA private keyOP  rather than a public keyOp).

Of course one way to look at this problem is to say "Gee, the full handshakes
aren't nearly as slow compared to the restart handshakes as I would expect
(based only on the Cryptographic work), there must be extra overhead in our code
that is washing out the cryptographic differences between the two;)."

bob
Bob,

I'm well aware the full handshake is more cryptographically intensive even for
the client. In the past, in the server case, our measurements showed that the
crypto was responsible for under half the CPU time . I haven't run any profiles
on the client-side yet to find out what it is.

Regarding the question about why the client is pegged and the server idle, it is
no mystery. As I previously responded, the server is very fast. It's running
with hand-tuned AMD64 assembly for RSA that we contributed for 3.10 . rsaperf
with 2 threads shows the server box is capable of 2000 RSA private key ops/s . I
am able to drive the server to 100% CPU - but I need multiple clients to achieve
that result. I opened this bug because I found the number of clients required to
peak one server to be excessive.

Regarding your last comment, we have done extensive work on the non-crypto
overhead problem on the server-side . There is a fairly large amount of overhead
still left in 3.10 . I'm not running the code from the
NSS_PERFORMANCE_HACKS_BRANCH, which would improve performance by about 30%, both
by reducing overhead and improving crypto.
Comment on attachment 182807 [details] [diff] [review]
support multiple -o to completely skip server cert verification in strsclnt (checked in)

r=nelson.bolyard 
Looks OK
Attachment #182807 - Flags: review?(nelson) → review+
Thanks for the review. I checked the patch in to the tip :

Checking in strsclnt.c;
/cvsroot/mozilla/security/nss/cmd/strsclnt/strsclnt.c,v  <--  strsclnt.c
new revision: 1.40; previous revision: 1.39

And to NSS_PERFORMANCE_HACKS_BRANCH :

Checking in strsclnt.c;
/cvsroot/mozilla/security/nss/cmd/strsclnt/strsclnt.c,v  <--  strsclnt.c
new revision: 1.37.2.3; previous revision: 1.37.2.2
done
I did some further investigation using the code from
NSS_PERFORMANCE_HACKS_BRANCH instead of the tip on the client side. The SSL code
on that branch uses PKCS#11 for the RSA and key derivations, but goes direct to
freebl for everything else. The improvements from using that branch are
impressive. Using that code, the same client box was able to drive 875 full
handshakes per second, and 2010 restart handshakes per second. The server was
still not close to peaked CPU-wise - the client CPU was the limiting factor once
again. These results compare with 600 full handshakes and 940  restarts for the
tip, which uses PKCS#11 for every operation.

2020 / 875 is a 2.3 : 1 ratio in favor of the restart handshake on the client
side. I haven't run any profiles yet so I don't know where all this extra time
is going. However, I think it may still have to do in part with PKI . libssl
always creates CERTCertificate* for the incoming certs, even if the client app
doesn't want to verify the cert chain, as is the case here. So that would be a
penalty for the full handshake case. In theory the cert cache should mitigate
this effect, but perhaps it's not as efficient as it should be ? I'll be
experimenting some more. Also, the comparison might be more fair if the client
had a full PKCS#11 bypass - in this case the full handshake code is going partly
through PKCS#11, and the restart handshake code is not going through PKCS#11 at all.
QA Contact: bishakhabanerjee → jason.m.reid
Assignee: wtchang → nobody
QA Contact: jason.m.reid → libraries
Priority: -- → P3
Attachment #182807 - Attachment description: support multiple -o to completely skip server cert verification in strsclnt → support multiple -o to completely skip server cert verification in strsclnt (checked in)
Let's leave this at the one patch that landed 12 years ago.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: