Closed
Bug 293091
Opened 20 years ago
Closed 8 years ago
Client-side SSL performance is 1/2 the speed for full handshakes as for restart handshakes
Categories
(NSS :: Libraries, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: julien.pierre, Unassigned)
Details
Attachments
(1 file)
Problem : I am using one big box with selfserv, which is capable of over 1000 full handshakes/s and 4000 restart handshakes/s. I am using a much smaller client box with strsclnt . When using strsclnt with the -N option, I only get about 440 ops/s . When using strsclnt without the -N option, I get about 930 ops/s. In both cases, the client CPU is fully saturated, and the server CPU is not close to being saturated. The ops/s are measured at the server . This means our client-side SSL code is half the speed for full handshakes as for restart handshakes . There is no big difference in the protocol that should account for this result. The expected result is that the two ops/s numbers should be very close .
Comment 1•20 years ago
|
||
Why do you think the 2 sides should be very close? In full handshakes, the client does at least 2 probably 3 RSA public key operations. These operations, on average, are small compared to the private key operations done by the server, but they aren't exactly neglible. They would certainly seem to be significate compared simply running some random numbers through a hash and starting mac and RC4 operations. That being said, the client side of NSS has not been tuned. Other than your test case (where you are trying to simulate lots of clients going to the server), do you have a need to make lots of outgoing client connections? (I can think of a few possibilities, I just want to make sure they are needed). Anyway I wouldn't expect the client to do full handshakes nearly as fast as it can do restart handshakes. What is interesting is the server isn't saturated. I'd still expect the server cost to handshake to be greater than the client... are the machines particuarly misbalances (a server which is 4x faster than the client would cause that). bob The interesting question is why the server isn't saturated. Is the se
| Reporter | ||
Comment 2•20 years ago
|
||
Bob, I forgot to mention I was using the -o parameter on the client side to skip server cert verification. That should remove at least one RSA public key op per SSL op. My client machine is capable of about 3000 RSA public key ops/s (measured with 2 threads and the enhanced rsaperf with PKCS#11 session keys). There is in fact a need to generate lots of connections to drive a server - this is for benchmarking purposes. The better the client performs, the fewer client machines are needed. We need to generate lots of full handshakes to stress boxes that have high-speed RSA accelerators (in the 10,000 ops/s range) and multi-core CPU. In my little test, I was using software crypto, and the server cost of full handshakes is of course higher than restarts. The 2 machines are misbalanced. The client is a dual CPU ultrasparc III @ 900 MHz (sun blade 2500). The server is a dual CPU opteron @ 2 GHz (sun java workstation 2100z). I would say the server is about 3x as fast as the client . In the full handshake case, the client is running at 0% idle CPU, and the server is at 60% idle, with a rate of 445 connections/s. In the restart handshake case, the client is running at 0% idle CPU, and the server is at 73% idle, with a rate of 945 connections/s. It looks like the last part of your comment #1 got truncated. Did you mean to ask another question ?
| Reporter | ||
Comment 3•20 years ago
|
||
It turns out the -o flag doesn't actually skip the server cert verification, it only makes it OK if it fails. We need another option to skip the actual check. When skipping the call to SSL_AuthCertificate, I can drive 600 ops/s from my client instead of 445 ops/s . This is a good improvement, but still well below the 945 ops/s of the restart case .
| Reporter | ||
Comment 4•20 years ago
|
||
| Reporter | ||
Updated•20 years ago
|
Attachment #182807 -
Flags: review?(nelson)
Comment 5•20 years ago
|
||
The truncated message was evidently a fragment I didn't remove when I rewrote the previous paragraph. So, yes, I would be worried if you had that big a performance difference and you weren't doing certificate authentication. I'm still confused, however, about the assumption that you expect client handshake performance to equal client restart performance... It won't. You still need to do at least one RSA public key operation to complete a full handshake... on a full handshake you also go through the key derivation operation twice... once to generate the master key from the pms and once to generate your keyset, so cryptographically you are doing at least twice the work as you are on a restart handshake. Anyway the numbers you are getting sound exactly what I would have expected, though I could be convinced otherwise (if RSA Public KeyOps are I am still mystified why the client is pegged and the server is idle. Even though the client has to do more than twice the work on full handshakes, the server has to do much more (it has to do all the work of the client does, plus it has to do an RSA private keyOP rather than a public keyOp). Of course one way to look at this problem is to say "Gee, the full handshakes aren't nearly as slow compared to the restart handshakes as I would expect (based only on the Cryptographic work), there must be extra overhead in our code that is washing out the cryptographic differences between the two;)." bob
| Reporter | ||
Comment 6•20 years ago
|
||
Bob, I'm well aware the full handshake is more cryptographically intensive even for the client. In the past, in the server case, our measurements showed that the crypto was responsible for under half the CPU time . I haven't run any profiles on the client-side yet to find out what it is. Regarding the question about why the client is pegged and the server idle, it is no mystery. As I previously responded, the server is very fast. It's running with hand-tuned AMD64 assembly for RSA that we contributed for 3.10 . rsaperf with 2 threads shows the server box is capable of 2000 RSA private key ops/s . I am able to drive the server to 100% CPU - but I need multiple clients to achieve that result. I opened this bug because I found the number of clients required to peak one server to be excessive. Regarding your last comment, we have done extensive work on the non-crypto overhead problem on the server-side . There is a fairly large amount of overhead still left in 3.10 . I'm not running the code from the NSS_PERFORMANCE_HACKS_BRANCH, which would improve performance by about 30%, both by reducing overhead and improving crypto.
Comment 7•20 years ago
|
||
Comment on attachment 182807 [details] [diff] [review] support multiple -o to completely skip server cert verification in strsclnt (checked in) r=nelson.bolyard Looks OK
Attachment #182807 -
Flags: review?(nelson) → review+
| Reporter | ||
Comment 8•20 years ago
|
||
Thanks for the review. I checked the patch in to the tip : Checking in strsclnt.c; /cvsroot/mozilla/security/nss/cmd/strsclnt/strsclnt.c,v <-- strsclnt.c new revision: 1.40; previous revision: 1.39 And to NSS_PERFORMANCE_HACKS_BRANCH : Checking in strsclnt.c; /cvsroot/mozilla/security/nss/cmd/strsclnt/strsclnt.c,v <-- strsclnt.c new revision: 1.37.2.3; previous revision: 1.37.2.2 done
| Reporter | ||
Comment 9•20 years ago
|
||
I did some further investigation using the code from NSS_PERFORMANCE_HACKS_BRANCH instead of the tip on the client side. The SSL code on that branch uses PKCS#11 for the RSA and key derivations, but goes direct to freebl for everything else. The improvements from using that branch are impressive. Using that code, the same client box was able to drive 875 full handshakes per second, and 2010 restart handshakes per second. The server was still not close to peaked CPU-wise - the client CPU was the limiting factor once again. These results compare with 600 full handshakes and 940 restarts for the tip, which uses PKCS#11 for every operation. 2020 / 875 is a 2.3 : 1 ratio in favor of the restart handshake on the client side. I haven't run any profiles yet so I don't know where all this extra time is going. However, I think it may still have to do in part with PKI . libssl always creates CERTCertificate* for the incoming certs, even if the client app doesn't want to verify the cert chain, as is the case here. So that would be a penalty for the full handshake case. In theory the cert cache should mitigate this effect, but perhaps it's not as efficient as it should be ? I'll be experimenting some more. Also, the comparison might be more fair if the client had a full PKCS#11 bypass - in this case the full handshake code is going partly through PKCS#11, and the restart handshake code is not going through PKCS#11 at all.
Updated•20 years ago
|
QA Contact: bishakhabanerjee → jason.m.reid
Updated•19 years ago
|
Assignee: wtchang → nobody
QA Contact: jason.m.reid → libraries
Updated•18 years ago
|
Priority: -- → P3
Updated•16 years ago
|
Attachment #182807 -
Attachment description: support multiple -o to completely skip server cert verification in strsclnt → support multiple -o to completely skip server cert verification in strsclnt (checked in)
Comment 10•8 years ago
|
||
Let's leave this at the one patch that landed 12 years ago.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•