Closed Bug 274518 Opened 21 years ago Closed 20 years ago

SSL close layer function is too CPU intensive

Categories

(NSS :: Libraries, enhancement, P1)

3.9.4
enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: julien.pierre, Assigned: nelson)

Details

Benchmarking of SSL on the server-side shows that the SSL layer close function is much more expensive than the base NSPR layer close socket function. I used a dtrace script I wrote to collect data on a running selfserv process which was doing full handshakes to measure where the CPU is going . I also proceeded to decompose the SSL close layer function . Here is the interesting part of the results : CPU time (us) - per CPU second : PR_PopIOLayer 3826 ssl3_FreeKeyPair 4142 SECKEY_DestroyPrivateKey 6230 CERT_DestroyCertificateList 7100 ssl_DestroyLocks 10150 ssl_DestroySecurityInfo 16680 ssl_DestroyGather 28216 close 33014 pt_Close 40390 ssl3_DestroySSL3Info 70994 ssl_DestroySocketContents 150391 ssl_FreeSocket 172364 ssl_DefClose 224974 PR_Close 255919 Total 1000665 Please ignore the fact that total is not exactly 1 million us - there is a limitation of dtrace that prevents me from getting it completely right (long story!), but the data is overall accurate still . Note that all the data above is inclusive . The table shows that 25.5% of the CPU time for the selfserv process went to PR_Close . Only 4% were in pt_Close, the NSPR layer function; and 3.3% in the close system call. The rest is all due to SSL specific overhead . It would appear that we have a private copy of the entire server socket configuration in the connection socket, and destroying it is very expensive . Interestingly, the call which makes the copy of the config, SSL_ImportFD (not measured in the above run), is not nearly as expensive as the SSL close call, only about one fifth . So we should focus on the close first . One way to possibly get a big boost would be maybe to refcount the model socket's configuration somehow, and refer to it in the connection sockets. We would only need to make a copy only if the socket config actually changes . Of course, this is only an expedient fix and still leaves a lot of situations with overhead, though it may help specific benchmarks (specweb 99) given the way they use NSS . The proper way to fix this is probably to take a much deeper look at libssl and all the socket structures and figure out a better systematic approach to deal with this problem .
Sorry, the server was doing restart handshakes in the previous test. For full handshakes, the data looks different . For the most part, the server is spending more time in crypto and has fewer sockets and thus fewer calls to PR_Close . But overall, the ratios of time subfunctions within the close method look the same . CPU time (us) - per CPU second : PR_PopIOLayer 1220 ssl3_FreeKeyPair 1267 SECKEY_DestroyPrivateKey 2076 CERT_DestroyCertificateList 2284 ssl_DestroyLocks 3405 ssl_DestroySecurityInfo 5623 ssl_DestroyGather 7727 close 11215 pt_Close 13279 ssl3_DestroySSL3Info 20652 ssl_DestroySocketContents 42418 ssl_FreeSocket 49021 ssl_DefClose 65255 PR_Close 74620 Total 1000696
This is a critical bottleneck to SSL performance with NSS, especially with hardware tokens. Marking P1 .
Severity: normal → enhancement
Priority: -- → P1
Target Milestone: --- → 3.10
The call to ssl3_FreeKeyPair is to destroy the stepdown keypair, which gets copied . When the fix for bug 148452 is in, this function will no longer be called. I found that the lock creation/deletion accounts for about 0.7% of the time in a full handshake test. Thus, recycling SSL socket state objects would be useful, to avoid constantly creating and freeing those locks . FYI, the cost of the lock operations themselves was 0.3% in the full handshake test. Most of the time spent in ssl3_DestroySSL3Info is spent destroying PK11Context objects . Again, we should recycle state objects, so that we don't have to free and reallocate the PKCS#11 sessions associated with them .
Assignee: wtchang → nelson
QA Contact: bishakhabanerjee → jason.m.reid
Planned to be part of the performance work in 3.11
Target Milestone: 3.10 → 3.11
Per our meeting today, this was fixed by the bypass.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.