Closed Bug 274518 Opened 16 years ago Closed 15 years ago

SSL close layer function is too CPU intensive

Categories

(NSS :: Libraries, enhancement, P1)

3.9.4
enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: julien.pierre, Assigned: nelson)

Details

Benchmarking of SSL on the server-side shows that the SSL layer close function
is much more expensive than the base NSPR layer close socket function.

I used a dtrace script I wrote to collect data on a running selfserv process
which was doing full handshakes to measure where the CPU is going . I also
proceeded to decompose the SSL close layer function . Here is the interesting
part of the results :

CPU time (us) - per CPU second :

  PR_PopIOLayer                                                  3826
  ssl3_FreeKeyPair                                               4142
  SECKEY_DestroyPrivateKey                                       6230
  CERT_DestroyCertificateList                                    7100
  ssl_DestroyLocks                                              10150
  ssl_DestroySecurityInfo                                       16680
  ssl_DestroyGather                                             28216
  close                                                         33014
  pt_Close                                                      40390
  ssl3_DestroySSL3Info                                          70994
  ssl_DestroySocketContents                                    150391
  ssl_FreeSocket                                               172364
  ssl_DefClose                                                 224974
  PR_Close                                                     255919
  Total                                                       1000665

Please ignore the fact that total is not exactly 1 million us - there is a
limitation of dtrace that prevents me from getting it completely right (long
story!), but the data is overall accurate still . Note that all the data above
is inclusive .

The table shows that 25.5% of the CPU time for the selfserv process went to
PR_Close .

Only 4% were in pt_Close, the NSPR layer function; and 3.3% in the close system
call.

The rest is all due to SSL specific overhead . It would appear that we have a
private copy of the entire server socket configuration in the connection socket,
and destroying it is very expensive .

Interestingly, the call which makes the copy of the config, SSL_ImportFD (not
measured in the above run), is not nearly as expensive as the SSL close call,
only about one fifth . So we should focus on the close first .

One way to possibly get a big boost would be maybe to refcount the model
socket's configuration somehow, and refer to it in the connection sockets. We
would only need to make a copy only if the socket config actually changes .

Of course, this is only an expedient fix and still leaves a lot of situations
with overhead, though it may help specific benchmarks (specweb 99) given the way
they use NSS .

The proper way to fix this is probably to take a much deeper look at libssl and
all the socket structures and figure out a better systematic approach to deal
with this problem .
Sorry, the server was doing restart handshakes in the previous test. For full
handshakes, the data looks different . For the most part, the server is spending
more time in crypto and has fewer sockets and thus fewer calls to PR_Close . But
overall, the ratios of time subfunctions within the close method look the same .

CPU time (us) - per CPU second :

  PR_PopIOLayer                                                  1220
  ssl3_FreeKeyPair                                               1267
  SECKEY_DestroyPrivateKey                                       2076
  CERT_DestroyCertificateList                                    2284
  ssl_DestroyLocks                                               3405
  ssl_DestroySecurityInfo                                        5623
  ssl_DestroyGather                                              7727
  close                                                         11215
  pt_Close                                                      13279
  ssl3_DestroySSL3Info                                          20652
  ssl_DestroySocketContents                                     42418
  ssl_FreeSocket                                                49021
  ssl_DefClose                                                  65255
  PR_Close                                                      74620
  Total                                                       1000696
This is a critical bottleneck to SSL performance with NSS, especially with
hardware tokens. Marking P1 .
Severity: normal → enhancement
Priority: -- → P1
Target Milestone: --- → 3.10
The call to ssl3_FreeKeyPair is to destroy the stepdown keypair, which gets
copied . When the fix for bug 148452 is in, this function will no longer be called.

I found that the lock creation/deletion accounts for about 0.7% of the time in a
full handshake test. Thus, recycling SSL socket state objects would be useful,
to avoid constantly creating and freeing those locks . FYI, the cost of the lock
operations themselves was 0.3% in the full handshake test.

Most of the time spent in ssl3_DestroySSL3Info is spent destroying PK11Context
objects . Again, we should recycle state objects, so that we don't have to free
and reallocate the PKCS#11 sessions associated with them .
Assignee: wtchang → nelson
QA Contact: bishakhabanerjee → jason.m.reid
Planned to be part of the performance work in 3.11
Target Milestone: 3.10 → 3.11
Per our meeting today, this was fixed by the bypass.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.