strsclnt should not start threads for each connection; and should allow specifying ratio of full handshakes

RESOLVED FIXED in 3.11

Status

NSS
Tools
P2
enhancement
RESOLVED FIXED
13 years ago
12 years ago

People

(Reporter: Julien Pierre, Assigned: Julien Pierre)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 5 obsolete attachments)

(Assignee)

Description

13 years ago
It has been observed that we need more client CPU hardware in order to peak an
SSL server when doing SSL restart handshakes with strsclnt . The amount of CPU
should in fact be similar on each side. I traced the problem to inefficient
logic with strsclnt . It starts a thread for every connection. A lot of time is
being spent in starting threads and waiting for them. Since the restart
connections don't take a lot of CPU, the ratio of thread startup overhead is
significant.

Another defficiency in strsclnt is that you can only do 100% full handshakes
with the -N option, or 0% without it . In order to simulate benchmarks, I
propose to add a -P <x> option where x is the percentage of full handshakes desired.

Patch forthcoming.
(Assignee)

Updated

13 years ago
Severity: normal → enhancement
Priority: -- → P2
Target Milestone: --- → 3.10.1
(Assignee)

Comment 1

13 years ago
Created attachment 182021 [details] [diff] [review]
strsclnt patch

This patch distributes the connections between max_threads, as opposed to
starting a thread for each new connection up to max_threads, waiting for
connections to end, and starting threads again, until all connections are done
.

The patch does not change the number of connections that strsclnt with do, as
passed to -c , they are only run more efficiently.

This has the effect of improving the ops/s by 16%. I measured the ops/s with
selfserv on an Opteron server, and strsclnt running on both Solaris Sparc and
Opteron, when doing restart handshakes . On a Win2k client and a Pentium4 with
hyperthreading, the improvement was 26% !
(Assignee)

Updated

13 years ago
Attachment #182021 - Flags: review?(nelson)
(Assignee)

Comment 2

13 years ago
Comment on attachment 182021 [details] [diff] [review]
strsclnt patch

Wan-Teh,

Would you mind reviewing this patch ?
Attachment #182021 - Flags: superreview?
Attachment #182021 - Flags: review?(wtchang)
(Assignee)

Comment 3

13 years ago
Created attachment 182839 [details] [diff] [review]
merged patch

Updated the patch to merge with the current tip.
Attachment #182021 - Attachment is obsolete: true
Attachment #182839 - Flags: superreview?(wtchang)
Attachment #182839 - Flags: review?(nelson)
(Assignee)

Updated

13 years ago
Attachment #182021 - Flags: superreview?
Attachment #182021 - Flags: review?(wtchang)
Attachment #182021 - Flags: review?(nelson)

Comment 4

13 years ago
Comment on attachment 182839 [details] [diff] [review]
merged patch

Julien, this patch has some minor problems.

> void
> thread_wrapper(void * arg)
> {
>     perThread * slot = (perThread *)arg;
> 
>     /* wait for parent to finish launching us before proceeding. */
>     PR_Lock(threadLock);
>     PR_Unlock(threadLock);
> 
>-    slot->rv = (* slot->startFunc)(slot->a, slot->b, slot->c);
>+    while (slot->connections--) {
>+        slot->rv = (* slot->startFunc)(slot->a, slot->b, slot->c);
>+    }
>+    
> 
>     /* Handle cleanup of thread here. */
>     PRINTF("strsclnt: Thread in slot %d returned %d\n", 
> 	   slot - threads, slot->rv);

The PRINTF statement should be moved inside the while loop.

>+    if (fullhs) {
>+        PRInt32 savid = PR_AtomicIncrement(&globalconid);
>+        PRInt32 conid = 1 + (savid % 100);
>+        /* don't set this option on the very first handshake, which is always
>+           a full, so the session gets stored into the client cache */
>+        if (conid <= fullhs && savid != 1) {
>+            rv = SSL_OptionSet(ssl_sock, SSL_NO_CACHE, 1);
>+        }
>+    }

This doesn't always give the correct percentage.  For example, if I
say -c 50 -P 50, I will get 49 full handshakes, which are 98%.	You
can avoid this problem by requiring the total number of connections
to be multiples of 100.

It also has an off-by-one error even if the total number of connections
is a multiple of 100.  For example, if I say -c 100 -P 1, I will get two
full handshakes, as opposed to one (1% of 100):
  the first handshake (always a full handshake)
  the 100th handshake (saveid=100,conid=1)

This off-by-one error can be fixed by defining conid as follows:

    PRInt32 conid = 1 + (savid - 1) % 100;
    if (conid <= fullhs && savid != 1) {

or

    PRInt32 conid = (savid - 1) % 100;
    ...
    if (conid < fullhs && savid != 1) {

>+	/* Start up the threads */
>+	for (i=0;i<max_threads;i++) {
>+            int todo = connections / max_threads;
>+            if (0 == i) {
>+                todo += connections % max_threads;
>+            }
>+	    rv = launch_thread(do_connects, &addr, model_sock, i, todo);
>+	}

since connections/max_threads is integer division, todo
may be 0.  So you need to handle that case.  (I believe it
will just work with your code.)

Since the number of threads is now at most max_threads, you
should remove the code that keeps the number of threads from
exceeding max_threads.	I believe it is just the following
code at the beginnign of launch_thread:

    while (numRunning >= max_threads) {
	PR_WaitCondVar(threadStartQ, PR_INTERVAL_NO_TIMEOUT);
    }
    for (i = 0; i < numUsed; ++i) {
	if (threads[i].running == rs_idle)
	    break;
    }
    if (i >= numUsed) {
	if (i >= MAX_THREADS) {
	    /* something's really wrong here. */
	    PORT_Assert(i < MAX_THREADS);
	    PR_Unlock(threadLock);
	    return SECFailure;
	}
	++numUsed;
	PORT_Assert(numUsed == i + 1);
    }

>+	case 'P': fullhs = PR_MIN(100, PORT_Atoi(optstate->value)); break;

This case should be moved up to where the capital letter cases
are.

It may be better to fail if the specified percentage is greater than
100.

You need to handle the interaction between -N (no session reuse) and
-P (full handshake percentage).  For example, how would you handle
-N -P 50?  Fail with a usage message, or honor the last specified
option (-P 50) with a warning message that -N is ignored?
Attachment #182839 - Flags: superreview?(wtchang) → superreview-

Comment 5

13 years ago
Just wanted to clarify an earlier comment of mine.

I said:
> Since the number of threads is now at most max_threads, you
> should remove the code that keeps the number of threads from
> exceeding max_threads.	I believe it is just the following
> code at the beginnign of launch_thread:
> [ code listing omitted ]

I didn't mean you should remove all of the code I listed.
You should remove some of it and possibly update the rest.
(Assignee)

Updated

13 years ago
Attachment #182839 - Flags: review?(nelson)
(Assignee)

Comment 6

13 years ago
Created attachment 183115 [details] [diff] [review]
updated patch

Wan-Teh,

Thanks for the review.
This patch fixes
1) PRINTF moved
2) your fix for the off-by-one error. Thanks !
For the case where connections is not a multiple of 100, I don't think it
should be considered an an error - the percentage just won't be exact because
it can't be.
3) regarding the connections / max_threads and the possible zero result, I'm
aware of it. That's why I made the first thread always perform a number of
connections equal to the remainder of that division.
I also added a short-circuit in launch-thread for the case where connections is
zero.
4) regarding the command-line options, I moved the processing for -P, and now
check for a range (valid is 0 to 100) . I made the -N option incompatible with
-P - usage will be displayed.
5) regarding the existing queue logic, I believe none of it made any sense
anymore. The throttling feature was broken, since it's purpose was to stop
starting new threads, but it couldn't slow down existing threads making
existing connections. This patch removes it altogether and changes reap_threads
to be a loop of PR_JoinThreads .
Attachment #182839 - Attachment is obsolete: true
(Assignee)

Updated

13 years ago
Attachment #183115 - Flags: review?(wtchang)
The throttling feature did work, and IMO was necessary.  I think it must not
be eliminated, but rather replaced with a new implementation.

Comment 8

13 years ago
You can get exact percentage even when "connections" is
not a multiple of 100.  For example, 50% of 50 connections
is 25, exactly.

With your algorithm, you just need to add a special case
to handle the remainder (connections % 100).  The following
is an outline of how to do this.  (The code can be improved
and rewritten to suit your taste.)

    int connections;  /* total number of connections */
    int connections_in_100s;
    int connections_remainder;

    /* get "connections" from the -c command-line argument */
    ...
    connections_remainder = connections % 100;
    connections_in_100s = connections - connections_remainder;

    ...
    if (fullhs != NO_FULLHS_PERCENTAGE) {
        PRInt32 savid = PR_AtomicIncrement(&globalconid);
        PRInt32 conid = 1 + (savid - 1) % 100;
        /* don't set this option on the very first handshake, which is always
           a full, so the session gets stored into the client cache */
        if (savid <= connections_in_100s) {
            /* your current algorithm */
            if (conid <= fullhs && savid != 1) {
                rv = SSL_OptionSet(ssl_sock, SSL_NO_CACHE, 1);
            }
        } else {
            /* if conid/connections_remainder <= fullhs/100 */
            if (conid*100 <= connections_remainder*fullhs && savid != 1) {
                rv = SSL_OptionSet(ssl_sock, SSL_NO_CACHE, 1);
            }
        }
    }

I won't have time to review your new patch right away.  If that's
a problem, could you ask someone else to review it?
(Assignee)

Comment 9

13 years ago
Wan-Teh,

Thanks for that code, I will incorporate it.

Nelson,

The existing throttling feature worked poorly. I don't think it should be
re-implemented in the same way. The old algorithm throttled down the number of
threads in the client, but it would never throttle it back up. The client has no
way of telling if the server is temporarily overloaded (such as is the case with
a full listen queue); if it simply went away temporarily, perhaps because it
crashed or because it was manually stopped; or if suddenly has more available
sockets in its queue. When using multiple client machines with strsclnt, which
aren't aware of each other, throttling up is a necessity. Otherwise, if a client
throttles down (say, because the server was manually restarted), it needs to be
restarted in order to provide full load again. Perhaps there should be separate
options for throttling down and throttling down/up.
(Assignee)

Comment 10

13 years ago
Created attachment 183243 [details] [diff] [review]
fix percentages when connections is not a multiple of 100

This patch just incorporates Wan-Teh's code; it does not yet add throttling
back.
Attachment #183115 - Attachment is obsolete: true
(Assignee)

Updated

13 years ago
Attachment #183115 - Flags: review?(wtchang)

Comment 11

13 years ago
Comment on attachment 183243 [details] [diff] [review]
fix percentages when connections is not a multiple of 100

>+        if (savid <= total_connections_in_100s) {
>+            /* your current algorithm */
>+            if (conid <= fullhs && savid != 1) {
>+                rv = SSL_OptionSet(ssl_sock, SSL_NO_CACHE, 1);
>+            }
>+        } else {
>+            /* if conid/connections_remainder <= fullhs/100 */
>+            if (conid*100 <= total_connections_remainder*fullhs && savid != 1) {
>+                rv = SSL_OptionSet(ssl_sock, SSL_NO_CACHE, 1);
>+            }
>+        }
>+    }

Julien, please remove the "your current algorithm" comment :-)
Also you should factor out the test "savid != 1" in both cases.
(Assignee)

Comment 12

13 years ago
Created attachment 183331 [details] [diff] [review]
add throttling down by default, and up with -U option

Also documented command-line options that were missing in the usage .

The -U throttling up feature kicks in at a rate of +2 threads/s if there hasn't
been a connect failure in the last 10s, and if there hasn't been a more recent
connect failure than connect success . The total threads will never exceed the
original value specified on the command-line with -t .
Attachment #183243 - Attachment is obsolete: true
(Assignee)

Updated

13 years ago
Attachment #183331 - Flags: superreview?(wtchang)
Attachment #183331 - Flags: review?(nelson)
Comment on attachment 183331 [details] [diff] [review]
add throttling down by default, and up with -U option

I haven't yet full reviewed this code, but I see one issue that will ensure an
r-, many lines substantially exceed 80 columns.
(Assignee)

Comment 14

13 years ago
Nelson,

Please review the code in the patch. If the code is correct and the columns are
the only issue, I can truncate the lines before the checkin without an extra
patch and review.
QA Contact: bishakhabanerjee → jason.m.reid
(Assignee)

Comment 15

13 years ago
Nelson, Wan-Teh, could you please review the code in the latest patch here ? I
did a lot of work in may to get this in. This is needed so we can better
simulate the specweb client. Thanks.
(Assignee)

Comment 16

13 years ago
Retargetting for 3.11 .
Target Milestone: 3.10.1 → 3.11

Comment 17

13 years ago
Comment on attachment 183331 [details] [diff] [review]
add throttling down by default, and up with -U option

First, some global comments.

1. You need to document the meaning of each new global variable
and which lock protects them.

By reading your patch, I gathered the following:

- globalconid is atomically incremented.
- threadLock protects remaining_connections, lastConnectFailure,
  lastConnectSuccess, lastThrottleUp, numUsed, numRunning,
  max_threads, and max_max_threads.  It would be good to declare
  those variables right below the threadLock declaration.

2. PR_Now is a system call, so you should call it before you
lock threadLock.  This is especially important because your
patch makes threadLock a heavily used lock.

3. The decrement of remaining_connections does not take into
account connect failures.  So the actual completed connections
may be fewer than what's specified with the -c option.	This
may be the intended behavior.

4. You write code like this:

   if (!expr) {
       A;
   } else {
       B;
   }

It is better to say:

   if (expr) {
       B;
   } else {
       A;
   }

Similarly,
   if (!(a < b))
should be changed to
   if (a >= b)

5. Please use NSS's block comment style (* at the beginning of
every comment line).

Now the specific comments.

> /* This global string is so that client main can see 
>  * which ciphers to use. 
>  */
> 
>+#define NO_FULLHS_PERCENTAGE -1
>+
> static const char *cipherString;

The macro definition should be moved before the comment
because the comment describes the global variable cipherString.

>+static PRBool ThrottleUp = PR_FALSE;

Why do we always throttle down, but only throttle up if the
-U option is specified?

> static void
> Usage(const char *progName)
> {
>     fprintf(stderr, 
>     	"Usage: %s [-n nickname] [-p port] [-d dbdir] [-c connections]\n"
>-	"          [-3DNTovq] [-2 filename]\n"
>+ 	"          [-3DTovq] [-2 filename] [-P fullhandshakespercentage | -N]\n"
> 	"          [-w dbpasswd] [-C cipher(s)] [-t threads] hostname\n"

You need to list the -U option.

>+ 	"       -P means do a specified percentage of full handshakes\n"

Would be nice to mention the range (0-100 or 0.0-1.0).

> PRLock    * threadLock;
>-PRCondVar * threadStartQ;
>-PRCondVar * threadEndQ;
> 
> int         numUsed;
> int         numRunning;
> PRInt32     numConnected;
> int         max_threads = 8;	/* default much less than max. */
>-
>-typedef enum { rs_idle = 0, rs_running = 1, rs_zombie = 2 } runState;
>+int         max_max_threads;    /* peak threads allowed */

It seems that numRunning is set but not used, and therefore can
be deleted.  It was previously used to reap the threads.  Now
you make the threads joinable and use PR_JoinThread to reap them.

max_threads and max_max_threads are confusing.	Please add comments.
By reading the code, I gathered that max_threads is a moving
maximum that is throttled up and down, and max_max_threads is a
static, absolute maximum; and that max_threads cannot exceed
max_max_threads.

>+                    if ( (now - lastConnectFailure > 10000) &&
>+                        ( (!lastThrottleUp) || ( (now - lastThrottleUp) >= 500 ) ) &&
>+                        (lastConnectSuccess > lastConnectFailure) ) {
>+                        /* try throttling up by one thread */
>+                        max_threads = PR_MIN(max_max_threads, max_threads+1);
>+                        fprintf(stderr,"max_threads set up to %d\n", max_threads);
>+                        lastThrottleUp = now;
>+                    }

The time unit of PRTime is wrong. PRTime is in microseconds,
not milliseconds. You can use the PR_USEC_PER_SEC and
PR_USEC_PER_MSEC macros.

>     if (!NoReuse) {
>-	rv = launch_thread(do_connects, &addr, model_sock, i);
>-	--connections;
>-	++i;
>+        remaining_connections = 1;
>+	rv = launch_thread(do_connects, &addr, model_sock, 0);
> 	/* wait for the first connection to terminate, then launch the rest. */
> 	reap_threads();
>+        remaining_connections = total_connections - 1 ;
>     }

It is better to handle NoReuse as fullhs==100 and replace the
above code by simply

      if (NoReuse) {
	  fullhs = 100;
      }

>+	for (i=0;i<max_threads;i++) {
> 	    rv = launch_thread(do_connects, &addr, model_sock, i);
>-	    ++i;
>-	} while (--connections > 0);
>+	}

Since max_threads is protected by threadLock, we can't
use max_threads without lock protection as soon as we
create threads.  So you should save max_threads in a
local variable and use the local variable in the for
loop above.

Comment 18

13 years ago
Please ignore my previous comment about if (!NoReuse).
I misunderstood your patch.

But the relation between fullhs and NoReuse is confusing.
I suggest that we only use fullhs in the code, and let
the -N option set fullhs to 100.  fullhs probably could
be initialize to 0 instead of -1.

Here is some explanation about the -U option, which enables the ability of 
the code to attempt to "throttle up" as well as down.  It is a compromise
between the original intended purpose of throttling and a new purpose that
Julien desired.  

When strsclnt was devised, the objective was to complete the specified 
number of connections, with as few wasted (failed) connection attempts as 
possible.  In the early days of strsclnt, it would hammer a server with
connection requests.  Often it would overflow the server's connection 
request queue, resulting in MANY connections appearing to be "refused" or 
"reset by peer".  A number of things were tried to reduce the number of 
failed connection attempts.

Changing strsclnt to always retry every failed connection attempt caused
it to hammer the poor server even harder, causing a large amount of network
bandwidth and (more importantly) server CPU time to be spent dealing with
failed connection attempts.  One attempt added an increasing delay time
before successive attempt to reconnect.  But the problem was that if strsclnt
fundamentally had more threads trying to work that the server could 
sustain, then all throughout the entire run, we would experience failed
connection attempts.  This resulted in widely varying test results, even
on otherwise idle systems.  We'd see big differences in the performance
results between two successive runs on the same client/server/network.

So, in rev 1.8, I changed strsclnt to "throttle back" the number of threads 
that would simultaneously pound on the server, until connection failres 
stopped happening.  Then it would sustain that number going forward.  
The result was a greatly reduced number of failed connection attempts,
AND a great increase in the repeatability of test results, getting the 
same values from run to run.  The code was specifically made to NOT 
attempt to throttle back up to avoid a continual ongoing stream of 
connection failures during the test.  I view that behavior as requisite
to obtaining useful performance numbers from strsclnt.

More recently, Julien has begun to use strsclnt in a very different way,
not as a performance measurement, but merely as an ongoing server test.
He wants it to continually pound the server, no matter how the server responds. 
He wants to be able to stop the server in a debugger for a while (minutes), 
during which time the server will accept no new connections, and will throttle
down to a level approaching zero, then when he continues the server from the
breakpoint, he wants the server to throttle back up.  In reviewing some of 
the earlier patches for this RFE, I insisted that the code preserve its 
existing throttling behavior (which is useful for performance measurement), 
and suggested that any new throttling behavior be made optional, enabled by
a command line option.  

That's how -U came to be.   
In reply to comment 14, I want to review the code that is going to be
potentially checked in.  I think the practice of making a change and 
checking in the changed results without additional code review is OK for
VERY small changes, e.g. one line or two, where the change is very clearly
spelled out before hand and agreed by reviewer and patch writer.

But in this case, there is MUCH code that has to be changed to make it
fit in 80 columns.  That change would be of such a great magnitude that
I'd insist on another review before it was committed.  So, when you've
addressed Wan-Teh's comments, please fix the line length problems in the
next patch, and submit it for review again.  Thanks.
(Assignee)

Comment 21

13 years ago
Nelson,

Re : comment #19, Actually, I'm using strsclnt as a performance tool, even with
-U . The servers I'm testing have a very large accept queue. In my experience,
the only condition that ever seems to cause strsclnt to throttle down is when
the server goes down completely, because either it crashes or I turn it off to
reconfigure it. At that point, all my strsclnt on multiple machines throttle
down, and never go back up. The strsclnt processes on each client machine all
have to be manually CTRL-C'ed and restarted. I want the -U option so that I no
longer have to restart all my clients when I make a server-side change. This is
very helpful when testing a big server which requires a large number of client
machines to peak.

Re: comment #20 . I understand your concern about wanting to review the new
format for a large patch. But I would like to minimize the number of patches
that have to be created and reviewed. Creating new patches for each review
comment would make the development process way more time consuming than it
actually needs to be. So, I will hold off generating the next patch until you
take a complete pass at the code, and will address both of your possible issues
and Wan-Teh's together in one update.
Comment on attachment 183331 [details] [diff] [review]
add throttling down by default, and up with -U option

1234567890123456789012345678901234567890123456789012345678901234567890123456789
0

Overall this approach to throttling seems OK, but I have a few comments.

1) With this patch, the variable named "max_threads" is in no way a maximum.
I suggest renaming it current_threads or target_threads, and renaming
the new variable max_max_threads to be max_threads, since it really is a 
maximum.

2) The code should not attempt to start more threads than there are connections 
to be performed.  

3) If the user specifies 100 connections, that should be 100 succesful 
connections, not merely 100 connections attempted (with some refused).
As Wan-Teh pointed out, the remaining_connections logic will count 
attempted connections whether they actually succeed in connecting or
not, and hence will effectively not retry connections that fail to connect.  

The remaining_connections logic probably needs two counters, one for 
connections that have (or have not yet) been initiated, and one for
connections that have (or have not yet) been completed.  If you're
trying to do (say) 100 connections, you don't want to count the 
connection done until it's done (and the connection actually connected),
but you don't want to start a 101st connection just because the 100th
isn't done.  

4) In function thread_wrapper, there are two places that indent unnecessarily.
Since this new code tends to exceed 80 columns, reducing unnecessary 
indentation could help that a lot.

>+            if (!ThrottleUp) {
>+                die = PR_TRUE;
>+            } else {
>+                if (remaining_connections > 0) {
     ...
>+                } else {
>+                    /* no more connections left, we are done */
>+                    die = PR_TRUE;
>+                }
>+            }

could correctly be 

>+            if (!ThrottleUp) {
>+                die = PR_TRUE;
>+            } else if (remaining_connections > 0) {
      ...
>+            } else {
>+                /* no more connections left, we are done */
>+                die = PR_TRUE;
>+            }

In fact, that whole thing could be reduced to a single if-else statement.

>+            if (!ThrottleUp || remaining_connections <= 0) {
>+                die = PR_TRUE;
>+            } else {
>+                /* no more connections left, we are done */
      ...


Later in thread_wrapper,

>+        } else {
>+            /* this thread should run */
>+            if (--remaining_connections >= 0) {
>+                doop = PR_TRUE;
>+            } else {
>+                die = PR_TRUE;
>+            }
>+        }

could correctly be

>+        } else if (--remaining_connections >= 0) {
>+            /* this thread should run */
>+            doop = PR_TRUE;
>+        } else {
>+            die = PR_TRUE;
>+        }

5. I found the variables named total_connections_in_100s perplexing.
Maybe "hundreds_of_connections" and "connections_mod_100" ?

6. For the new mode which is neither all restart handshakes nor all
restart handshakes, but rather a mixture, using the SSL_NO_CACHE option to 
force new handshakes has some undesirable properties with respect to 
simulating real world behavior.  It does not cause the newly created
SSL sessions to ever be reused, as they would normally be, but instead
continually restarts that same one connection over and over.  

You can make it reuse the newly created SSL sessions by separating
the sessions using the function SSL_SetSockPeerID.  Create a small
string, and set it into each SSL socket with SSL_SetSockPeerID.
Change the string contents each time you want to force a new full
handshake, but not when you don't.
Attachment #183331 - Flags: review?(nelson) → review-
(Assignee)

Comment 23

13 years ago
Re Wan-Teh's comment #17 :

I will address 1, 2 and 5 in my upcoming patch.

Regarding 3, which is also Nelson's #3, I believe the behavior is unchanged from
the existing strsclnt, which took no precautions to count failed connections
separately from successful ones.
Also, it is worth noting that the program retries forever if the PR_Connect
errors are PR_CONNECT_REFUSED_ERROR or PR_CONNECT_RESET_ERROR . Connections will
only be missed if the error is different .
If this is of concern, I suggest a separate bug/RFE be filed.

4. I believe this is a matter of style. I wrote the expressions in the way that
makes the most logical sense to me, even though they may not be the shortest
ones to read.

I will address all your other issues. Nelson already answered about -U and why
the default is what it is.

Thanks for catching the error with the time unit. I was wondering why strsclnt
was throttling back up so fast ...
(Assignee)

Comment 24

13 years ago
Re : Nelson's comment #22

Thanks for the review.
My upcoming patch will clean up the indentation . This wasn't that big of a
change. The patch only added 9 lines over 80 columns, including 3 of comments .

1) I will address this, which was also a concern of Wan-Teh's.
2) I will fix this too, thanks for noticing this problem.
3) Please see my response to Wan-Teh's review.
4) I will change this too for the first if, since it saves one logical indented
block . In the second case, the comment is about the thread still being runnable
(because the TID is less than active_threads) and needs to be placed before the
test of remaining_connections, so I will keep it the same. The code is much
shorter there so it's not a problem.
5) I will try to find better variable names.
6) Thanks for finding this. Indeed, after running for a while, the server
session cache will fill up and some "accidental" full handshakes will happen due
to restarts on the initial session failing. By creating new sessions and caching
them, this is less likely to happen - unless there are more client threads than
server cache entries. I will make this change.

I should also note that when testing against public SSL web sites with
load-balancers that are not SSL-session aware such as Microsoft, the actual
percentage results of full/restarts are very random to what was requested,
because one never knows after a new PR_Connect if the session will be reused or
not since it may go to another server. I don't intend to attempt to fix this at
the time, but it could be tried by examining the cache stats during the run, if
the cache stats were thread safe (see bug 292181) .

Another approach would be to use keep-alive HTTP requests rather than do another
PR_Connect, which would cause the connection to stay on the same back-end SSL
server, and just force another handshake. But keep-alive support hasn't been
added to strsclnt either.
(Assignee)

Comment 25

13 years ago
Nelson,

I implemented your suggestion #6. It works fine in single-threaded mode.
However, I'm not getting the expected results when running in multithreaded mode.

The problem is in the case I try to do a restart handshake and keep the sock
peer ID the same as it was. That fact alone isn't sufficient to guarantee that
the handshake will be a restart. There is a requirement that a full handshake
has previously been completed with that ID. When multiple threads get in to SSL
with the same peer ID, but no full handshake has been completed yet, they each
end up doing full handshakes, and immediately overwriting their cache entries as
soon as their respective handshakes complete. So, I'm getting a lot more full
handshakes than I should ! I would have expected some serialization to happen in
the SSL library in this case, but there is none. It's probably not desirable.

These are my results for 1000 connections with 100 threads vs 1 thread, trying
to do 50% full and 50% restarts :

strsclnt -d . -o -o -D -P 50 -p 3000 -c 1000 -t 100 variation
strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable
strsclnt: 183 cache hits; 817 cache misses, 0 cache not reusable

strsclnt -d . -o -o -D -P 50 -p 3000 -c 1000 -t 1 variation
strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable
strsclnt: 500 cache hits; 500 cache misses, 0 cache not reusable

I verified that the sequential pattern of sockPeerIDs is correct in both cases,
so my strsclnt code isn't the problem. FYI, the new code is below :

    if (fullhs != NO_FULLHS_PERCENTAGE) {
        char sockPeerIDString[512];
        static PRInt32 sockPeerID = 0; /* atomically incremented */
        PRInt32 thisPeerID = sockPeerID; /* reuse previous sockPeerID for
                                            restart handhsake */
        
        PRInt32 savid = PR_AtomicIncrement(&globalconid);
        PRInt32 conid = 1 + (savid - 1) % 100;
        /* don't change peer ID on the very first handshake, which is always
           a full, so the session gets stored into the client cache */
        if ( (savid != 1) &&
            ( ( (savid <= total_connections_rounded_down_to_hundreds) &&
                (conid <= fullhs) ) ||
              (conid*100 <= total_connections_modulo_100*fullhs ) ) ) {
            /* force a full handshake by changing the socket peer ID */
            thisPeerID = PR_AtomicIncrement(&sockPeerID);
        }
        PR_snprintf(sockPeerIDString, sizeof(sockPeerIDString), "ID%d",
                    thisPeerID);
        SSL_SetSockPeerID(ssl_sock, sockPeerIDString);
    }

I think I can fix this problem by using the handshake callback and remembering
the last sockPeerID for a successful handshake.

Another concern is that I'm incurring the cost of a PR_snprintf on my stack
buffer to create the peer ID string for every connection. This will bring down
the strsclnt performance vs just setting the SSL_NO_CACHE option . I haven't yet
measured by how much. If the SSL_SetSockPeerID API could take an integer instead
of a string, it would be more efficient. It would also remove the cost of a
PORT_Strdup within it.
(Assignee)

Comment 26

13 years ago
Nelson,

Is there anyway to tell within the handshake callback if the handshake that just
completed was a full or a restart ?
(Assignee)

Comment 27

13 years ago
Created attachment 188630 [details] [diff] [review]
implement changes from reviews

Nelson,

I encountered an insurmountable problem while implementing your suggestion #6.
The client cache only purges itself every 24 hours. When creating new peer IDs
repeatedly, this means the heap grows continuously. It isn't technically a
leak, since the memory will eventually be freed. However, the heap growth
causes performance to go down the toilet very rapidly. After 15 minutes it is
divided in half already. So, for all practical purposes, this approach is not
usable at the moment. We need a mechanism to set the SSL session expiration
timeout for client-side. I see that there is one in server mode, which is set
by calling the SSL_ConfigServerSessionIDCache function . But I don't see one
for the client cache. I will file a separate bug.

In the meantime, I believe that setting SSL_NO_CACHE to 1 is adequate. It will
work fine for all servers that have a sufficiently large session ID cache. The
patch I have attached does this. There is USE_SOCK_PEER_ID that can be set to
enable the code that uses Peer IDs. Once the SSL client cache issues are
resolved, that code can be turned on. I wouldn't want to hold this patch until
that happens, however.
Attachment #183331 - Attachment is obsolete: true
Attachment #188630 - Flags: superreview?(nelson)
Attachment #188630 - Flags: review?(wtchang)
Issues like heap growth are, IMO, secondary to the primary objective of 
applying a realistic load to the server.  While a growth of megabytes per
minute wouldn't be good, I think some amount of growth in strsclnt would be
acceptable.

The client cache timeout is settable, but the interface may not be public. :(
You should be able to set the timeout times for strsclnt connections down to 
mere seconds, if desired.  
Originally, in HCL, the cache timeout times were global variables, directly
settable by the application.  I think they are declared in one of SSL's two
public header files, but I'll bet they're not in nss.def.  I agree that
getter/setter functions are better, in part because of Windows' problem with
addresses of data items in other shared libs.  

Your code doesn't spread out the full handshakes uniformly over the connections,
but lumps them together within groups of 100 handshakes.  IINM it does N 
connections as restarts, followed by 100-N as fulls.  I consider that approach 
undesirable, but in this case, you can take advantage of it to mitigate your
cache growth problem.  You can use the string technique for connection N, and
use SSL_NO_CACHE for the subsequent 99-N connections.  
In answer to comment 26: not directly.  One way to tell is by the occurrance
(or not) of a prior callback to the application's cert validity callback.
The application could keep track for each socket of whether or not libSSL has
called the cert chain validation callback on that socket, and check it at the
time of the handshake completion callback.  
(Assignee)

Comment 30

13 years ago
Nelson,

Re: comment #28

Concerning the goal of applying a realistic load to the server, with the size
servers we have today, that goal is not achieved when using new SockPeerIDs to
provoke full handshakes.

I took the case of N = 99, ie. the client only does 1 full handshake followed by
99 restarts. In this case, your suggestion of using a different algorithm to do
only 1 out of 100 connections with a new peer ID and the following 99 - N with
the SSL_NO_CACHE option does not apply, since 99 - N = 0 .

Doing N = 99 (ie . strsclnt -P 1) should limit memory growth the most, since
only 1 out of 100 connections will cause a new SSL client cache entry to be created.

I tried a single opteron client doing 1% full handshakes hitting a single
opteron server, both running code from NSS_PERFORMANCE_HACKS_BRANCH, except the
strsclnt was running the code in the patch called "implement changes from
reviews" with USE_SOCK_PEER_ID defined .

Here is the performance in ops/s (equivalent to total handshakes/s) as measured
by the server for the last 20 second interval :

8080.39 ops/second, 100 threads

Here is the performance as measured by the server after running the test for 13
minutes :

5066.79 ops/second, 100 threads

All the 20s data points in between showed decreasing ops/s .

Ie. after running for 13 minutes, strsclnt only applies a load of about 5000
ops/s, even though it started at 8000 ops/s. If I CTRL-C the strsclnt process
and immediately restart it, the load goes back to 8000 ops/s. This is entirely
due to the client cache session growth.

At t = 13 mins, strsclnt resident memory size is 82MB, up from 17MB at t = 0 .
Also, the server has 40% idle time at t = 13 mins, vs 0% at t = 0 .

IMO, this data clearly shows that your suggestion of using SSL_SockPeerID to
cause full handshakes in a benchmark tool is not workable. Until something
changes in the SSL client cache implementation, I think strsclnt must use
SSL_NO_CACHE for doing full handshakes. I filed bug 300343 to track those changes.
(Assignee)

Comment 31

13 years ago
In the previous message, I meant that the *first* 20 seconds of the test yielded
8080 ops/s, not the last.

The bug for limiting the client cache size is actually bug 300043 .
I just filed bug 300163 to add a per-socket setter value for the SSL session
expiration timeout.
(Assignee)

Updated

13 years ago
Attachment #183331 - Flags: superreview?(wtchang)
I agree that the performance limitations of the client cache when used 
in this test are severe enough that we shouldn't attempt to implement
what I suggested in comment 22 point 6 until that the cache is better,
and should continue to use SSL_NO_CACHE until then.  

I will review the newest patch above RSN.
Comment on attachment 188630 [details] [diff] [review]
implement changes from reviews

This patch still has one comment that is too long.
Also, the replacement variables names seem even LONGER than the ones 
about which I previously lamented!  
But this patch resolves the important issues, IMO.
r+ = nelson@bolyard
Attachment #188630 - Flags: superreview?(nelson) → review+
(Assignee)

Comment 34

13 years ago
Thanks for the review, Nelson ! I checked the patch into the tip .

Checking in strsclnt.c;
/cvsroot/mozilla/security/nss/cmd/strsclnt/strsclnt.c,v  <--  strsclnt.c
new revision: 1.41; previous revision: 1.40
done
Status: NEW → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED
(Assignee)

Comment 35

13 years ago
Comment on attachment 188630 [details] [diff] [review]
implement changes from reviews

Clearing review request. This patch was checked in several weeks ago.
Attachment #188630 - Flags: review?(wtchang)
You need to log in before you can comment on or make changes to this bug.