Closed Bug 292151 Opened 20 years ago Closed 19 years ago

strsclnt should not start threads for each connection; and should allow specifying ratio of full handshakes

Categories

(NSS :: Tools, enhancement, P2)

3.10
enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: julien.pierre, Assigned: julien.pierre)

Details

Attachments

(1 file, 5 obsolete files)

It has been observed that we need more client CPU hardware in order to peak an SSL server when doing SSL restart handshakes with strsclnt . The amount of CPU should in fact be similar on each side. I traced the problem to inefficient logic with strsclnt . It starts a thread for every connection. A lot of time is being spent in starting threads and waiting for them. Since the restart connections don't take a lot of CPU, the ratio of thread startup overhead is significant. Another defficiency in strsclnt is that you can only do 100% full handshakes with the -N option, or 0% without it . In order to simulate benchmarks, I propose to add a -P <x> option where x is the percentage of full handshakes desired. Patch forthcoming.
Severity: normal → enhancement
Priority: -- → P2
Target Milestone: --- → 3.10.1
Attached patch strsclnt patch (obsolete) — Splinter Review
This patch distributes the connections between max_threads, as opposed to starting a thread for each new connection up to max_threads, waiting for connections to end, and starting threads again, until all connections are done . The patch does not change the number of connections that strsclnt with do, as passed to -c , they are only run more efficiently. This has the effect of improving the ops/s by 16%. I measured the ops/s with selfserv on an Opteron server, and strsclnt running on both Solaris Sparc and Opteron, when doing restart handshakes . On a Win2k client and a Pentium4 with hyperthreading, the improvement was 26% !
Attachment #182021 - Flags: review?(nelson)
Comment on attachment 182021 [details] [diff] [review] strsclnt patch Wan-Teh, Would you mind reviewing this patch ?
Attachment #182021 - Flags: superreview?
Attachment #182021 - Flags: review?(wtchang)
Attached patch merged patch (obsolete) — Splinter Review
Updated the patch to merge with the current tip.
Attachment #182021 - Attachment is obsolete: true
Attachment #182839 - Flags: superreview?(wtchang)
Attachment #182839 - Flags: review?(nelson)
Attachment #182021 - Flags: superreview?
Attachment #182021 - Flags: review?(wtchang)
Attachment #182021 - Flags: review?(nelson)
Comment on attachment 182839 [details] [diff] [review] merged patch Julien, this patch has some minor problems. > void > thread_wrapper(void * arg) > { > perThread * slot = (perThread *)arg; > > /* wait for parent to finish launching us before proceeding. */ > PR_Lock(threadLock); > PR_Unlock(threadLock); > >- slot->rv = (* slot->startFunc)(slot->a, slot->b, slot->c); >+ while (slot->connections--) { >+ slot->rv = (* slot->startFunc)(slot->a, slot->b, slot->c); >+ } >+ > > /* Handle cleanup of thread here. */ > PRINTF("strsclnt: Thread in slot %d returned %d\n", > slot - threads, slot->rv); The PRINTF statement should be moved inside the while loop. >+ if (fullhs) { >+ PRInt32 savid = PR_AtomicIncrement(&globalconid); >+ PRInt32 conid = 1 + (savid % 100); >+ /* don't set this option on the very first handshake, which is always >+ a full, so the session gets stored into the client cache */ >+ if (conid <= fullhs && savid != 1) { >+ rv = SSL_OptionSet(ssl_sock, SSL_NO_CACHE, 1); >+ } >+ } This doesn't always give the correct percentage. For example, if I say -c 50 -P 50, I will get 49 full handshakes, which are 98%. You can avoid this problem by requiring the total number of connections to be multiples of 100. It also has an off-by-one error even if the total number of connections is a multiple of 100. For example, if I say -c 100 -P 1, I will get two full handshakes, as opposed to one (1% of 100): the first handshake (always a full handshake) the 100th handshake (saveid=100,conid=1) This off-by-one error can be fixed by defining conid as follows: PRInt32 conid = 1 + (savid - 1) % 100; if (conid <= fullhs && savid != 1) { or PRInt32 conid = (savid - 1) % 100; ... if (conid < fullhs && savid != 1) { >+ /* Start up the threads */ >+ for (i=0;i<max_threads;i++) { >+ int todo = connections / max_threads; >+ if (0 == i) { >+ todo += connections % max_threads; >+ } >+ rv = launch_thread(do_connects, &addr, model_sock, i, todo); >+ } since connections/max_threads is integer division, todo may be 0. So you need to handle that case. (I believe it will just work with your code.) Since the number of threads is now at most max_threads, you should remove the code that keeps the number of threads from exceeding max_threads. I believe it is just the following code at the beginnign of launch_thread: while (numRunning >= max_threads) { PR_WaitCondVar(threadStartQ, PR_INTERVAL_NO_TIMEOUT); } for (i = 0; i < numUsed; ++i) { if (threads[i].running == rs_idle) break; } if (i >= numUsed) { if (i >= MAX_THREADS) { /* something's really wrong here. */ PORT_Assert(i < MAX_THREADS); PR_Unlock(threadLock); return SECFailure; } ++numUsed; PORT_Assert(numUsed == i + 1); } >+ case 'P': fullhs = PR_MIN(100, PORT_Atoi(optstate->value)); break; This case should be moved up to where the capital letter cases are. It may be better to fail if the specified percentage is greater than 100. You need to handle the interaction between -N (no session reuse) and -P (full handshake percentage). For example, how would you handle -N -P 50? Fail with a usage message, or honor the last specified option (-P 50) with a warning message that -N is ignored?
Attachment #182839 - Flags: superreview?(wtchang) → superreview-
Just wanted to clarify an earlier comment of mine. I said: > Since the number of threads is now at most max_threads, you > should remove the code that keeps the number of threads from > exceeding max_threads. I believe it is just the following > code at the beginnign of launch_thread: > [ code listing omitted ] I didn't mean you should remove all of the code I listed. You should remove some of it and possibly update the rest.
Attachment #182839 - Flags: review?(nelson)
Attached patch updated patch (obsolete) — Splinter Review
Wan-Teh, Thanks for the review. This patch fixes 1) PRINTF moved 2) your fix for the off-by-one error. Thanks ! For the case where connections is not a multiple of 100, I don't think it should be considered an an error - the percentage just won't be exact because it can't be. 3) regarding the connections / max_threads and the possible zero result, I'm aware of it. That's why I made the first thread always perform a number of connections equal to the remainder of that division. I also added a short-circuit in launch-thread for the case where connections is zero. 4) regarding the command-line options, I moved the processing for -P, and now check for a range (valid is 0 to 100) . I made the -N option incompatible with -P - usage will be displayed. 5) regarding the existing queue logic, I believe none of it made any sense anymore. The throttling feature was broken, since it's purpose was to stop starting new threads, but it couldn't slow down existing threads making existing connections. This patch removes it altogether and changes reap_threads to be a loop of PR_JoinThreads .
Attachment #182839 - Attachment is obsolete: true
Attachment #183115 - Flags: review?(wtchang)
The throttling feature did work, and IMO was necessary. I think it must not be eliminated, but rather replaced with a new implementation.
You can get exact percentage even when "connections" is not a multiple of 100. For example, 50% of 50 connections is 25, exactly. With your algorithm, you just need to add a special case to handle the remainder (connections % 100). The following is an outline of how to do this. (The code can be improved and rewritten to suit your taste.) int connections; /* total number of connections */ int connections_in_100s; int connections_remainder; /* get "connections" from the -c command-line argument */ ... connections_remainder = connections % 100; connections_in_100s = connections - connections_remainder; ... if (fullhs != NO_FULLHS_PERCENTAGE) { PRInt32 savid = PR_AtomicIncrement(&globalconid); PRInt32 conid = 1 + (savid - 1) % 100; /* don't set this option on the very first handshake, which is always a full, so the session gets stored into the client cache */ if (savid <= connections_in_100s) { /* your current algorithm */ if (conid <= fullhs && savid != 1) { rv = SSL_OptionSet(ssl_sock, SSL_NO_CACHE, 1); } } else { /* if conid/connections_remainder <= fullhs/100 */ if (conid*100 <= connections_remainder*fullhs && savid != 1) { rv = SSL_OptionSet(ssl_sock, SSL_NO_CACHE, 1); } } } I won't have time to review your new patch right away. If that's a problem, could you ask someone else to review it?
Wan-Teh, Thanks for that code, I will incorporate it. Nelson, The existing throttling feature worked poorly. I don't think it should be re-implemented in the same way. The old algorithm throttled down the number of threads in the client, but it would never throttle it back up. The client has no way of telling if the server is temporarily overloaded (such as is the case with a full listen queue); if it simply went away temporarily, perhaps because it crashed or because it was manually stopped; or if suddenly has more available sockets in its queue. When using multiple client machines with strsclnt, which aren't aware of each other, throttling up is a necessity. Otherwise, if a client throttles down (say, because the server was manually restarted), it needs to be restarted in order to provide full load again. Perhaps there should be separate options for throttling down and throttling down/up.
This patch just incorporates Wan-Teh's code; it does not yet add throttling back.
Attachment #183115 - Attachment is obsolete: true
Attachment #183115 - Flags: review?(wtchang)
Comment on attachment 183243 [details] [diff] [review] fix percentages when connections is not a multiple of 100 >+ if (savid <= total_connections_in_100s) { >+ /* your current algorithm */ >+ if (conid <= fullhs && savid != 1) { >+ rv = SSL_OptionSet(ssl_sock, SSL_NO_CACHE, 1); >+ } >+ } else { >+ /* if conid/connections_remainder <= fullhs/100 */ >+ if (conid*100 <= total_connections_remainder*fullhs && savid != 1) { >+ rv = SSL_OptionSet(ssl_sock, SSL_NO_CACHE, 1); >+ } >+ } >+ } Julien, please remove the "your current algorithm" comment :-) Also you should factor out the test "savid != 1" in both cases.
Also documented command-line options that were missing in the usage . The -U throttling up feature kicks in at a rate of +2 threads/s if there hasn't been a connect failure in the last 10s, and if there hasn't been a more recent connect failure than connect success . The total threads will never exceed the original value specified on the command-line with -t .
Attachment #183243 - Attachment is obsolete: true
Attachment #183331 - Flags: superreview?(wtchang)
Attachment #183331 - Flags: review?(nelson)
Comment on attachment 183331 [details] [diff] [review] add throttling down by default, and up with -U option I haven't yet full reviewed this code, but I see one issue that will ensure an r-, many lines substantially exceed 80 columns.
Nelson, Please review the code in the patch. If the code is correct and the columns are the only issue, I can truncate the lines before the checkin without an extra patch and review.
QA Contact: bishakhabanerjee → jason.m.reid
Nelson, Wan-Teh, could you please review the code in the latest patch here ? I did a lot of work in may to get this in. This is needed so we can better simulate the specweb client. Thanks.
Retargetting for 3.11 .
Target Milestone: 3.10.1 → 3.11
Comment on attachment 183331 [details] [diff] [review] add throttling down by default, and up with -U option First, some global comments. 1. You need to document the meaning of each new global variable and which lock protects them. By reading your patch, I gathered the following: - globalconid is atomically incremented. - threadLock protects remaining_connections, lastConnectFailure, lastConnectSuccess, lastThrottleUp, numUsed, numRunning, max_threads, and max_max_threads. It would be good to declare those variables right below the threadLock declaration. 2. PR_Now is a system call, so you should call it before you lock threadLock. This is especially important because your patch makes threadLock a heavily used lock. 3. The decrement of remaining_connections does not take into account connect failures. So the actual completed connections may be fewer than what's specified with the -c option. This may be the intended behavior. 4. You write code like this: if (!expr) { A; } else { B; } It is better to say: if (expr) { B; } else { A; } Similarly, if (!(a < b)) should be changed to if (a >= b) 5. Please use NSS's block comment style (* at the beginning of every comment line). Now the specific comments. > /* This global string is so that client main can see > * which ciphers to use. > */ > >+#define NO_FULLHS_PERCENTAGE -1 >+ > static const char *cipherString; The macro definition should be moved before the comment because the comment describes the global variable cipherString. >+static PRBool ThrottleUp = PR_FALSE; Why do we always throttle down, but only throttle up if the -U option is specified? > static void > Usage(const char *progName) > { > fprintf(stderr, > "Usage: %s [-n nickname] [-p port] [-d dbdir] [-c connections]\n" >- " [-3DNTovq] [-2 filename]\n" >+ " [-3DTovq] [-2 filename] [-P fullhandshakespercentage | -N]\n" > " [-w dbpasswd] [-C cipher(s)] [-t threads] hostname\n" You need to list the -U option. >+ " -P means do a specified percentage of full handshakes\n" Would be nice to mention the range (0-100 or 0.0-1.0). > PRLock * threadLock; >-PRCondVar * threadStartQ; >-PRCondVar * threadEndQ; > > int numUsed; > int numRunning; > PRInt32 numConnected; > int max_threads = 8; /* default much less than max. */ >- >-typedef enum { rs_idle = 0, rs_running = 1, rs_zombie = 2 } runState; >+int max_max_threads; /* peak threads allowed */ It seems that numRunning is set but not used, and therefore can be deleted. It was previously used to reap the threads. Now you make the threads joinable and use PR_JoinThread to reap them. max_threads and max_max_threads are confusing. Please add comments. By reading the code, I gathered that max_threads is a moving maximum that is throttled up and down, and max_max_threads is a static, absolute maximum; and that max_threads cannot exceed max_max_threads. >+ if ( (now - lastConnectFailure > 10000) && >+ ( (!lastThrottleUp) || ( (now - lastThrottleUp) >= 500 ) ) && >+ (lastConnectSuccess > lastConnectFailure) ) { >+ /* try throttling up by one thread */ >+ max_threads = PR_MIN(max_max_threads, max_threads+1); >+ fprintf(stderr,"max_threads set up to %d\n", max_threads); >+ lastThrottleUp = now; >+ } The time unit of PRTime is wrong. PRTime is in microseconds, not milliseconds. You can use the PR_USEC_PER_SEC and PR_USEC_PER_MSEC macros. > if (!NoReuse) { >- rv = launch_thread(do_connects, &addr, model_sock, i); >- --connections; >- ++i; >+ remaining_connections = 1; >+ rv = launch_thread(do_connects, &addr, model_sock, 0); > /* wait for the first connection to terminate, then launch the rest. */ > reap_threads(); >+ remaining_connections = total_connections - 1 ; > } It is better to handle NoReuse as fullhs==100 and replace the above code by simply if (NoReuse) { fullhs = 100; } >+ for (i=0;i<max_threads;i++) { > rv = launch_thread(do_connects, &addr, model_sock, i); >- ++i; >- } while (--connections > 0); >+ } Since max_threads is protected by threadLock, we can't use max_threads without lock protection as soon as we create threads. So you should save max_threads in a local variable and use the local variable in the for loop above.
Please ignore my previous comment about if (!NoReuse). I misunderstood your patch. But the relation between fullhs and NoReuse is confusing. I suggest that we only use fullhs in the code, and let the -N option set fullhs to 100. fullhs probably could be initialize to 0 instead of -1.
Here is some explanation about the -U option, which enables the ability of the code to attempt to "throttle up" as well as down. It is a compromise between the original intended purpose of throttling and a new purpose that Julien desired. When strsclnt was devised, the objective was to complete the specified number of connections, with as few wasted (failed) connection attempts as possible. In the early days of strsclnt, it would hammer a server with connection requests. Often it would overflow the server's connection request queue, resulting in MANY connections appearing to be "refused" or "reset by peer". A number of things were tried to reduce the number of failed connection attempts. Changing strsclnt to always retry every failed connection attempt caused it to hammer the poor server even harder, causing a large amount of network bandwidth and (more importantly) server CPU time to be spent dealing with failed connection attempts. One attempt added an increasing delay time before successive attempt to reconnect. But the problem was that if strsclnt fundamentally had more threads trying to work that the server could sustain, then all throughout the entire run, we would experience failed connection attempts. This resulted in widely varying test results, even on otherwise idle systems. We'd see big differences in the performance results between two successive runs on the same client/server/network. So, in rev 1.8, I changed strsclnt to "throttle back" the number of threads that would simultaneously pound on the server, until connection failres stopped happening. Then it would sustain that number going forward. The result was a greatly reduced number of failed connection attempts, AND a great increase in the repeatability of test results, getting the same values from run to run. The code was specifically made to NOT attempt to throttle back up to avoid a continual ongoing stream of connection failures during the test. I view that behavior as requisite to obtaining useful performance numbers from strsclnt. More recently, Julien has begun to use strsclnt in a very different way, not as a performance measurement, but merely as an ongoing server test. He wants it to continually pound the server, no matter how the server responds. He wants to be able to stop the server in a debugger for a while (minutes), during which time the server will accept no new connections, and will throttle down to a level approaching zero, then when he continues the server from the breakpoint, he wants the server to throttle back up. In reviewing some of the earlier patches for this RFE, I insisted that the code preserve its existing throttling behavior (which is useful for performance measurement), and suggested that any new throttling behavior be made optional, enabled by a command line option. That's how -U came to be.
In reply to comment 14, I want to review the code that is going to be potentially checked in. I think the practice of making a change and checking in the changed results without additional code review is OK for VERY small changes, e.g. one line or two, where the change is very clearly spelled out before hand and agreed by reviewer and patch writer. But in this case, there is MUCH code that has to be changed to make it fit in 80 columns. That change would be of such a great magnitude that I'd insist on another review before it was committed. So, when you've addressed Wan-Teh's comments, please fix the line length problems in the next patch, and submit it for review again. Thanks.
Nelson, Re : comment #19, Actually, I'm using strsclnt as a performance tool, even with -U . The servers I'm testing have a very large accept queue. In my experience, the only condition that ever seems to cause strsclnt to throttle down is when the server goes down completely, because either it crashes or I turn it off to reconfigure it. At that point, all my strsclnt on multiple machines throttle down, and never go back up. The strsclnt processes on each client machine all have to be manually CTRL-C'ed and restarted. I want the -U option so that I no longer have to restart all my clients when I make a server-side change. This is very helpful when testing a big server which requires a large number of client machines to peak. Re: comment #20 . I understand your concern about wanting to review the new format for a large patch. But I would like to minimize the number of patches that have to be created and reviewed. Creating new patches for each review comment would make the development process way more time consuming than it actually needs to be. So, I will hold off generating the next patch until you take a complete pass at the code, and will address both of your possible issues and Wan-Teh's together in one update.
Comment on attachment 183331 [details] [diff] [review] add throttling down by default, and up with -U option 1234567890123456789012345678901234567890123456789012345678901234567890123456789 0 Overall this approach to throttling seems OK, but I have a few comments. 1) With this patch, the variable named "max_threads" is in no way a maximum. I suggest renaming it current_threads or target_threads, and renaming the new variable max_max_threads to be max_threads, since it really is a maximum. 2) The code should not attempt to start more threads than there are connections to be performed. 3) If the user specifies 100 connections, that should be 100 succesful connections, not merely 100 connections attempted (with some refused). As Wan-Teh pointed out, the remaining_connections logic will count attempted connections whether they actually succeed in connecting or not, and hence will effectively not retry connections that fail to connect. The remaining_connections logic probably needs two counters, one for connections that have (or have not yet) been initiated, and one for connections that have (or have not yet) been completed. If you're trying to do (say) 100 connections, you don't want to count the connection done until it's done (and the connection actually connected), but you don't want to start a 101st connection just because the 100th isn't done. 4) In function thread_wrapper, there are two places that indent unnecessarily. Since this new code tends to exceed 80 columns, reducing unnecessary indentation could help that a lot. >+ if (!ThrottleUp) { >+ die = PR_TRUE; >+ } else { >+ if (remaining_connections > 0) { ... >+ } else { >+ /* no more connections left, we are done */ >+ die = PR_TRUE; >+ } >+ } could correctly be >+ if (!ThrottleUp) { >+ die = PR_TRUE; >+ } else if (remaining_connections > 0) { ... >+ } else { >+ /* no more connections left, we are done */ >+ die = PR_TRUE; >+ } In fact, that whole thing could be reduced to a single if-else statement. >+ if (!ThrottleUp || remaining_connections <= 0) { >+ die = PR_TRUE; >+ } else { >+ /* no more connections left, we are done */ ... Later in thread_wrapper, >+ } else { >+ /* this thread should run */ >+ if (--remaining_connections >= 0) { >+ doop = PR_TRUE; >+ } else { >+ die = PR_TRUE; >+ } >+ } could correctly be >+ } else if (--remaining_connections >= 0) { >+ /* this thread should run */ >+ doop = PR_TRUE; >+ } else { >+ die = PR_TRUE; >+ } 5. I found the variables named total_connections_in_100s perplexing. Maybe "hundreds_of_connections" and "connections_mod_100" ? 6. For the new mode which is neither all restart handshakes nor all restart handshakes, but rather a mixture, using the SSL_NO_CACHE option to force new handshakes has some undesirable properties with respect to simulating real world behavior. It does not cause the newly created SSL sessions to ever be reused, as they would normally be, but instead continually restarts that same one connection over and over. You can make it reuse the newly created SSL sessions by separating the sessions using the function SSL_SetSockPeerID. Create a small string, and set it into each SSL socket with SSL_SetSockPeerID. Change the string contents each time you want to force a new full handshake, but not when you don't.
Attachment #183331 - Flags: review?(nelson) → review-
Re Wan-Teh's comment #17 : I will address 1, 2 and 5 in my upcoming patch. Regarding 3, which is also Nelson's #3, I believe the behavior is unchanged from the existing strsclnt, which took no precautions to count failed connections separately from successful ones. Also, it is worth noting that the program retries forever if the PR_Connect errors are PR_CONNECT_REFUSED_ERROR or PR_CONNECT_RESET_ERROR . Connections will only be missed if the error is different . If this is of concern, I suggest a separate bug/RFE be filed. 4. I believe this is a matter of style. I wrote the expressions in the way that makes the most logical sense to me, even though they may not be the shortest ones to read. I will address all your other issues. Nelson already answered about -U and why the default is what it is. Thanks for catching the error with the time unit. I was wondering why strsclnt was throttling back up so fast ...
Re : Nelson's comment #22 Thanks for the review. My upcoming patch will clean up the indentation . This wasn't that big of a change. The patch only added 9 lines over 80 columns, including 3 of comments . 1) I will address this, which was also a concern of Wan-Teh's. 2) I will fix this too, thanks for noticing this problem. 3) Please see my response to Wan-Teh's review. 4) I will change this too for the first if, since it saves one logical indented block . In the second case, the comment is about the thread still being runnable (because the TID is less than active_threads) and needs to be placed before the test of remaining_connections, so I will keep it the same. The code is much shorter there so it's not a problem. 5) I will try to find better variable names. 6) Thanks for finding this. Indeed, after running for a while, the server session cache will fill up and some "accidental" full handshakes will happen due to restarts on the initial session failing. By creating new sessions and caching them, this is less likely to happen - unless there are more client threads than server cache entries. I will make this change. I should also note that when testing against public SSL web sites with load-balancers that are not SSL-session aware such as Microsoft, the actual percentage results of full/restarts are very random to what was requested, because one never knows after a new PR_Connect if the session will be reused or not since it may go to another server. I don't intend to attempt to fix this at the time, but it could be tried by examining the cache stats during the run, if the cache stats were thread safe (see bug 292181) . Another approach would be to use keep-alive HTTP requests rather than do another PR_Connect, which would cause the connection to stay on the same back-end SSL server, and just force another handshake. But keep-alive support hasn't been added to strsclnt either.
Nelson, I implemented your suggestion #6. It works fine in single-threaded mode. However, I'm not getting the expected results when running in multithreaded mode. The problem is in the case I try to do a restart handshake and keep the sock peer ID the same as it was. That fact alone isn't sufficient to guarantee that the handshake will be a restart. There is a requirement that a full handshake has previously been completed with that ID. When multiple threads get in to SSL with the same peer ID, but no full handshake has been completed yet, they each end up doing full handshakes, and immediately overwriting their cache entries as soon as their respective handshakes complete. So, I'm getting a lot more full handshakes than I should ! I would have expected some serialization to happen in the SSL library in this case, but there is none. It's probably not desirable. These are my results for 1000 connections with 100 threads vs 1 thread, trying to do 50% full and 50% restarts : strsclnt -d . -o -o -D -P 50 -p 3000 -c 1000 -t 100 variation strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable strsclnt: 183 cache hits; 817 cache misses, 0 cache not reusable strsclnt -d . -o -o -D -P 50 -p 3000 -c 1000 -t 1 variation strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable strsclnt: 500 cache hits; 500 cache misses, 0 cache not reusable I verified that the sequential pattern of sockPeerIDs is correct in both cases, so my strsclnt code isn't the problem. FYI, the new code is below : if (fullhs != NO_FULLHS_PERCENTAGE) { char sockPeerIDString[512]; static PRInt32 sockPeerID = 0; /* atomically incremented */ PRInt32 thisPeerID = sockPeerID; /* reuse previous sockPeerID for restart handhsake */ PRInt32 savid = PR_AtomicIncrement(&globalconid); PRInt32 conid = 1 + (savid - 1) % 100; /* don't change peer ID on the very first handshake, which is always a full, so the session gets stored into the client cache */ if ( (savid != 1) && ( ( (savid <= total_connections_rounded_down_to_hundreds) && (conid <= fullhs) ) || (conid*100 <= total_connections_modulo_100*fullhs ) ) ) { /* force a full handshake by changing the socket peer ID */ thisPeerID = PR_AtomicIncrement(&sockPeerID); } PR_snprintf(sockPeerIDString, sizeof(sockPeerIDString), "ID%d", thisPeerID); SSL_SetSockPeerID(ssl_sock, sockPeerIDString); } I think I can fix this problem by using the handshake callback and remembering the last sockPeerID for a successful handshake. Another concern is that I'm incurring the cost of a PR_snprintf on my stack buffer to create the peer ID string for every connection. This will bring down the strsclnt performance vs just setting the SSL_NO_CACHE option . I haven't yet measured by how much. If the SSL_SetSockPeerID API could take an integer instead of a string, it would be more efficient. It would also remove the cost of a PORT_Strdup within it.
Nelson, Is there anyway to tell within the handshake callback if the handshake that just completed was a full or a restart ?
Nelson, I encountered an insurmountable problem while implementing your suggestion #6. The client cache only purges itself every 24 hours. When creating new peer IDs repeatedly, this means the heap grows continuously. It isn't technically a leak, since the memory will eventually be freed. However, the heap growth causes performance to go down the toilet very rapidly. After 15 minutes it is divided in half already. So, for all practical purposes, this approach is not usable at the moment. We need a mechanism to set the SSL session expiration timeout for client-side. I see that there is one in server mode, which is set by calling the SSL_ConfigServerSessionIDCache function . But I don't see one for the client cache. I will file a separate bug. In the meantime, I believe that setting SSL_NO_CACHE to 1 is adequate. It will work fine for all servers that have a sufficiently large session ID cache. The patch I have attached does this. There is USE_SOCK_PEER_ID that can be set to enable the code that uses Peer IDs. Once the SSL client cache issues are resolved, that code can be turned on. I wouldn't want to hold this patch until that happens, however.
Attachment #183331 - Attachment is obsolete: true
Attachment #188630 - Flags: superreview?(nelson)
Attachment #188630 - Flags: review?(wtchang)
Issues like heap growth are, IMO, secondary to the primary objective of applying a realistic load to the server. While a growth of megabytes per minute wouldn't be good, I think some amount of growth in strsclnt would be acceptable. The client cache timeout is settable, but the interface may not be public. :( You should be able to set the timeout times for strsclnt connections down to mere seconds, if desired. Originally, in HCL, the cache timeout times were global variables, directly settable by the application. I think they are declared in one of SSL's two public header files, but I'll bet they're not in nss.def. I agree that getter/setter functions are better, in part because of Windows' problem with addresses of data items in other shared libs. Your code doesn't spread out the full handshakes uniformly over the connections, but lumps them together within groups of 100 handshakes. IINM it does N connections as restarts, followed by 100-N as fulls. I consider that approach undesirable, but in this case, you can take advantage of it to mitigate your cache growth problem. You can use the string technique for connection N, and use SSL_NO_CACHE for the subsequent 99-N connections.
In answer to comment 26: not directly. One way to tell is by the occurrance (or not) of a prior callback to the application's cert validity callback. The application could keep track for each socket of whether or not libSSL has called the cert chain validation callback on that socket, and check it at the time of the handshake completion callback.
Nelson, Re: comment #28 Concerning the goal of applying a realistic load to the server, with the size servers we have today, that goal is not achieved when using new SockPeerIDs to provoke full handshakes. I took the case of N = 99, ie. the client only does 1 full handshake followed by 99 restarts. In this case, your suggestion of using a different algorithm to do only 1 out of 100 connections with a new peer ID and the following 99 - N with the SSL_NO_CACHE option does not apply, since 99 - N = 0 . Doing N = 99 (ie . strsclnt -P 1) should limit memory growth the most, since only 1 out of 100 connections will cause a new SSL client cache entry to be created. I tried a single opteron client doing 1% full handshakes hitting a single opteron server, both running code from NSS_PERFORMANCE_HACKS_BRANCH, except the strsclnt was running the code in the patch called "implement changes from reviews" with USE_SOCK_PEER_ID defined . Here is the performance in ops/s (equivalent to total handshakes/s) as measured by the server for the last 20 second interval : 8080.39 ops/second, 100 threads Here is the performance as measured by the server after running the test for 13 minutes : 5066.79 ops/second, 100 threads All the 20s data points in between showed decreasing ops/s . Ie. after running for 13 minutes, strsclnt only applies a load of about 5000 ops/s, even though it started at 8000 ops/s. If I CTRL-C the strsclnt process and immediately restart it, the load goes back to 8000 ops/s. This is entirely due to the client cache session growth. At t = 13 mins, strsclnt resident memory size is 82MB, up from 17MB at t = 0 . Also, the server has 40% idle time at t = 13 mins, vs 0% at t = 0 . IMO, this data clearly shows that your suggestion of using SSL_SockPeerID to cause full handshakes in a benchmark tool is not workable. Until something changes in the SSL client cache implementation, I think strsclnt must use SSL_NO_CACHE for doing full handshakes. I filed bug 300343 to track those changes.
In the previous message, I meant that the *first* 20 seconds of the test yielded 8080 ops/s, not the last. The bug for limiting the client cache size is actually bug 300043 . I just filed bug 300163 to add a per-socket setter value for the SSL session expiration timeout.
Attachment #183331 - Flags: superreview?(wtchang)
I agree that the performance limitations of the client cache when used in this test are severe enough that we shouldn't attempt to implement what I suggested in comment 22 point 6 until that the cache is better, and should continue to use SSL_NO_CACHE until then. I will review the newest patch above RSN.
Comment on attachment 188630 [details] [diff] [review] implement changes from reviews This patch still has one comment that is too long. Also, the replacement variables names seem even LONGER than the ones about which I previously lamented! But this patch resolves the important issues, IMO. r+ = nelson@bolyard
Attachment #188630 - Flags: superreview?(nelson) → review+
Thanks for the review, Nelson ! I checked the patch into the tip . Checking in strsclnt.c; /cvsroot/mozilla/security/nss/cmd/strsclnt/strsclnt.c,v <-- strsclnt.c new revision: 1.41; previous revision: 1.40 done
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Comment on attachment 188630 [details] [diff] [review] implement changes from reviews Clearing review request. This patch was checked in several weeks ago.
Attachment #188630 - Flags: review?(wtchang)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: