Closed Bug 58624 Opened 24 years ago Closed 24 years ago

SSL Stress Test fails on FreeBSD 3.5

Categories

(NSS :: Tools, defect, P3)

x86
FreeBSD
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lennox, Assigned: sonja.mirtitsch)

Details

The NSS 3.1 SSL Stress Tests fail for me on FreeBSD 3.5. The end of the output of './ssl.sh stress' looks like this: ********************* Stress Test **************************** ********************* Stress SSL2 RC4 128 with MD5 **************************** selfserv -p 8443 -d /local/llennox/NSS-PSM/mozilla/tests_results/security/conrail.20/server -n conrail.cs.columbia.edu -w nss -i /tmp/tests_pid.5505 & strsclnt -p 8443 -d . -w nss -c 1000 -C A conrail.cs.columbia.edu strsclnt: -- SSL: Server Certificate Validated. strsclnt: PR_NewTCPSocket returned error -5974: Insufficient system resources. Terminated ********************* Stress SSL3 RC4 128 with MD5 **************************** selfserv -p 8443 -d /local/llennox/NSS-PSM/mozilla/tests_results/security/conrail.20/server -n conrail.cs.columbia.edu -w nss -i /tmp/tests_pid.5505 & strsclnt -p 8443 -d . -w nss -c 1000 -C c conrail.cs.columbia.edu strsclnt: -- SSL: Server Certificate Validated. strsclnt: PR_NewTCPSocket returned error -5974: Insufficient system resources. Terminated Running ktrace on the process (ktrace is a system-call tracer, the equivalent of Linux's strace) reveals that socket() failed with ENOBUFS after it was called for the 953rd time for the first test, and it failed after the 27th time it was called for the second test. The failure is consistent, both for debug and optimized builds; I haven't tested to see whether the count of socket() failures is consistent. All the other NSS tests pass successfully.
Nelson, please take a look at this bug and reassign to the appropriate person. Thanks.
Assignee: wtc → nelsonb
I see no indication of any error on NSS's part from this description. It sounds like an OS kernel configuration problem on the submittor's system. The stress test is just that. It stresses the server by pounding it with SSL connections. Apparently this test exhausts some kernel resource on the submittor's system. The only change to NSS that might be beneficial to this test would be to respond to this error by waiting and trying again for some limited number of times, rather than immediately treating it as a fatal error. However, while such a change might make the test appear to pass, it would merely be hiding a very serious problem, namely, chronic system resource exhaustion. So, I suggest that, in this case, the failure serves the useful purpose of revealing the system problem, which needs to be cured apart from any changes to NSS. I'll leave this bug open for a few more days, to give others a chance to persuade me that some NSS change would and should solve this problem.
Okay, some more investigation leads me to agree with you. What's happening is that the TCP connections from the stress test stick around in TIME_WAIT for two minutes; my kernel is only configured to support 1064 simultaneous open sockets, which isn't enough for the 2K sockets opened by the stress test plus the 100 or so normally in use on my system. So I'd just suggest adding a note to the NSS test webpage to the effect of "The SSL stress test opens 2,048 TCP connections in quick succession. Kernel data structures may remain allocated for these connections for up to two minutes. Some systems may not be configured to allow this many simulatenous connections by default; if the stress tests fail, try increasing the number of simultaneous sockets supported." On FreeBSD, you can display the number of simultaneous sockets with the command sysctl kern.ipc.maxsockets which on my system returns 1064. It looks like this can be fixed with the kernel config option options NMBCLUSTERS=[something-large] or by increasing the 'maxusers' parameter. It looks like more recent FreeBSD implementations still have this limitation, and the same solutions apply, plus you can alternatively specify the maxsockets parameter in the boot loader.
Thanks for your very useful explanation of how to fix this on freeBSD. I'm reassigning this to Sonja. She can update the test page.
Assignee: nelsonb → sonmi
Target Milestone: --- → 3.2
Sent email to Scott Carver to update the Webpage. I personally am not so sure that this is a great idea, because he might end up having to describe a lot of kernel configurations and parameters. This would be useful information in a readme file, or as a comment in the source, also we could add the kernelparameters we had to change on HP and (still have to) on AIX
I think the webpage should just mention what the problem is, the reason for it, and the general solution -- increase the size of your kernel's datastructures. (Roughly, the quoted text in the second paragraph of my 11/02 comment.) I could slap together some appropriate text if this would be useful. Specific instructions for each OS as to how to accomplish this should then be in a README file in nss/tests/ssl or somewhere. (Probably just having it be a comment in the source code would be too obscure.) The webpage can have a link or a reference to this file.
Status: NEW → ASSIGNED
checked file platform_specific_problems into mozilla/security/nss/tests/doc, containing this bug report and the changed hp kernel parameters - anyone willing to bring it into a more readable format is welcome to do so.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.