Closed
Bug 129701
Opened 23 years ago
Closed 22 years ago
Remove the sleep commands for Linux in kill_selfserv()
Categories
(NSS :: Test, enhancement, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
3.6
People
(Reporter: wtc, Assigned: wtc)
Details
Attachments
(2 files)
4.24 KB,
patch
|
Details | Diff | Splinter Review | |
4.20 KB,
patch
|
Details | Diff | Splinter Review |
In tests/ssl/ssl.sh, kill_selfserv(), we sleep for 30 seconds on Linux after killing the selfserv process, otherwise the next selfserv process may fail to bind to the TCP port with an "Local Network address is in use" error. These sleep 30 commands significantly slow down the QA runs on Linux. In http://bugzilla.mozilla.org/show_bug.cgi?id=119340#c13, I proposed a hypothesis and an inelegant patch. That patch was backed out because of interaction with another problem (the 'ps' command doesn't seem to be reliable on Linux). Now that we are not using 'ps' to determine if a process is alive, I'd like to try my inelegant solution again.
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → 3.4.1
Comment 1•23 years ago
|
||
I really hope we can remove the sleep 30. I was planing to do this after RTM, and I think even without the patch we can reduce it to 3-5 seconds - of course complete removal would be even better
Assignee | ||
Comment 2•23 years ago
|
||
Changed the QA contact to Bishakha.
QA Contact: sonja.mirtitsch → bishakhabanerjee
Assignee | ||
Comment 5•22 years ago
|
||
This is an updated version of the patch in bug 119340.
Assignee | ||
Comment 6•22 years ago
|
||
I found that my patch doesn't work on Red Hat Linux 7.3. I tried patching it further and different solutions, but none of them work reliably. So I'm going to mark this WONTFIX. I am afraid that we will need to solve this problem by redefining the problem. For example, we can modify selfserv so that it will sleep one second and retry PR_Bind if PR_Bind fails with the address in use error. Or we can modify our test scripts so that selfserv uses an arbitrary unused port.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → WONTFIX
Comment 7•22 years ago
|
||
> Or we can modify our test scripts
> so that selfserv uses an arbitrary unused port.
I vote against this, because it will result in even more selfservs hanging on QA
machines after failed QA
Assignee | ||
Comment 8•22 years ago
|
||
Using a fixed port for selfserv has prevented us from doing simultaneous NSS QA runs on the same machine. Another possible solution is to shut down selfserv cleanly by sending a 'stop' command to it.
Assignee | ||
Comment 9•22 years ago
|
||
OK, I came up with a different solution. Instead of trying to figure out how to kill and wait for a multithreaded process on Linux reliably, I simply replace the 'sleep 30' command by a loop that tries to bind to the port. When we successfully bind to the port, we know the previous selfserv process has fully died and freed the port. I added a -b (bindOnly) option to selfserv so that I can use selfserv to do the "try bind". A possible improvement is to try to bind to the port for no more than a fixed number of times. Another possible improvement is to print a message before each "try bind" so that output.log shows what's going on behind the scene. Please review and test this patch. Thanks.
Comment 10•22 years ago
|
||
I'd like to suggest Yet Another Solution. We could pass an argument to selfserv that determines the number of connections it will handle. After that number, selfserv shuts down cleanly. This has the advantage of a clean shutdown, without the delay of sending a STOP request.
Assignee | ||
Comment 11•22 years ago
|
||
I am sure there is a reason why we are killing the selfserv processes instead of cleanly shutting them down. I guess it is to avoid leftover selfserv processes or hanging QA when there is a bug in selfserv. I think a clean shutdown solution (either by sending a STOP request or by handling a specified number of connections) is better. But we would still need to kill selfserv when it fails to shut down cleanly because of a bug. Between the two proposed clean shutdown solutions, it seems that handling a specified number of connections is more reliable and less vulnerable to a bug in our code. My proposed patch #2 is quite simple. How about if we use it at least as an interim solution?
Status: RESOLVED → REOPENED
Priority: P1 → P2
Resolution: WONTFIX → ---
Comment 12•22 years ago
|
||
I agree we still have to shutdown the server in the even the number of specified connections was not completed due to a bug. I think your patch is good.
Comment 13•22 years ago
|
||
> Using a fixed port for selfserv has prevented us
> from doing simultaneous NSS QA runs on the same
> machine.
I run 2 different tinderboxes plus nightly QA on an HP machine and a Solaris 8
machine without problems, just set the ENV PORT to a portnumber before starting
all.sh (ssl.sh) or use nssqa with -p <portnumber>. nssqa -t, as we start the
tinderbox qa will automatically pick a different portnumber too.
I think the loop is a good idea for now, but we should really implement Ian's
solution as soon as someone has the time.
Assignee | ||
Comment 14•22 years ago
|
||
I will check in my patch on Saturday to get some testing done on it during the weekend.
Status: REOPENED → ASSIGNED
Assignee | ||
Comment 15•22 years ago
|
||
Comment on attachment 98094 [details] [diff] [review] Proposed patch #2: replace 'sleep 30' by looping until the port is freed I just checked in this patch to have it tested on Tinderbox over the weekend.
Comment 16•22 years ago
|
||
problems on todays backward compatibility tests, see ftp://ftp.mozilla.org/pub/security/nss/daily_qa/20020911.1/bct/huey.2/results.html ftp://ftp.mozilla.org/pub/security/nss/daily_qa/20020911.1/bct/huey.2/output.log backward compatibility tests on Linux 2.4 run 331 scripts (do a sleep 30) and binaries and todays libraries init.sh init: Testing PATH /share/builds/mccrel3/nss/nsstip/../nss331/builds/20010928.2.331-RTM/booboo_Solaris8/mozilla/dist/../security/nss/tests:/share/builds/mccrel3/nss/nsstip/../nss331/builds/20010928.2.331-RTM/booboo_Solaris8/mozilla/dist/Linux2.4_x86_glibc_PTH_OPT.OBJ/bin:/lib:/usr/lib:/bin:/sbin:/usr/bin:/usr/sbin:.:/u/svbld/bin:/tools/ns/bin:/usr/ccs/bin:/usr/dist/local/exe:/usr/bin/X11:/usr/audio/bin:/u/svbld/ns/securityqa against LIB /share/builds/mccrel3/nss/nsstip/builds/20020911.1/booboo_Solaris8/mozilla/dist/Linux2.4_x86_glibc_PTH_OPT.OBJ/lib ssl.sh: TLS Require client auth (bad password) ---- selfserv -D -p 8443 -d ../server -n huey.red.iplanet.com \ -w nss -r -r -i ../tests_pid.30698 & selfserv started at Wed Sep 11 07:19:55 PDT 2002 tstclnt -p 8443 -h huey -q -d . < /share/builds/mccrel3/nss/nss331/builds/20010928.2.331-RTM/booboo_Solaris8/mozilla/security/nss/tests/ssl/sslreq.txt ssl.sh: Exit: 10 Fatal - selfserv process not detectable /share/builds/mccrel3/nss/nsstip/../nss331/builds/20010928.2.331-RTM/booboo_Solaris8/mozilla/dist/../security/nss/tests/all.sh: line 1: 7349 Terminated selfserv -D -p ${PORT} -d ${R_SERVERDIR} -n ${HOSTADDR} -w nss ${sparam} -i ${R_SERVERPID} $verbose (wd: /share/builds/mccrel3/nss/nsstip/builds/20020911.1/booboo_Solaris8/mozilla/tests_results/security/bct/huey.2/client)
Assignee | ||
Comment 17•22 years ago
|
||
Sonja, thanks a lot for reporting these failures. The failures in backward compatibility tests cannot have been caused by my checkin because I only modified init.sh, ssl.sh, and selfserv.exe. I think these were caused by the Linux 'ps' problem we identified in http://bugzilla.mozilla.org/show_bug.cgi?id=119340#c38. Maybe we should check in the "kill -0 $PID" fix into the backward compatibility test scripts.
Assignee | ||
Comment 18•22 years ago
|
||
Marked the bug fixed.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago → 22 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•