Closed
Bug 81707
Opened 24 years ago
Closed 24 years ago
strsclnt problems on orville, exitcode still 0
Categories
(NSS :: Tools, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
3.3
People
(Reporter: sonja.mirtitsch, Assigned: larryh)
Details
Attachments
(1 file)
|
893 bytes,
patch
|
Details | Diff | Splinter Review |
5978 is PR_NOT_CONNECTED_ERROR
ssl.sh: SSL Stress Test ===============================
ssl.sh: Stress SSL2 RC4 128 with MD5 ----
selfserv -D -p 8443 -d ../server -n orville.red.iplanet.com \
-w nss -i ../tests_pid.6758 &
selfserv started at Fri May 18 06:05:57 PDT 2001
tstclnt -p 8443 -h orville -q -d . <
/h/hs-sca15c/export/builds/mccrel/nss/nsstip/builds/20010518.1/y2sun2_Solaris8/mozilla/security/nss/tests/ssl/sslreq.txt
strsclnt -q -p 8443 -d . -w nss -c 1000 -C A \
orville.red.iplanet.com
strsclnt started at Fri May 18 06:05:57 PDT 2001
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: 1 server certificates tested.
strsclnt completed at Fri May 18 06:06:02 PDT 2001
ssl.sh: Stress SSL3 RC4 128 with MD5 ----
selfserv -D -p 8443 -d ../server -n orville.red.iplanet.com \
-w nss -i ../tests_pid.6758 &
selfserv started at Fri May 18 06:06:02 PDT 2001
tstclnt -p 8443 -h orville -q -d . <
/h/hs-sca15c/export/builds/mccrel/nss/nsstip/builds/20010518.1/y2sun2_Solaris8/mozilla/security/nss/tests/ssl/sslreq.txt
strsclnt -q -p 8443 -d . -w nss -c 1000 -C c \
orville.red.iplanet.com
strsclnt started at Fri May 18 06:06:03 PDT 2001
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
strsclnt: 987 cache hits; 1 cache misses, 0 cache not reusable
strsclnt completed at Fri May 18 06:06:10 PDT 2001
-------------- working stress test on dump, same OS
ssl.sh: SSL Stress Test ===============================
ssl.sh: Stress SSL2 RC4 128 with MD5 ----
selfserv -D -p 8443 -d ../server -n dump.red.iplanet.com \
-w nss -i ../tests_pid.16154 &
selfserv started at Fri May 18 04:21:09 PDT 2001
tstclnt -p 8443 -h dump -q -d . <
/h/hs-sca15c/export/builds/mccrel/nss/nsstip/builds/20010518.1/y2sun2_Solaris8/mozilla/security/nss/tests/ssl/sslreq.txt
strsclnt -q -p 8443 -d . -w nss -c 1000 -C A \
dump.red.iplanet.com
strsclnt started at Fri May 18 04:21:11 PDT 2001
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: 1 server certificates tested.
strsclnt completed at Fri May 18 04:21:27 PDT 2001
/h/hs-sca15c/export/builds/mccrel/nss/nsstip/builds/20010518.1/y2sun2_Solaris8/mozilla/security/nss/tests/all.sh[99]:
16778 Terminated
ssl.sh: Stress SSL3 RC4 128 with MD5 ----
selfserv -D -p 8443 -d ../server -n dump.red.iplanet.com \
-w nss -i ../tests_pid.16154 &
selfserv started at Fri May 18 04:21:27 PDT 2001
tstclnt -p 8443 -h dump -q -d . <
/h/hs-sca15c/export/builds/mccrel/nss/nsstip/builds/20010518.1/y2sun2_Solaris8/mozilla/security/nss/tests/ssl/sslreq.txt
strsclnt -q -p 8443 -d . -w nss -c 1000 -C c \
dump.red.iplanet.com
strsclnt started at Fri May 18 04:21:28 PDT 2001
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: 999 cache hits; 1 cache misses, 0 cache not reusable
strsclnt completed at Fri May 18 04:21:52 PDT 2001
Comment 1•24 years ago
|
||
Larry, could you find out why strsclnt exits with status 0
after printing the error message below?
strsclnt: PR_Write returned error -5978:
Network file descriptor is not connected.
Assignee: wtc → larryh
Priority: -- → P2
Target Milestone: --- → 3.3
| Assignee | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
| Assignee | ||
Comment 2•24 years ago
|
||
Examining the code ...
The error message is emitted by function errWarn(), as called by
handle_connection(). After emitting the message, handle_connection() returns
SECFailure to its caller: do_connects(). do_connects() quietly ignores the
return value. That is the proximate cause.
In discovering the intent of the strsclnt, there is some difference of opinion
on its purpose. The original author (nelsonb) declared the program to be a "unit
test, designed to help during development of SSL. ... that its result value is
zero when a PR_Write() failed, is not important". The submitter of this bugzilla
is using strsclnt in automated QA tests, where the result value of the program
is an indication that the test failed. That there is no real "specification" for
the correct behavior of strsclnt leaves me in a quandry about "fixing" this at
all.
How-some-ever, were I going to make it return a failing result, I'd probable set
a global flag, such as "failed_already" when the error message is emitted, and
keep going. Then, at program exit, if failed_already is true, then exit with a
result indicating failure.
I'll let management decide on whether strsclnt is a "QA test program" or an
"engineer's unit test".
| Assignee | ||
Comment 3•24 years ago
|
||
| Assignee | ||
Comment 4•24 years ago
|
||
proposed patch, as described in my previous diatribe. :-)
Should do the trick. Leaves the program running for folks wanting it to keep
running: unit testers. Returns non-zero on errors in that PR_Write() for QA
purposes.
Comments, please.
| Reporter | ||
Comment 5•24 years ago
|
||
solution and patch look good to me.
Comment 6•24 years ago
|
||
Larry:
The NSS tip QA failed on orville again today with the same error.
It is odd that PR_Write() failed with PR_NOT_CONNECTED_ERROR. I
am worried that PR_Connect() might have returned prematurely before
the connection was established. Could you look into that? It would
be a good idea to run the NSPR 4.1.1 test suite on orville first.
(PR_NOT_CONNECTED_ERROR is the equivalent of the Unix errno ENOTCONN
from send().)
orville is maintained by the HP engineers working with the web server
team. Therefore, it tends to have all the recommended kernel patches
installed. It is likely that orville behaves differently from the
other HP boxes we run our tests on (hp64 and dump).
We also need to understand the purpose of the strsclnt test. If it
is solely intended for generating client traffic for a server, it
makes sense that it exits with a success status when non-fatal
errors occur. The fact that the particular PR_Write() failure is only
handled with a function named "errWarn", and the fact that many other
failures are handled with "errExit" or exit() seem to indicate that
the PR_Write() failure in question is not a fatal error and only warrants
a warning message. If, after checking with Nelson, you decide that this
is a fatal error, I would suggest that you use errExit() or exit() to
terminate the strsclnt test rather than adding a new "failed_already"
global variable.
| Reporter | ||
Comment 7•24 years ago
|
||
about the orville failures: There are tons of old processes running on orville,
I think a reboot would fix the problem there. The reaso I have not requested a
reboot yet is, that I would like to see the fix work as long as we have the error.
If Nelson owns the QA stress test then make the decision between you and Nelson,
otherwise you might want to talk to me as well, since I also have some input and
suggestions to make.
| Reporter | ||
Comment 8•24 years ago
|
||
I do not think it is a "fatal" condition, so I prefer Larry's solution to have
the test to run to it's end if possible as opposed to quit on the first failure.
However I am certain that the PR_Write failure should produce a non 0 exit code,
because they indicate that something is not working as expected, even if it is
not "fatal".
I too would like to hear Nelson's input, maybe we can get together in the early
afternoon between Larry, Nelson and me. If some unit tests need it to exit with
0 after the failures I would like to discuss this too.
What I am afraid of is the "not-my-job" approach - for example if the stress
tests find a non stress related problem and fail to report it because we make
the focus so narrow, and have the tests only report "expected" stress related
problems.
| Assignee | ||
Comment 9•24 years ago
|
||
I re-ran NSPR's test suite on orville this morning, debug and optimized builds.
There were no errors indicating that PR_Connect() is misbehaving.
Sonja's observation that there are many strsclnt (or selfserv) processes running
on orville is consistent with my experience that multiple idling selfservs can
lead to unpredictable results when talking to strsclnt. I believe her suggestion
of killing off the hung processes or rebooting the box is sound. ... I have to
ask: How did orville end up with dangling selfservs in the first place?
| Reporter | ||
Comment 10•24 years ago
|
||
orville did not have old stressclient and selfserver processes, but a lot of
webserver related processes (at least that's what I think they are).
Also, there seems to be a constant tinderbox build going on, which at times
fills up the /tmp. It might not be the best machine for us to test on, I just
tried it again after not using it for 3 months or so, because Christian thought
it should work for us again.
Comment 11•24 years ago
|
||
Larry,
I would change the last line in your patch
exitVal = ( exitVal || failed_already )? 1 : 0;
to something like
if (!exitVal) {
exitVal = failed_already;
}
This way we don't lose the original value of exitVal
if it is nonzero.
You can go ahead and check in your patch.
| Assignee | ||
Comment 12•24 years ago
|
||
Checked in parts.
| Assignee | ||
Comment 13•24 years ago
|
||
Marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•