Closed Bug 83593 Opened 23 years ago Closed 23 years ago

stress tests fail on orville

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

3.3.1

People

(Reporter: sonja.mirtitsch, Assigned: wtc)

References

Details

Attachments

(8 files, 1 obsolete file)

stressclient output 23 years ago Sonja Mirtitsch 509.22 KB, text/plain		Details
selfserver output, coresponding to the attachement above 23 years ago Sonja Mirtitsch 589.95 KB, text/plain		Details
script to reproduce the problem 23 years ago Sonja Mirtitsch 1.34 KB, text/plain		Details
A possible workaround. Write 0 bytes to the socket and retry getpeername. 23 years ago Wan-Teh Chang 1.03 KB, patch		Details \| Diff \| Splinter Review
The bug report I submitted to HP about the getpeername() problem. 23 years ago Wan-Teh Chang 1.77 KB, text/plain		Details
Test server server.c (bug report attachment 1) 23 years ago Wan-Teh Chang 1.86 KB, text/plain		Details
Test client nbclient.c (bug report attachment 2) 23 years ago Wan-Teh Chang 4.97 KB, text/plain		Details
Test server server.c (bug report attachment 1) 23 years ago Wan-Teh Chang 2.08 KB, text/plain		Details
Second workaround: a patch for NSPR. Select the socket for writing with zero timeout after successful completion of a non-blocking connect. 23 years ago Wan-Teh Chang 1.07 KB, patch		Details \| Diff \| Splinter Review

Sonja Mirtitsch

Reporter

Description

•

23 years ago

the iWs team would let us use orville again, which can finish our tests in 20 minutes (dump takes 3-5 hours) The strestests on orville fail, but other HP sysems pass. I filed bug #81707 on the fact that the stress tests did not return an error, but the QA stat did. When this was fixed I rebooted orville, but this did not help the condition. Information about the failure is in every day's QA log. Would you have a few minutes to explain to me what the failure means exactly?

Sonja Mirtitsch

Reporter

Comment 1

•

23 years ago

Nelson, could you please have a look at it? I realize it seems low priority but it costs my time constantely, sinc eI have to doublecheck the QA reports every single day, and even if it is just 5 or 10 minutes every day it adds up. It also makes the QA report harder to read for the rest of you.

Nelson Bolyard (seldom reads bugmail)

Comment 2

•

23 years ago

It is indeed a low priority until I get the SSL server session cache rewrite done. strsclnt is reporting error -5978 PR_NOT_CONNECTED_ERROR, which is equivalent to unix's ENOTCONN error, in response to a PR_Write call by the client. It appears that this error code is used only when an OS call returns ENOTCONN. The number of errors varies from run to run, typically between 3 and 12. The number of errors reported is coincidentally equal to 1000 - (cache_hits + cache_missed) as reported by strsclnt when it is done. This begs a question: Are these the last N connections that fail? or do these failures occur in the middle of the test with more succesful connections occuring afterwords? The failures occur with both SSL2 and SSL3 ciphersuites, specifically suites A and c, which are SSL2 RC4 128 WITH MD5 and SSL3 RSA WITH RC4 128 MD5 respectively. I suspect these are the first SSL2 and SSL3 suites that are tried by the QA test, respectively. Question: When one of these tests fails, do we stop trying any further suites from that protocol version? That is, when the first SSL2 suite fails, does the QA script go on and try the rest of the SSL2 suites? or does it stop there and go on to the SSL3 suites? Unless someone else works on this, these questions will go unanswered at least until next week, possibly later.

Sonja Mirtitsch

Reporter

Comment 3

•

23 years ago

CCing Wan-Teh, since we should have a look at this bug before the early release

Sonja Mirtitsch

Reporter

Updated

•

23 years ago

Summary: machine / configuration problems on orville → stress tests fail on orville

Sonja Mirtitsch

Reporter

Comment 4

•

23 years ago

Nelson, could you please attach the old logs to verify that the failures on orville were occuring before the recent SSL server cache changes, and are no different now than before? Thanks Orville was used as our main HP-UX 32 bit QA machine until about 5 months ago, and I am not aware that it showed these failures then, we could not use it for a while because of a tinderbox running there. When I started using it again, about 1 month or so ago I noticed this failure.