Closed Bug 14776 Opened 25 years ago Closed 25 years ago

Simple TestCases for nspr threads failing on HPUX 10.20

Categories

(Core :: XPCOM, defect, P3)

HP
HP-UX
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: pepper, Assigned: pepper)

Details

(Whiteboard: [Perf])

I dunno who should have a look at this but the simple testcase TestThreads in
the mozilla tree is failing on HPUX 10.20.  The error is:

1073757856[40003e68]: Loaded library a.out (init)
1073757856[40003e68]: nsIThread 400038f8 created
1073757856[40003e68]: nsIThread 400038f8 start join
1073892016[40024a78]: nsIThread 400038f8 start run 40003748
1073892016[40024a78]: nsIThread 400038f8 end run 40003748
1073757856[40003e68]: nsIThread 400038f8 exited
1073757856[40003e68]: nsIThread 400038f8 end join
1073757856[40003e68]: nsIThread 400038f8 destroyed
1073757856[40003e68]: nsIThread 400038f8 created
1073757856[40003e68]: nsIThread 400038f8 start join
1073757856[40003e68]: PR_JoinThread: 0x40024A78 not joinable | already smashed
1073757856[40003e68]: Assertion failure: !"Illegal thread join attempt", at ptth
read.c:516
Assertion failure: !"Illegal thread join attempt", at ptthread.c:516
running 0

I know this is only a simple testcase and may not work right on any platform
for all I know but we are having HUGE performance problems with nspr threading
on HPUX 10.20 and 11.00.  I'm running all the pr testcases on HP now and will
follow up shortly with all other testcases that currently fail on 10.20.  11.00
builds are completely hosed right now so we'll try and get to that next.
Status: NEW → ASSIGNED
The assertion failure indicates that the thread being join'd is not a valid
thread, i.e. the thread could have exited or the thread structure could have
been corrupted.

Is there a simple test program to reproduce this problem?

Also, more details than the statement "HUGE performance problems with nspr
threading" are needed to understand the performance issues.
Whiteboard: [Perf]
Putting on [Perf] radar.
I've ran through most of the tests in nsprpub/pr/tests and everything seems to
work reasonably well.  The following tests hang indefinitely: select2, pipeping,
pipepong, thruput, and tmoacc.  There may be others.
acceptread fails with the following error:
Testing w/ write_dally = 2000 msec
PR_AcceptRead (server) failed: PR_IO_TIMEOUT_ERROR(-5990), oserror = 0
PR_Recv (client) succeeded: 0 bytes
GET ÿÿÿÿ
Testing w/ write_dally = 2500 msec
PR_AcceptRead (server) failed: PR_IO_TIMEOUT_ERROR(-5990), oserror = 0
PR_Recv (client) failed: PR_CONNECT_RESET_ERROR(-5961), oserror = 232
PR_Shutdown (client) failed: PR_INVALID_ARGUMENT_ERROR(-5987), oserror = 22

When loading apprunner or viewer after a clean build there is a massive delay
everytime nsIThreadPool is referenced: (reported through PR_LOG_MODULES=all:5)
1073766336[40005f88]: InMemoryDataSource(401264e0): MARK
1073766336[40005f88]:   [(40294470)chrome://related/]--
1073766336[40005f88]:
---[(401ac2e8)http://www.w3.org/1999/02/22-rdf-syntax-ns#type]--
1073766336[40005f88]:   -->[(4011ef48)http://chrome.mozilla.org/rdf#chrome]
1075552360[401ba030]: nsIThreadPool thread 401b9f70 got request 401b9894
1075552360[401ba030]: nsIThreadPool thread 401b9f70 running 401b9894

Then, you'll sit and wait for a good long while before anything else happens.
It takes about 45 minutes to load the browser right now(conservatively).  If you
have to migrate a profile expect it to take much longer.

I believe Mike has ran through the debugger load a few times and may be able to
provide additional information.

This is all HPUX 10.20 info.  We are having even bigger problems on 11.00 right
now which may be resolved by HP directly.
There are a few spots with long delays on HPUX.  The most noticable
viewer/apprunner delays appear in xpcom/io/nsPipe2.cpp,
nsPipe::nsPipeInputStream::Fill().

two things going on here, nsPipe::GetReadSegment() and
nsAutoCMonitor.Notify()/.Wait().

Anyone know the appropriate behavior here?
There is still no information on what the NSPR problem is; leaving this
bug assigned to NSPR will probably result in no one looking at resolving this
bug.
This bug should be reassigned to the owner of the
test (mozilla/xpcom/tests/TestThreads.cpp).
Product: Browser.  Component: XPCOM.

pepper@netscape.com wrote:
> I've ran through most of the tests in nsprpub/pr/tests and
> everything seems to work reasonably well.  The following
> tests hang indefinitely: select2, pipeping, pipepong,
> thruput, and tmoacc.  There may be others.

Warning: some of our tests are broken.  Our test harness
is mozilla/nsprpub/pr/tests/runtests.ksh.  You can use
runtests.ksh to run all the working tests, or look
at it to see what individual tests you can run manually.

select2 is an obsolete test.

pipeping and pipepong are a pair.  You only need to
invoke pipeping.  SeaMonkey is not using any of the
features tested by pipeping, so the failure of this
test is not critical to SeaMonkey.

thruput needs to run with another instance of
thruput.

tmoacc needs to run with either tmocon or writev.
If you invoke tmocon, it will invoke tmoacc for you.

> acceptread fails with the following error:
> Testing w/ write_dally = 2000 msec
> PR_AcceptRead (server) failed: PR_IO_TIMEOUT_ERROR(-5990), oserror = 0
> PR_Recv (client) succeeded: 0 bytes
> GET ÿÿÿÿ
> Testing w/ write_dally = 2500 msec
> PR_AcceptRead (server) failed: PR_IO_TIMEOUT_ERROR(-5990), oserror = 0
> PR_Recv (client) failed: PR_CONNECT_RESET_ERROR(-5961), oserror = 232
> PR_Shutdown (client) failed: PR_INVALID_ARGUMENT_ERROR(-5987), oserror = 22

These are expected error output.  It is the process
exit status that determines whether a test passes or
fails.  The runtests.ksh script examines the exit
status of the tests.  I agree the output generated
by this test is very confusing.
Assignee: srinivas → warren
Status: ASSIGNED → NEW
Component: NSPR → XPCOM
Product: NSPR → Browser
Assignee: warren → pepper
I'm not going to work on this.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.