Closed
Bug 490451
Opened 16 years ago
Closed 2 years ago
Selfserv/strsclnt tests found hanged on memory leak Tinderbox.
Categories
(NSS :: Tools, defect, P5)
Tracking
(Not tracked)
RESOLVED
INACTIVE
People
(Reporter: slavomir.katuscak+mozilla, Unassigned)
Details
Build:
Tinderbox: harpsichord SunOS/sparc 32bit OPT, Started 2009/04/27 10:49
Scenario:
1. Selfserv started under DBX with memory leak checking enabled.
2. Strsclnt connects to selfserv and do some communication.
After strsclnt is done, it should disconnect from selfserv and other strsclnt tests should continue.
Last lines in logfile:
---
strsclnt -q -p 8222 -d /export/tinderlight/data/harpsichord_32_OPT/mozilla/tests_results/security/harpsichord.1/client_memleak -w nss -c 1000 -
n TestUser harpsichord.red.iplanet.com -C g
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable
0 stateless resumes
strsclnt: 999 cache hits; 1 cache misses, 0 cache not reusable
0 stateless resumes
memleak.sh: -------- Trying cipher i:
strsclnt -q -p 8222 -d /export/tinderlight/data/harpsichord_32_OPT/mozilla/tests_results/security/harpsichord.1/client_memleak -w nss -c 1000 -
n TestUser harpsichord.red.iplanet.com -C i
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable
0 stateless resumes
---
On testing machine this connection remains hanged, with both selfserv and strsclnt running. As selfserv was already under DBX, I was not able to reattach it, I tried to attach at least strsclnt:
(dbx) threads
> t@1 a l@1 ?() running in __lwp_wait()
t@4 a l@4 _pt_root() running in __pollsys()
(dbx) where t@1
current thread: t@1
=>[1] __lwp_wait(0x4, 0xffbfeab4, 0x7b028, 0x69, 0x5, 0x5), at 0xfef3d618
[2] lwp_wait(0x4, 0xffbfeab4, 0x0, 0xfef6bad4, 0x5, 0x5), at 0xfef2f25c
[3] _thrp_join(0x4, 0x0, 0xffbfeb1c, 0x1, 0xffbfeab4, 0xfef68bc0), at 0xfef3926c
[4] PR_JoinThread(0xafafafaf, 0xafafac00, 0x3d98c, 0x1, 0x3e6b0, 0x99060), at 0xff111d38
[5] client_main(0x3d8ac, 0x4088, 0x8, 0x4000, 0x40918, 0x24758), at 0x17f24
[6] main(0xffbfec80, 0x1, 0x40618, 0x3d8a8, 0x3b1a0, 0x24964), at 0x18d9c
(dbx) where t@4
current thread: t@4
=>[1] __pollsys(0xfe8fb708, 0x1, 0xfe8fb690, 0x0, 0x0, 0x0), at 0xfef3d1c4
[2] _pollsys(0xfe8fb708, 0x1, 0xfe8fb690, 0x0, 0x0, 0x1388), at 0xfef30790
[3] _poll(0xfe8fb708, 0x1, 0x1388, 0x10624c00, 0x0, 0x0), at 0xfeeda9b0
[4] pt_poll_now(0xfe8fb788, 0x0, 0x5, 0x0, 0xfe8fb708, 0xffffffff), at 0xff10b19c
[5] pt_Recv(0x40c18, 0x188cc4, 0x8, 0x0, 0xffffffff, 0x5), at 0xff10d088
[6] ssl_DefRecv(0x188a40, 0xff10cf60, 0x5, 0xff12b590, 0x18b9f0, 0x0), at 0xff36915c
[7] ssl3_GatherData(0x188a40, 0x188c84, 0x0, 0x1000, 0x4800, 0x1), at 0xff3636c0
[8] ssl3_GatherCompleteHandshake(0x188a40, 0x0, 0xc9c00, 0x4, 0x1, 0xfe8fb8dc), at 0xff36380c
[9] ssl_GatherRecord1stHandshake(0x188a40, 0xfffd4cb0, 0x90cc0000, 0x90cc0000, 0x8000, 0x188a40), at 0xff3651e0
[10] ssl_Do1stHandshake(0x0, 0xff3675c8, 0x178bc0, 0x188a40, 0x1, 0x0), at 0xff36c7b4
[11] ssl_SecureSend(0x0, 0x23d54, 0x15, 0x8000, 0x188a40, 0x91cc0000), at 0xff36e610
[12] ssl_Send(0x18b9f0, 0x23d54, 0x15, 0xff36e444, 0xffffffff, 0x188a40), at 0xff374320
[13] do_connects(0xfffe972c, 0x18b9f0, 0x15c00, 0xf000, 0x18c1d0, 0x3b1a4), at 0x16bbc
[14] thread_wrapper(0x3d8e4, 0x2400, 0x4688d, 0xc6bc5643, 0x3a628, 0x3cc20), at 0x151d4
[15] _pt_root(0x99060, 0x4, 0xff12b76c, 0xff12b780, 0x0, 0xa57d8), at 0xff111710
I'm not really sure if problem is on selfserv or on strsclnt side.
I let this DBX attached in the screen on the machine, feel free to connect there if you need some more information. (There are 4 test suites running in parallel, every uses one window of screen command, this one is 32bit OPT, DBX uses special screen window).
Similar hangs were seen very often before workaround for Solaris/DBX/bash bug was added (see bug 464223 comment 33), since that I haven't seen hangs like this for more than 4 months. There was one very similar hang on machine grow about week ago, but this machine I use for experiments and workaround for mentioned bug was partially disabled there, therefore failures on this machine are expected.
Reporter | ||
Comment 1•16 years ago
|
||
Logfile from DBX that runs selfserv doesn't contain any error messages, only info about loading libraries and info about enabling memory leak checking.
Reporter | ||
Comment 2•16 years ago
|
||
The same (or very similar) issue occured today on the same machine, but on DBG build. I found tests running connection from strsclnt to selfserv hanged, this time strsclnt was under DBX, so I was able to attach selfserv:
$ dbx /export/tinderlight/data/harpsichord_32_DBG/mozilla/dist/SunOS5.10_DBG.OBJ/bin/selfserv 29229
Attached to process 29229 with 6 LWPs
t@1 (l@1) stopped in __pollsys at 0xfee3d1c4
0xfee3d1c4: __pollsys+0x0004: ta %icc,0x00000008
Current function is pt_poll_now
601 rv = poll(&tmp_pfd, 1, msecs);
(dbx) threads
> t@1 a l@1 ?() running in __pollsys()
t@2 a l@2 _pt_root() sleep on 0x4a540 in __lwp_park()
t@3 a l@3 _pt_root() sleep on 0x4a540 in __lwp_park()
t@4 a l@4 _pt_root() sleep on 0x4a540 in __lwp_park()
t@5 a l@5 _pt_root() sleep on 0x4a540 in __lwp_park()
t@6 a l@6 _pt_root() sleep on 0x4a540 in __lwp_park()
(dbx) where t@1
current thread: t@1
[1] __pollsys(0xffbfe700, 0x1, 0xffbfe690, 0x0, 0x0, 0x0), at 0xfee3d1c4
[2] _pollsys(0xffbfe700, 0x1, 0xffbfe690, 0x0, 0x0, 0x1388), at 0xfee30790
[3] _poll(0xffbfe700, 0x1, 0x1388, 0x10624c00, 0x0, 0x0), at 0xfedda9b0
=>[4] pt_poll_now(op = 0xffbfe7f4), line 601 in "ptio.c"
[5] pt_Continue(op = 0xffbfe7f4), line 724 in "ptio.c"
[6] pt_Accept(fd = 0x4b460, addr = 0xffbfe9b8, timeout = 4294967295U), line 1707 in "ptio.c"
[7] ssl_Accept(fd = 0x4b100, sockaddr = 0xffbfe9b8, timeout = 4294967295U), line 1243 in "sslsock.c"
[8] PR_Accept(fd = 0x4b100, addr = 0xffbfe9b8, timeout = 4294967295U), line 199 in "priometh.c"
[9] do_accepts(listen_sock = 0x4b100, model_sock = 0x4b100, requestCert = 0), line 1342 in "selfserv.c"
[10] server_main(listen_sock = 0x4b100, requestCert = 0, privKey = 0xffbfeb9c, cert = 0xffbfebb0), line 1658 in "selfserv.c"
[11] main(argc = 16, argv = 0xffbfec5c), line 2231 in "selfserv.c"
(dbx) where t@2
current thread: t@2
=>[1] __lwp_park(0x0, 0x0, 0x0, 0x0, 0xfed1a020, 0x1), at 0xfee3c4a0
[2] cond_sleep_queue(0x4a540, 0x13db18, 0x0, 0x0, 0x0, 0x0), at 0xfee36650
[3] cond_wait_queue(0x4a540, 0x13db18, 0x0, 0x0, 0x0, 0x0), at 0xfee3676c
[4] cond_wait(0x4a540, 0x13db18, 0xfee68bc0, 0xfffffff8, 0xfee82400, 0x0), at 0xfee36cec
[5] _pthread_cond_wait(0x4a540, 0x13db18, 0x0, 0x0, 0x0, 0x0), at 0xfee36d28
[6] PR_WaitCondVar(cvar = 0x4a538, timeout = 4294967295U), line 417 in "ptsynch.c"
[7] jobLoop(a = (nil), b = (nil), c = 0), line 516 in "selfserv.c"
[8] thread_wrapper(arg = 0x141d08), line 496 in "selfserv.c"
[9] _pt_root(arg = 0x141e70), line 228 in "ptthread.c"
Stacks from threads 3-6 were similar to thread 2.
Updated•15 years ago
|
Assignee: nelson → nobody
Target Milestone: 3.12.5 → ---
Updated•3 years ago
|
Severity: normal → S3
Updated•2 years ago
|
Severity: S3 → S4
Status: NEW → RESOLVED
Closed: 2 years ago
Priority: -- → P5
Resolution: --- → INACTIVE
You need to log in
before you can comment on or make changes to this bug.
Description
•