Closed
Bug 137800
Opened 22 years ago
Closed 21 years ago
tinderbox core dumps (y2sun1 Solaris 6, OSF 5.1)
Categories
(NSS :: Test, defect, P1)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: sonja.mirtitsch, Assigned: bishakhabanerjee)
Details
machine had hanging selfservers, killed a few times, checked where they came from, found cores in client and ronlydir on y2sun1 (4/14 22:52 and 4/16 01:29) and dijkstra 4/16 00:37). Maybe problems in dbtest.
Comment 1•22 years ago
|
||
I saw the failures, and built on axilla, which is also Solaris 2.6. I have been running SSL stress tests on it (100,000 connections in each test) and haven't seen the problem. You have a script that detects coredumps in the QA, right? In that case, maybe it should shutdown the tinderbox on that machine, so that the state is frozen. Then we can go in and debug the core file. That may be harder to do than I realize, though...
Reporter | ||
Comment 2•22 years ago
|
||
put a stop on fail on dijkstra's tinderbox (selfserv core) I am not certain about the solaris cores since it looks loke shell core dumps, some of our lab machines had troubles tonight, there might be something wrong with the network. Neither of the cores had anything to do with the dbtest, this test just copies the client directory and makes it readonly.
Reporter | ||
Comment 3•22 years ago
|
||
copied http://cindercone.red.iplanet.com/share/builds/mccrel3/nss/nsstip/tinderbox/tests_results/security/dijkstra-20020417-05.26 to /u/sonmi/tmp ssl.sh: SSL Stress Test Extended test =============================== ssl.sh: skipping Stress SSL2 RC4 128 with MD5 for Extended test ssl.sh: Stress SSL3 RC4 128 with MD5 ---- selfserv -D -p 8444 -d ../ext_server -n dijkstra.red.iplanet.com \ -w nss -i ../tests_pid.315559 & selfserv started at Wed Apr 17 05:31:01 PDT 2002 tstclnt -p 8444 -h dijkstra -q -d ../ext_client < /usr2/nss_tbx_OSF1-5.1/builds/tinderbox/OSF1-5.1/mozilla/security/nss/tests/ssl/sslreq.txt \ strsclnt -q -p 8444 -d ../ext_client -w nss -c 1000 -C c \ dijkstra.red.iplanet.com strsclnt started at Wed Apr 17 05:31:01 PDT 2002 strsclnt: -- SSL: Server Certificate Validated. strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable /usr2/nss_tbx_OSF1-5.1/builds/tinderbox/OSF1-5.1/mozilla/security/nss/tests/all.sh: 317397 Memory fault - core dumped strsclnt completed at Wed Apr 17 05:31:19 PDT 2002 /usr2/nss_tbx_OSF1-5.1/builds/tinderbox/OSF1-5.1/mozilla/security/nss/tests/all.sh: 317160 Terminated
OS: Solaris → other
Hardware: Sun → Other
Summary: y2sun1 Solaris 6 tinderbox core dumps → tinderbox core dumps (y2sun1 Solaris 6, OSF 5.1)
Comment 4•22 years ago
|
||
Here is the trace from the core: 7 free(0x3ffc0087f58, 0x100000000, 0x3ffc0087390, 0xfffffffffffffff6, 0x300020219bc) [0x3ff800d3500] 8 PR_Free(ptr = 0x140219240) ["prmem.c":82, 0x300020219b8] 9 PR_DestroyLock(lock = 0x140219240) ["ptsynch.c":176, 0x30002035900] 10 pk11_CleanupFreeLists() ["pkcs11u.c":1789, 0x30002856da8] 11 NSC_Finalize(pReserved = (nil)) ["pkcs11.c":2409, 0x30002844308] 12 SECMOD_UnloadModule(mod = 0x14003d420) ["pk11load.c":285, 0x30000849324] 13 SECMOD_SlotDestroyModule(module = 0x14003d420, fromSlot = 1) ["pk11util.c":653, 0x30000861d00] 14 PK11_DestroySlot(slot = 0x1400fb000) ["pk11slot.c":483, 0x3000084cba0] 15 PK11_FreeSlot(slot = 0x1400fb000) ["pk11slot.c":515, 0x3000084cca4] 16 SECMOD_DestroyModule(module = 0x14003d420) ["pk11util.c":630, 0x30000861bd8] 17 SECMOD_DestroyModuleListElement(element = 0x14003ae00) ["pk11util.c":669, 0x30000861d94] More (n if no)? 18 SECMOD_DestroyModuleList(list = 0x14003ae00) ["pk11util.c":684, 0x30000861e00] 19 SECMOD_Shutdown() ["pk11util.c":87, 0x300008608d0] 20 NSS_Shutdown() ["nssinit.c":462, 0x3000082e9e8] 21 main(argc = 13, argv = 0x11fffc018) ["strsclnt.c":1147, 0x1200078c0]
Comment 5•22 years ago
|
||
Unfortunately, I couldn't get any more information from the core file. Bob, does this tell you anything?
Comment 6•22 years ago
|
||
I'm not sure about that core file. Here is another, generated by my own build:
8 PR_Free(ptr = 0x14019ea80) ["prmem.c":82, 0x300020219b8]
9 PORT_Free(ptr = 0x14019ea80) ["secport.c":149, 0x30000884050]
10 SECITEM_FreeItem(zap = 0x14019ea80, freeit = 1) ["secitem.c":227,
0x30000882ef4]
11 PK11_DestroyContext(context = 0x14015fc80, freeit = 1) ["pk11skey.c":3250,
0x3000085c480]
12 ssl_ResetSecurityInfo(sec = 0x20000fd5458) ["sslsecur.c":805, 0x3ffbffeaee0]
13 ssl_DestroySecurityInfo(sec = 0x20000fd5458) ["sslsecur.c":845, 0x3ffbffeb060]
14 ssl_DestroySocketContents(ss = 0x20000fd5440) ["sslsock.c":354, 0x3ffbffef6b4]
15 ssl_FreeSocket(ss = 0x14014de00) ["sslsock.c":418, 0x3ffbffef8e8]
More (n if no)?y
16 ssl_DefClose(ss = 0x14014de00) ["ssldef.c":244, 0x3ffbffe7460]
17 ssl_SecureClose(ss = 0x14014de00) ["sslsecur.c":906, 0x3ffbffeb314]
18 ssl_Close(fd = 0x140122c40) ["sslsock.c":1178, 0x3ffbfff17c8]
19 PR_Close(fd = 0x140122c40) ["priometh.c":131, 0x3000201618c]
> 20 do_connects(a = 0x11fffbed0, b = 0x140115800, connection = 9)
["strsclnt.c":779, 0x1200069e0]
The interesting thing to note is that the SECITEM_FreeItem call in
PK11_DestroyContext immediately follows a call to PK11_FreeSymKey. Here is some
possibly useful information from context->slot:
(dbx) print *context->slot
struct {
functionList = 0x300428976d0
module = 0x14003d420
needTest = 0
isPerm = 1
isHW = 0
isInternal = 1
disabled = 0
reason = PK11_DIS_NONE
readOnly = 1
needLogin = 0
hasRandom = 1
defRWSession = 0
isThreadSafe = 1
flags = 32771
session = 1
sessionLock = 0x140111640
slotID = 1
defaultFlags = 2684370749
refCount = 33
refLock = 0x1400f6d80
freeListLock = 0x1401117c0
freeSymKeysHead = 0x1401d1f00
keyCount = 13
It appears this is a shutdown error, not a stress error. I've written a script
that runs strsclnt instances until a core file is generated. I must have been
very lucky to get this core; the script has been running for a while.
Comment 7•22 years ago
|
||
The first core file looks like shutdown, the second is not (we don't free contexts on shutdown). Note that both of these crash in free, I suspect that our problem is probably a double free or a free into the wrong heap. Since Ian can't reproduce it easily, it's probably a race condition. bob
Comment 8•22 years ago
|
||
another core: 7 malloc(0x3ffc0086e90, 0x1000000a0, 0x3ff800d423c, 0x15c, 0x140148280) [0x3ff800d1ca0] 8 calloc(0x3ff800d423c, 0x15c, 0x140148280, 0x14003af80, 0x30002021948) [0x3ff800d4238] 9 PR_Calloc(nelem = 1, elsize = 348) ["prmem.c":64, 0x30002021944] 10 PORT_ZAlloc(bytes = 348) ["secport.c":137, 0x3000288c330] 11 SHA1_NewContext() ["sha_fast.c":328, 0x30002869714] 12 NSC_DigestInit(hSession = 709, pMechanism = 0x20000f45680) ["pkcs11c.c":1062, 0x3000284a0b8] 13 pk11_context_init(context = 0x14021e980, mech_info = 0x20000f45680) ["pk11skey.c":3364, 0x3000085c898] 14 pk11_CreateNewContextInSlot(type = 544, slot = 0x1400faa00, operation = 2164260864, symKey = (nil), param = 0x20000f456e8) ["pk11skey.c":3450, 0x3000085cc7c] 15 PK11_CreateDigestContext(hashAlg = SEC_OID_SHA1) ["pk11skey.c":3558, 0x3000085cfc4] More (n if no)?y 16 ssl3_InitState(ss = 0x14014b700) ["ssl3con.c":7641, 0x3ffbffdc538] 17 ssl3_SendClientHello(ss = 0x14014b700) ["ssl3con.c":2590, 0x3ffbffd1190] 18 ssl2_BeginClientHandshake(ss = 0x14014b700) ["sslcon.c":3072, 0x3ffbffe5758] 19 ssl_Do1stHandshake(ss = 0x14014b700) ["sslsecur.c":155, 0x3ffbffe97a4] 20 ssl_SecureSend(ss = 0x14014b700, buf = 0x120004090 = "GET /abc HTTP/1.0\r\n\r\n", len = 21, flags = 0) ["sslsecur.c":1036, 0x3ffbffeb930] 21 ssl_SecureWrite(ss = 0x14014b700, buf = 0x120004090 = "GET /abc HTTP/1.0\r\n\r\n", len = 21) ["sslsecur.c":1070, 0x3ffbffeba90] 22 ssl_Write(fd = 0x14019e0c0, buf = 0x120004090, len = 21) ["sslsock.c":1260, 0x3ffbfff1bf0] 23 PR_Write(fd = 0x14019e0c0, buf = 0x120004090, amount = 21) ["priometh.c":141, 0x30002016214] 24 handle_connection(ssl_sock = 0x14019e0c0, connection = 47) ["strsclnt.c":645, 0x120006540] 25 do_connects(a = 0x11fffbed0, b = 0x140108040, connection = 47) ["strsclnt.c":772, 0x1200069a4]
Comment 10•22 years ago
|
||
Changed the QA contact to Bishakha.
QA Contact: sonja.mirtitsch → bishakhabanerjee
Comment 11•21 years ago
|
||
This is a crash, therefore it qualifies as P1, IFF it is still happening, otherwise it should be resolved WORKSFORME. Sonja, Bishakha, does this still happen?
Priority: -- → P1
Reporter | ||
Comment 12•21 years ago
|
||
Can't verify it it still happens at Sun, Solaris 6 is not supported anymore on NSS > 3.3.x, neither is OSF
Assignee | ||
Comment 13•21 years ago
|
||
Have never seen this on Netscape Tinderboxes. We do run them on OSF/1 and Solaris 5.8. Have never run on Solaris 2.6 here. Resolving "WORKSFORME".
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•