Closed Bug 400947 Opened 17 years ago Closed 16 years ago

thread unsafe operation in PKIX_PL_HashTable_Add cause selfserv to crash.

Categories

(NSS :: Libraries, defect, P1)

Sun
Solaris
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: slavomir.katuscak+mozilla, Assigned: alvolkov.bgs)

Details

(Whiteboard: PKIX NSS312B2)

Attachments

(1 file)

Branch: securitytip
Build: 20071023.1
Platform: SunOS5.10_i86pc_OPT.OBJ

Test category: SSL Stress Test - Server Bypass/Client Bypass - with ECC (PKIX)
Test failed: Stress TLS ECDH-RSA AES 128 CBC with SHA (no reuse, client auth)

ssl.sh: Stress TLS ECDH-RSA AES 128 CBC with SHA (no reuse, client auth) ----
selfserv starting at Tue Oct 23 05:42:12 PDT 2007
selfserv -D -p 8444 -d ../server -n mandela.red.iplanet.com -B -s \
         -e mandela.red.iplanet.com-ecmixed -w nss -r -r -c :C00E -i ../tests_pid.6990  &
trying to connect to selfserv at Tue Oct 23 05:42:12 PDT 2007
tstclnt -p 8444 -h mandela.red.iplanet.com -B -s -q \
        -d ../client < /share/builds/mccrel3/security/securitytip/builds/20071023.1/wozzeck_Solaris8/mozilla/security/nss/tests/ssl/sslreq.dat
kill -0 6068 >/dev/null 2>/dev/null
selfserv with PID 6068 found at Tue Oct 23 05:42:12 PDT 2007
selfserv with PID 6068 started at Tue Oct 23 05:42:12 PDT 2007
strsclnt -q -p 8444 -d ../client -B -s -w nss -c 10 -C :C00E -N -n TestUser-ecmixed \
          mandela.red.iplanet.com
strsclnt started at Tue Oct 23 05:42:12 PDT 2007
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: PR_Send returned error -5938:
Encountered end of file.
strsclnt: PR_Send returned error -5938:
Encountered end of file.
strsclnt: PR_Send returned error -5938:
Encountered end of file.
strsclnt: PR_Send returned error -5938:
Encountered end of file.
strsclnt: PR_Send returned error -5938:
Encountered end of file.
strsclnt: PR_Send returned error -5938:
Encountered end of file.
strsclnt: PR_Send returned error -5938:
Encountered end of file.
strsclnt: PR_Send returned error -5938:
Encountered end of file.
(pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed
(pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed
(pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed
(PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed
(pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed
(pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed
(pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed
(pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed
(pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed
(pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed
(pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed
(pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed
(pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed
(pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed
(pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed
strsclnt: 0 cache hits; 8 cache misses, 0 cache not reusable
strsclnt: NoReuse - 8 server certificates tested.
6068 Segmentation Fault - core dumped
strsclnt completed at Tue Oct 23 05:42:15 PDT 2007
ssl.sh: Stress TLS ECDH-RSA AES 128 CBC with SHA (no reuse, client auth) produced a returncode of 1, expected is 0.  FAILED

Crashed only once. Core file not found.
Today I found this problem again in nightly builds results. The same machine, the same build, the same test.
Slavo,

Why is there no core file ? Is there never any core file for this bug ?
Your log says there was a Segmentation fault and a core was dumped. We need to know where it is and get a stack. I haven't seen Solaris fail to write cores before.

As I read your problem report above , I see :

6068 Segmentation Fault - core dumped

and earlier :

selfserv with PID 6068 found at Tue Oct 23 05:42:12 PDT 2007
selfserv with PID 6068 started at Tue Oct 23 05:42:12 PDT 2007

So, it is selfserv that crashed, not strsclnt .
Summary: Strsclnt crashed in PKIX tests. → Selfserv crashed in PKIX tests.
Julien,

There was ulimit -c set to 0 (default settings) on Solaris machines. I fixed test_nss script to set ulimit -c unlimited, now core files should be dumped.
bash-3.00$ dbx /share/builds/mccrel3/security/securitytip/builds/20080223.1/wozzeck_Solaris8/mozilla/security/nss/cmd/selfserv/SunOS5.8_i86pc_DBG.OBJ/selfserv core

t@null (l@1) terminated by signal KILL (Killed)
0xd0a60aa5: __pollsys+0x0015:   jb       __cerror       [ 0xd09df790, .-0x81315 ]
Current function is pt_poll_now
  599                                   rv = poll(&tmp_pfd, 1, msecs);

(dbx) threads
      t@1  a  l@1   ?()   LWP suspended in  __pollsys() 
      t@2  a  l@2   _pt_root()   LWP suspended in  __pollsys() 
      t@3  a  l@3   _pt_root()   LWP suspended in  __pollsys() 
      t@4  a  l@4   _pt_root()   LWP suspended in  __pollsys() 
      t@5  a  l@5   _pt_root()   LWP suspended in  pkix_pl_GeneralName_Destroy() 
      t@6  a  l@6   _pt_root()   LWP suspended in  __pollsys() 
      t@7  a  l@7   _pt_root()   LWP suspended in  __pollsys() 
o     t@8  a  l@8   _pt_root()   signal SIGSEGV in  pkix_pl_PrimHashTable_GetBucketSize() 
      t@9  a  l@9   _pt_root()   LWP suspended in  __pollsys() 

(dbx) where t@1
current thread: t@1
=>[1] __pollsys(0x8046df8, 0x1, 0x8046dc8, 0x0), at 0xd0a60aa5 
  [2] _pollsys(0x8046df8, 0x1, 0x8046dc8, 0x0), at 0xd0a55229 
  [3] _poll(0x8046df8, 0x1, 0x1388), at 0xd0a0a672 
  [4] pt_poll_now(op = 0x8046e5c), line 599 in "ptio.c"
  [5] pt_Continue(op = 0x8046e5c), line 722 in "ptio.c"
  [6] pt_Accept(fd = 0x8180570, addr = 0x8046f64, timeout = 4294967295U), line 1696 in "ptio.c"
  [7] ssl_Accept(fd = 0x8085b40, sockaddr = 0x8046f64, timeout = 4294967295U), line 1227 in "sslsock.c"
  [8] PR_Accept(fd = 0x8085b40, addr = 0x8046f64, timeout = 4294967295U), line 199 in "priometh.c"
  [9] do_accepts(listen_sock = 0x8085b40, model_sock = 0x8085b40, requestCert = 2), line 1246 in "selfserv.c"
  [10] server_main(listen_sock = 0x8085b40, requestCert = 2, privKey = 0x80470c0, cert = 0x80470d4), line 1499 in "selfserv.c"
  [11] main(argc = 18, argv = 0x8047144), line 2082 in "selfserv.c"

(dbx) where t@2 (identical with other threads)
current thread: t@2
=>[1] __pollsys(0xd0568fd8, 0x1, 0xd0568fa8, 0x0), at 0xd0a60aa5 
  [2] _pollsys(0xd0568fd8, 0x1, 0xd0568fa8, 0x0), at 0xd0a55229 
  [3] _poll(0xd0568fd8, 0x1, 0x1388), at 0xd0a0a672 
  [4] pt_poll_now(op = 0xd056903c), line 599 in "ptio.c"
  [5] pt_Continue(op = 0xd056903c), line 722 in "ptio.c"
  [6] pt_Recv(fd = 0x824b360, buf = 0x81a87d0, amount = 5, flags = 0, timeout = 4294967295U), line 1863 in "ptio.c"
  [7] ssl_DefRecv(ss = 0x81a8538, buf = 0x81a87d0 "\x80^\^A", len = 5, flags = 0), line 94 in "ssldef.c"
  [8] ssl3_GatherData(ss = 0x81a8538, gs = 0x81a8790, flags = 0), line 90 in "ssl3gthr.c"
  [9] ssl3_GatherCompleteHandshake(ss = 0x81a8538, flags = 0), line 195 in "ssl3gthr.c"
  [10] ssl_GatherRecord1stHandshake(ss = 0x81a8538), line 1258 in "sslcon.c"
  [11] ssl_Do1stHandshake(ss = 0x81a8538), line 151 in "sslsecur.c"
  [12] ssl_SecureRecv(ss = 0x81a8538, buf = 0xd056951c "", len = 10239, flags = 0), line 1089 in "sslsecur.c"
  [13] ssl_SecureRead(ss = 0x81a8538, buf = 0xd056951c "", len = 10239), line 1108 in "sslsecur.c"
  [14] ssl_Read(fd = 0x824b320, buf = 0xd056951c, len = 10239), line 1452 in "sslsock.c"
  [15] PR_Read(fd = 0x824b320, buf = 0xd056951c, amount = 10239), line 141 in "priometh.c"
  [16] handle_connection(tcp_sock = 0x824b320, model_sock = 0x8085b40, requestCert = 2), line 969 in "selfserv.c"
  [17] jobLoop(a = (nil), b = (nil), c = 2), line 515 in "selfserv.c"
  [18] thread_wrapper(arg = 0x817c060), line 483 in "selfserv.c"
  [19] _pt_root(arg = 0x817c258), line 221 in "ptthread.c"
  [20] _thr_setup(0xd0aa2400), at 0xd0a5f708 
  [21] _lwp_start(), at 0xd0a5f9f0 

(dbx) where t@5
current thread: t@5
=>[1] pkix_pl_GeneralName_Destroy(object = 0x82a8b24, plContext = 0x816a848), line 502 in "pkix_pl_generalname.c"
  [2] PKIX_PL_Object_DecRef(object = 0x82a8b24, plContext = 0x816a848), line 911 in "pkix_pl_object.c"
  [3] pkix_List_Destroy(object = 0x82a571c, plContext = 0x816a848), line 121 in "pkix_list.c"
  [4] PKIX_PL_Object_DecRef(object = 0x82a571c, plContext = 0x816a848), line 911 in "pkix_pl_object.c"
  [5] pkix_List_Destroy(object = 0x82aaeac, plContext = 0x816a848), line 122 in "pkix_list.c"
  [6] PKIX_PL_Object_DecRef(object = 0x82aaeac, plContext = 0x816a848), line 911 in "pkix_pl_object.c"
  [7] pkix_ComCertSelParams_Destroy(object = 0x829a59c, plContext = 0x816a848), line 73 in "pkix_comcertselparams.c"
  [8] PKIX_PL_Object_DecRef(object = 0x829a59c, plContext = 0x816a848), line 911 in "pkix_pl_object.c"
  [9] pkix_CertSelector_Destroy(object = 0x829834c, plContext = 0x816a848), line 67 in "pkix_certselector.c"
  [10] PKIX_PL_Object_DecRef(object = 0x829834c, plContext = 0x816a848), line 911 in "pkix_pl_object.c"
  [11] pkix_ForwardBuilderState_Destroy(object = 0x8263ea4, plContext = 0x816a848), line 119 in "pkix_build.c"
  [12] PKIX_PL_Object_DecRef(object = 0x8263ea4, plContext = 0x816a848), line 911 in "pkix_pl_object.c"
  [13] PKIX_BuildChain(procParams = 0x82ba40c, pNBIOContext = 0xd024ad38, pState = 0xd024ad34, pBuildResult = 0xd024ad40, pVerifyNode = 0xd024ad3c, plContext = 0x816a848), line 4373 in "pkix_build.c"
  [14] cert_BuildAndValidateChain(procParams = 0x82ba40c, pResult = 0xd024ad80, pVerifyNode = 0xd024ad7c, plContext = 0x816a848), line 780 in "certvfypkix.c"
  [15] cert_VerifyCertChainPkix(cert = 0x82d0298, checkSig = 1, requiredUsage = certUsageSSLClient, time = 1203770408639433ULL, wincx = (nil), log = (nil), pSigerror = (nil), pRevoked = (nil)), line 1174 in "certvfypkix.c"
  [16] cert_VerifyCertChain(handle = 0x816c9e0, cert = 0x82d0298, checkSig = 1, sigerror = (nil), certUsage = certUsageSSLClient, t = 1203770408639433LL, wincx = (nil), log = (nil), revoked = (nil)), line 870 in "certvfy.c"
  [17] CERT_VerifyCertChain(handle = 0x816c9e0, cert = 0x82d0298, checkSig = 1, certUsage = certUsageSSLClient, t = 1203770408639433LL, wincx = (nil), log = (nil)), line 882 in "certvfy.c"
  [18] CERT_VerifyCert(handle = 0x816c9e0, cert = 0x82d0298, checkSig = 1, certUsage = certUsageSSLClient, t = 1203770408639433LL, wincx = (nil), log = (nil)), line 1479 in "certvfy.c"
  [19] CERT_VerifyCertNow(handle = 0x816c9e0, cert = 0x82d0298, checkSig = 1, certUsage = certUsageSSLClient, wincx = (nil)), line 1530 in "certvfy.c"
  [20] SSL_AuthCertificate(arg = 0x816c9e0, fd = 0x81a3270, checkSig = 1, isServer = 1), line 255 in "sslauth.c"
  [21] mySSLAuthCertificate(arg = 0x816c9e0, fd = 0x81a3270, checkSig = 1, isServer = 1), line 336 in "selfserv.c"
  [22] ssl3_HandleCertificate(ss = 0x81cacc8, b = 0x81f2843 "^P", length = 0), line 7135 in "ssl3con.c"
  [23] ssl3_HandleHandshakeMessage(ss = 0x81cacc8, b = 0x81f25dc "", length = 615U), line 7797 in "ssl3con.c"
  [24] ssl3_HandleHandshake(ss = 0x81cacc8, origBuf = 0x81caf24), line 7913 in "ssl3con.c"
  [25] ssl3_HandleRecord(ss = 0x81cacc8, cText = 0xd024b12c, databuf = 0x81caf24), line 8176 in "ssl3con.c"
  [26] ssl3_GatherCompleteHandshake(ss = 0x81cacc8, flags = 0), line 206 in "ssl3gthr.c"
  [27] ssl_GatherRecord1stHandshake(ss = 0x81cacc8), line 1258 in "sslcon.c"
  [28] ssl_Do1stHandshake(ss = 0x81cacc8), line 151 in "sslsecur.c"
  [29] ssl_SecureRecv(ss = 0x81cacc8, buf = 0xd024b51c "", len = 10239, flags = 0), line 1089 in "sslsecur.c"
  [30] ssl_SecureRead(ss = 0x81cacc8, buf = 0xd024b51c "", len = 10239), line 1108 in "sslsecur.c"
  [31] ssl_Read(fd = 0x81a3270, buf = 0xd024b51c, len = 10239), line 1452 in "sslsock.c"
  [32] PR_Read(fd = 0x81a3270, buf = 0xd024b51c, amount = 10239), line 141 in "priometh.c"
  [33] handle_connection(tcp_sock = 0x81a3270, model_sock = 0x8085b40, requestCert = 2), line 969 in "selfserv.c"
  [34] jobLoop(a = (nil), b = (nil), c = 2), line 515 in "selfserv.c"
  [35] thread_wrapper(arg = 0x817c0b4), line 483 in "selfserv.c"
  [36] _pt_root(arg = 0x817b5b0), line 221 in "ptthread.c"
  [37] _thr_setup(0xd0350800), at 0xd0a5f708 
  [38] _lwp_start(), at 0xd0a5f9f0 

(dbx) where t@8
current thread: t@8
=>[1] pkix_pl_PrimHashTable_GetBucketSize(ht = 0x8085290, hashCode = 186505469U, pBucketSize = 0xcff4abc8, plContext = 0x82488b0), line 552 in "pkix_pl_primhash.c"
  [2] PKIX_PL_HashTable_Add(ht = 0x81730cc, key = 0x82bb1f4, value = 0x82c51cc, plContext = 0x82488b0), line 240 in "pkix_pl_hashtable.c"
  [3] pkix_CacheCertChain_Add(targetCert = 0x82e42c4, anchors = 0x82dc89c, validityDate = 0x82bad64, buildResult = 0x82bb54c, plContext = 0x82488b0), line 807 in "pkix_tools.c"
  [4] PKIX_BuildChain(procParams = 0x81a2b2c, pNBIOContext = 0xcff4ad38, pState = 0xcff4ad34, pBuildResult = 0xcff4ad40, pVerifyNode = 0xcff4ad3c, plContext = 0x82488b0), line 4357 in "pkix_build.c"
  [5] cert_BuildAndValidateChain(procParams = 0x81a2b2c, pResult = 0xcff4ad80, pVerifyNode = 0xcff4ad7c, plContext = 0x82488b0), line 780 in "certvfypkix.c"
  [6] cert_VerifyCertChainPkix(cert = 0x82d0298, checkSig = 1, requiredUsage = certUsageSSLClient, time = 1203770408637866ULL, wincx = (nil), log = (nil), pSigerror = (nil), pRevoked = (nil)), line 1174 in "certvfypkix.c"
  [7] cert_VerifyCertChain(handle = 0x816c9e0, cert = 0x82d0298, checkSig = 1, sigerror = (nil), certUsage = certUsageSSLClient, t = 1203770408637866LL, wincx = (nil), log = (nil), revoked = (nil)), line 870 in "certvfy.c"
  [8] CERT_VerifyCertChain(handle = 0x816c9e0, cert = 0x82d0298, checkSig = 1, certUsage = certUsageSSLClient, t = 1203770408637866LL, wincx = (nil), log = (nil)), line 882 in "certvfy.c"
  [9] CERT_VerifyCert(handle = 0x816c9e0, cert = 0x82d0298, checkSig = 1, certUsage = certUsageSSLClient, t = 1203770408637866LL, wincx = (nil), log = (nil)), line 1479 in "certvfy.c"
  [10] CERT_VerifyCertNow(handle = 0x816c9e0, cert = 0x82d0298, checkSig = 1, certUsage = certUsageSSLClient, wincx = (nil)), line 1530 in "certvfy.c"
  [11] SSL_AuthCertificate(arg = 0x816c9e0, fd = 0x81a3250, checkSig = 1, isServer = 1), line 255 in "sslauth.c"
  [12] mySSLAuthCertificate(arg = 0x816c9e0, fd = 0x81a3250, checkSig = 1, isServer = 1), line 336 in "selfserv.c"
  [13] ssl3_HandleCertificate(ss = 0x824fde8, b = 0x8257493 "^P", length = 0), line 7135 in "ssl3con.c"
  [14] ssl3_HandleHandshakeMessage(ss = 0x824fde8, b = 0x825722c "", length = 615U), line 7797 in "ssl3con.c"
  [15] ssl3_HandleHandshake(ss = 0x824fde8, origBuf = 0x8250044), line 7913 in "ssl3con.c"
  [16] ssl3_HandleRecord(ss = 0x824fde8, cText = 0xcff4b12c, databuf = 0x8250044), line 8176 in "ssl3con.c"
  [17] ssl3_GatherCompleteHandshake(ss = 0x824fde8, flags = 0), line 206 in "ssl3gthr.c"
  [18] ssl_GatherRecord1stHandshake(ss = 0x824fde8), line 1258 in "sslcon.c"
  [19] ssl_Do1stHandshake(ss = 0x824fde8), line 151 in "sslsecur.c"
  [20] ssl_SecureRecv(ss = 0x824fde8, buf = 0xcff4b51c "", len = 10239, flags = 0), line 1089 in "sslsecur.c"
  [21] ssl_SecureRead(ss = 0x824fde8, buf = 0xcff4b51c "", len = 10239), line 1108 in "sslsecur.c"
  [22] ssl_Read(fd = 0x81a3250, buf = 0xcff4b51c, len = 10239), line 1452 in "sslsock.c"
  [23] PR_Read(fd = 0x81a3250, buf = 0xcff4b51c, amount = 10239), line 141 in "priometh.c"
  [24] handle_connection(tcp_sock = 0x81a3250, model_sock = 0x8085b40, requestCert = 2), line 969 in "selfserv.c"
  [25] jobLoop(a = (nil), b = (nil), c = 2), line 515 in "selfserv.c"
  [26] thread_wrapper(arg = 0x817c108), line 483 in "selfserv.c"
  [27] _pt_root(arg = 0x817b790), line 221 in "ptthread.c"
  [28] _thr_setup(0xd0351400), at 0xd0a5f708 
  [29] _lwp_start(), at 0xd0a5f9f0 

Slavo,

Could you please change the system settings on that machine (mandela) so that the core files are world-readable in the future ?

Also, it seems coreadm is not properly configured - the core file is just named "core" . If it was, you would not be able to mistake a crash of selfserv vs strsclnt like you did earlier.
FYI :

(dbx) list 530,560
  530    *  Not Thread Safe - assumes exclusive access to "ht"
  531    *  (see Thread Safety Definitions in Programmer's Guide)
  532    * RETURNS:
  533    *  Returns NULL if the function succeeds.
  534    *  Returns a HashTable Error if the function fails in a non-fatal way.
  535    *  Returns a Fatal Error if the function fails in an unrecoverable way.
  536    */
  537   PKIX_Error *
  538   pkix_pl_PrimHashTable_GetBucketSize(
  539           pkix_pl_PrimHashTable *ht,
  540           PKIX_UInt32 hashCode,
  541           PKIX_UInt32 *pBucketSize,
  542           void *plContext)
  543   {
  544           pkix_pl_HT_Elem **elemPtr = NULL;
  545           pkix_pl_HT_Elem *element = NULL;
  546           PKIX_UInt32 bucketSize = 0;
  547
  548           PKIX_ENTER(HASHTABLE, "pkix_pl_PrimHashTable_GetBucketSize");
  549           PKIX_NULLCHECK_TWO(ht, pBucketSize);
  550
  551           for (elemPtr = &((ht->buckets)[hashCode%ht->size]), element = *elemPtr;
  552               element != NULL; elemPtr = &(element->next), element = *elemPtr) {
  553                   bucketSize++;
  554           }
  555
  556           *pBucketSize = bucketSize;
  557
  558           PKIX_RETURN(HASHTABLE);
  559   }
  560
(dbx) p ht
ht = 0x8085290
(dbx) p bucketSize
bucketSize = 2U
(dbx) p *ht
*ht = {
    buckets = 0x8173140
    size    = 32U
}
(dbx) p hashCode
hashCode = 186505469U
(dbx) p elemPtr
elemPtr = 0x30
(dbx) p element
element = 0x24
(dbx) examine ht->buckets,&ht->buckets[31]
0x08173140:      0x00000000 0x00000000 0x00000000 0x00000000
0x08173150:      0x00000000 0x00000000 0x00000000 0x00000000
0x08173160:      0x00000000 0x00000000 0x00000000 0x00000000
0x08173170:      0x00000000 0x00000000 0x00000000 0x00000000
0x08173180:      0x00000000 0x00000000 0x00000000 0x00000000
0x08173190:      0x00000000 0x00000000 0x00000000 0x00000000
0x081731a0:      0x00000000 0x00000000 0x00000000 0x00000000
0x081731b0:      0x00000000 0x08209eb0 0x00000000 0x00000000
(dbx) p *ht->buckets[29]
*ht->buckets[29] = {
    key      = 0x82d184c
    value    = 0x82c5194
    hashCode = 186505469U
    next     = (nil)
}
(dbx) p hashCode
hashCode = 186505469U
(dbx) p hashCode % ht->size
hashCode % ht->size = 29U
(dbx)

I don't see anything obviously wrong in this code even though it's not the most readable. The way I see this , with the current content of the hash table, I don't think bucketSize should have been incremented to 2, because there is only one element in the bucket. My guess is that this hash table is unsafe and was modified from another thread while being read in this one.
(In reply to comment #5)
> Slavo,
> 
> Could you please change the system settings on that machine (mandela) so that
> the core files are world-readable in the future ?

Core can be found at http://cindercone.red.iplanet.com/share/builds/mccrel3/security/securitytip/builds/20080223.1/wozzeck_Solaris8/mozilla/tests_results/security/mandela.1/pkix/client/core
 
> Also, it seems coreadm is not properly configured - the core file is just named
> "core" . If it was, you would not be able to mistake a crash of selfserv vs
> strsclnt like you did earlier.

There are all core files in /export/tests and the same core is named core.selfserv.27343 there.
Thank you, Julien, for your analysis. It is thread unsafe operation.
PKIX_PL_HashTable_Add does not holding the hash table lock when calling the function pkix_pl_PrimHashTable_GetBucketSize. 
Priority: -- → P1
Whiteboard: PKIX NSS312B2
Target Milestone: --- → 3.12
Alexei,

Great.

Slavo,

Re: comment 7,

Yes, I know where the core is. I am wondering why it wasn't called core.selfserv.27343 over there . A filename of "core" is not useful since it keeps getting overwritten. And I had to change the file permissions manually to run dbx as myself. The user svbld does not have dbx in its PATH so I needed to do that.

All core files in /export/test are readable only by root. Please change that so that new core files are world readable in the future. Also, AFAIK, the NSS QA test script does not check for cores in /export/test . It is a OK to keep a copy of the cores there, but to make it easier for us to debug, there should also be a copy in mozilla/tests_results with the right names and permissions.
Comment on attachment 308042 [details] [diff] [review]
Set lock before calling pkix_pl_PrimHashTable_GetBucketSize

r=nelson
Attachment #308042 - Flags: review?(nelson) → review+
Summary: Selfserv crashed in PKIX tests. → thread unsafe operation in PKIX_PL_HashTable_Add cause selfserv to crash crashed.
Summary: thread unsafe operation in PKIX_PL_HashTable_Add cause selfserv to crash crashed. → thread unsafe operation in PKIX_PL_HashTable_Add cause selfserv to crash.
Patch is integrated. Slavo please reopen the bug if you see the failure again.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: