Closed Bug 270689 Opened 21 years ago Closed 21 years ago

race condition in softoken list lock creation

Categories

(NSS :: Tools, defect, P2)

3.9.4
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jason.m.reid, Assigned: julien.pierre)

Details

Attachments

(3 files, 1 obsolete file)

User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.4) Gecko/20040414 Build Identifier: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.4) Gecko/20040414 The program strsclnt had an assertion failure on 20041117 nightly build in the ssl_simple_rootca NISCC test. I have attached the logs, NISCC code fragment, and stack trace. When I reran this test, it passed normally. mahatma[svbld]:/niscc/archive/20041117.1> file core.061805 core.061805: ELF 32-bit LSB core file 80386 Version 1, from 'strsclnt' mahatma[svbld]:/niscc/archive/20041117.1> cat nisccLog05 using /niscc/ssl/testcases/simple_rootca/00000002 using /niscc/ssl/testcases/simple_rootca/00000004 using /niscc/ssl/testcases/simple_rootca/00000001 using /niscc/ssl/testcases/simple_rootca/00000003 using /niscc/ssl/testcases/simple_rootca/00000005 selfserv: normal termination mahatma[svbld]:/niscc/archive/20041117.1> cat nisccLog06 Assertion failure: _PT_PTHREAD_MUTEX_IS_LOCKED(lock->mutex), at ../../../../pr/src/pthreads/ptsynch.c:207 Bad server certificate: -8156, Issuer certificate is invalid. subject DN: CN=Simple Server,O=Simple Server,C=GB issuer DN: CN=Simple Root Authority,O=Simple Root Authority,C=GB 0 cache hits; 1 cache misses, 0 cache not reusable HTTP/1.0 200 OK Server: Generic Web Server Date: Tue, 26 Aug 1997 22:10:05 GMT Content-type: text/plain GET /stop HTTP/1.0 EOF tstclnt: write to SSL socket failed: TCP connection reset by peer. tstclnt: unable to connect: Connection refused by peer. tstclnt: unable to connect: Connection refused by peer. tstclnt: unable to connect: Connection refused by peer. mahatma[svbld]:/niscc/archive/20041117.1> # NISCC code fragment ssl_simple_rootca() { CLIENT=${NISCC_HOME}/niscc_ssl/simple_client; export CLIENT SERVER=${NISCC_HOME}/niscc_ssl/simple_server; export SERVER PORT=8445; export PORT START_AT=1; export START_AT STOP_AT=106190; export STOP_AT LD_LIBRARY_PATH=${HACKLIB}; export LD_LIBRARY_PATH NISCC_TEST=$TEST/simple_rootca; export NISCC_TEST ${HACKBIN}/selfserv -p $PORT -d $SERVER -n server_crt -t 5 -w test > \ $NISCC_HOME/nisccLog05 2>&1 & unset NISCC_TEST; export NISCC_TEST LD_LIBRARY_PATH=${TESTLIB}; export LD_LIBRARY_PATH ${TESTBIN}/strsclnt -d $CLIENT -p $PORT -t 4 -c 106190 -o -N $HOSTNAME > \ $NISCC_HOME/nisccLog06 2>&1 echo "starting tstclnt to shutdown simple rootca selfserv process" ${TESTBIN}/tstclnt -h $HOSTNAME -p $PORT -d $CLIENT -n client_crt -o -f \ -w test < $CLIENT/stop.txt >> nisccLog06 2>&1 ${TESTBIN}/tstclnt -h $HOSTNAME -p $PORT -d $CLIENT -n client_crt -o -f \ -w test < $CLIENT/stop.txt >> nisccLog06 2>&1 ${TESTBIN}/tstclnt -h $HOSTNAME -p $PORT -d $CLIENT -n client_crt -o -f \ -w test < $CLIENT/stop.txt >> nisccLog06 2>&1 ${TESTBIN}/tstclnt -h $HOSTNAME -p $PORT -d $CLIENT -n client_crt -o -f \ -w test < $CLIENT/stop.txt >> nisccLog06 2>&1 ${TESTBIN}/tstclnt -h $HOSTNAME -p $PORT -d $CLIENT -n client_crt -o -f \ -w test < $CLIENT/stop.txt >> nisccLog06 2>&1 unset LD_LIBRARY_PATH sleep 10 find_core } mahatma[svbld]:/niscc/archive/20041117.1> /usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx /share/builds/mccrel3/security/security39/builds/20041117.1/wozzeck_Solaris8/mozilla/dist/SunOS5.10_i86pc_DBG.OBJ/bin/strsclnt core.061805 Reading strsclnt core file header read successfully Reading ld.so.1 Reading libssl3.so Reading libsmime3.so Reading libnss3.so Reading libplc4.so Reading libplds4.so Reading libnspr4.so Reading libthread.so.1 Reading libnsl.so.1 Reading libsocket.so.1 Reading librt.so.1 Reading libdl.so.1 Reading libc.so.1 Reading libsoftokn3.so Reading libpthread.so.1 Reading libaio.so.1 Reading libmd5.so.1 Reading libmp.so.2 detected a multithreaded program t@0 (l@1) terminated by signal KILL (Killed) 0xd0ff1979: __lwp_park+0x0019: jae __lwp_park+0x27 <0xd0ff1987> Current function is PR_WaitCondVar 389 rv = pthread_cond_wait(&cvar->cv, &cvar->lock->mutex); (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) whre whre: not found (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) where [1] __lwp_park(0x808a8c4, 0x80bf3e8), at 0xd0ff1979 [2] cond_sleep_queue(0x808a8c4, 0x80bf3e8, 0x0), at 0xd0febff1 [3] cond_wait_queue(0x808a8c4, 0x80bf3e8, 0x0, 0x0), at 0xd0fec103 [4] _cond_wait(0x808a8c4, 0x80bf3e8), at 0xd0fec60c [5] cond_wait(0x808a8c4, 0x80bf3e8), at 0xd0fec651 [6] _pthread_cond_wait(0x808a8c4, 0x80bf3e8), at 0xd0fec68b =>[7] PR_WaitCondVar(cvar = 0x808a8c0, timeout = 4294967295U), line 389 in "ptsynch.c" [8] launch_thread(startFunc = 0x8052588 = &do_connects(void *a, void *b, int connection), a = 0x80475a0, b = 0x808a780, c = 5), line 416 in "strsclnt.c" [9] client_main(port = 8445U, connections = 106186, Cert_And_Key = 0x8047654, hostName = 0x808a620 "mahatma.red.iplanet.com"), line 1136 in "strsclnt.c" [10] main(argc = 12, argv = 0x80476cc), line 1309 in "strsclnt.c" (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) Reproducible: Couldn't Reproduce Steps to Reproduce: 1. 2. 3. Here is the thread stack trace. mahatma[svbld]:/niscc/archive/20041117.1> /usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx /share/builds/mccrel3/security/security39/builds/20041117.1/wozzeck_Solaris8/mozilla/dist/SunOS5.10_i86pc_DBG.OBJ/bin/strsclnt core.061805 Reading strsclnt core file header read successfully Reading ld.so.1 Reading libssl3.so Reading libsmime3.so Reading libnss3.so Reading libplc4.so Reading libplds4.so Reading libnspr4.so Reading libthread.so.1 Reading libnsl.so.1 Reading libsocket.so.1 Reading librt.so.1 Reading libdl.so.1 Reading libc.so.1 Reading libsoftokn3.so Reading libpthread.so.1 Reading libaio.so.1 Reading libmd5.so.1 Reading libmp.so.2 detected a multithreaded program t@0 (l@1) terminated by signal KILL (Killed) 0xd0ff1979: __lwp_park+0x0019: jae __lwp_park+0x27 <0xd0ff1987> Current function is PR_WaitCondVar 389 rv = pthread_cond_wait(&cvar->cv, &cvar->lock->mutex); (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) threads t@1 a l@1 ?() sleep on 0x808a8c4 in __lwp_park() t@2 a l@2 _pt_root() sleep on 0x80d4208 in __lwp_park() o t@3 a l@3 _pt_root() signal SIGABRT in _lwp_kill() t@4 a l@4 _pt_root() sleep on 0x808cbf8 in __lwp_park() t@5 a l@5 _pt_root() sleep on 0x80d4208 in __lwp_park() (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) three t@3 three: not found (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) thread t@3 Current function is PR_Assert 530 abort(); t@3 (l@3) stopped in _lwp_kill at 0xd0ff2e05 0xd0ff2e05: _lwp_kill+0x0015: jae _lwp_kill+0x23 <0xd0ff2e13> (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) where current thread: t@3 [1] _lwp_kill(0x3, 0x6), at 0xd0ff2e05 [2] _thr_kill(0x3, 0x6), at 0xd0fefe1d [3] raise(0x6), at 0xd0f9ccaf [4] abort(), at 0xd0f7f1cd =>[5] PR_Assert(s = 0xd11911b0 "_PT_PTHREAD_MUTEX_IS_LOCKED(lock->mutex)", file = 0xd11910e8 "../../../../pr/src/pthreads/ptsynch.c", ln = 207), line 530 in "prlog.c" [6] PR_Unlock(lock = 0x80d4208), line 207 in "ptsynch.c" [7] pk11_GetObjectFromList(hasLocks = 0xd0d2f8b8, optimizeSpace = 0, list = 0xd0f1e6c4, hashSize = 32U, isSessionObject = 1), line 2101 in "pkcs11u.c" [8] pk11_NewObject(slot = 0x808d8a0), line 2208 in "pkcs11u.c" [9] NSC_GenerateKey(hSession = 18U, pMechanism = 0xd0d2fa64, pTemplate = 0xd0d2fa7c, ulCount = 2U, phKey = 0x8092214), line 2768 in "pkcs11c.c" [10] PK11_TokenKeyGen(slot = 0x80ae7f0, type = 880U, param = 0xd0d2fb34, keySize = 0, keyid = (nil), isToken = 0, wincx = (nil)), line 1554 in "pk11skey.c" [11] PK11_KeyGen(slot = 0x80ae7f0, type = 880U, param = 0xd0d2fb34, keySize = 0, wincx = (nil)), line 1576 in "pk11skey.c" [12] ssl3_GenerateRSAPMS(ss = 0x80c0d60, spec = 0x8106c58, serverKeySlot = (nil)), line 6861 in "ssl3con.c" [13] sendRSAClientKeyExchange(ss = 0x80c0d60, svrPubKey = 0x81255c0), line 3587 in "ssl3con.c" [14] ssl3_SendClientKeyExchange(ss = 0x80c0d60), line 4290 in "ssl3con.c" [15] ssl3_HandleServerHelloDone(ss = 0x80c0d60), line 5328 in "ssl3con.c" [16] ssl3_HandleHandshakeMessage(ss = 0x80c0d60, b = 0x8108fbe "", length = 0), line 8077 in "ssl3con.c" [17] ssl3_HandleHandshake(ss = 0x80c0d60, origBuf = 0x80c0f04), line 8165 in "ssl3con.c" [18] ssl3_HandleRecord(ss = 0x80c0d60, cText = 0xd0d2fd54, databuf = 0x80c0f04), line 8433 in "ssl3con.c" [19] ssl3_GatherCompleteHandshake(ss = 0x80c0d60, flags = 0), line 203 in "ssl3gthr.c" [20] ssl_GatherRecord1stHandshake(ss = 0x80c0d60), line 1260 in "sslcon.c" [21] ssl_Do1stHandshake(ss = 0x80c0d60), line 149 in "sslsecur.c" [22] ssl_SecureSend(ss = 0x80c0d60, buf = 0x806507c "GET /abc HTTP/1.0^M\n^M\n", len = 21, flags = 0), line 1033 in "sslsecur.c" [23] ssl_SecureWrite(ss = 0x80c0d60, buf = 0x806507c "GET /abc HTTP/1.0^M\n^M\n", len = 21), line 1067 in "sslsecur.c" [24] ssl_Write(fd = 0x808a9a0, buf = 0x806507c, len = 21), line 1286 in "sslsock.c" [25] PR_Write(fd = 0x808a9a0, buf = 0x806507c, amount = 21), line 143 in "priometh.c" [26] handle_connection(ssl_sock = 0x808a9a0, connection = 2), line 675 in "strsclnt.c" [27] do_connects(a = 0x80475a0, b = 0x808a780, connection = 2), line 802 in "strsclnt.c" [28] thread_wrapper(arg = 0x8087630), line 383 in "strsclnt.c" [29] _pt_root(arg = 0x80bf4e8), line 214 in "ptthread.c" [30] _thr_setup(0xd0dd8800), at 0xd0ff1730 [31] _lwp_start(), at 0xd0ff18f0 (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx)
Priority: -- → P2
Target Milestone: --- → 3.9.5
Version: unspecified → 3.9.4
The code in question looks like this: 2084 pk11_GetObjectFromList(PRBool *hasLocks, PRBool optimizeSpace, 2085 PK11ObjectFreeList *list, unsigned int hashSize, PRBool isSessionOb ject) 2086 { 2087 PK11Object *object; 2088 int size = 0; 2089 2090 if (!optimizeSpace) { 2091 if (list->lock == NULL) { 2092 list->lock = PZ_NewLock(nssILockObject); 2093 } 2094 2095 PK11_USE_THREADS(PZ_Lock(list->lock)); 2096 object = list->head; 2097 if (object) { 2098 list->head = object->next; 2099 list->count--; 2100 } 2101 PK11_USE_THREADS(PZ_Unlock(list->lock)); <=== 2102 if (object) { 2103 object->next = object->prev = NULL; 2104 *hasLocks = PR_TRUE; 2105 return object; 2106 } 2107 } The creation of list->lock is not thread-safe. Jason, could you get the stacks of all the other threads in that core file?
mahatma[svbld]:/niscc/archive/20041117.1> /usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx /share/builds/mccrel3/security/security39/builds/20041117.1/wozzeck_Solaris8/mozilla/dist/SunOS5.10_i86pc_DBG.OBJ/bin/strsclnt core.061805 Reading strsclnt core file header read successfully Reading ld.so.1 Reading libssl3.so Reading libsmime3.so Reading libnss3.so Reading libplc4.so Reading libplds4.so Reading libnspr4.so Reading libthread.so.1 Reading libnsl.so.1 Reading libsocket.so.1 Reading librt.so.1 Reading libdl.so.1 Reading libc.so.1 Reading libsoftokn3.so Reading libpthread.so.1 Reading libaio.so.1 Reading libmd5.so.1 Reading libmp.so.2 detected a multithreaded program t@0 (l@1) terminated by signal KILL (Killed) 0xd0ff1979: __lwp_park+0x0019: jae __lwp_park+0x27 <0xd0ff1987> Current function is PR_WaitCondVar 389 rv = pthread_cond_wait(&cvar->cv, &cvar->lock->mutex); (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) threads t@1 a l@1 ?() sleep on 0x808a8c4 in __lwp_park() t@2 a l@2 _pt_root() sleep on 0x80d4208 in __lwp_park() o t@3 a l@3 _pt_root() signal SIGABRT in _lwp_kill() t@4 a l@4 _pt_root() sleep on 0x808cbf8 in __lwp_park() t@5 a l@5 _pt_root() sleep on 0x80d4208 in __lwp_park() (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) thread t@1 Current function is PR_WaitCondVar 389 rv = pthread_cond_wait(&cvar->cv, &cvar->lock->mutex); t@1 (l@1) stopped in __lwp_park at 0xd0ff1979 0xd0ff1979: __lwp_park+0x0019: jae __lwp_park+0x27 <0xd0ff1987> (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) where current thread: t@1 [1] __lwp_park(0x808a8c4, 0x80bf3e8), at 0xd0ff1979 [2] cond_sleep_queue(0x808a8c4, 0x80bf3e8, 0x0), at 0xd0febff1 [3] cond_wait_queue(0x808a8c4, 0x80bf3e8, 0x0, 0x0), at 0xd0fec103 [4] _cond_wait(0x808a8c4, 0x80bf3e8), at 0xd0fec60c [5] cond_wait(0x808a8c4, 0x80bf3e8), at 0xd0fec651 [6] _pthread_cond_wait(0x808a8c4, 0x80bf3e8), at 0xd0fec68b =>[7] PR_WaitCondVar(cvar = 0x808a8c0, timeout = 4294967295U), line 389 in "ptsynch.c" [8] launch_thread(startFunc = 0x8052588 = &do_connects(void *a, void *b, int connection), a = 0x80475a0, b = 0x808a780, c = 5), line 416 in "strsclnt.c" [9] client_main(port = 8445U, connections = 106186, Cert_And_Key = 0x8047654, hostName = 0x808a620 "mahatma.red.iplanet.com"), line 1136 in "strsclnt.c" [10] main(argc = 12, argv = 0x80476cc), line 1309 in "strsclnt.c" (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) thread t@2 Current function is PR_Lock 190 rv = pthread_mutex_lock(&lock->mutex); t@2 (l@2) stopped in __lwp_park at 0xd0ff1979 0xd0ff1979: __lwp_park+0x0019: jae __lwp_park+0x27 <0xd0ff1987> (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) where current thread: t@2 [1] __lwp_park(0xd0dd8400, 0x0), at 0xd0ff1979 [2] mutex_lock_queue(0xd0dd8400, 0x0, 0x80d4208, 0x0), at 0xd0fea904 [3] slow_lock(0xd0dd8400, 0x80d4208, 0x0), at 0xd0feb1ad [4] mutex_lock_impl(0x80d4208, 0x0), at 0xd0feb2a3 [5] pthread_mutex_lock(0x80d4208), at 0xd0feb3b1 =>[6] PR_Lock(lock = 0x80d4208), line 190 in "ptsynch.c" [7] pk11_GetObjectFromList(hasLocks = 0xd0d4f9bc, optimizeSpace = 0, list = 0xd0f1e6c4, hashSize = 32U, isSessionObject = 1), line 2095 in "pkcs11u.c" [8] pk11_NewObject(slot = 0x808d8a0), line 2208 in "pkcs11u.c" [9] NSC_CreateObject(hSession = 1U, pTemplate = 0xd0d4fa74, ulCount = 8U, phObject = 0xd0d4faec), line 3658 in "pkcs11.c" [10] PK11_CreateNewObject(slot = 0x80ae7f0, session = 0, theTemplate = 0xd0d4fa74, count = 8, token = 0, objectID = 0xd0d4faec), line 138 in "pk11skey.c" [11] PK11_ImportPublicKey(slot = 0x80ae7f0, pubKey = 0x811cd18, isToken = 0), line 628 in "pk11skey.c" [12] PK11_PubWrapSymKey(type = 1U, pubKey = 0x811cd18, symKey = 0x80e6c80, wrappedKey = 0xd0d4fb74), line 2437 in "pk11skey.c" [13] sendRSAClientKeyExchange(ss = 0x80bf6c8, svrPubKey = 0x811cd18), line 3602 in "ssl3con.c" [14] ssl3_SendClientKeyExchange(ss = 0x80bf6c8), line 4290 in "ssl3con.c" [15] ssl3_HandleServerHelloDone(ss = 0x80bf6c8), line 5328 in "ssl3con.c" [16] ssl3_HandleHandshakeMessage(ss = 0x80bf6c8, b = 0x811205d "", length = 0), line 8077 in "ssl3con.c" [17] ssl3_HandleHandshake(ss = 0x80bf6c8, origBuf = 0x80bf86c), line 8165 in "ssl3con.c" [18] ssl3_HandleRecord(ss = 0x80bf6c8, cText = 0xd0d4fd54, databuf = 0x80bf86c), line 8433 in "ssl3con.c" [19] ssl3_GatherCompleteHandshake(ss = 0x80bf6c8, flags = 0), line 203 in "ssl3gthr.c" [20] ssl_GatherRecord1stHandshake(ss = 0x80bf6c8), line 1260 in "sslcon.c" [21] ssl_Do1stHandshake(ss = 0x80bf6c8), line 149 in "sslsecur.c" [22] ssl_SecureSend(ss = 0x80bf6c8, buf = 0x806507c "GET /abc HTTP/1.0^M\n^M\n", len = 21, flags = 0), line 1033 in "sslsecur.c" [23] ssl_SecureWrite(ss = 0x80bf6c8, buf = 0x806507c "GET /abc HTTP/1.0^M\n^M\n", len = 21), line 1067 in "sslsecur.c" [24] ssl_Write(fd = 0x808a900, buf = 0x806507c, len = 21), line 1286 in "sslsock.c" [25] PR_Write(fd = 0x808a900, buf = 0x806507c, amount = 21), line 143 in "priometh.c" [26] handle_connection(ssl_sock = 0x808a900, connection = 1), line 675 in "strsclnt.c" [27] do_connects(a = 0x80475a0, b = 0x808a780, connection = 1), line 802 in "strsclnt.c" [28] thread_wrapper(arg = 0x8087610), line 383 in "strsclnt.c" [29] _pt_root(arg = 0x80bf448), line 214 in "ptthread.c" [30] _thr_setup(0xd0dd8400), at 0xd0ff1730 [31] _lwp_start(), at 0xd0ff18f0 (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) thread t@4 Current function is PR_Lock 190 rv = pthread_mutex_lock(&lock->mutex); t@4 (l@4) stopped in __lwp_park at 0xd0ff1979 0xd0ff1979: __lwp_park+0x0019: jae __lwp_park+0x27 <0xd0ff1987> (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) where current thread: t@4 [1] __lwp_park(0xd0dd8c00, 0x0), at 0xd0ff1979 [2] mutex_lock_queue(0xd0dd8c00, 0x0, 0x808cbf8, 0x0), at 0xd0fea904 [3] slow_lock(0xd0dd8c00, 0x808cbf8, 0x0), at 0xd0feb1ad [4] mutex_lock_impl(0x808cbf8, 0x0), at 0xd0feb2a3 [5] pthread_mutex_lock(0x808cbf8), at 0xd0feb3b1 =>[6] PR_Lock(lock = 0x808cbf8), line 190 in "ptsynch.c" [7] PK11_EnterSlotMonitor(slot = 0x80ae7f0), line 526 in "pk11slot.c" [8] PK11_CreateNewObject(slot = 0x80ae7f0, session = 0, theTemplate = 0xd0d0fa74, count = 8, token = 0, objectID = 0xd0d0faec), line 135 in "pk11skey.c" [9] PK11_ImportPublicKey(slot = 0x80ae7f0, pubKey = 0x811a418, isToken = 0), line 628 in "pk11skey.c" [10] PK11_PubWrapSymKey(type = 1U, pubKey = 0x811a418, symKey = 0x80d6510, wrappedKey = 0xd0d0fb74), line 2437 in "pk11skey.c" [11] sendRSAClientKeyExchange(ss = 0x80c2468, svrPubKey = 0x811a418), line 3602 in "ssl3con.c" [12] ssl3_SendClientKeyExchange(ss = 0x80c2468), line 4290 in "ssl3con.c" [13] ssl3_HandleServerHelloDone(ss = 0x80c2468), line 5328 in "ssl3con.c" [14] ssl3_HandleHandshakeMessage(ss = 0x80c2468, b = 0x810d855 "", length = 0), line 8077 in "ssl3con.c" [15] ssl3_HandleHandshake(ss = 0x80c2468, origBuf = 0x80c260c), line 8165 in "ssl3con.c" [16] ssl3_HandleRecord(ss = 0x80c2468, cText = 0xd0d0fd54, databuf = 0x80c260c), line 8433 in "ssl3con.c" [17] ssl3_GatherCompleteHandshake(ss = 0x80c2468, flags = 0), line 203 in "ssl3gthr.c" [18] ssl_GatherRecord1stHandshake(ss = 0x80c2468), line 1260 in "sslcon.c" [19] ssl_Do1stHandshake(ss = 0x80c2468), line 149 in "sslsecur.c" [20] ssl_SecureSend(ss = 0x80c2468, buf = 0x806507c "GET /abc HTTP/1.0^M\n^M\n", len = 21, flags = 0), line 1033 in "sslsecur.c" [21] ssl_SecureWrite(ss = 0x80c2468, buf = 0x806507c "GET /abc HTTP/1.0^M\n^M\n", len = 21), line 1067 in "sslsecur.c" [22] ssl_Write(fd = 0x808a940, buf = 0x806507c, len = 21), line 1286 in "sslsock.c" [23] PR_Write(fd = 0x808a940, buf = 0x806507c, amount = 21), line 143 in "priometh.c" [24] handle_connection(ssl_sock = 0x808a940, connection = 3), line 675 in "strsclnt.c" [25] do_connects(a = 0x80475a0, b = 0x808a780, connection = 3), line 802 in "strsclnt.c" [26] thread_wrapper(arg = 0x8087650), line 383 in "strsclnt.c" [27] _pt_root(arg = 0x80bf588), line 214 in "ptthread.c" [28] _thr_setup(0xd0dd8c00), at 0xd0ff1730 [29] _lwp_start(), at 0xd0ff18f0 (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) thread t@5 Current function is PR_Lock 190 rv = pthread_mutex_lock(&lock->mutex); t@5 (l@5) stopped in __lwp_park at 0xd0ff1979 0xd0ff1979: __lwp_park+0x0019: jae __lwp_park+0x27 <0xd0ff1987> (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) where current thread: t@5 [1] __lwp_park(0xd0dd9000, 0x0), at 0xd0ff1979 [2] mutex_lock_queue(0xd0dd9000, 0x0, 0x80d4208, 0x0), at 0xd0fea904 [3] slow_lock(0xd0dd9000, 0x80d4208, 0x0), at 0xd0feb1ad [4] mutex_lock_impl(0x80d4208, 0x0), at 0xd0feb2a3 [5] pthread_mutex_lock(0x80d4208), at 0xd0feb3b1 =>[6] PR_Lock(lock = 0x80d4208), line 190 in "ptsynch.c" [7] pk11_GetObjectFromList(hasLocks = 0xd0cef8b8, optimizeSpace = 0, list = 0xd0f1e6c4, hashSize = 32U, isSessionObject = 1), line 2095 in "pkcs11u.c" [8] pk11_NewObject(slot = 0x808d8a0), line 2208 in "pkcs11u.c" [9] NSC_GenerateKey(hSession = 21U, pMechanism = 0xd0cefa64, pTemplate = 0xd0cefa7c, ulCount = 2U, phKey = 0x811d5ec), line 2768 in "pkcs11c.c" [10] PK11_TokenKeyGen(slot = 0x80ae7f0, type = 880U, param = 0xd0cefb34, keySize = 0, keyid = (nil), isToken = 0, wincx = (nil)), line 1554 in "pk11skey.c" [11] PK11_KeyGen(slot = 0x80ae7f0, type = 880U, param = 0xd0cefb34, keySize = 0, wincx = (nil)), line 1576 in "pk11skey.c" [12] ssl3_GenerateRSAPMS(ss = 0x80c20e8, spec = 0x81060d8, serverKeySlot = (nil)), line 6861 in "ssl3con.c" [13] sendRSAClientKeyExchange(ss = 0x80c20e8, svrPubKey = 0x8124580), line 3587 in "ssl3con.c" [14] ssl3_SendClientKeyExchange(ss = 0x80c20e8), line 4290 in "ssl3con.c" [15] ssl3_HandleServerHelloDone(ss = 0x80c20e8), line 5328 in "ssl3con.c" [16] ssl3_HandleHandshakeMessage(ss = 0x80c20e8, b = 0x811e0bd "", length = 0), line 8077 in "ssl3con.c" [17] ssl3_HandleHandshake(ss = 0x80c20e8, origBuf = 0x80c228c), line 8165 in "ssl3con.c" [18] ssl3_HandleRecord(ss = 0x80c20e8, cText = 0xd0cefd54, databuf = 0x80c228c), line 8433 in "ssl3con.c" [19] ssl3_GatherCompleteHandshake(ss = 0x80c20e8, flags = 0), line 203 in "ssl3gthr.c" [20] ssl_GatherRecord1stHandshake(ss = 0x80c20e8), line 1260 in "sslcon.c" [21] ssl_Do1stHandshake(ss = 0x80c20e8), line 149 in "sslsecur.c" [22] ssl_SecureSend(ss = 0x80c20e8, buf = 0x806507c "GET /abc HTTP/1.0^M\n^M\n", len = 21, flags = 0), line 1033 in "sslsecur.c" [23] ssl_SecureWrite(ss = 0x80c20e8, buf = 0x806507c "GET /abc HTTP/1.0^M\n^M\n", len = 21), line 1067 in "sslsecur.c" [24] ssl_Write(fd = 0x808a980, buf = 0x806507c, len = 21), line 1286 in "sslsock.c" [25] PR_Write(fd = 0x808a980, buf = 0x806507c, amount = 21), line 143 in "priometh.c" [26] handle_connection(ssl_sock = 0x808a980, connection = 4), line 675 in "strsclnt.c" [27] do_connects(a = 0x80475a0, b = 0x808a780, connection = 4), line 802 in "strsclnt.c" [28] thread_wrapper(arg = 0x8087670), line 383 in "strsclnt.c" [29] _pt_root(arg = 0x80bf628), line 214 in "ptthread.c" [30] _thr_setup(0xd0dd9000), at 0xd0ff1730 [31] _lwp_start(), at 0xd0ff18f0 (/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx)
Two other threads (t@2 and t@5) were also in pk11_GetObjectFromList at that time. They were at line 2095 of pkcs11u.c. This supports the theory that some other thread also saw a null list->lock, created list->lock, and overwrote the list->lock created by the thread that hit the assertion failure (t@3).
Status: NEW → ASSIGNED
Assigned the bug to Julien.
Assignee: wchang0222 → julien.pierre.bugs
Status: ASSIGNED → NEW
The same problem exists with both tokenObjectList and sessionObjectList . Those lists have their locks initialized too late. Fortunately, this problem can only show up in a multithreaded client application. In the server case, both lists get initialized early . The token list gets initialized when the server looks up its server cert while the server is still single-threaded. The session list gets initialized when the server does the RSA stepdown key generation, also while the server is still single-threaded. This is the only instance that causes this list to be initialized before the server starts worker threads and is ready to accept a connection. So at least now we can justify all the cycles spent doing that keygen - it prevents this race condition ;) Patch forthcoming, for 3.9.5 .
OS: SunOS → All
Hardware: PC → All
Attachment #166830 - Flags: superreview?(rrelyea0264)
Attachment #166830 - Flags: review?(nelson)
Comment on attachment 166830 [details] [diff] [review] Initialize list locks I think this patch is fine, as far as it goes, so I'm giving it r+. But while you're at it, maybe you should initialize ALL of softoken's free list locks. Your new function pk11_InitFreeLists() appears to be the complement to the existing function: pk11_CleanupFreeLists(). Perhaps you should also create (and call) a new function to complement nsslowcert_DestroyFreeLists() and/or nsslowcert_DestroyGlobalLocks() which are both called immediately after the call to pk11_CleanupFreeLists() However, that is just a suggestion.
Attachment #166830 - Flags: review?(nelson) → review+
Comment on attachment 166830 [details] [diff] [review] Initialize list locks >+static void >+pk11_InitFreeList(PK11ObjectFreeList *list) >+{ >+ list->lock = PZ_NewLock(nssILockObject); I'm wondering if this should be wrapped in a PK11_USE_THREADS macro.
Wan-Teh, I don't know. The original code that initialized this lock lazily did not have this macro. Bob ? Nelson, Thanks for noting that. I think fixing the other lists should be done in additional init functions for each source, and be tracked in a different bug.
re: PK11_USE_THREADS There is both a macro and and ifdef. The macro is for single lines (usually lock and unlock). The ifdef (PKCS11_USE_THREADS) is for blocks of code. Currently all the lock variables are conditionally compiled, so it's probably best to keep up the ifdefs. On the other hand, I doubt that we can actually build without PKCS11_USE_THREADS undefined. It may be time to remove them from the source code. bob
I added an ifdef for PKCS11_USE_THREADS and checked in this patch to the tip. Closing the bug . Checking in pkcs11.c; /cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v <-- pkcs11.c new revision: 1.96; previous revision: 1.95 done Checking in pkcs11i.h; /cvsroot/mozilla/security/nss/lib/softoken/pkcs11i.h,v <-- pkcs11i.h new revision: 1.32; previous revision: 1.31 done Checking in pkcs11u.c; /cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v <-- pkcs11u.c new revision: 1.59; previous revision: 1.58 done
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Summary: strsclnt assertion failure during NISCC run → race condition in softoken list lock creation
Also checked in to NSS_3_9_BRANCH for 3.9.5 . Checking in pkcs11.c; /cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v <-- pkcs11.c new revision: 1.88.4.2; previous revision: 1.88.4.1 done Checking in pkcs11i.h; /cvsroot/mozilla/security/nss/lib/softoken/pkcs11i.h,v <-- pkcs11i.h new revision: 1.30.16.1; previous revision: 1.30 done Checking in pkcs11u.c; /cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v <-- pkcs11u.c new revision: 1.54.2.4; previous revision: 1.54.2.3 done
Attachment #166830 - Flags: review+
Attachment #166830 - Flags: superreview?(rrelyea)
Mass re-assign of 3.9.5 fixed bugs to 3.9.6 , since we built 3.9.5 with the same source tree as 3.9.4 .
Target Milestone: 3.9.5 → 3.9.6
When I reviewed the patch "Initialize list locks", I found that the "lazy" lock creation code in pk11_PutObjectToList should also be removed. (The same code in pk11_GetObjectFromList was removed.) It is not urgent to fix this on the NSS_3_9_BRANCH because that code is dead code now that we create the lock in nsc_CommonInitialize.
Attachment #176647 - Flags: superreview?(nelson)
Attachment #176647 - Flags: review?(julien.pierre.bugs)
At the very beginning of pk11_CleanupFreeList, we have: if (!list->lock) { return; } Before, list->lock was created lazily, so a null list->lock indicated that the list was never used and therefore it needed not be cleaned up. This test may need to be updated after Julien's patch has been checked in. First, list->lock is now always created, so a null list->lock now indicates a softoken initialization failure. Second, if PKCS11_USE_THREADS is not defined, list->lock is NULL even if the softoken is initialized successfully (see pk11_InitFreeList). So it seems that testing list->lock is no longer correct. I suggest testing list->head instead.
Attachment #176650 - Flags: review?(rrelyea)
Comment on attachment 176650 [details] [diff] [review] Confusing code in pk11_CleanupFreeList Instead of this patch, it may be better to remove the if (!list->lock) { return; } test altogether.
Attachment #176647 - Flags: review?(julien.pierre.bugs) → review+
Attachment #176650 - Attachment is obsolete: true
Attachment #176694 - Flags: review?(wtchang)
Wan-Teh, I have checked in both https://bugzilla.mozilla.org/attachment.cgi?id=176647 and https://bugzilla.mozilla.org/attachment.cgi?id=176694 to the tip. Sorry, midnight is our build deadline . If you have a problem with one of the patches, please reopen this bug. Checking in pkcs11u.c; /cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v <-- pkcs11u.c new revision: 1.60; previous revision: 1.59 done For NSS_3_9_BRANCH : Checking in pkcs11u.c; /cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v <-- pkcs11u.c new revision: 1.54.2.5; previous revision: 1.54.2.4 done
Comment on attachment 176694 [details] [diff] [review] test list->lock only if PKCS11_USE_THREADS is defined In the current code, list->lock is always created (if PKCS11_USE_THREADS is defined), so the list->lock test is useless. If I understand it correctly, the purpose of that test used to be a performance optimization (to return immediately if the freelist wasn't created), rather than a crash protection. It now only serves the purpose of crash protection (due to a failed NSS initialization), but we'd need similar tests in pk11_PutObjectToList, etc.
Attachment #176694 - Flags: review?(wtchang) → review+
Attachment #176647 - Flags: superreview?(nelson)
Comment on attachment 176650 [details] [diff] [review] Confusing code in pk11_CleanupFreeList remove review request on an obsolete patch
Attachment #176650 - Flags: review?(rrelyea)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: