Closed Bug 270689 Opened 16 years ago Closed 16 years ago

race condition in softoken list lock creation

Categories

(NSS :: Tools, defect, P2)

3.9.4
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jason.m.reid, Assigned: julien.pierre)

Details

Attachments

(3 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.4) Gecko/20040414
Build Identifier: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.4) Gecko/20040414

 The program strsclnt had an assertion failure on 20041117 nightly
build in the ssl_simple_rootca NISCC test. I have attached the logs,
NISCC code fragment, and stack trace. When I reran this test, it passed
normally.


mahatma[svbld]:/niscc/archive/20041117.1> file core.061805
core.061805:    ELF 32-bit LSB core file 80386 Version 1, from 'strsclnt'
mahatma[svbld]:/niscc/archive/20041117.1> cat nisccLog05
using /niscc/ssl/testcases/simple_rootca/00000002
using /niscc/ssl/testcases/simple_rootca/00000004
using /niscc/ssl/testcases/simple_rootca/00000001
using /niscc/ssl/testcases/simple_rootca/00000003
using /niscc/ssl/testcases/simple_rootca/00000005
selfserv: normal termination
mahatma[svbld]:/niscc/archive/20041117.1> cat nisccLog06
Assertion failure: _PT_PTHREAD_MUTEX_IS_LOCKED(lock->mutex), at
../../../../pr/src/pthreads/ptsynch.c:207
Bad server certificate: -8156, Issuer certificate is invalid.
subject DN: CN=Simple Server,O=Simple Server,C=GB
issuer  DN: CN=Simple Root Authority,O=Simple Root Authority,C=GB
0 cache hits; 1 cache misses, 0 cache not reusable
HTTP/1.0 200 OK
Server: Generic Web Server
Date: Tue, 26 Aug 1997 22:10:05 GMT
Content-type: text/plain

GET /stop HTTP/1.0

EOF


tstclnt: write to SSL socket failed: TCP connection reset by peer.
tstclnt: unable to connect: Connection refused by peer.
tstclnt: unable to connect: Connection refused by peer.
tstclnt: unable to connect: Connection refused by peer.
mahatma[svbld]:/niscc/archive/20041117.1>

# NISCC code fragment
ssl_simple_rootca()
{
  CLIENT=${NISCC_HOME}/niscc_ssl/simple_client; export CLIENT
  SERVER=${NISCC_HOME}/niscc_ssl/simple_server; export SERVER
  PORT=8445; export PORT
  START_AT=1; export START_AT
  STOP_AT=106190; export STOP_AT
  LD_LIBRARY_PATH=${HACKLIB}; export LD_LIBRARY_PATH
  NISCC_TEST=$TEST/simple_rootca; export NISCC_TEST
  ${HACKBIN}/selfserv -p $PORT -d $SERVER -n server_crt -t 5 -w test > \
$NISCC_HOME/nisccLog05 2>&1 &

  unset NISCC_TEST; export NISCC_TEST
  LD_LIBRARY_PATH=${TESTLIB}; export LD_LIBRARY_PATH
  ${TESTBIN}/strsclnt -d $CLIENT -p $PORT -t 4 -c 106190 -o -N $HOSTNAME > \
$NISCC_HOME/nisccLog06 2>&1

  echo "starting tstclnt to shutdown simple rootca selfserv process"
  ${TESTBIN}/tstclnt -h $HOSTNAME -p $PORT -d $CLIENT -n client_crt -o -f \
-w test < $CLIENT/stop.txt >> nisccLog06 2>&1
  ${TESTBIN}/tstclnt -h $HOSTNAME -p $PORT -d $CLIENT -n client_crt -o -f \
-w test < $CLIENT/stop.txt >> nisccLog06 2>&1
  ${TESTBIN}/tstclnt -h $HOSTNAME -p $PORT -d $CLIENT -n client_crt -o -f \
-w test < $CLIENT/stop.txt >> nisccLog06 2>&1
  ${TESTBIN}/tstclnt -h $HOSTNAME -p $PORT -d $CLIENT -n client_crt -o -f \
-w test < $CLIENT/stop.txt >> nisccLog06 2>&1
  ${TESTBIN}/tstclnt -h $HOSTNAME -p $PORT -d $CLIENT -n client_crt -o -f \
-w test < $CLIENT/stop.txt >> nisccLog06 2>&1

  unset LD_LIBRARY_PATH
  sleep 10

  find_core
}

mahatma[svbld]:/niscc/archive/20041117.1>
/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx
/share/builds/mccrel3/security/security39/builds/20041117.1/wozzeck_Solaris8/mozilla/dist/SunOS5.10_i86pc_DBG.OBJ/bin/strsclnt
 core.061805
Reading strsclnt
core file header read successfully
Reading ld.so.1
Reading libssl3.so
Reading libsmime3.so
Reading libnss3.so
Reading libplc4.so
Reading libplds4.so
Reading libnspr4.so
Reading libthread.so.1
Reading libnsl.so.1
Reading libsocket.so.1
Reading librt.so.1
Reading libdl.so.1
Reading libc.so.1
Reading libsoftokn3.so
Reading libpthread.so.1
Reading libaio.so.1
Reading libmd5.so.1
Reading libmp.so.2
detected a multithreaded program
t@0 (l@1) terminated by signal KILL (Killed)
0xd0ff1979: __lwp_park+0x0019:  jae    __lwp_park+0x27 <0xd0ff1987>
Current function is PR_WaitCondVar
  389           rv = pthread_cond_wait(&cvar->cv, &cvar->lock->mutex);
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) whre
whre: not found
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) where
  [1] __lwp_park(0x808a8c4, 0x80bf3e8), at 0xd0ff1979
  [2] cond_sleep_queue(0x808a8c4, 0x80bf3e8, 0x0), at 0xd0febff1
  [3] cond_wait_queue(0x808a8c4, 0x80bf3e8, 0x0, 0x0), at 0xd0fec103
  [4] _cond_wait(0x808a8c4, 0x80bf3e8), at 0xd0fec60c
  [5] cond_wait(0x808a8c4, 0x80bf3e8), at 0xd0fec651
  [6] _pthread_cond_wait(0x808a8c4, 0x80bf3e8), at 0xd0fec68b
=>[7] PR_WaitCondVar(cvar = 0x808a8c0, timeout = 4294967295U), line 389 in
"ptsynch.c"
  [8] launch_thread(startFunc = 0x8052588 = &do_connects(void *a, void *b, int
connection), a = 0x80475a0, b = 0x808a780, c = 5), line 416 in "strsclnt.c"
  [9] client_main(port = 8445U, connections = 106186, Cert_And_Key = 0x8047654,
hostName = 0x808a620 "mahatma.red.iplanet.com"), line 1136 in "strsclnt.c"
  [10] main(argc = 12, argv = 0x80476cc), line 1309 in "strsclnt.c"
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx)


Reproducible: Couldn't Reproduce
Steps to Reproduce:
1.
2.
3.




Here is the thread stack trace.
mahatma[svbld]:/niscc/archive/20041117.1>
/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx
/share/builds/mccrel3/security/security39/builds/20041117.1/wozzeck_Solaris8/mozilla/dist/SunOS5.10_i86pc_DBG.OBJ/bin/strsclnt
core.061805
Reading strsclnt
core file header read successfully
Reading ld.so.1
Reading libssl3.so
Reading libsmime3.so
Reading libnss3.so
Reading libplc4.so
Reading libplds4.so
Reading libnspr4.so
Reading libthread.so.1
Reading libnsl.so.1
Reading libsocket.so.1
Reading librt.so.1
Reading libdl.so.1
Reading libc.so.1
Reading libsoftokn3.so
Reading libpthread.so.1
Reading libaio.so.1
Reading libmd5.so.1
Reading libmp.so.2
detected a multithreaded program
t@0 (l@1) terminated by signal KILL (Killed)
0xd0ff1979: __lwp_park+0x0019:  jae    __lwp_park+0x27 <0xd0ff1987>
Current function is PR_WaitCondVar
  389           rv = pthread_cond_wait(&cvar->cv, &cvar->lock->mutex);
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) threads
      t@1  a l@1  ?()   sleep on 0x808a8c4      in __lwp_park()
      t@2  a l@2  _pt_root()    sleep on 0x80d4208      in __lwp_park()
o     t@3  a l@3  _pt_root()    signal SIGABRT  in _lwp_kill()
      t@4  a l@4  _pt_root()    sleep on 0x808cbf8      in __lwp_park()
      t@5  a l@5  _pt_root()    sleep on 0x80d4208      in __lwp_park()
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) three t@3
three: not found
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) thread t@3
Current function is PR_Assert
  530       abort();
t@3 (l@3) stopped in _lwp_kill at 0xd0ff2e05
0xd0ff2e05: _lwp_kill+0x0015:   jae    _lwp_kill+0x23 <0xd0ff2e13>
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) where
current thread: t@3
  [1] _lwp_kill(0x3, 0x6), at 0xd0ff2e05
  [2] _thr_kill(0x3, 0x6), at 0xd0fefe1d
  [3] raise(0x6), at 0xd0f9ccaf
  [4] abort(), at 0xd0f7f1cd
=>[5] PR_Assert(s = 0xd11911b0 "_PT_PTHREAD_MUTEX_IS_LOCKED(lock->mutex)", file
= 0xd11910e8 "../../../../pr/src/pthreads/ptsynch.c", ln = 207), line 530 in
"prlog.c"
  [6] PR_Unlock(lock = 0x80d4208), line 207 in "ptsynch.c"
  [7] pk11_GetObjectFromList(hasLocks = 0xd0d2f8b8, optimizeSpace = 0, list =
0xd0f1e6c4, hashSize = 32U, isSessionObject = 1), line 2101 in "pkcs11u.c"
  [8] pk11_NewObject(slot = 0x808d8a0), line 2208 in "pkcs11u.c"
  [9] NSC_GenerateKey(hSession = 18U, pMechanism = 0xd0d2fa64, pTemplate =
0xd0d2fa7c, ulCount = 2U, phKey = 0x8092214), line 2768 in "pkcs11c.c"
  [10] PK11_TokenKeyGen(slot = 0x80ae7f0, type = 880U, param = 0xd0d2fb34,
keySize = 0, keyid = (nil), isToken = 0, wincx = (nil)), line 1554 in "pk11skey.c"
  [11] PK11_KeyGen(slot = 0x80ae7f0, type = 880U, param = 0xd0d2fb34, keySize =
0, wincx = (nil)), line 1576 in "pk11skey.c"
  [12] ssl3_GenerateRSAPMS(ss = 0x80c0d60, spec = 0x8106c58, serverKeySlot =
(nil)), line 6861 in "ssl3con.c"
  [13] sendRSAClientKeyExchange(ss = 0x80c0d60, svrPubKey = 0x81255c0), line
3587 in "ssl3con.c"
  [14] ssl3_SendClientKeyExchange(ss = 0x80c0d60), line 4290 in "ssl3con.c"
  [15] ssl3_HandleServerHelloDone(ss = 0x80c0d60), line 5328 in "ssl3con.c"
  [16] ssl3_HandleHandshakeMessage(ss = 0x80c0d60, b = 0x8108fbe "", length =
0), line 8077 in "ssl3con.c"
  [17] ssl3_HandleHandshake(ss = 0x80c0d60, origBuf = 0x80c0f04), line 8165 in
"ssl3con.c"
  [18] ssl3_HandleRecord(ss = 0x80c0d60, cText = 0xd0d2fd54, databuf =
0x80c0f04), line 8433 in "ssl3con.c"
  [19] ssl3_GatherCompleteHandshake(ss = 0x80c0d60, flags = 0), line 203 in
"ssl3gthr.c"
  [20] ssl_GatherRecord1stHandshake(ss = 0x80c0d60), line 1260 in "sslcon.c"
  [21] ssl_Do1stHandshake(ss = 0x80c0d60), line 149 in "sslsecur.c"
  [22] ssl_SecureSend(ss = 0x80c0d60, buf = 0x806507c "GET /abc
HTTP/1.0^M\n^M\n", len = 21, flags = 0), line 1033 in "sslsecur.c"
  [23] ssl_SecureWrite(ss = 0x80c0d60, buf = 0x806507c "GET /abc
HTTP/1.0^M\n^M\n", len = 21), line 1067 in "sslsecur.c"
  [24] ssl_Write(fd = 0x808a9a0, buf = 0x806507c, len = 21), line 1286 in
"sslsock.c"
  [25] PR_Write(fd = 0x808a9a0, buf = 0x806507c, amount = 21), line 143 in
"priometh.c"
  [26] handle_connection(ssl_sock = 0x808a9a0, connection = 2), line 675 in
"strsclnt.c"
  [27] do_connects(a = 0x80475a0, b = 0x808a780, connection = 2), line 802 in
"strsclnt.c"
  [28] thread_wrapper(arg = 0x8087630), line 383 in "strsclnt.c"
  [29] _pt_root(arg = 0x80bf4e8), line 214 in "ptthread.c"
  [30] _thr_setup(0xd0dd8800), at 0xd0ff1730
  [31] _lwp_start(), at 0xd0ff18f0
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx)
Priority: -- → P2
Target Milestone: --- → 3.9.5
Version: unspecified → 3.9.4
The code in question looks like this:

   2084 pk11_GetObjectFromList(PRBool *hasLocks, PRBool optimizeSpace,
   2085      PK11ObjectFreeList *list, unsigned int hashSize, PRBool isSessionOb
        ject)
   2086 {
   2087     PK11Object *object;
   2088     int size = 0;
   2089
   2090     if (!optimizeSpace) {
   2091         if (list->lock == NULL) {
   2092             list->lock = PZ_NewLock(nssILockObject);
   2093         }
   2094
   2095         PK11_USE_THREADS(PZ_Lock(list->lock));
   2096         object = list->head;
   2097         if (object) {
   2098             list->head = object->next;
   2099             list->count--;
   2100         }
   2101         PK11_USE_THREADS(PZ_Unlock(list->lock));  <===
   2102         if (object) {
   2103             object->next = object->prev = NULL;
   2104             *hasLocks = PR_TRUE;
   2105             return object;
   2106         }
   2107     }

The creation of list->lock is not thread-safe.

Jason, could you get the stacks of all the other threads
in that core file?
mahatma[svbld]:/niscc/archive/20041117.1>
/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx
/share/builds/mccrel3/security/security39/builds/20041117.1/wozzeck_Solaris8/mozilla/dist/SunOS5.10_i86pc_DBG.OBJ/bin/strsclnt
core.061805
Reading strsclnt
core file header read successfully
Reading ld.so.1
Reading libssl3.so
Reading libsmime3.so
Reading libnss3.so
Reading libplc4.so
Reading libplds4.so
Reading libnspr4.so
Reading libthread.so.1
Reading libnsl.so.1
Reading libsocket.so.1
Reading librt.so.1
Reading libdl.so.1
Reading libc.so.1
Reading libsoftokn3.so
Reading libpthread.so.1
Reading libaio.so.1
Reading libmd5.so.1
Reading libmp.so.2
detected a multithreaded program
t@0 (l@1) terminated by signal KILL (Killed)
0xd0ff1979: __lwp_park+0x0019:  jae    __lwp_park+0x27 <0xd0ff1987>
Current function is PR_WaitCondVar
  389           rv = pthread_cond_wait(&cvar->cv, &cvar->lock->mutex);
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) threads
      t@1  a l@1  ?()   sleep on 0x808a8c4      in __lwp_park()
      t@2  a l@2  _pt_root()    sleep on 0x80d4208      in __lwp_park()
o     t@3  a l@3  _pt_root()    signal SIGABRT  in _lwp_kill()
      t@4  a l@4  _pt_root()    sleep on 0x808cbf8      in __lwp_park()
      t@5  a l@5  _pt_root()    sleep on 0x80d4208      in __lwp_park()
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) thread t@1
Current function is PR_WaitCondVar
  389           rv = pthread_cond_wait(&cvar->cv, &cvar->lock->mutex);
t@1 (l@1) stopped in __lwp_park at 0xd0ff1979
0xd0ff1979: __lwp_park+0x0019:  jae    __lwp_park+0x27 <0xd0ff1987>
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) where
current thread: t@1
  [1] __lwp_park(0x808a8c4, 0x80bf3e8), at 0xd0ff1979
  [2] cond_sleep_queue(0x808a8c4, 0x80bf3e8, 0x0), at 0xd0febff1
  [3] cond_wait_queue(0x808a8c4, 0x80bf3e8, 0x0, 0x0), at 0xd0fec103
  [4] _cond_wait(0x808a8c4, 0x80bf3e8), at 0xd0fec60c
  [5] cond_wait(0x808a8c4, 0x80bf3e8), at 0xd0fec651
  [6] _pthread_cond_wait(0x808a8c4, 0x80bf3e8), at 0xd0fec68b
=>[7] PR_WaitCondVar(cvar = 0x808a8c0, timeout = 4294967295U), line 389 in
"ptsynch.c"
  [8] launch_thread(startFunc = 0x8052588 = &do_connects(void *a, void *b, int
connection), a = 0x80475a0, b = 0x808a780, c = 5), line 416 in "strsclnt.c"
  [9] client_main(port = 8445U, connections = 106186, Cert_And_Key = 0x8047654,
hostName = 0x808a620 "mahatma.red.iplanet.com"), line 1136 in "strsclnt.c"
  [10] main(argc = 12, argv = 0x80476cc), line 1309 in "strsclnt.c"
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) thread t@2
Current function is PR_Lock
  190       rv = pthread_mutex_lock(&lock->mutex);
t@2 (l@2) stopped in __lwp_park at 0xd0ff1979
0xd0ff1979: __lwp_park+0x0019:  jae    __lwp_park+0x27 <0xd0ff1987>
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) where
current thread: t@2
  [1] __lwp_park(0xd0dd8400, 0x0), at 0xd0ff1979
  [2] mutex_lock_queue(0xd0dd8400, 0x0, 0x80d4208, 0x0), at 0xd0fea904
  [3] slow_lock(0xd0dd8400, 0x80d4208, 0x0), at 0xd0feb1ad
  [4] mutex_lock_impl(0x80d4208, 0x0), at 0xd0feb2a3
  [5] pthread_mutex_lock(0x80d4208), at 0xd0feb3b1
=>[6] PR_Lock(lock = 0x80d4208), line 190 in "ptsynch.c"
  [7] pk11_GetObjectFromList(hasLocks = 0xd0d4f9bc, optimizeSpace = 0, list =
0xd0f1e6c4, hashSize = 32U, isSessionObject = 1), line 2095 in "pkcs11u.c"
  [8] pk11_NewObject(slot = 0x808d8a0), line 2208 in "pkcs11u.c"
  [9] NSC_CreateObject(hSession = 1U, pTemplate = 0xd0d4fa74, ulCount = 8U,
phObject = 0xd0d4faec), line 3658 in "pkcs11.c"
  [10] PK11_CreateNewObject(slot = 0x80ae7f0, session = 0, theTemplate =
0xd0d4fa74, count = 8, token = 0, objectID = 0xd0d4faec), line 138 in "pk11skey.c"
  [11] PK11_ImportPublicKey(slot = 0x80ae7f0, pubKey = 0x811cd18, isToken = 0),
line 628 in "pk11skey.c"
  [12] PK11_PubWrapSymKey(type = 1U, pubKey = 0x811cd18, symKey = 0x80e6c80,
wrappedKey = 0xd0d4fb74), line 2437 in "pk11skey.c"
  [13] sendRSAClientKeyExchange(ss = 0x80bf6c8, svrPubKey = 0x811cd18), line
3602 in "ssl3con.c"
  [14] ssl3_SendClientKeyExchange(ss = 0x80bf6c8), line 4290 in "ssl3con.c"
  [15] ssl3_HandleServerHelloDone(ss = 0x80bf6c8), line 5328 in "ssl3con.c"
  [16] ssl3_HandleHandshakeMessage(ss = 0x80bf6c8, b = 0x811205d "", length =
0), line 8077 in "ssl3con.c"
  [17] ssl3_HandleHandshake(ss = 0x80bf6c8, origBuf = 0x80bf86c), line 8165 in
"ssl3con.c"
  [18] ssl3_HandleRecord(ss = 0x80bf6c8, cText = 0xd0d4fd54, databuf =
0x80bf86c), line 8433 in "ssl3con.c"
  [19] ssl3_GatherCompleteHandshake(ss = 0x80bf6c8, flags = 0), line 203 in
"ssl3gthr.c"
  [20] ssl_GatherRecord1stHandshake(ss = 0x80bf6c8), line 1260 in "sslcon.c"
  [21] ssl_Do1stHandshake(ss = 0x80bf6c8), line 149 in "sslsecur.c"
  [22] ssl_SecureSend(ss = 0x80bf6c8, buf = 0x806507c "GET /abc
HTTP/1.0^M\n^M\n", len = 21, flags = 0), line 1033 in "sslsecur.c"
  [23] ssl_SecureWrite(ss = 0x80bf6c8, buf = 0x806507c "GET /abc
HTTP/1.0^M\n^M\n", len = 21), line 1067 in "sslsecur.c"
  [24] ssl_Write(fd = 0x808a900, buf = 0x806507c, len = 21), line 1286 in
"sslsock.c"
  [25] PR_Write(fd = 0x808a900, buf = 0x806507c, amount = 21), line 143 in
"priometh.c"
  [26] handle_connection(ssl_sock = 0x808a900, connection = 1), line 675 in
"strsclnt.c"
  [27] do_connects(a = 0x80475a0, b = 0x808a780, connection = 1), line 802 in
"strsclnt.c"
  [28] thread_wrapper(arg = 0x8087610), line 383 in "strsclnt.c"
  [29] _pt_root(arg = 0x80bf448), line 214 in "ptthread.c"
  [30] _thr_setup(0xd0dd8400), at 0xd0ff1730
  [31] _lwp_start(), at 0xd0ff18f0
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) thread t@4
Current function is PR_Lock
  190       rv = pthread_mutex_lock(&lock->mutex);
t@4 (l@4) stopped in __lwp_park at 0xd0ff1979
0xd0ff1979: __lwp_park+0x0019:  jae    __lwp_park+0x27 <0xd0ff1987>
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) where
current thread: t@4
  [1] __lwp_park(0xd0dd8c00, 0x0), at 0xd0ff1979
  [2] mutex_lock_queue(0xd0dd8c00, 0x0, 0x808cbf8, 0x0), at 0xd0fea904
  [3] slow_lock(0xd0dd8c00, 0x808cbf8, 0x0), at 0xd0feb1ad
  [4] mutex_lock_impl(0x808cbf8, 0x0), at 0xd0feb2a3
  [5] pthread_mutex_lock(0x808cbf8), at 0xd0feb3b1
=>[6] PR_Lock(lock = 0x808cbf8), line 190 in "ptsynch.c"
  [7] PK11_EnterSlotMonitor(slot = 0x80ae7f0), line 526 in "pk11slot.c"
  [8] PK11_CreateNewObject(slot = 0x80ae7f0, session = 0, theTemplate =
0xd0d0fa74, count = 8, token = 0, objectID = 0xd0d0faec), line 135 in "pk11skey.c"
  [9] PK11_ImportPublicKey(slot = 0x80ae7f0, pubKey = 0x811a418, isToken = 0),
line 628 in "pk11skey.c"
  [10] PK11_PubWrapSymKey(type = 1U, pubKey = 0x811a418, symKey = 0x80d6510,
wrappedKey = 0xd0d0fb74), line 2437 in "pk11skey.c"
  [11] sendRSAClientKeyExchange(ss = 0x80c2468, svrPubKey = 0x811a418), line
3602 in "ssl3con.c"
  [12] ssl3_SendClientKeyExchange(ss = 0x80c2468), line 4290 in "ssl3con.c"
  [13] ssl3_HandleServerHelloDone(ss = 0x80c2468), line 5328 in "ssl3con.c"
  [14] ssl3_HandleHandshakeMessage(ss = 0x80c2468, b = 0x810d855 "", length =
0), line 8077 in "ssl3con.c"
  [15] ssl3_HandleHandshake(ss = 0x80c2468, origBuf = 0x80c260c), line 8165 in
"ssl3con.c"
  [16] ssl3_HandleRecord(ss = 0x80c2468, cText = 0xd0d0fd54, databuf =
0x80c260c), line 8433 in "ssl3con.c"
  [17] ssl3_GatherCompleteHandshake(ss = 0x80c2468, flags = 0), line 203 in
"ssl3gthr.c"
  [18] ssl_GatherRecord1stHandshake(ss = 0x80c2468), line 1260 in "sslcon.c"
  [19] ssl_Do1stHandshake(ss = 0x80c2468), line 149 in "sslsecur.c"
  [20] ssl_SecureSend(ss = 0x80c2468, buf = 0x806507c "GET /abc
HTTP/1.0^M\n^M\n", len = 21, flags = 0), line 1033 in "sslsecur.c"
  [21] ssl_SecureWrite(ss = 0x80c2468, buf = 0x806507c "GET /abc
HTTP/1.0^M\n^M\n", len = 21), line 1067 in "sslsecur.c"
  [22] ssl_Write(fd = 0x808a940, buf = 0x806507c, len = 21), line 1286 in
"sslsock.c"
  [23] PR_Write(fd = 0x808a940, buf = 0x806507c, amount = 21), line 143 in
"priometh.c"
  [24] handle_connection(ssl_sock = 0x808a940, connection = 3), line 675 in
"strsclnt.c"
  [25] do_connects(a = 0x80475a0, b = 0x808a780, connection = 3), line 802 in
"strsclnt.c"
  [26] thread_wrapper(arg = 0x8087650), line 383 in "strsclnt.c"
  [27] _pt_root(arg = 0x80bf588), line 214 in "ptthread.c"
  [28] _thr_setup(0xd0dd8c00), at 0xd0ff1730
  [29] _lwp_start(), at 0xd0ff18f0
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) thread t@5
Current function is PR_Lock
  190       rv = pthread_mutex_lock(&lock->mutex);
t@5 (l@5) stopped in __lwp_park at 0xd0ff1979
0xd0ff1979: __lwp_park+0x0019:  jae    __lwp_park+0x27 <0xd0ff1987>
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) where
current thread: t@5
  [1] __lwp_park(0xd0dd9000, 0x0), at 0xd0ff1979
  [2] mutex_lock_queue(0xd0dd9000, 0x0, 0x80d4208, 0x0), at 0xd0fea904
  [3] slow_lock(0xd0dd9000, 0x80d4208, 0x0), at 0xd0feb1ad
  [4] mutex_lock_impl(0x80d4208, 0x0), at 0xd0feb2a3
  [5] pthread_mutex_lock(0x80d4208), at 0xd0feb3b1
=>[6] PR_Lock(lock = 0x80d4208), line 190 in "ptsynch.c"
  [7] pk11_GetObjectFromList(hasLocks = 0xd0cef8b8, optimizeSpace = 0, list =
0xd0f1e6c4, hashSize = 32U, isSessionObject = 1), line 2095 in "pkcs11u.c"
  [8] pk11_NewObject(slot = 0x808d8a0), line 2208 in "pkcs11u.c"
  [9] NSC_GenerateKey(hSession = 21U, pMechanism = 0xd0cefa64, pTemplate =
0xd0cefa7c, ulCount = 2U, phKey = 0x811d5ec), line 2768 in "pkcs11c.c"
  [10] PK11_TokenKeyGen(slot = 0x80ae7f0, type = 880U, param = 0xd0cefb34,
keySize = 0, keyid = (nil), isToken = 0, wincx = (nil)), line 1554 in "pk11skey.c"
  [11] PK11_KeyGen(slot = 0x80ae7f0, type = 880U, param = 0xd0cefb34, keySize =
0, wincx = (nil)), line 1576 in "pk11skey.c"
  [12] ssl3_GenerateRSAPMS(ss = 0x80c20e8, spec = 0x81060d8, serverKeySlot =
(nil)), line 6861 in "ssl3con.c"
  [13] sendRSAClientKeyExchange(ss = 0x80c20e8, svrPubKey = 0x8124580), line
3587 in "ssl3con.c"
  [14] ssl3_SendClientKeyExchange(ss = 0x80c20e8), line 4290 in "ssl3con.c"
  [15] ssl3_HandleServerHelloDone(ss = 0x80c20e8), line 5328 in "ssl3con.c"
  [16] ssl3_HandleHandshakeMessage(ss = 0x80c20e8, b = 0x811e0bd "", length =
0), line 8077 in "ssl3con.c"
  [17] ssl3_HandleHandshake(ss = 0x80c20e8, origBuf = 0x80c228c), line 8165 in
"ssl3con.c"
  [18] ssl3_HandleRecord(ss = 0x80c20e8, cText = 0xd0cefd54, databuf =
0x80c228c), line 8433 in "ssl3con.c"
  [19] ssl3_GatherCompleteHandshake(ss = 0x80c20e8, flags = 0), line 203 in
"ssl3gthr.c"
  [20] ssl_GatherRecord1stHandshake(ss = 0x80c20e8), line 1260 in "sslcon.c"
  [21] ssl_Do1stHandshake(ss = 0x80c20e8), line 149 in "sslsecur.c"
  [22] ssl_SecureSend(ss = 0x80c20e8, buf = 0x806507c "GET /abc
HTTP/1.0^M\n^M\n", len = 21, flags = 0), line 1033 in "sslsecur.c"
  [23] ssl_SecureWrite(ss = 0x80c20e8, buf = 0x806507c "GET /abc
HTTP/1.0^M\n^M\n", len = 21), line 1067 in "sslsecur.c"
  [24] ssl_Write(fd = 0x808a980, buf = 0x806507c, len = 21), line 1286 in
"sslsock.c"
  [25] PR_Write(fd = 0x808a980, buf = 0x806507c, amount = 21), line 143 in
"priometh.c"
  [26] handle_connection(ssl_sock = 0x808a980, connection = 4), line 675 in
"strsclnt.c"
  [27] do_connects(a = 0x80475a0, b = 0x808a780, connection = 4), line 802 in
"strsclnt.c"
  [28] thread_wrapper(arg = 0x8087670), line 383 in "strsclnt.c"
  [29] _pt_root(arg = 0x80bf628), line 214 in "ptthread.c"
  [30] _thr_setup(0xd0dd9000), at 0xd0ff1730
  [31] _lwp_start(), at 0xd0ff18f0
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx)
Two other threads (t@2 and t@5) were also in pk11_GetObjectFromList
at that time.  They were at line 2095 of pkcs11u.c.  This supports
the theory that some other thread also saw a null list->lock, created
list->lock, and overwrote the list->lock created by the thread that
hit the assertion failure (t@3).
Status: NEW → ASSIGNED
Assigned the bug to Julien.
Assignee: wchang0222 → julien.pierre.bugs
Status: ASSIGNED → NEW
The same problem exists with both tokenObjectList and sessionObjectList .
Those lists have their locks initialized too late.

Fortunately, this problem can only show up in a multithreaded client application.

In the server case, both lists get initialized early .
The token list gets initialized when the server looks up its server cert while
the server is still single-threaded.
The session list gets initialized when the server does the RSA stepdown key
generation, also while the server is still single-threaded. This is the only
instance that causes this list to be initialized before the server starts worker
threads and is ready to accept a connection. So at least now we can justify all
 the cycles spent doing that keygen - it prevents this race condition ;)

Patch forthcoming, for 3.9.5 .
OS: SunOS → All
Hardware: PC → All
Attachment #166830 - Flags: superreview?(rrelyea0264)
Attachment #166830 - Flags: review?(nelson)
Comment on attachment 166830 [details] [diff] [review]
Initialize list locks

I think this patch is fine, as far as it goes, so I'm giving it r+.
But while you're at it, maybe you should initialize ALL of softoken's
free list locks.  Your new function pk11_InitFreeLists() appears to 
be the complement to the existing function:  
  pk11_CleanupFreeLists().  
Perhaps you should also create (and call) a new function to complement 
  nsslowcert_DestroyFreeLists()
and/or
  nsslowcert_DestroyGlobalLocks()
which are both called immediately after the call to 
  pk11_CleanupFreeLists()
However, that is just a suggestion.
Attachment #166830 - Flags: review?(nelson) → review+
Comment on attachment 166830 [details] [diff] [review]
Initialize list locks

>+static void
>+pk11_InitFreeList(PK11ObjectFreeList *list)
>+{
>+    list->lock = PZ_NewLock(nssILockObject);

I'm wondering if this should be wrapped in a
PK11_USE_THREADS macro.
Wan-Teh,

I don't know. The original code that initialized this lock lazily did not have
this macro. Bob ?

Nelson,

Thanks for noting that. I think fixing the other lists should be done in
additional init functions for each source, and be tracked in a different bug.
re: PK11_USE_THREADS

There is both a macro and and ifdef. The macro is for single lines (usually lock
and unlock). The ifdef (PKCS11_USE_THREADS) is for blocks of code. Currently all
the lock variables are conditionally compiled, so it's probably best to keep up
the ifdefs. On the other hand, I doubt that we can actually build without
PKCS11_USE_THREADS undefined. It may be time to remove them from the source code.

bob
I added an ifdef for PKCS11_USE_THREADS and checked in this patch to the tip.
Closing the bug .

Checking in pkcs11.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v  <--  pkcs11.c
new revision: 1.96; previous revision: 1.95
done
Checking in pkcs11i.h;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11i.h,v  <--  pkcs11i.h
new revision: 1.32; previous revision: 1.31
done
Checking in pkcs11u.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v  <--  pkcs11u.c
new revision: 1.59; previous revision: 1.58
done
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Summary: strsclnt assertion failure during NISCC run → race condition in softoken list lock creation
Also checked in to NSS_3_9_BRANCH for 3.9.5 .

Checking in pkcs11.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v  <--  pkcs11.c
new revision: 1.88.4.2; previous revision: 1.88.4.1
done
Checking in pkcs11i.h;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11i.h,v  <--  pkcs11i.h
new revision: 1.30.16.1; previous revision: 1.30
done
Checking in pkcs11u.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v  <--  pkcs11u.c
new revision: 1.54.2.4; previous revision: 1.54.2.3
done
Attachment #166830 - Flags: review+
Attachment #166830 - Flags: superreview?(rrelyea)
Mass re-assign of 3.9.5 fixed bugs to 3.9.6 , since we built 3.9.5 with the same
source tree as 3.9.4 .
Target Milestone: 3.9.5 → 3.9.6
When I reviewed the patch "Initialize list locks",
I found that the "lazy" lock creation code in
pk11_PutObjectToList should also be removed.
(The same code in pk11_GetObjectFromList was
removed.)

It is not urgent to fix this on the NSS_3_9_BRANCH
because that code is dead code now that we create
the lock in nsc_CommonInitialize.
Attachment #176647 - Flags: superreview?(nelson)
Attachment #176647 - Flags: review?(julien.pierre.bugs)
At the very beginning of pk11_CleanupFreeList, we have:

     if (!list->lock) {
	return;
     }

Before, list->lock was created lazily, so a null list->lock
indicated that the list was never used and therefore it needed
not be cleaned up.

This test may need to be updated after Julien's patch
has been checked in.  First, list->lock is now always
created, so a null list->lock now indicates a softoken
initialization failure.  Second, if PKCS11_USE_THREADS is
not defined, list->lock is NULL even if the softoken is
initialized successfully (see pk11_InitFreeList).  So it
seems that testing list->lock is no longer correct.  I
suggest testing list->head instead.
Attachment #176650 - Flags: review?(rrelyea)
Comment on attachment 176650 [details] [diff] [review]
Confusing code in pk11_CleanupFreeList

Instead of this patch, it may be better to
remove the

    if (!list->lock) {
	return;
    }

test altogether.
Attachment #176647 - Flags: review?(julien.pierre.bugs) → review+
Attachment #176650 - Attachment is obsolete: true
Attachment #176694 - Flags: review?(wtchang)
Wan-Teh,

I have checked in both https://bugzilla.mozilla.org/attachment.cgi?id=176647 and
https://bugzilla.mozilla.org/attachment.cgi?id=176694 to the tip.

Sorry, midnight is our build deadline . If you have a problem with one of the
patches, please reopen this bug.

Checking in pkcs11u.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v  <--  pkcs11u.c
new revision: 1.60; previous revision: 1.59
done

For NSS_3_9_BRANCH :

Checking in pkcs11u.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v  <--  pkcs11u.c
new revision: 1.54.2.5; previous revision: 1.54.2.4
done
Comment on attachment 176694 [details] [diff] [review]
test list->lock only if PKCS11_USE_THREADS is defined

In the current code, list->lock is always
created (if PKCS11_USE_THREADS is defined),
so the list->lock test is useless.

If I understand it correctly, the purpose of
that test used to be a performance optimization
(to return immediately if the freelist wasn't
created), rather than a crash protection. It now
only serves the purpose of crash protection (due
to a failed NSS initialization), but we'd need
similar tests in pk11_PutObjectToList, etc.
Attachment #176694 - Flags: review?(wtchang) → review+
Attachment #176647 - Flags: superreview?(nelson)
Comment on attachment 176650 [details] [diff] [review]
Confusing code in pk11_CleanupFreeList

remove review request on an obsolete patch
Attachment #176650 - Flags: review?(rrelyea)
You need to log in before you can comment on or make changes to this bug.