Closed Bug 299070 Opened 20 years ago Closed 19 years ago

NISCC strsclnt SECMOD_Shutdown assertion

Categories

(NSS :: Libraries, defect, P1)

3.10
Sun
Solaris
defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 331279
3.11.3

People

(Reporter: jason.m.reid, Assigned: julien.pierre)

Details

This morning's NISCC run on securitytip dropped core. Here is the DBX output.
mahatma[svbld]:/niscc/archive/20050628.1>
/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx
$LD_LIBRARY_PATH/../bin/strsclnt core.065810
Reading strsclnt
core file header read successfully
Reading ld.so.1
Reading libssl3.so
Reading libsmime3.so
Reading libnss3.so
Reading libplc4.so
Reading libplds4.so
Reading libnspr4.so
Reading libthread.so.1
Reading libnsl.so.1
Reading libsocket.so.1
Reading librt.so.1
Reading libdl.so.1
Reading libc.so.1
Reading libsoftokn3.so
Reading libpthread.so.1
Reading libaio.so.1
Reading libmd5.so.1
Reading libmp.so.2
Reading libscf.so.1
Reading libdoor.so.1
Reading libuutil.so.1
detected a multithreaded program
program terminated by signal ABRT (Abort)
0xd0c1e875: _lwp_kill+0x0015:   jae    _lwp_kill+0x23 <0xd0c1e883>
Current function is PR_Assert
  538       abort();
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) where
  [1] _lwp_kill(0x1, 0x6), at 0xd0c1e875
  [2] _pthread_kill(0x1, 0x6), at 0xd0c1b71b
  [3] raise(0x6), at 0xd0bcadbb
  [4] abort(), at 0xd0bae909
=>[5] PR_Assert(s = 0xd0ee1760 "secmod_PrivateModuleCount == 0", file =
0xd0ee1780 "pk11util.c", ln = 119), line 538 in "prlog.c"
  [6] SECMOD_Shutdown(), line 119 in "pk11util.c"
  [7] NSS_Shutdown(), line 559 in "nssinit.c"
  [8] main(argc = 12, argv = 0x80476c0), line 1347 in "strsclnt.c"
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) up 3
Current function is main
 1347       if (NSS_Shutdown() != SECSuccess) {
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) print argv[0]
argv[0] = 0x8047828
"/share/builds/mccrel3/security/securitytip/builds/20050628.1/wozzeck_Solaris8/mozilla/dist/SunOS5.10_i86pc_DBG.OBJ/bin/strsclnt"
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) print argc
argc = 12
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx) print argv[0..11]
argv[0..11] =
[0] = 0x8047828
"/share/builds/mccrel3/security/securitytip/builds/20050628.1/wozzeck_Solaris8/mozilla/dist/SunOS5.10_i86pc_DBG.OBJ/bin/strsclnt"
[1] = 0x80478a8 "-d"
[2] = 0x80478ab "/niscc/niscc_ssl/simple_client"
[3] = 0x80478ca "-p"
[4] = 0x80478cd "8444"
[5] = 0x80478d2 "-t"
[6] = 0x80478d5 "4"
[7] = 0x80478d7 "-c"
[8] = 0x80478da "106167"
[9] = 0x80478e1 "-o"
[10] = 0x80478e4 "-N"
[11] = 0x80478e7 "mahatma.red.iplanet.com"
(/usr/dist/share/forte_dev_i386,v6.2/SUNWspro/bin/dbx)

Here is the relevant KSH source that runs this test.
##############################################################
# do simple server auth tests
# Use an altered server against the client
##############################################################
ssl_simple_server_auth()
{
  CLIENT=${NISCC_HOME}/niscc_ssl/simple_client; export CLIENT
  SERVER=${NISCC_HOME}/niscc_ssl/simple_server; export SERVER
  PORT=8444; export PORT
  START_AT=1; export START_AT
  STOP_AT=106167; export STOP_AT
  LD_LIBRARY_PATH=${HACKLIB}; export LD_LIBRARY_PATH
  NISCC_TEST=$TEST/simple_server; export NISCC_TEST
  ${HACKBIN}/selfserv -p $PORT -d $SERVER -n server_crt -t 5 -w test > \
$NISCC_HOME/nisccLog03 2>&1 &

  unset NISCC_TEST; export NISCC_TEST
  LD_LIBRARY_PATH=${TESTLIB}; export LD_LIBRARY_PATH
  ${TESTBIN}/strsclnt -d $CLIENT -p $PORT -t 4 -c 106167 -o -N $HOSTNAME > \
$NISCC_HOME/nisccLog04 2>&1

I can provide the core file upon request.
This failure also occured on 20050625 securitytip NISCC run. 
This assertion failure means there is an NSS object reference
leak.

Jason, do you run NISCC tests daily or weekly?
I run them daily for both security310 and securitytip branches.
In that case, we should stop checkins on the NSS
tip and examine the checkins done yesterday (2005-06-27).
Jason, could you provide a time interval in which
this failure started?  I just saw your comment 1
that this failure also occured on 20050625 securitytip NISCC run.
So it wasn't introduced yesterday. 
This failure first occured the morning 20050625. NISCC tests were not run on 0626.
Tests passed on 0627. Failure occured again on 0628. I ran NISCC suite three
more times on tip on 0628 and was unable to reproduce the problem. Tests passed on 
0629. I am running NISCC again now to see if I can reproduce.
strsclnt is a multi-threaded test program.  Perhaps there is a race that 
sometimes leads to a leak, in which case the program results would be
non-deterministic, failing intermittently.

It would be great if we had a way to find the leaked object(s), since 
doing so might enable us to construct much smaller tests that reproduce 
the problem.  The present test does ~106K test cases. 
Hit this again on securytip 20050805 NISCC run.

mahatma[svbld]:/niscc/archive/20050805.1> mdb
/share/builds/mccrel3/security/securitytip/builds/20050805.1/wozzeck_Solaris8/mozilla/dist/SunOS5.10_i86pc_DBG.OBJ/bin/strsclnt
core
Loading modules: [ libc.so.1 libuutil.so.1 ld.so.1 ]
> $C
08047574 libc.so.1`_lwp_kill+0x15(1, 6)
0804758c libc.so.1`raise+0x1f(6)
080475d0 libc.so.1`abort+0xcd(d0ef786c, d0d639e1, 8047600, d0e735e9, d0ee2114,
d0ee2134)
080475e0 libnspr4.so`PR_Assert+0x6a(d0ee2114, d0ee2134, 77)
08047600 libnss3.so`SECMOD_Shutdown+0x199(808c020, d0e247c1, 0, 8047694,
d0ffd7f0, 8047694)
0804761c libnss3.so`NSS_Shutdown+0x41(804767c, d0ffb824, 8055eac, 1, 1, 50)
08047694 main+0x823(c, 80476c0, 80476f4)
080476b4 _start+0x7a(c, 8047828, 80478a8, 80478ab, 80478ca, 80478cd)
>
Another hit on the 20050902 tip run.

mahatma[svbld]:/niscc/archive/20050902.1> mdb
/share/builds/mccrel3/security/securitytip/builds/20050902.1/wozzeck_Solaris8/mozilla/dist/SunOS5.10_i86pc_DBG.OBJ/bin/strsclnt
core.
core.071456  core.backup
mahatma[svbld]:/niscc/archive/20050902.1> mdb
/share/builds/mccrel3/security/securitytip/builds/20050902.1/wozzeck_Solaris8/mozilla/dist/SunOS5.10_i86pc_DBG.OBJ/bin/strsclnt
core.071456
Loading modules: [ libc.so.1 libuutil.so.1 ld.so.1 ]
> $C
080475c4 libc.so.1`_lwp_kill+0x15(1, 6)
080475dc libc.so.1`raise+0x1f(6)
08047628 libc.so.1`abort+0xcd(d0f066d0, d0d738a9, 8047658, d0e829e9, d0ef0f68,
d0ef0f88)
08047638 libnspr4.so`PR_Assert+0x6a(d0ef0f68, d0ef0f88, 77)
08047658 libnss3.so`SECMOD_Shutdown+0x199(80890b0, d0e34001, 0, 80476ec,
d0ffd7b4, 80476ec)
08047674 libnss3.so`NSS_Shutdown+0x41(80476d4, d0ffb824, 8054b54, 1, 1, 4a)
080476ec 0x80552af(10, 8047718, 804775c)
0804770c 0x8052342(10, 8047894, 80478d5, 80478d8, 80478f9, 80478fc)
Another hit on 20050910.
Another hit on 20051006 on security310.
Another hit on 20051207 on securitytip.
Another hit on security310 on 20060103.
Another hit on 20060207 securitytip.
Another hit on 20060221 securitytip.
Assignee: wtchang → nobody
QA Contact: jason.m.reid → libraries
I suspect this is a duplicate of another bug we're working on now.
Julien, can you verify whether it is or is not a duplicate?
Assignee: nobody → julien.pierre.bugs
Priority: -- → P1
Target Milestone: --- → 3.11.1
Nelson,

I think you are talking about bug 331279 . Indeed the failures look very similar. Once I fix bug 331279, which I hope to be this week, we can see if this NISCC failure stops too. But the frequency is only 1 - 2 times a month, so it will take a while.
This bug didn't make 3.11.1 .
Target Milestone: 3.11.1 → 3.11.2
Not a regression. Changing target to 3.11.3 .
Target Milestone: 3.11.2 → 3.11.3
Based on comment 6, this bug is CLEARLY a regression!
Nelson,

Comment 6 implies that it might have been a regression as of a year ago. If you read the later comments, you'll see that I believe this is a dupe of bug 331279, which is an NSS 3.4 Stan bug. But I'm not 100% sure of that, so I kept this one open. The incidence of this bug is very low so proving that it is a dupe will take some time after bug 331279 is fixed.
I'm now convinced this is a dupe of bug 331279 .
Both are failures in strsclnt running in full handshake mode (-N).
This one was seen with RSA during NISCC tests, but very rarely.
331279 was seen with ECC, more frequently (every few nights).
I believe the only difference is the timing of these algorithms.
The race is in the cert code, when creating the temp cert for verification.
This bug was only reproducible about once a month - NISCC takes a long time,
Bug 331279 on the other hand is reproducible once a minute.
I'm going to close this as a dupe. We can reopen it later if we see a failure in NISCC again after I fix bug 331279 .

I'm removing bug 232386 as a dependency because the fix for this doesn't depend on the enhancement to track reference leaks. That enhancement would only ease diagnosis, but is not part of the requirements for the fix.


*** This bug has been marked as a duplicate of 331279 ***
Status: NEW → RESOLVED
Closed: 19 years ago
No longer depends on: 232386
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.