A number of crashes are reported in strsclnt in tinderbox subsequent to some of my checkins for bugs 399304, 403957, 353577 . Here is one of the logs with the crashes : http://tinderbox.mozilla.org/showlog.cgi?log=NSS-Stable-Branch/1195193040.1195196007.9191.gz&fulltext=1 This is not happening on all platforms. I am trying to reproduce the issue on the same platform so I can get a stack dump.
I reproduced the problem on NSS_3_11_BRANCH and Solaris sparc 64 bit. The crashes are in the stan code and don't make much sense. The problem seems to be due to the checkin for bug 353577, as it went away when I backed out the patch. I haven't found what's wrong with it yet. My guess is that it has something to do with the NSS_3_4_CODE changes. I'm still investigating.
This is one of the crash stacks . They are all identical. The problem is always that o2 is NULL .  nssCryptokiObject_Equal(o1 = 0x100132390, o2 = 0x132590), line 121 in "devutil.c" => nssPKIObject_AddInstance(object = 0x1002d7a10, instance = 0x132590), line 210 in "pkibase.c"  merge_object_instances(to = 0x1002d7a10, from = 0x1002d92d0), line 738 in "tdcache.c"  add_cert_to_cache(td = 0x10017bbf0, cert = 0x1002d92d0), line 781 in "tdcache.c"  nssTrustDomain_AddCertsToCache(td = 0x10017bbf0, certs = 0xffffffff7a6fa878, numCerts = 1U), line 879 in "tdcache.c"  cert_createObject(o = 0x1002d9260), line 1063 in "pkibase.c"  nssPKIObjectCollection_GetObjects(collection = 0x1002d6a40, rvObjects = 0xffffffff7a6fab48, rvSize = 1U), line 894 in "pkibase.c"  nssPKIObjectCollection_GetCertificates(collection = 0x1002d6a40, rvOpt = 0xffffffff7a6fab48, maximumOpt = 1U, arenaOpt = (nil)), line 1119 in "pkibase.c"  nssTrustDomain_FindCertificateByIssuerAndSerialNumber(td = 0x10017bbf0, issuer = 0xffffffff7a6fac68, serial = 0xffffffff7a6fac58), line 816 in "trustdomain.c"  nssTrustDomain_FindCertificateByEncodedCertificate(td = 0x10017bbf0, ber = 0xffffffff7a6fae30), line 867 in "trustdomain.c"  NSSTrustDomain_FindCertificateByEncodedCertificate(td = 0x10017bbf0, ber = 0xffffffff7a6fae30), line 879 in "trustdomain.c"  __CERT_NewTempCertificate(handle = 0x10017bbf0, derCert = 0xffffffff7a6faf68, nickname = (nil), isperm = 0, copyDER = 1), line 242 in "stanpcertdb.c"  ssl3_HandleCertificate(ss = 0x100257ea0, b = 0x10025ff5f "^N", length = 0), line 7063 in "ssl3con.c"  ssl3_HandleHandshakeMessage(ss = 0x100257ea0, b = 0x10025fa76 "", length = 1257U), line 7746 in "ssl3con.c"  ssl3_HandleHandshake(ss = 0x100257ea0, origBuf = 0x100258220), line 7862 in "ssl3con.c"  ssl3_HandleRecord(ss = 0x100257ea0, cText = 0xffffffff7a6fb430, databuf = 0x100258220), line 8125 in "ssl3con.c"  ssl3_GatherCompleteHandshake(ss = 0x100257ea0, flags = 0), line 206 in "ssl3gthr.c"  ssl_GatherRecord1stHandshake(ss = 0x100257ea0), line 1258 in "sslcon.c"  ssl_Do1stHandshake(ss = 0x100257ea0), line 151 in "sslsecur.c"  ssl_SecureSend(ss = 0x100257ea0, buf = 0x10001c694 "GET /abc HTTP/1.0^M\n^M\n", len = 21, flags = 0), line 1152 in "sslsecur.c"  ssl_Send(fd = 0x100132000, buf = 0x10001c694, len = 21, flags = 0, timeout = 4294967295U), line 1426 in "sslsock.c"  PR_Send(fd = 0x100132000, buf = 0x10001c694, amount = 21, flags = 0, timeout = 4294967295U), line 226 in "priometh.c"  handle_connection(ssl_sock = 0x100132000, tid = 5), line 696 in "strsclnt.c"  do_connects(a = 0xffffffff7fffeae0, b = 0x100131c80, tid = 5), line 887 in "strsclnt.c"  thread_wrapper(arg = 0x10012a008), line 439 in "strsclnt.c"  _pt_root(arg = 0x100194310), line 220 in "ptthread.c"
Removing the PURE_STAN_BUILD macro had zero risk because it is easy to verify that the PURE_STAN_BUILD macro wasn't defined. The NSS_3_4_CODE macro is a different story. In one or two directories, it was defined in a makefile, which was good. But in other directories (e.g., lib/pk11wrap), it was manually defined in the .c files, before including the Stan headers. So it is difficult to verify that we defined NSS_3_4_CODE in all the necessary places. It was possible that in a .c file we didn't define NSS_3_4_CODE before including the Stan headers, so that removing the #ifdef NSS_3_4_CODE did not result in the same code. I tried to look for such problems but gave up because it was too tedious.
Julien, fix all the warnings: warning: implicit function declaration xxx
On the NSS_3_11_BRANCH you missed two files: $ find . -name "*.h" -print | xargs grep NSS_3_4_CODE $ find . -name "*.c" -print | xargs grep NSS_3_4_CODE ./lib/pki/tdcache.c:#ifdef NSS_3_4_CODE ./lib/pki/tdcache.c:#ifdef NSS_3_4_CODE ./lib/pki/tdcache.c:#ifdef NSS_3_4_CODE ./lib/pki/tdcache.c:#ifdef NSS_3_4_CODE ./lib/pki/trustdomain.c:#ifdef NSS_3_4_CODE ./lib/pki/trustdomain.c:#ifdef NSS_3_4_CODE
Created attachment 289629 [details] [diff] [review] Proposed patch The changes to tdcache.c and trustdomain.c are required. The other changes are just formatting changes. We did have the problem I mentioned in comment 3 (we didn't define NSS_3_4_CODE in crl.c and nssinit.c before including devt.h), but there was no harm.
Comment on attachment 289629 [details] [diff] [review] Proposed patch I've checked in this patch on the NSS_3_11_BRANCH because this patch is just a missing piece of the patch that Nelson and I reviewed. In the CVS commit comment I noted "tbr=julien.pierre", which means "to be reviewed by Julien Pierre".
Wan-Teh, Yes, I noticed the problem in nssinit.c and crl.c yesterday (I changed the header files to have #else #error NSS_3_4_CODE not defined) but indeed there was no harm. Thanks for noticing the problem and checking in the fix.
Yes, that's also how I found out about nssinit.c and crl.c :-) I then made the NSSSlotStr and NSSTokenStr structures opaque in devt.h and verified that I could still compile nssinit.c and crl.c successfully and without compiler warnings.
The problem is now resolved in the tinderbox, so I am marking this bug fixed.