Closed
Bug 489188
Opened 15 years ago
Closed 12 years ago
object/memory leak in libpkix
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
3.13.6
People
(Reporter: alvolkov.bgs, Assigned: KaiE)
References
Details
(Keywords: memory-leak)
Attachments
(8 files, 1 obsolete file)
2.59 KB,
patch
|
nelson
:
review+
|
Details | Diff | Splinter Review |
1.42 KB,
patch
|
nelson
:
review+
|
Details | Diff | Splinter Review |
6.97 KB,
text/plain
|
Details | |
4.62 KB,
text/plain
|
Details | |
15.77 KB,
patch
|
Details | Diff | Splinter Review | |
1.65 KB,
patch
|
KaiE
:
review+
|
Details | Diff | Splinter Review |
1.02 KB,
patch
|
Details | Diff | Splinter Review | |
1.98 KB,
patch
|
KaiE
:
review+
|
Details | Diff | Splinter Review |
Yet another memory leak in libpkix. This time selfserv leaked list of chain checkers that were created in pkix_Build_CheckInCache. The actual reason of the error is not known, but the print out of the leaked object bring some light to the leak problem. ssl.sh: Stress TLS ECDH-ECDSA AES 128 CBC with SHA (no reuse) ---- selfserv starting at Mon Apr 20 00:16:47 PDT 2009 selfserv -D -p 8111 -d ../server -n GORIDE.red.iplanet.com -B -s \ -e GORIDE.red.iplanet.com-ec -w nss -c :C004 -i ../tests_pid.3120 & trying to connect to selfserv at Mon Apr 20 00:16:47 PDT 2009 tstclnt -p 8111 -h GORIDE.red.iplanet.com -B -s -q \ -d ../client -v < /export/tinderlight/data/goride_32_DBG/mozilla/security/nss/tests/ssl/sslreq.dat tstclnt.exe: connecting to GORIDE.red.iplanet.com:8111 (address=192.18.72.38) kill -0 1456 >/dev/null 2>/dev/null selfserv with PID 1456 found at Mon Apr 20 00:16:47 PDT 2009 selfserv with PID 1456 started at Mon Apr 20 00:16:47 PDT 2009 strsclnt -q -p 8111 -d ../client -B -s -w nss -2 -c 100 -C :C004 -N \ GORIDE.red.iplanet.com strsclnt started at Mon Apr 20 00:16:47 PDT 2009 .... strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed strsclnt: -- SSL: Server Certificate Validated. 0[b52898]: Class Cert leaked 2 objects of size 144 bytes, total = 288 bytes 0[b52898]: Class CertChainChecker leaked 4 objects of size 20 bytes, total = 80 bytes 0[b52898]: Class PolicyCheckerState leaked 1 objects of size 88 bytes, total = 88 bytes 0[b52898]: Class PolicyNode leaked 1 objects of size 28 bytes, total = 28 bytes 0[b52898]: Class CertSelector leaked 1 objects of size 12 bytes, total = 12 bytes 0[b52898]: Class ComCertSelParams leaked 1 objects of size 80 bytes, total = 80 bytes 0[b52898]: Class Date leaked 1 objects of size 8 bytes, total = 8 bytes 0[b52898]: Class Error leaked 1 objects of size 20 bytes, total = 20 bytes 0[b52898]: Class List leaked 23 objects of size 20 bytes, total = 460 bytes 0[b52898]: Class OID leaked 14 objects of size 12 bytes, total = 168 bytes 0[b52898]: Class PublicKey leaked 3 objects of size 4 bytes, total = 12 bytes 0[b52898]: Class SignatureCheckerState leaked 1 objects of size 20 bytes, total = 20 bytes 0[b52898]: Class TargetCertCheckerState leaked 1 objects of size 32 bytes, total = 32 bytes 0[b52898]: Class X500Name leaked 4 objects of size 24 bytes, total = 96 bytes 0[b52898]: Assertion failure: numLeakedObjects == 0, at C:/mozilla-build/msys/export/tinderlight/data/goride_32_DBG/mozilla/security/nss/lib/libpkix/pkix_pl_nss/system/pkix_pl_lifecycle.c:294 strsclnt completed at Mon Apr 20 00:16:59 PDT 2009 ssl.sh: #4271: Stress TLS ECDH-ECDSA AES 128 CBC with SHA (no reuse) produced a returncode of 3, expected is 0. - FAILED trying to kill selfserv with PID 1456 at Mon Apr 20 00:16:59 PDT 2009 kill 1456 ./all.sh: line 17: 1456 Terminated ${PROFTOOL} ${BINDIR}/selfserv -D -p ${PORT} -d ${P_R_SERVERDIR} -n ${HOSTADDR} ${SERVER_OPTIONS} ${ECC_OPTIONS} -w nss ${sparam} -i ${R_SERVERPID} $verbose selfserv with PID 1456 killed at Mon Apr 20 00:16:59 PDT 2009 Potential leak is at this line 3101 through 3104 of pkix_build.c rev 1.56. The cleanup of the list of checkers happens only upon successful execution of pkix_Build_ValidateEntireChain. They are leaked in case of it's failure. Patch fixes memory leak by destroying object right before they get assigned to a new value.
Attachment #373678 -
Flags: review?(nelson)
Updated•15 years ago
|
Attachment #373678 -
Flags: review?(nelson) → review+
Comment 1•15 years ago
|
||
Comment on attachment 373678 [details] [diff] [review] Patch v1 - resolt object leak problem r=nelson
Comment 2•15 years ago
|
||
Please verify also this leak (Tinderbox goride MINGW32_NT-5.2 32bit DBG 2009/04/20 16:45:36), if it's caused by the same reason or if it is something different: ssl.sh: Stress SSL3 ECDHE-ECDSA AES 128 CBC with SHA (no reuse) ---- selfserv starting at Mon Apr 20 18:45:38 PDT 2009 selfserv -D -p 8111 -d ../ext_server -n GORIDE.red.iplanet.com -B -s \ -e GORIDE.red.iplanet.com-ec -w nss -c :C009 -i ../tests_pid.3544 & trying to connect to selfserv at Mon Apr 20 18:45:38 PDT 2009 tstclnt -p 8111 -h GORIDE.red.iplanet.com -B -s -q \ -d ../ext_client -v < /export/tinderlight/data/goride_32_DBG/mozilla/security/nss/tests/ssl/sslreq.dat tstclnt.exe: connecting to GORIDE.red.iplanet.com:8111 (address=192.18.72.38) kill -0 3784 >/dev/null 2>/dev/null selfserv with PID 3784 found at Mon Apr 20 18:45:39 PDT 2009 selfserv with PID 3784 started at Mon Apr 20 18:45:39 PDT 2009 strsclnt -q -p 8111 -d ../ext_client -B -s -w nss -c 100 -C :C009 -N -T \ GORIDE.red.iplanet.com strsclnt started at Mon Apr 20 18:45:39 PDT 2009 (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed strsclnt: -- SSL: Server Certificate Validated. (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed (pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed strsclnt: -- SSL: Server Certificate Validated. (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed (pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed (pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed strsclnt: -- SSL: Server Certificate Validated. (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed (pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed strsclnt: -- SSL: Server Certificate Validated. (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed (pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed strsclnt: -- SSL: Server Certificate Validated. strsclnt: 0 cache hits; 8 cache misses, 0 cache not reusable 0 stateless resumes (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed (pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed strsclnt: -- SSL: Server Certificate Validated. (pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. 0[b528a0]: Class Error leaked 1 objects of size 20 bytes, total = 20 bytes 0[b528a0]: Assertion failure: numLeakedObjects == 0, at C:/mozilla-build/msys/export/tinderlight/data/goride_32_DBG/mozilla/security/nss/lib/libpkix/pkix_pl_nss/system/pkix_pl_lifecycle.c:294 strsclnt completed at Mon Apr 20 18:45:45 PDT 2009 ssl.sh: #4395: Stress SSL3 ECDHE-ECDSA AES 128 CBC with SHA (no reuse) produced a returncode of 3, expected is 0. - FAILED
Comment 3•15 years ago
|
||
While Alexei is away, Julien and I need to take some of his bugs. I'm going to ask Julien to take the ones that are obviously related to the CRL cache.
Assignee: alexei.volkov.bugs → julien.pierre.boogz
Priority: -- → P1
Target Milestone: 3.12.1 → 3.12.4
Comment 4•15 years ago
|
||
This seems to be some PKIX shutdown problem. In nightly tests from yesterday (builds 20090420.1) I have seen the same bug on 2 machines. AIX (failure in selfserv): ... strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. strsclnt: 0 cache hits; 100 cache misses, 0 cache not reusable 0 stateless resumes strsclnt: NoReuse - 100 server certificates tested. strsclnt completed at Tue Apr 21 09:47:54 PDT 2009 ssl.sh: #4701: Stress TLS RC4 128 with MD5 (no reuse, client auth) produced a returncode of 0, expected is 0. - PASSED trying to kill selfserv with PID 422004 at Tue Apr 21 09:47:54 PDT 2009 kill -USR1 422004 Assertion failure: numLeakedObjects == 0, at /h/guybrush.red.iplanet.com/export/mccrel3/security/securitytip/builds/20090420.1/wozzeck_Solaris8/mozilla/security/nss/lib/libpkix/pkix_pl_nss/system/pkix_pl_lifecycle.c:294 selfserv: 0 cache hits; 100 cache misses, 0 cache not reusable 0 stateless resumes, 0 ticket parse failures all.sh[12]: 422004 IOT/Abort trap(coredump) selfserv with PID 422004 killed at Tue Apr 21 09:47:59 PDT 2009 ssl.sh: #4702: kill_selfserv core detection step - Core file is detected - FAILED Solaris (failure in strsclnt): strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. Assertion failure: numLeakedObjects == 0, at pkix_pl_lifecycle.c:294 strsclnt: 0 cache hits; 100 cache misses, 0 cache not reusable 0 stateless resumes strsclnt: NoReuse - 100 server certificates tested. all.sh: line 326: 619 Abort (core dumped) ${PROFTOOL} ${BINDIR}/strsclnt -q -p ${PORT} -d ${P_R_CLIENTDIR} ${CLIENT_OPTIONS} -w nss $cparam $verbose ${HOSTADDR} strsclnt completed at Mon Apr 20 23:56:20 PDT 2009 ssl.sh: #4705: Stress TLS ECDH-ECDSA AES 128 CBC with SHA (no reuse) produced a returncode of 134, expected is 0. - Core file is detected - FAILED Maybe this can tell something more: Current function is PR_Assert 550 abort(); (dbx) where [1] _basename(0x1, 0x6, 0x0, 0xd0ec0000, 0x8046c88, 0xd0e503fe), at 0xd0e4f597 [2] _nss_netdb_aliases(0x6, 0xd1079d3c, 0x0, 0x0, 0x0, 0x0), at 0xd0e67d8d [3] confstr(0xd1283114, 0xd1032cd9, 0x8046cec, 0xd12306cc, 0xd12727ac, 0xd12727c4), at 0xd0e503fe =>[4] PR_Assert(s = 0xd12727ac "numLeakedObjects == 0", file = 0xd12727c4 "pkix_pl_lifecycle.c", ln = 294), line 550 in "prlog.c" [5] PKIX_PL_Shutdown(plContext = 0x80b6848), line 294 in "pkix_pl_lifecycle.c" [6] PKIX_Shutdown(plContext = 0x80b6848), line 232 in "pkix_lifecycle.c" [7] NSS_Shutdown(), line 884 in "nssinit.c" [8] main(argc = 17, argv = 0x8046e14), line 1532 in "strsclnt.c"
Comment 5•15 years ago
|
||
tinderbox: build: trunk standard ciotat SunOS/sparc 64bit DBG strsclnt: -- SSL: Server Certificate Validated. (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed (pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed (pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed strsclnt: -- SSL: Server Certificate Validated. (PKIX_PL_Cert_VerifySignature: PKIX_PL_HashTable_Add skipped: entry existed (pkix_CacheCertChain_Add: PKIX_PL_HashTable_Add for CertChain skipped: entry existed strsclnt: 0 cache hits; 10 cache misses, 0 cache not reusable 0 stateless resumes strsclnt: NoReuse - 10 server certificates tested. strsclnt completed at Fri May 8 07:51:17 PDT 2009 ssl.sh: #5054: Stress TLS ECDH-ECDSA AES 128 CBC with SHA (no reuse, client auth) produced a returncode of 0, expected is 0. - PASSED trying to kill selfserv with PID 29922 at Fri May 8 07:51:18 PDT 2009 kill -USR1 29922 1[100130a70]: Class Error leaked 1 objects of size 32 bytes, total = 32 bytes 1[100130a70]: Assertion failure: numLeakedObjects == 0, at pkix_pl_lifecycle.c:294 Assertion failure: numLeakedObjects == 0, at pkix_pl_lifecycle.c:294 selfserv: 0 cache hits; 10 cache misses, 0 cache not reusable 0 stateless resumes, 0 ticket parse failures ./all.sh: line 17: 29922 Abort (core dumped) ${PROFTOOL} ${BINDIR}/selfserv -D -p ${PORT} -d ${P_R_SERVERDIR} -n ${HOSTADDR} ${SERVER_OPTIONS} ${ECC_OPTIONS} -w nss ${sparam} -i ${R_SERVERPID} $verbose When viewed this bug I thought the approved v1 patch may not have been check in, but it was checked in 1.57 <alexei.volkov.bugs@sun.com> 2009-04-20 14:19 489188 - object/memory leak in libpkix.r=nelson.
Comment 6•15 years ago
|
||
I have seen some of these leaks in strsclnt as well. I recall that Alexei had a method to trace libpkix object leaks. Is it documented somewhere ?
Comment 7•15 years ago
|
||
This happened on Mac too : http://tinderbox.mozilla.org/showlog.cgi?log=NSS/1242386373.1242395594.6390.gz&fulltext=1 strsclnt: -- SSL: Server Certificate Validated. -1609533664[401f60]: Class Error leaked 1 objects of size 20 bytes, total = 20 bytes -1609533664[401f60]: Assertion failure: numLeakedObjects == 0, at pkix_pl_lifecycle.c:294 Assertion failure: numLeakedObjects == 0, at pkix_pl_lifecycle.c:294 strsclnt: 0 cache hits; 100 cache misses, 0 cache not reusable 0 stateless resumes strsclnt: NoReuse - 100 server certificates tested. ./ssl.sh: line 421: 14849 Abort trap ${PROFTOOL} ${BINDIR}/strsclnt -q -p ${PORT} -d ${P_R_CLIENTDIR} ${CLIENT_OPTIONS} -w nss $cparam $verbose ${HOSTADDR} strsclnt completed at Fri May 15 05:40:55 PDT 2009 ssl.sh: #5142: Stress TLS ECDH-ECDSA AES 128 CBC with SHA (no reuse) produced a returncode of 134, expected is 0. - FAILED trying to kill selfserv with PID 14829 at Fri May 15 05:40:55 PDT 2009
Comment 8•15 years ago
|
||
I'm giving this bug back to Alexei.
Assignee: julien.pierre.boogz → alexei.volkov.bugs
Status: ASSIGNED → NEW
Keywords: mlk
Whiteboard: PKIX SUN_MUST_HAVE
Reporter | ||
Comment 9•15 years ago
|
||
> strsclnt: -- SSL: Server Certificate Validated.
> -1609533664[401f60]: Class Error leaked 1 objects of size 20 bytes, total = 20
> bytes
> -1609533664[401f60]: Assertion failure: numLeakedObjects == 0, at
> pkix_pl_lifecycle.c:294
This leak appears to be an incorrect handling/overwrite of the pointer to an error that was returned by a cache indicating an object, that was tried to be added to the cache, is already there. The problematic code is either in pkix_CacheCert_Add or in pkix_CacheCertChain_Add. Still working on trying to figure out the exact problematic code area.
However, I'm attaching the patch that will resolve the problem by handling the error within protected by locks area.
Attachment #387014 -
Flags: review?(nelson)
Comment 10•15 years ago
|
||
Comment on attachment 387014 [details] [diff] [review] Patch v2 - fix the leak by handling error in the protected area [backed out] r=nelson. We will want to back this out when the real cause is found, so please say that in the checkin comment.
Attachment #387014 -
Flags: review?(nelson) → review+
Comment 11•15 years ago
|
||
I'm running into this assertion failure when running all.sh on Windows Vista. The value of numLeakedObjects is 1.
Reporter | ||
Comment 12•15 years ago
|
||
Wan-Teh, did you make any changes or is it a build from the trunk? If you have made some changes to lib pkix, this might mean, that you have leaked a pointer ref count somewhere.
Comment 13•15 years ago
|
||
Alexei, it is a build from the current NSS trunk, without any changes. I built it on Windows Vista with Visual Studio 2008. I set OS_TARGET=WIN95 and NSS_ENABLE_ECC=1 during the build, and set OS_TARGET=WIN95 when I ran all.sh. (That is, I unset NSS_ENABLE_ECC before I ran all.sh.)
Reporter | ||
Comment 14•15 years ago
|
||
Thanks for the report. I'll check it out.
Comment 15•15 years ago
|
||
strsclnt had this assertion failure on an NSS Tinderbox last night: trunk standard attic Linux 32bit DBG on 2010/01/21 20:18:13 Here is the excerpt of the failed SSL stress test from the test log file.
Assignee | ||
Comment 17•12 years ago
|
||
We frequently get this problem on tinderbox, especially on machines with multiple cores. This bug has a patch, it has r+, but apparently it never got checked in. I've done so now, let's see if it helps. Checking in lib/libpkix/pkix_pl_nss/system/pkix_pl_hashtable.c; /cvsroot/mozilla/security/nss/lib/libpkix/pkix_pl_nss/system/pkix_pl_hashtable.c,v <-- pkix_pl_hashtable.c new revision: 1.14; previous revision: 1.13 done
Assignee | ||
Comment 18•12 years ago
|
||
Marking fixed for now. I will reopen if I see the problem again...
Target Milestone: 3.12.4 → 3.14
Assignee | ||
Comment 19•12 years ago
|
||
Patch DIDN'T help. We still get the assertion on Linux/ppc64. Leaving bug open. Keep the patch, or backout?
Assignee | ||
Updated•12 years ago
|
Blocks: pkix-default
Assignee | ||
Comment 20•12 years ago
|
||
Whenever we run into this assertion, is's a stress test that sets "no session reuse" in strsclnt. With "session reuse enabled", we don't get this bug.
Assignee | ||
Comment 21•12 years ago
|
||
... also, by inspecting all the recent occurrences of this bug on tinderbox, we always had bypass mode enabled in both client and server. This bug is mysterious. On my Linux machine with 4 cores, using the following test execution command, the test suite runs into this bug at least once in three runs: NSS_TESTS="ssl" NSS_SSL_RUN="stress" NSS_CYCLES="pkix" \ NSS_SSL_TESTS="bypass_normal" ./all.sh I tried to reproduce this crash, by running the test that crashed, only. I assumed, if I run this crashing test multiple times, I should be able to see the crash frequently. But this assumption is wrong. For example, the following series of commands (taken from the logfile) ran into the assertion: selfserv -D -p 8443 -d ../server -n localhost.localdomain -B -s -e localhost.localdomain-ec -w nss -r -r tstclnt -p 8443 -h localhost.localdomain -B -s -q -d ../client -v < /home/kaie/moz/nss/head/mozilla/security/nss/tests/ssl/sslreq.dat strsclnt -q -p 8443 -d ../client -B -s -w nss -c 100 -C c -T -N -n TestUser localhost.localdomain I copied the client and db directories to "c" and "s" and I used a mininum script to run these commands: selfserv -D -p 8443 -d ../s -n localhost.localdomain -B -s -e localhost.localdomain-ec -w nss -r -r & PID=$! tstclnt -p 8443 -h localhost.localdomain -B -s -q -d ../c -v < /home/kaie/moz/nss/head/mozilla/security/nss/tests/ssl/sslreq.dat kill -0 $PID >/dev/null 2>/dev/null strsclnt -q -p 8443 -d ../c -B -s -w nss -c 100 -C c -T -N -n TestUser localhost.localdomain kill -USR1 $PID selfserv -b -p 8443 2>/dev/null; I looped through above commands, but it never crashed. (For testing purposes I ran "killall -11 selfserv", and that immediately crashed and produced a core file.)
Assignee | ||
Comment 22•12 years ago
|
||
After spending most of today on this bug, and after some black alleys, I made progress. I edited sslstress.txt to run the problematic tests more often, reliably giving me at least one crash each test run, and I added printf statements to the error-creation and error-destroy functions. Luckily the problamtic error code was unique. When crashing, the process has allocated a NULLARGUMENT error object, but not freed it. After adding some assertions to all places where this error is thrown, I got a core file with the stack causing the issue. Function PKIX_PL_Cert_GetExtendedKeyUsage will always call PKIX_List_SetImmutable on list cert->extKeyUsages, even if that list pointer is NULL. This causes a NULLARGUMENT error object to be created. I have not checked why this error object doesn't get freed. However, we can avoid the creation of the error object, by avoiding this unnecessary function call if the list is NULL. In my local testing, this simple NULL check fixes the problem.
Assignee | ||
Comment 23•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
Attachment #624222 -
Flags: review?(wtc)
Assignee | ||
Comment 24•12 years ago
|
||
Comment on attachment 387014 [details] [diff] [review] Patch v2 - fix the leak by handling error in the protected area [backed out] Wan-Teh, are you OK to backout this patch, which I landed yesterday, but which didn't help?
Attachment #387014 -
Flags: feedback?(wtc)
Assignee | ||
Updated•12 years ago
|
Attachment #387014 -
Attachment description: fix the leak by handling error in the protected area. → Patch v2 - fix the leak by handling error in the protected area - propose to backout
Assignee | ||
Comment 25•12 years ago
|
||
Comment 26•12 years ago
|
||
Comment on attachment 387014 [details] [diff] [review] Patch v2 - fix the leak by handling error in the protected area [backed out] Yes, we should back out this patch. (I didn't review it.)
Assignee | ||
Comment 27•12 years ago
|
||
Assignee | ||
Comment 28•12 years ago
|
||
Comment on attachment 387014 [details] [diff] [review] Patch v2 - fix the leak by handling error in the protected area [backed out] backed out Checking in lib/libpkix/pkix_pl_nss/system/pkix_pl_hashtable.c; /cvsroot/mozilla/security/nss/lib/libpkix/pkix_pl_nss/system/pkix_pl_hashtable.c,v <-- pkix_pl_hashtable.c new revision: 1.15; previous revision: 1.14 done
Attachment #387014 -
Attachment description: Patch v2 - fix the leak by handling error in the protected area - propose to backout → Patch v2 - fix the leak by handling error in the protected area [backed out]
Attachment #387014 -
Flags: feedback?(wtc)
Assignee | ||
Updated•12 years ago
|
Whiteboard: PKIX SUN_MUST_HAVE → [fix frequent tinderbox orange]
Assignee | ||
Comment 29•12 years ago
|
||
This bug is now waiting for review of patch v3, which fixes the bug according to my testing.
Comment 30•12 years ago
|
||
Why does this leak occur only intermittently?
Assignee | ||
Comment 31•12 years ago
|
||
(In reply to Brian Smith (:bsmith) from comment #30) > Why does this leak occur only intermittently? I haven't researched that.
Comment 32•12 years ago
|
||
Kai, please review and test this patch. I studied the code related to cert->extKeyUsages and cert->extKeyUsagesAbsent. I found that this is where we construct cert->extKeyUsages, so we only need to call PKIX_List_SetImmutable after we just constructed cert->extKeyUsages. This is essentially the same as your patch v3, with the PKIX_List_SetImmutable call moved to a better place. Note that this fix merely suppresses the creation of the PKIX_NULLARGUMENT error object. We still need to track down why the error object is leaked. If we don't do that, we will still leak an error object if a different error object is created elsewhere.
Attachment #624222 -
Attachment is obsolete: true
Attachment #624956 -
Flags: review?(kaie)
Attachment #624222 -
Flags: review?(wtc)
Comment 33•12 years ago
|
||
Kai, please review and test this patch. To test it, you need to back out patch v4. By studying all the functions on the call stack that allocates the leaked NULLARGUMENT error, I believe this is where the error object is leaked: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/libpkix/pkix/top/pkix_build.c&rev=1.64&mark=3078-3082,3096,3102-3106,3109#3078 We know the pkix_Build_ValidateEntireChain call returns a fatal NULLARGUMENT error. So the PKIX_CHECK_ONLY_FATAL macro causes us to goto cleanup. Under cleanup:, I believe that we enter the if statement and call pkix_CacheCertChain_Remove inside the PKIX_CHECK_FATAL. This causes us to lose the error object stored in the pkixErrorResult variable. I don't know if we should preserve the original error object. This patch merely fixes the error object leak. It does not preserve the original error object if the pkix_CacheCertChain_Remove call succeeds.
Attachment #624962 -
Flags: review?(kaie)
Comment 34•12 years ago
|
||
This is an alternative fix for the error object leak. This patch preserves the original error object if the pkix_CacheCertChain_Remove doesn't fail. Since I didn't find out whether we need to preserve the original error object, I think it is safer to maintain the current behavior of dropping the original error object and just fix the error object leak.
Comment 35•12 years ago
|
||
Comment on attachment 624966 [details] [diff] [review] Fix pkixErrorResult leak, approach 2 (preserves the original error object) There is only one caller of pkix_Build_CheckInCache, pkix_Build_InitiateBuildChain: ... PKIX_CHECK_ONLY_FATAL( pkix_Build_CheckInCache(state, &buildResult, &nbioContext, plContext), PKIX_UNABLETOBUILDCHAIN); ... So, whether we preserve the original error or not doesn't matter in the CURRENT state of the code. However, is better to preserve the error when the call to pkix_CacheCertChain_Remove doesn't fail, because AFAICT that is how functions in libpkix are "supposed" to work. So, I think this patch (approach 2) is better than approach 1, as far as making the code understandable. So, my understanding is that if we were to check in this patch (approach 2), we would start seeing intermittent NULLARGUMENT errors, and if we check in your other patch then those intermittent NULLARGUMENT errors would go away. If we were to check in approach 1, then the bug fixed in Patch v4 would be not result in a detectable failure, right? If so then I think the approach 1 patch is not just worse than approach 2 because of understandability, but really wrong.
Comment 36•12 years ago
|
||
Just to clarify: I recommend checking in two patches: - patch v4 - Fix pkixErrorResult leak, either approach 1 or approach 2 Brian: your understanding in comment 35 is correct. It is true that the current code and also the "approach 1" patch both suppress the NULLARGUMENT errors. I now agree that is wrong. Kai: please test "approach 2" instead of "approach 1". Thanks. It would also be nice to track down why the NULLARGUMENT errors are intermittent. But I understand you may not have time to do that.
Updated•12 years ago
|
Attachment #624966 -
Flags: review?(kaie)
Assignee | ||
Comment 37•12 years ago
|
||
Comment on attachment 624956 [details] [diff] [review] Patch v4 [checked in] Wan-Teh, I suspect you made an unintentional copy/paste mistake. Your patch doesn't simply move the code - you modify a different list.
Attachment #624956 -
Flags: review?(kaie) → review-
Assignee | ||
Comment 38•12 years ago
|
||
Comment on attachment 624956 [details] [diff] [review] Patch v4 [checked in] (In reply to Kai Engert (:kaie) from comment #37) > > Your patch doesn't simply move the code - you modify a different list. Sorry, my mistake. You do modify the correct list, which is named differently in this scope.
Attachment #624956 -
Flags: review- → review?(kaie)
Assignee | ||
Comment 39•12 years ago
|
||
Wan-Teh, neither your "approach 1" nor your "approach 2" fixes the issue. Here are the test results: For all testing described in this comment, the following applies: - patch from bug 761715 was reverted - attachment 624227 [details] [diff] [review] from this bug was applied (test enhancements) - test execution with: NSS_TESTS="ssl" NSS_SSL_RUN="stress" NSS_CYCLES="pkix" \ NSS_SSL_TESTS="bypass_normal" ./all.sh no patches test-run #1: fail test-run #2: ok test-run #3: fail only patch v4: test-run #1: ok test-run #2: ok test-run #3: ok only approach 1: test-run #1: ok test-run #2: fail ! test-run #3: fail ! only approach 2: test-run #1: fail ! test-run #2: ok test-run #3: fail
Assignee | ||
Comment 40•12 years ago
|
||
Sorry. I've performed the testing without sufficient focus. Of course, when using ONLY approach 1 or ONLY approach 2, we still fail, because my testing modifications cause us to assert/abort whenever the PKIX_NULLARGUMENT objects gets created. In order to test your approaches, I must remove those additional asserts...
Assignee | ||
Comment 41•12 years ago
|
||
Ignore comment 39.
Assignee | ||
Comment 42•12 years ago
|
||
Comment on attachment 624962 [details] [diff] [review] Fix pkixErrorResult leak, approach 1 Patch "approach 1" fixes the leak. No other failures in the remainder of the test suite.
Assignee | ||
Comment 43•12 years ago
|
||
Comment on attachment 624966 [details] [diff] [review] Fix pkixErrorResult leak, approach 2 (preserves the original error object) This patch "approach 2" introduces a new problem. Probably whenever we have the NULL argument error object, we get another error, like the following example: strsclnt: -- SSL: Server Certificate Invalid, err -8187. security library: invalid arguments. strsclnt: PR_Send returned error -8187: security library: invalid arguments. strsclnt: -- SSL: Server Certificate Validated. strsclnt: -- SSL: Server Certificate Validated. selfserv: HDX PR_Read returned error -12271: SSL peer cannot verify your certificate. strsclnt: 0 cache hits; 12 cache misses, 0 cache not reusable 0 stateless resumes strsclnt: NoReuse - 12 server certificates tested. strsclnt completed at Wed Jun 6 11:51:06 CEST 2012 ssl.sh: #595: Stress SSL3 RC4 128 with MD5 (no reuse, client auth) produced a returncode of 1, expected is 0. - FAILED
Attachment #624966 -
Flags: review?(kaie) → review-
Assignee | ||
Comment 44•12 years ago
|
||
When combining patch v4 and approach 2, the errors shown in comment 43 are gone. Does this mean, whenever we saw a NULLARGUMENT error, it was caused by a certificate validation failure?
Assignee | ||
Comment 45•12 years ago
|
||
When using the diagnostic printf statements for file pkix_error.c found in attachment 624227 [details] [diff] [review], we get the attached output. With the help of a patch from bug 762057 I translated those error numbers into meaning... The set of error codes seen is: code 702 NULLARGUMENT code 865 PRIMHASHTABLEADDFAILED code 19 ATTEMPTTOADDDUPLICATEKEY But errors 865 and 19 are commonly seen, in all runs of strsclnt, even in the succeeding runs. The diagnostic output has statements meaning "did't add, because entry was already there" - meaning those errors aren't fatal but are supposed to be ignored. Whenever we get certificate validation failure -8187, we can see that the NULLARGUMENT error code was produced immediately before.
Assignee | ||
Comment 46•12 years ago
|
||
Comment on attachment 624956 [details] [diff] [review] Patch v4 [checked in] r=kaie
Attachment #624956 -
Flags: review?(kaie) → review+
Assignee | ||
Comment 47•12 years ago
|
||
Comment on attachment 624956 [details] [diff] [review] Patch v4 [checked in] I think we agree that this change is good. I've checked it in to make the tinderbox orange stop. Checking in pkix_pl_cert.c; /cvsroot/mozilla/security/nss/lib/libpkix/pkix_pl_nss/pki/pkix_pl_cert.c,v <-- pkix_pl_cert.c new revision: 1.31; previous revision: 1.30 done If you decide to further investigate this bug, you must back it out for further testing.
Attachment #624956 -
Attachment description: Patch v4 → Patch v4 [checked in]
Assignee | ||
Comment 48•12 years ago
|
||
Comment on attachment 624956 [details] [diff] [review] Patch v4 [checked in] also checked in to NSS_3_13_4_BRANCH (for potential 3.13.6) Checking in pkix_pl_cert.c; /cvsroot/mozilla/security/nss/lib/libpkix/pkix_pl_nss/pki/pkix_pl_cert.c,v <-- pkix_pl_cert.c new revision: 1.28.2.2; previous revision: 1.28.2.1 done
Comment 49•12 years ago
|
||
Kai, thank you for checking in patch v4 and testing the "approach 1" and "approach 2" patches. The behaviors you reported in comment 42 and comment 43 are expected: - "approach 1" should fix the object leak and preserve the current behavior (suppress the NULLARGUMENT error). - "approach 2" should also fix the object leak, but it will expose the NULLARGUMENT error intentionally. Since it seems wrong to suppress an error, if combining patch v4 and approach 2 fixes the object leak, I suggest that we check in approach 2 instead of approach 1.
Assignee | ||
Comment 50•12 years ago
|
||
Comment on attachment 624966 [details] [diff] [review] Fix pkixErrorResult leak, approach 2 (preserves the original error object) Wan-Teh, thanks for the additional clarificaions. r=kaie
Attachment #624966 -
Flags: review- → review+
Assignee | ||
Updated•12 years ago
|
Attachment #624962 -
Flags: review?(kaie)
Assignee | ||
Comment 51•12 years ago
|
||
Comment on attachment 624966 [details] [diff] [review] Fix pkixErrorResult leak, approach 2 (preserves the original error object) trunk: Checking in pkix_build.c; /cvsroot/mozilla/security/nss/lib/libpkix/pkix/top/pkix_build.c,v <-- pkix_build.c new revision: 1.65; previous revision: 1.64 done 3_13_4 branch for 3.13.6: Checking in pkix_build.c; /cvsroot/mozilla/security/nss/lib/libpkix/pkix/top/pkix_build.c,v <-- pkix_build.c new revision: 1.63.2.1; previous revision: 1.63 done
Assignee | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Assignee: alvolkov.bgs → kaie
Whiteboard: [fix frequent tinderbox orange] → [fix frequent tinderbox orange][3.13.6]
Updated•12 years ago
|
Whiteboard: [fix frequent tinderbox orange][3.13.6] → [fix frequent tinderbox orange]
Target Milestone: 3.14 → 3.13.6
Comment 52•12 years ago
|
||
Cleaning up/fixing whiteboards containing 'orange' but not '[orange]' in advance of bug 791758.
Whiteboard: [fix frequent tinderbox orange]
You need to log in
before you can comment on or make changes to this bug.
Description
•