Closed Bug 123081 Opened 24 years ago Closed 24 years ago

SSL client doing client auth in second handshake leaks memory with NSS 3.4

Categories

(NSS :: Libraries, defect, P1)

Sun
Solaris
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: julien.pierre, Assigned: bugz)

References

Details

Attachments

(2 files, 3 obsolete files)

This is the NSS 3.4 counterpart to bug 123079 : Using the web server GAT httptest client in the following way causes a large memory leak : httptest -s -x url51 -H 1 -h strange:1890 -P -e 3600 -L 30 -p 20 This is a simple test where the client presents a client cert to the server, which accepts it. The server does not have SSL client required, just allowed, and the ACL of the server checks the client cert. The dbx checkleaks output for the client says : (dbx) showleaks Checking for memory leaks... Actual leaks report (actual leaks: 1061 total size: 457316 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 423120 205 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < EncodeDBCertKey < ReadDBCertEntry < FindCertByKey < nsslowcert_FindCertByKey < nsslowcert_TraversePermCertsForSubject < pk11_searchCertsAndTrust < pk11_searchTokenList < NSC_FindObjectsInit < traverse_objects_by_template < nssToken_TraverseCertificatesBySubject < NSSTrustDomain_FindBestCertificateBySubject < NSSCertificate_BuildChain < CERT_CertChainFromCert 18040 205 - calloc < PR_Calloc < PR_NewLock < PORT_NewArena < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord < DoRecv < ssl_SecureRecv < ssl_Recv < PR_Recv < PRSocket::empty 9020 205 - calloc < PR_Calloc < PORT_ZAlloc < PORT_NewArena < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord < DoRecv < ssl_SecureRecv < ssl_Recv < PR_Recv < PRSocket::empty 3568 223 - calloc < PR_Calloc < nss_ZAlloc < NSSTime_SetPRTime < NSSTime_Now < nssBestCertificate_SetArgs < NSSTrustDomain_FindBestCertificateBySubject < NSSCertificate_BuildChain < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord < DoRecv 3568 223 - calloc < PR_Calloc < nss_ZAlloc < NSSTime_SetPRTime < NSSTime_Now < nssBestCertificate_SetArgs < NSSCertificate_BuildChain < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord < DoRecv < ssl_SecureRecv Possible leaks report (possible leaks: 90 total size: 113832 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 49536 24 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < nsspkcs5_AlgidToParam < seckey_decrypt_private_key < seckey_decode_encrypted_private_key < seckey_get_private_key < nsslowkey_FindKeyByPublicKey < pk11_GetPrivateKey < pk11_FindPrivateKeyAttribute < pk11_FindTokenAttribute < pk11_FindAttribute 37152 18 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < EncodeDBCertKey < ReadDBCertEntry < FindCertByKey < nsslowcert_FindCertByKey < nsslowcert_TraversePermCertsForSubject < pk11_searchCertsAndTrust < pk11_searchTokenList < NSC_FindObjectsInit < traverse_objects_by_template < nssToken_TraverseCertificatesBySubject < NSSTrustDomain_FindBestCertificateBySubject < NSSCertificate_BuildChain < CERT_CertChainFromCert 18576 9 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < nsslowcert_DecodeDERCertificate < DecodeACert < FindCertByKey < nsslowcert_FindCertByKey < nsslowcert_TraversePermCertsForSubject < pk11_searchCertsAndTrust < pk11_searchTokenList < NSC_FindObjectsInit < traverse_objects_by_template 6192 3 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < nsslowcert_DecodeDERCertificate < DecodeACert < FindCertByKey < nsslowcert_FindCertByKey < nsslowcert_TraversePermCertsForSubject < nsslowcert_TraversePermCertsForNickname < pk11_searchCertsAndTrust < pk11_searchTokenList < NSC_FindObjectsInit 1584 18 - calloc < PR_Calloc < PR_NewLock < PORT_NewArena < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord < DoRecv < ssl_SecureRecv < ssl_Recv < PR_Recv < PRSocket::empty 792 18 - calloc < PR_Calloc < PORT_ZAlloc < PORT_NewArena < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord < DoRecv < ssl_SecureRecv < ssl_Recv < PR_Recv < PRSocket::empty
Priority: -- → P1
The stacks shown above that contain ssl3_HandleCertificateRequest all have the same cause. The fix has been crafted by Ian and myself. The leak goes back to day 1 in NSS, and is not a recent regression. The stacks for the other leaks may be too short to determine their cause. I'd expect that other leaks with the exact same block counts as the ssl3_HandleCertificateRequest leaks will also go away when the leak in ssl3_HandleCertificateRequest is fixed.
Blocks: 123094
Setting TFV to 3.4. I don't think we can release 3.4 with these leaks.
Target Milestone: --- → 3.4
This is the patch that Ian and I co-developed. I intend to check this in tonight.
Thanks, Nelson, I will be trying the patch shortly. I never ran my stress test program with client auth until today, which is why I never noticed the leak before even though it existed from day 1.
Nelson, I tried the patch, but it doesn't resolve the problem. Here is the output of dbx checkleaks on the client : (dbx) showleaks Checking for memory leaks... Actual leaks report (actual leaks: 4921 total size: 78736 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 39376 2461 - calloc < PR_Calloc < nss_ZAlloc < NSSTime_SetPRTime < NSSTime_Now < nssBestCertificate_SetArgs < NSSCertificate_BuildChain < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake < ssl_Do1stHandshake < ssl_SecureSend 39360 2460 - calloc < PR_Calloc < nss_ZAlloc < NSSTime_SetPRTime < NSSTime_Now < nssBestCertificate_SetArgs < NSSTrustDomain_FindBestCertificateBySubject < NSSCertificate_BuildChain < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake < ssl_Do1stHandshake Possible leaks report (possible leaks: 304 total size: 621312 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 598560 290 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < nsslowcert_DecodeDERCertificate < DecodeACert < FindCertByKey < nsslowcert_FindCertByKey < nsslowcert_TraversePermCertsForSubject < nsslowcert_TraversePermCertsForNickname < pk11_searchCertsAndTrust < pk11_searchTokenList < NSC_FindObjectsInit 14448 7 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < nsslowcert_DecodeDERCertificate < DecodeACert < FindCertByKey < nsslowcert_FindCertByKey < nsslowcert_TraversePermCertsForSubject < pk11_searchCertsAndTrust < pk11_searchTokenList < NSC_FindObjectsInit < traverse_objects_by_template 8256 4 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < ReadDBCertEntry < FindCertByKey < nsslowcert_FindCertByKey < nsslowcert_TraversePermCertsForSubject < pk11_searchCertsAndTrust < pk11_searchTokenList < NSC_FindObjectsInit < traverse_objects_by_template < nssToken_TraverseCertificatesBySubject < NSSTrustDomain_FindBestCertificateBySubject < NSSCertificate_BuildChain < CERT_CertChainFromCert < ssl3_HandleCertificateRequest 32 2 - calloc < PR_Calloc < nss_ZAlloc < NSSTime_SetPRTime < NSSTime_Now < nssBestCertificate_SetArgs < NSSTrustDomain_FindBestCertificateBySubject < NSSCertificate_BuildChain < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake < ssl_Do1stHandshake 16 1 0x146e60 calloc < PR_Calloc < nss_ZAlloc < NSSTime_SetPRTime < NSSTime_Now < nssBestCertificate_SetArgs < NSSCertificate_BuildChain < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake < ssl_Do1stHandshake < ssl_SecureSend
Further analysis of these stacks reveals that there are at least two (possibly 3) separate leaks here. I believe one of them was plugged by my patch, but there are others, including at least ONE that is NOT in SSL code. The original list of leaks includes blocks of 4 sizes: 2064 byte blocks (in an arena) 88 byte blocks (a PRLock allocated for an arenapool) 44 byte blocks (PLArenaPool structures) 16 byte blocks (allocated by NSSTime_SetPRTime, NOT IN ANY ARENA) The patch above was expected to eliminate the leaks of arenapools, arenas, and the locks associated with arenapools, when those arenapools are allocated for the chain of DER certs returned by CERT_CertChainFromCert. The leak of 16-byte blocks allocate by NSSTime_SetPRTime, is a separate leak. Since those blocks are not in the arenapool that contains the returned DER certs, it cannot be fixed by freeing that arenapool. This leak is down inside some new NSS code (not in SSL). Because this patch reportedly did NOT eliminate all the arenapool leaks, I suspect one or both of the following two explanations: 1. The code in (and under) CERT_CertChainFromCert may be allocating more than one arenapool, and leaking all but the one arenapool whose address is part of the struct it returns. 2. Perhaps we're seeing arenas that are on the arena free list. I suspect this because in the second run, we don't see any of the 44 or 88 byte leaks, which are the arenapool and the arenapool's lock. When an arenapool is freed, the arenas associated with it go to a free list, but the arenapool struct itself, and the lock associated with it are both really freed. So, the fact that, in the patched test, we see arenas but no arenapools suggests that the arenas are on the free list. So, I suggest that a) Ian (or someone) try to plug the leak of blocks allocated in NSSTime_SetPRTime, and b) rerun the test, also using the patch that disables the arena free list. Then if arenas still appear to be leaked, we know there's a separate leak of arenas from the one that Ian found in SSL last week.
Nelson, By browsing the checkins, I saw that Ian has plugged some more leaks in the cert code, in lib/certhigh. Here are the results with a combination of your patch and his : (dbx) showleaks Checking for memory leaks... Actual leaks report (actual leaks: 0 total size: 0 bytes) Possible leaks report (possible leaks: 197 total size: 406608 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 348816 169 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < nsslowcert_DecodeDERCertificate < DecodeACert < FindCertByKey < nsslowcert_FindCertByKey < pk11_getCert < pk11_FindCertAttribute < pk11_FindTokenAttribute < pk11_FindAttribute < NSC_GetAttributeValue 57792 28 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < nsslowcert_DecodeDERCertificate < DecodeACert < FindCertByKey < nsslowcert_FindCertByKey < nsslowcert_FindCertByIssuerAndSN < pk11_searchCertsAndTrust < pk11_searchTokenList < NSC_FindObjectsInit < find_object_by_template (dbx)
Please ignore the last test output. I didn't run a client auth test, just a regular SSL test. The info I entered at the beginning of the defect was wrong - it is not test url51 but another test, aclS01d.
FYI, right now, that test is dumping core the following way on my 3.4 tree. No core dump on the 3.3 tree. I'm going to rebuild the whole 3.4 tree because there may be some inconsistency due to the patches I applied from Nelson and Ian. detected a multithreaded program t@6 (l@3) terminated by signal BUS (invalid address alignment) Current function is nssPKIObject_AddRef 89 PZ_Lock(object->lock); (dbx) where current thread: t@6 =>[1] nssPKIObject_AddRef(object = 0xdadadada), line 89 in "certdecode.c" [2] nssCertificate_AddRef(c = 0xdadadada), line 71 in "certificate.c" [3] CERT_DupCertificate(c = 0xe08e8), line 1169 in "certdb.c" [4] ssl3_SendCertificate(ss = 0xb8fd8), line 6347 in "ssl3con.c" [5] ssl3_HandleServerHelloDone(ss = 0xb8fd8), line 4702 in "ssl3con.c" [6] ssl3_HandleHandshakeMessage(ss = 0xb8fd8, b = 0xcffe0 "", length = 0), line 7186 in "ssl3con.c" [7] ssl3_HandleHandshake(ss = 0xb8fd8, origBuf = 0xaf994), line 7273 in "ssl3con.c" [8] ssl3_HandleRecord(ss = 0xb8fd8, cText = 0xfe9713bc, databuf = 0xaf994), line 7538 in "ssl3con.c" [9] ssl3_GatherCompleteHandshake(ss = 0xb8fd8, flags = 0), line 204 in "ssl3gthr.c" [10] ssl_GatherRecord1stHandshake(ss = 0xb8fd8), line 1300 in "sslcon.c" [11] ssl_Do1stHandshake(ss = 0xb8fd8), line 156 in "sslsecur.c" [12] ssl_SecureSend(ss = 0xb8fd8, buf = 0x9ebf0 "GET / HTTP/1.0\n\n", len = 16, flags = 0), line 1100 in "sslsecur.c" [13] ssl_Send(fd = 0x57c70, buf = 0x9ebf0, len = 16, flags = 0, timeout = 360000000U), line 1212 in "sslsock.c" [14] PR_Send(fd = 0x57c70, buf = 0x9ebf0, amount = 16, flags = 0, timeout = 360000000U), line 221 in "priometh.c" [15] SunRequest::send(this = 0xbbc10, sock = 0x57c70), line 308 in "request.cpp" [16] SunEngine::makeRequest(this = 0xfe971b64, request = CLASS, server = CLASS, ignoresenderror = 1), line 413 in "engine.cpp" [17] SunTest::run(this = 0xc4280), line 446 in "tests.cpp" [18] TestInstance::execute_test(this = 0xabec8), line 144 in "tests.cpp" [19] thread_entry(arg = (nil)), line 1676 in "tests.cpp" [20] _pt_root(arg = 0xcd5c0), line 214 in "ptthread.c"
For my own records, the actual command to run this test is httptest -s -x acl/aclS01d -g SSL -g COMMON -H 1 -h strange:2010 -P -e 3600 -L 30 -p 20
OK, now that I updated the tree and rebuilt it from scratch, it no longer dumps core. It turns out I had forgotten to update some headers that Ian changed. My 3.4 tree is currently the tip + Nelson's patch which I didn't check in as it did not resolve the problem fully in the SSL code. Here is the output of dbx checkleaks with this client auth test after a hundred runs or so : (dbx) showleaks Checking for memory leaks... Actual leaks report (actual leaks: 683 total size: 10928 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 5472 342 - calloc < PR_Calloc < nss_ZAlloc < NSSTime_SetPRTime < NSSTime_Now < nssBestCertificate_SetArgs < NSSTrustDomain_FindBestCertificateBySubject < NSSCertificate_BuildChain < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake < ssl_Do1stHandshake 5456 341 - calloc < PR_Calloc < nss_ZAlloc < NSSTime_SetPRTime < NSSTime_Now < nssBestCertificate_SetArgs < NSSCertificate_BuildChain < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake < ssl_Do1stHandshake < ssl_SecureSend Possible leaks report (possible leaks: 117 total size: 239440 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 229104 111 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < EncodeDBCertKey < ReadDBCertEntry < FindCertByKey < nsslowcert_FindCertByKey < nsslowcert_TraversePermCertsForSubject < pk11_searchCertsAndTrust < pk11_searchTokenList < NSC_FindObjectsInit < traverse_objects_by_template < nssToken_TraverseCertificatesBySubject < NSSTrustDomain_FindBestCertificateBySubject < NSSCertificate_BuildChain < CERT_CertChainFromCert 4128 2 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < nsslowcert_DecodeDERCertificate < DecodeACert < FindCertByKey < nsslowcert_FindCertByKey < pk11_getCert < pk11_FindCertAttribute < pk11_FindTokenAttribute < pk11_FindAttribute < NSC_GetAttributeValue 4128 2 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < nsspkcs5_AlgidToParam < seckey_decrypt_private_key < seckey_decode_encrypted_private_key < seckey_get_private_key < nsslowkey_FindKeyByPublicKey < pk11_GetPrivateKey < pk11_FindPrivateKeyAttribute < pk11_FindTokenAttribute < pk11_FindAttribute 2064 1 0x145258 PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < nsslowcert_DecodeDERCertificate < DecodeACert < FindCertByKey < nsslowcert_FindCertByKey < nsslowcert_TraversePermCertsForSubject < pk11_searchCertsAndTrust < pk11_searchTokenList < NSC_FindObjectsInit < traverse_objects_by_template 16 1 0x128038 calloc < PR_Calloc < nss_ZAlloc < NSSTime_SetPRTime < NSSTime_Now < nssBestCertificate_SetArgs < NSSCertificate_BuildChain < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake < ssl_Do1stHandshake < ssl_SecureSend
I checked in a fix for the leak from NSSTime_Now.
I checked in the patch for the leak in ssl3con.c. Assuming that the 16-byte block leak is fixed, there appears to be one potential leak remaining. It's an arena of size SEC_ASN1_DEFAULT_ARENA_SIZE, allocated in SEC_ASN1DecoderStart(). There are several different calls stacks that show this same leak. They all share the following part of the stack in common: PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < (different callers of SEC_ASN1DecodeItem) Looking at SEC_ASN1Decode, I don't see any path that fails to free the arenapool allocated in SEC_ASN1DecoderStart, which makes me wonder if these arenas are really on the free list instead. I'll bet we get different results with the arena free list disabled.
Ian & Nelson, When you make a patch related to this defect or others, please include the full list of files with pathnames, so that I can apply them quickly. Pasting the cvs commit output would be great. It will also make it easier for me to track the patches, and to backport the fix to 3.3 if it needs to be.
Here is the output with the current tree : (dbx) showleaks Checking for memory leaks... Actual leaks report (actual leaks: 0 total size: 0 bytes) Possible leaks report (possible leaks: 891 total size: 1839024 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 1791552 868 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < EncodeDBCertKey < ReadDBCertEntry < FindCertByKey < nsslowcert_FindCertByKey < nsslowcert_TraversePermCertsForSubject < pk11_searchCertsAndTrust < pk11_searchTokenList < NSC_FindObjectsInit < traverse_objects_by_template < nssToken_TraverseCertificatesBySubject < NSSTrustDomain_FindBestCertificateBySubject < NSSCertificate_BuildChain < CERT_CertChainFromCert 20640 10 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < nsspkcs5_AlgidToParam < seckey_decrypt_private_key < seckey_decode_encrypted_private_key < seckey_get_private_key < nsslowkey_FindKeyByPublicKey < pk11_GetPrivateKey < pk11_FindPrivateKeyAttribute < pk11_FindTokenAttribute < pk11_FindAttribute 14448 7 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < nsslowcert_DecodeDERCertificate < DecodeACert < FindCertByKey < nsslowcert_FindCertByKey < nsslowcert_TraversePermCertsForSubject < pk11_searchCertsAndTrust < pk11_searchTokenList < NSC_FindObjectsInit < traverse_objects_by_template 12384 6 - PR_Malloc < PL_ArenaAllocate < PORT_ArenaAlloc < PORT_ArenaZAlloc < SEC_ASN1DecoderStart < SEC_ASN1Decode < SEC_ASN1DecodeItem < nsslowcert_DecodeDERCertificate < DecodeACert < FindCertByKey < nsslowcert_FindCertByKey < pk11_getCert < pk11_FindCertAttribute < pk11_FindTokenAttribute < pk11_FindAttribute < NSC_GetAttributeValue (dbx) (and yes, those "possible leaks" are real, according to the top output)
Do these latest results have the arena free list disabled, perhaps with the patch in bug http://bugzilla.mozilla.org/show_bug.cgi?id=122907 ?? I suspect the free list was enabled for these results. Whenever the arena free list is enabled, we always see the stack of the function that FIRST allocated the block (before it went onto the free list from which it was subsequently taken by another function). The leaker is the function that LAST function that allocated it (i.e. from the free list), and we won't see that while the free list remains enabled.
Nelson, You are right, I didn't disable the free list. I applied the patch and set the environment variable. The result is : (dbx) showleaks Checking for memory leaks... Actual leaks report (actual leaks: 0 total size: 0 bytes) Possible leaks report (possible leaks: 924 total size: 1907136 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 1399392 678 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nssItem_Create < nssItem_Duplicate < add_subject_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake 309600 150 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nssList_Create < nssList_Clone < nssList_CreateIterator < nssTrust_GetCERTCertTrustForCert < fill_CERTCertificateFields < STAN_GetCERTCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake 66048 32 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nsslist_add_element < nssList_AddUnique < add_subject_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake 66048 32 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_certificate_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord 66048 32 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < add_certificate_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake < ssl_Do1stHandshake < ssl_SecureSend (dbx) As you can see there is still a leak, and some arena functions are still getting called.
Thanks! Now we're getting somewhere. There are 5 different stacks here, which all have in common this portion: PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < (5 different callers of nss_ZAlloc) It's possible this is a bug in nss_ZAlloc, but I'd guess there are simply 5 different leaks here.
There may be more than 5 leaks. From reading the documentation of dbx check -leaks, there are settings to display the number of frames and "match" the leaks by number of frames. Both are limited to 16. I have set them both to 16. The problem is that if there are leak is in frame #17 or greater, but the top of the stack is the same, then the leaks will be grouped under the same line of dbx check -leaks output ... We may have no choice but to try Purify or some other tool. Does anyone have a license of Purify for Solaris, and know if it works with NSS ? I remember the assembly code causing problems with it.
Or it could be a leak of the CERT (somehere deep down we keep a reference around). The single cert could be holding down pointers or references to all the data structures shown in the leak. bob
All but one of the blocks of memory that was allocated above and declared "leaked" by dbx was used within either the trust domain certificate cache or the crypto context cert store. The blocks were allocated for the keys used in the hashtables. All of the blocks were allocated within arenas (naturally, that has been revealed already). In both cases (the cache and temp store), the arenas will persist for the lifetime of NSS. They will be freed at shutdown. The stacktrace that does seem to show a leak is the one related to getting the trust of a temp cert. I'm looking into that. I've been running tstclnt under purify, and find that the internal crypto slot has seven additional references at shutdown time. I'm still trying to track those down, in order to get a clean run of client auth.
Ian, As I discussed via e-mail, I was using the "showleaks" command to see the blocks leaked between any two dates within the program. Every time you use showleaks, it resets the debugger's memory counter. I did not take the measurement from the start of the program, but rather after several requests had already been completed, so that the one-time overhead should have been eliminated. In NSS 3.3 I can run this way for as long as I want, with the client doing loops indefinitely, without memory ever growing. I can break in with CTRL C, do a stop at the same line I always do showleaks, continue, then showleaks, and it always shows 0 blocks and 0 bytes leaked. In NSS 3.3.4 that isn't the case. It continues to leak, even if I break in after a long run, and use showleaks to reset the counter. There are always some more blocks leaked, even if the number of blocks is sometimes less than the number of sessions/requests. It just seems that some of the leaks are not in a common codepath.
Here are the results with the current tree, arenas disabled, and a 1000 client auth runs : Checking for memory leaks... Actual leaks report (actual leaks: 1 total size: 32 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 32 1 0x1b1540 PR_Malloc < PR_NewCondVar < PR_Sleep < clockthread_main < _pt_root < _thread_start Possible leaks report (possible leaks: 431 total size: 889584 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 59856 29 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nsslist_add_element < nssList_AddUnique < add_subject_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake 57792 28 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nssItem_Create < nssItem_Duplicate < add_subject_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake 57792 28 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nssItem_Create < nssItem_Duplicate < add_subject_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake 57792 28 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_nickname_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake 57792 28 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_subjec eMessage < ssl3_HandleHandshake 57792 28 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_subject_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake 57792 28 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_issuer_and_serial_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake 47472 23 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_certificate_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord 30960 15 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < new_cache_entry < add_subject_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord 28896 14 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nssList_Create < add_subject_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake 28896 14 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < new_cache_entry < add_nickname_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord 28896 14 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < new_cache_entry < add_nickname_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake 28896 14 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < new_cache_entry < add_issuer_and_serial_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord 28896 14 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nssUTF8_Duplicate < add_nickname_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake 28896 14 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nssUTF8_Duplicate < add_nickname_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord 28896 14 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nssList_Create < add_subject_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord 28896 14 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < new_cache_entry < add_issuer_and_serial_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake 28896 14 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < new_cache_entry < add_subject_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake 28896 14 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nssList_CreateIterator < collect_subject_certs < nssTrustDomain_GetCertsForSubjectFromCache < NSSTrustDomain_FindBestCertificateBySubject < NSSCertificate_BuildChain < CERT_CertChainFromCert < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord 28896 14 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nssList_CreateIterator < collect_subject_certs < nssTrustDomain_GetCertsForNicknameFromCache < PK11_FindCertFromNickname < ownGetClientAuthData < ssl3_HandleCertificateRequest < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord < DoRecv 24768 12 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nssList_CreateIterator < nssTrust_GetCERTCertTrustForCert < fill_CERTCertificateFields < STAN_GetCERTCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord < DoRecv 24768 12 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nsslist_add_element < nssList_AddUnique < add_subject_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake 24768 12 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < add_certificate_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord < DoRecv < ssl_SecureRecv 24768 12 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < add_certificate_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake < ssl_Do1stHandshake < ssl_SecureSend 22704 11 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nssList_CreateIterator < nssTrust_GetCERTCertTrustForCert < fill_CERTCertificateFields < STAN_GetCERTCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake < ssl_Do1stHandshake 22704 11 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nsslist_add_element < nssList_AddUnique < add_subject_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord (dbx)
I see the problem now. While all of the memory in question is being allocated within the arena (there is more than just initialization cost involved with the cache -- it is allocating more blocks for keys as the program runs), the arena is growing to meet the needs of the cache. This memory isn't leaked in the sense that is has not been lost track of, but it is causing wanton expansion of the cache's arena. I'm going to implement a freelist for the cache, and try some other enhancements to bring this under control.
Ian, this leak is related to the leak in bug 120651. When tracing through the code, I found that we are allocating new Hash entries out of the arena, then discarding them when we remove elements out of the cache. We should just allocate the entries and free them when we remove them. This complicates shutdown (we need to walk the Hashtable and free each entry explicitly). This is also way Purify doesn't see the leak (the memory get's freed up when we blow the arena away). bob
Blocks: 120651
That would be a lot of allocating and freeing, no? I think a freelist should bring this under control. The cache entries are all the same size, and once a critical mass have been allocated the freelist should be able to handle any new entries.
If we are worried about this, we can allocate and place the entries on a free list, but they should not be part of the arena (doing this abuses the original purposes of arenas). The cost of allocating the entry is less than the cost of allocating the Cert to begin with.... Actually an idea: Allocate the entry out of the Certificate's arena! The entry is only created to store the certificate and removed when the certificate is going away! bob
Assigned the bug to Ian.
Assignee: wtc → ian.mcgreer
Bob, That is only true for one of the four hashes in the cache. As an example, imagine you have three certs, all with the same email address, and two with the same subject. If they are all cached at the same time, the cache would look like so: issuer-and-serial: [ der1, der2, der3 ] subject: [ subject1 -> { der1, der2 }, subject2 -> { der3 } ] nickname: [ nickname1 -> subjectList1, nickname2 -> subjectList2 ] email: [ email1 -> { subjectList1, subjectList2 } ] It is true that the issuer-and-serial entries could be safely allocated within the corresponding cert's arena, but that is not true of the others. If the cert der1 is the first to be added to the cache, and all of the entries are allocated within its arena, then the cache will become corrupted when that cert is removed but der2 and der3 remain. But you are right about correcting the way arenas are used here. I believe the correct behavior should be: 1) issuer-and-serial entries are allocated within the cert's arena. 2) subject and nickname entries create their own arena, the same arena used by the subjectList. This arena is destroyed when the subjectList has no more entries (that is, there are no more entries for that subject in the cache). 3) email entries create their own arena, and also use that arena for the list of subjectLists. The arena is freed when there are no more entries for the email address. Using this scheme, the following sequence of events is one possibility for the above example: 1> der1 is removed from the cache. Thus subjectList1 only has one cert. Nothing else changes. 2> der2 is removed from the cache. This means subjectList1 should be destroyed, along with the cache entries for subject1 and nickname1 (all can be freed by destroying one arena). email1's list now only has one entry. 3> der3 is removed from the cache. subjectList2 is destroyed, along with nickname2 (again, all in one arena). The email1 entry is also destroyed, and the cache is empty.
No longer blocks: 120651
block on bug 120651 got lost somehow
Blocks: 120651
As described, this separates the cache entries into three arenas.
Attached patch do the same in the temp store (obsolete) — Splinter Review
separate arena usage in the temp store entries as well
Attachment #67959 - Attachment is obsolete: true
Comments on the patch: all nonfunctional comments: if we were starting from scratch, it would be cleaner, to have the new_cache_entry take a NULL pointer when it wants to own the arena. Unfortunately this would require restructuring some of the code that allocates additional information from that arena. We need a comment that we assume all the certs have been removed from the cache before we call nssTrustDomain_DestroyCache() [or we will hash table entries]. The reuse of the single arena variable in add_cert_to_cache() makes it difficult to verify that there is indeed no leaks on errors (which there isn't). bob
I'm reluctant to see any more free lists introduced into NSS. Now that we have the "zone allocator", it should act as the one and only free list for all NSS allocations. I'll even go so far as to suggest that we eliminate the use of the PL arena free list when the zone allocator is in use.
Julien- Some of the stacks from your last showleaks test are different than the ones addressed by the two patches above. I found the place that causes this, again it's the list arena taking on too much responsibility. Checkin log: /cvsroot/mozilla/security/nss/lib/base/list.c,v <-- list.c new revision: 1.14; previous revision: 1.13 Nelson- Agreed. I retracted my proposal after Bob nominated something better.
The zone allocator cannot replace the freelists in NSS. Many products are sloppy about matching PR_Malloc() with PR_Free() and malloc() with free(). These products must be modified before they can turn on the zone allocator. Although the zone allocator attempts to detect and free memory blocks allocated by malloc() correctly, it cannot protect against an application passing a memory block allocated by PR_Malloc() to free(). Our Directory Server dumps core on Solaris with the zone allocator enabled. We can avoid this problem by moving the zone allocator back to NSS. The downside is that NSPR functions such as PR_NewLock() and PR_NewCondVar() won't be able to benefit from it. Maybe it makes more sense to have NSPR use freelists for PRLock's and PRCondVar's.
Attachment #67966 - Attachment is obsolete: true
Attachment #67968 - Attachment is obsolete: true
Changed summary to show this bug is about SSL clients.
Summary: SSL client auth application leaks memory with NSS 3.4 → SSL client doing client auth leaks memory with NSS 3.4
After applying patch67989, here are my results with this test in the client, full handshakes - arena free lists disabled , client auth , 1000 runs: (dbx) sl Checking for memory leaks... Actual leaks report (actual leaks: 0 total size: 0 bytes) Possible leaks report (possible leaks: 141 total size: 291024 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 49536 24 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_subject_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake 49536 24 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_issuer_and_serial_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake 47472 23 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_nickname_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake 47472 23 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_certificate_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord 24768 12 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nsslist_add_element < nssList_AddUnique < add_subject_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake 24768 12 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < remove_subject_entry < nssCertificateStore_Remove < CERT_DestroyCertificate < ssl_DestroySID < ssl_FreeLockedSID < ssl_FreeSID < ssl_DestroySecurityInfo < ssl_FreeSocket 24768 12 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < remove_subject_entry < nssCertificateStore_Remove < CERT_DestroyCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake 22704 11 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nsslist_add_element < nssList_AddUnique < add_subject_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord (dbx) There are now 8 distinct leaks reported, compared to 27 in the last report.
The first four all go through nss_arena_hash_alloc_entry. This is the callback function supplied to NSPR's hashtable implementation for allocating hashtable entries. It currently allocates the entries in an arena that is common to the entire hashtable (this is the nssHash implementation). Should this be changed also, to not use arena's? I didn't write this code, so I don't know the reasons behind it. I'm still thinking about the second four.
Julien, is it possible for you to call the new nss_DumpCertificateCacheInfo function just before your breakpoint? That would reveal if there are any leaked cert references, which would confuse the results.
Do you mean that I should call it at the end of my run, or inside the loop between two SSL sessions ?
Once at the end should be enough. After the socket has been closed and before NSS_Shutdown(). I actually doubt that any certs are being leaked. I've tested client auth myself to make sure. And if that were the case, I would expect the situation to be much worse. Actually, I now remember that when I purified tstclnt I had to add a call to SSL_ClearSessionCache() to make sure any certs associated with the session id cache were freed. I notice some of the traces showing ssl_FreeSID, perhaps there are references to session id cache certs in memory, and that is what dbx is finding?
My client never calls NSS_Shutdown :) I added the call right after the last requast is made. This is the output (strange)/export/home/jpierre/60/SunOS_5.8_depend/ns/server/internal/B1/SunOS5.6_DBG.OBJ/tests/httptests{128} ./cltest info: Running 1 test . info: Concurrency : 1 . info: Repeat rate : 1000 . info: Total client threads : 1 info: Performance mode enabled. info: Running acl/aclS01a # get clientauth protected file with correct cert thread 1 , repeats = 1000 Certificates in the cache: [ 2] "E=alpha@testcentral.com, CN=Franzl Alpha, UID=alpha, OU=People, O=TestCentral, C=US" Certificates in the temporary store: pass: acl/aclS01a # get clientauth protected file with correct cert thread 1 passed - 1000 out of 1000 passes info: 1000 operations and 0 failure in 168.41 seconds - 5.94 ops/second info: 100 % of all test operations were successful --------------------------------- Tests passed : 1 / 1 Tests failed : 0 / 1 Ops passed : 1000 / 1000 Ops failed : 0 / 1000 --------------------------------- 89.31u 1.37s 2:49.46 53.5% (strange)/export/home/jpierre/60/SunOS_5.8_depend/ns/server/internal/B1/SunOS5.6_DBG.OBJ/tests/httptests{129}
FYI, this is what we now get on this test after 1000 runs : (dbx) sl Checking for memory leaks... Actual leaks report (actual leaks: 0 total size: 0 bytes) Possible leaks report (possible leaks: 68 total size: 140352 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 24768 12 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < remove_subject_entry < nssCertificateStore_Remove < CERT_DestroyCertificate < ssl_DestroySID < ssl_FreeLockedSID < ssl_FreeSID < ssl_DestroySecurityInfo < ssl_FreeSocket 24768 12 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_subject_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake 22704 11 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_certificate_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord 22704 11 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nsslist_add_element < nssList_AddUnique < add_subject_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake 22704 11 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_issuer_and_serial_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake 22704 11 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_nickname_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake (dbx)
Okay, all but 1 of the leaks reported last are all related to the way entries for the PLHashTable are allocated. This is done if the file lib/base/hashops.c. As I mentioned, the allocations are done within a single arena, causing the same growth we saw before. If we don't want to see this behavior, we will either have to allocate the entries outside of an arena, or maintain freelists. The answer before was the former, should we do that again? The other one really confuses me. Julien- Would you have expected there to be a cert in the cache at shutdown? Or is that likely to be a leaked reference?
Ian, That cert that is in the cache is the cert that the client presented to the server during each of the 1000 requests. I suppose it should be in the cache once, so that makes sense. Unless the deletion of the socket should cause it to be removed from the cache, and since that deletion happens at every iteration in the loop, then it would be a bug for the cert to still be in the cache at the end. Looking at the code in my client, the only thing that deals with certs is the client auth callback, which does a PK11_FindCertByNickname in the client auth callback. The code looks like this : static SECStatus ownGetClientAuthData(void *arg, PRFileDesc *socket, CERTDistNames *caNames, CERTCertificate **pRetCert,/*return */ SECKEYPrivateKey **pRetKey) { CERTCertificate * cert; SECKEYPrivateKey * privKey; void * proto_win = NULL; SECStatus rv = SECFailure; char * localNickName = (char *)arg; proto_win = SSL_RevealPinArg(socket); if (localNickName) { cert = PK11_FindCertFromNickname(localNickName, proto_win); if (cert) { privKey = PK11_FindKeyByAnyCert(cert, proto_win); if (privKey) { rv = SECSuccess; } else { CERT_DestroyCertificate(cert); }; } if (rv == SECSuccess) { *pRetCert = cert; *pRetKey = privKey; }; //free(localNickName); return rv; }; char* chosenNickName = certName ? (char *)strdup(certName) : NULL; if (chosenNickName) { cert = PK11_FindCertFromNickname(chosenNickName, proto_win); if (cert) { privKey = PK11_FindKeyByAnyCert(cert, proto_win); if (privKey) { rv = SECSuccess; } else { CERT_DestroyCertificate(cert); }; } } else { /* no nickname given, automatically find the right cert */ CERTCertNicknames * names; int i; names = CERT_GetCertNicknames( CERT_GetDefaultCertDB(), SEC_CERT_NICKNAMES_USER, proto_win); if (names != NULL) { for( i=0; i < names->numnicknames; i++ ) { cert = PK11_FindCertFromNickname(names->nicknames[i], proto_win); if (!cert) { continue; }; /* Only check unexpired certs */ if (CERT_CheckCertValidTimes(cert, PR_Now(), PR_FALSE) != secCertTimeValid) { CERT_DestroyCertificate(cert); continue; }; rv = NSS_CmpCertChainWCANames(cert, caNames); if (rv == SECSuccess) { privKey = PK11_FindKeyByAnyCert(cert, proto_win); if (privKey) { // got the key break; }; // cert database password was probably wrong rv = SECFailure; break; }; }; /* for loop */ CERT_FreeNicknames(names); }; // names }; // no nickname chosen if (rv == SECSuccess) { *pRetCert = cert; *pRetKey = privKey; }; if (chosenNickName) free(chosenNickName); return rv; };
Technically, this bug was originally opened where the web server would request the client cert after the initial SSL handshake due to an ACL, causing a second SSL handshake to occur. When taking my measurements in this defect, I have sometimes had my server configured for client auth globally. In that case, there is no second handshake occurring in that case, only one handshake, where the client cert is already presented by the client, and available when the server gets to the ACL to check it. In particular, the last measure I took at 17:08 today was with a single SSL handshake (SSL client auth required on server). What's interesting to note is that the stacks for that case are very similar to the no-clientauth case in bug 122907. The number of bytes is very close, and all the stacks match. Therefore, I am going to assume that the leak with client auth and a single SSL handshake and the leak with SSL without client auth are the same leaks. This bug will be used strictly to track the case of client auth with double handshake and I'm changing the title to clear things up.
Summary: SSL client doing client auth leaks memory with NSS 3.4 → SSL client doing client auth in second handshake leaks memory with NSS 3.4
Depends on: 122907
Sorry to have to correct the test procedure so many times, but : The testcase for this is to setup the web server for GAT with SSL tests and leave the SSLPARAMS as default - with client auth disabled on the SSL socket - and have client auth required by ACL, to force a second SSL handshake to occur, then running test acl/aclS01a .
This is the dbx check leaks output after 1000 double handshakes (SSL in first, client auth in 2nd) : (dbx) sl Checking for memory leaks... Actual leaks report (actual leaks: 0 total size: 0 bytes) Possible leaks report (possible leaks: 141 total size: 291024 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 49536 24 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_subject_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake 49536 24 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_issuer_and_serial_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake 47472 23 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_certificate_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord 47472 23 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < add_nickname_entry < add_cert_to_cache < nssTrustDomain_AddCertsToCache < NSSTrustDomain_FindCertificateByEncodedCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake 24768 12 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nsslist_add_element < nssList_AddUnique < add_subject_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake 24768 12 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < remove_subject_entry < nssCertificateStore_Remove < CERT_DestroyCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake 24768 12 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < remove_subject_entry < nssCertificateStore_Remove < CERT_DestroyCertificate < ssl_DestroySID < ssl_FreeLockedSID < ssl_FreeSID < ssl_DestroySecurityInfo < ssl_FreeSocket 22704 11 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nsslist_add_element < nssList_AddUnique < add_subject_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord
Using the current tip of NSS 3.4 as of 4pm, and deleting the root cert module, I get the following leak after 1000 runs of the client doing client auth in a double handshake : (dbx) sl Checking for memory leaks... Actual leaks report (actual leaks: 0 total size: 0 bytes) Possible leaks report (possible leaks: 24 total size: 47496 bytes) Total Num of Leaked Allocation call stack Size Blocks Block Address ====== ====== ========== ======================================= 24768 12 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nsslist_add_element < nssList_AddUnique < add_subject_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl_GatherRecord1stHandshake 22704 11 - PR_Malloc < PL_ArenaAllocate < nss_zalloc_arena_locked < nss_ZAlloc < nsslist_add_element < nssList_AddUnique < add_subject_entry < nssCertificateStore_Add < NSSCryptoContext_ImportCertificate < __CERT_NewTempCertificate < ssl3_HandleCertificate < ssl3_HandleHandshakeMessage < ssl3_HandleHandshake < ssl3_HandleRecord < ssl3_GatherCompleteHandshake < ssl3_GatherAppDataRecord 24 1 0xe9428 calloc < PR_Calloc < nss_ZAlloc < nss_arena_hash_alloc_entry < PL_HashTableRawAdd < PL_HashTableAdd < nssHash_Add < remove_subject_entry < nssCertificateStore_Remove < CERT_DestroyCertificate < ssl_DestroySID < ssl_FreeLockedSID < ssl_FreeSID < ssl_DestroySecurityInfo < ssl_FreeSocket < ssl_DefClose If you compare this with what's posted in 122907 for no client auth, you'll see that the last 2 leaks are exactly the same, in terms of number of blocks and size. The double handshake causes the additional first leak above to show up.
I reran this double handshake client auth test, this time with the built-in module installed. The leak was still exactly the same. Therefore I conclude the built-ins are not the cause of this leak.
My stress test over the week-end shows that this leak is gone. Marking fixed.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Marking verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: