Closed Bug 308275 Opened 19 years ago Closed 17 years ago

Leaks related to nssCKFWInstance_CreateMutex

Categories

(NSS :: Libraries, defect, P2)

3.11
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.11.8

People

(Reporter: nelson, Assigned: rrelyea)

References

Details

(Keywords: memory-leak)

Attachments

(3 files, 1 obsolete file)

I ran selfserv from NSS trunk under purify tonight , and found two nasty leaks
which have a number of things in common in their stacks.  Here's the info and
their stack traces.  I do not know if this is a regression since 3.10 or not.
Bob, please take a look.

[W] MLK: leak of 12408 bytes from 94 blocks allocated in PR_Calloc [NSPR4.DLL]
    Distribution of leaked blocks
     12408 bytes from 94 blocks of 132 bytes (first block: 0x0343e090)
    Allocation location
     [C:\WINDOWS\system32\MSVCRT.dll ip=0x57C2F221]
    PR_Calloc      [pr/src/malloc/prmem.c:484]
    PR_NewLock     [pr/src/threads/combined/prulock.c:194]
    nssCKFWMutex_Create [lib/ckfw/mutex.c:146]
    nssCKFWInstance_CreateMutex [lib/ckfw/instance.c:514]
    nssCKFWObject_Create [lib/ckfw/object.c:205]
    nssCKFWFindObjects_Next [lib/ckfw/find.c:375]
    NSSCKFWC_FindObjects [lib/ckfw/wrap.c:2577]
    builtinsC_FindObjects [dist/public/nss\nssck.api:742]
    nssToken_TraverseCertificates [lib/dev/devtoken.c:1658]
    NSSTrustDomain_TraverseCertificates [lib/pki/trustdomain.c:1072]
    PK11_TraverseSlotCerts [lib/pk11wrap/pk11cert.c:490]
    CERT_GetSSLCACerts [lib/certhigh/certhigh.c:631]
    SSL_ConfigSecureServer [lib/ssl/sslsecur.c:720]
    server_main    [cmd/selfserv/selfserv.c:1428]
    main           [cmd/selfserv/selfserv.c:1972]
    mainCRTStartup [.\build\intel\dll_obj\crtexe.obj]


[W] MLK: leak of 12144 bytes from 92 blocks allocated in PR_Calloc [NSPR4.DLL]
    Distribution of leaked blocks
	 12144 bytes from 92 blocks of 132 bytes (first block: 0x01fd26d8)
    Allocation location
	 [C:\WINDOWS\system32\MSVCRT.dll ip=0x57C2F221]
    PR_Calloc      [pr/src/malloc/prmem.c:484]
    PR_NewLock     [pr/src/threads/combined/prulock.c:194]
    nssCKFWMutex_Create [lib/ckfw/mutex.c:146]
    nssCKFWInstance_CreateMutex [lib/ckfw/instance.c:514]
    nssCKFWObject_Create [lib/ckfw/object.c:205]
    nssCKFWFindObjects_Next [lib/ckfw/find.c:375]
    NSSCKFWC_FindObjects [lib/ckfw/wrap.c:2577]
    builtinsC_FindObjects [dist/public/nss\nssck.api:742]
    find_objects   [lib/dev/devtoken.c:419]
    find_objects_by_template [lib/dev/devtoken.c:539]
    nssToken_FindTrustForCertificate [lib/dev/devtoken.c:1288]
    nssTrustDomain_FindTrustForCertificate [lib/pki/trustdomain.c:1217]
    nssTrust_GetCERTCertTrustForCert [lib/pki/pki3hack.c:594]
    fill_CERTCertificateFields [lib/pki/pki3hack.c:760]
    stan_GetCERTCertificate [lib/pki/pki3hack.c:813]
    STAN_GetCERTCertificate [lib/pki/pki3hack.c:837]
    convert_cert   [lib/pk11wrap/pk11cert.c:92]
    nssPKIObjectCollection_Traverse [lib/pki/pkibase.c:897]
    NSSTrustDomain_TraverseCertificates [lib/pki/trustdomain.c:1080]
    PK11_TraverseSlotCerts [lib/pk11wrap/pk11cert.c:490]
    CERT_GetSSLCACerts [lib/certhigh/certhigh.c:631]
    SSL_ConfigSecureServer [lib/ssl/sslsecur.c:720]
    server_main    [cmd/selfserv/selfserv.c:1428]
    main           [cmd/selfserv/selfserv.c:1972]
    mainCRTStartup [.\build\intel\dll_obj\crtexe.obj]
BTW, this leak behavior is the same in both bypass and non-bypass modes. 
It is apparently not a function of the new bypass code.
Priority: -- → P2
Target Milestone: --- → 3.11
QA Contact: jason.m.reid → libraries
Target Milestone: 3.11 → 3.11.7
After adding root certs module to test suite this leak occurs also there (on Solaris platform).

DBX log:
Memory Leak (mel):
Found 109 leaked blocks with total size 9592 bytes
At time of each allocation, the call stack was:
	[1] calloc() at 0xb4d30860 
	[2] PR_Calloc() at line 475 in "prmem.c"
	[3] PR_NewLock() at line 174 in "ptsynch.c"
	[4] 0xa991fac9
	[5] 0xa991e7d2
	[6] 0xa9920034
	[7] 0xa991d93d
	[8] 0xa9915876
	[9] 0xa9907cb6
	[10] find_objects() at line 419 in "devtoken.c"
	[11] find_objects_by_template() at line 539 in "devtoken.c"
	[12] nssToken_FindTrustForCertificate() at line 1288 in "devtoken.c"
	[13] nssTrustDomain_FindTrustForCertificate() at line 1222 in "trustdomain.c"
	[14] nssTrust_GetCERTCertTrustForCert() at line 607 in "pki3hack.c"
	[15] fill_CERTCertificateFields() at line 793 in "pki3hack.c"
	[16] stan_GetCERTCertificate() at line 850 in "pki3hack.c"
OS: Windows XP → All
Hardware: PC → All
There is also another leak (but probably the same bug). This leak is already reported in bug 169313. Please check for duplicates.

DBX log:
Memory Leak (mel):
Found leaked block of size 88 bytes at address 0x8170b38
At time of allocation, the call stack was:
        [1] calloc() at 0xb4d30860 
        [2] PR_Calloc() at line 475 in "prmem.c"
        [3] PR_NewLock() at line 174 in "ptsynch.c"
        [4] 0xa991fac9
        [5] 0xa991e7d2
        [6] 0xa9920034
        [7] 0xa991d93d
        [8] 0xa9915876
        [9] 0xa9907cb6
        [10] pk11_FindObjectByTemplate() at line 1377 in "pk11obj.c"
        [11] pk11_isRootSlot() at line 1462 in "pk11slot.c"
        [12] PK11_InitSlot() at line 1529 in "pk11slot.c"
        [13] SECMOD_LoadPKCS11Module() at line 377 in "pk11load.c"
        [14] SECMOD_LoadModule() at line 323 in "pk11pars.c"
        [15] SECMOD_LoadModule() at line 338 in "pk11pars.c"
        [16] nss_Init() at line 486 in "nssinit.c"


Purify stack from Windows (from bug 169313):

               Distribution of leaked blocks
                Allocation location
                     [MSVCRT.DLL ip=0x58003925]
                    PR_Calloc      [prmem.c:484]
                    PR_NewLock     [prulock.c:194]
                    secmodCreateMutext [pk11load.c:61]
                    nssCKFWMutex_Create [mutex.c:200]
                    nssCKFWInstance_CreateMutex [instance.c:494]
                    nssCKFWObject_Create [object.c:202]
                    nssCKFWFindObjects_Next [find.c:372]
                    NSSCKFWC_FindObjects [wrap.c:2533]
                    builtinsC_FindObjects [nssck.api:739]
                    pk11_FindObjectByTemplate [pk11cert.c:183]
                    pk11_isRootSlot [pk11slot.c:1866]
                    PK11_InitSlot  [pk11slot.c:1933]
                    SECMOD_LoadPKCS11Module [pk11load.c:275]
                    SECMOD_LoadModule [pk11pars.c:306]
                    SECMOD_LoadModule [pk11pars.c:319]
                    nss_Init       [nssinit.c:456]
                    NSS_Initialize [nssinit.c:525]
                    main           [crlutil.c:430]
                    mainCRTStartup [crtexe.obj]
Leaks update - trunk:

Checking in ignored;
/cvsroot/mozilla/security/nss/tests/memleak/ignored,v  <--  ignored
new revision: 1.6; previous revision: 1.5
done
Slavo,

The stack in comment 2 is way too shallow.  
It may or may not be a duplicate of the second stack in comment 0.
We cannot tell because the stack is truncated.  Need more stack depth.
Also, from the stacks in comment 2 and comment 3, it appears that the 
nssckbi.so library was built without symbols.  

Perhaps an optimized lib was used instead of a debug build?  
Could the process have loaded an nssckbi from a system directory rather
than from the directory of test binaries?   That would explain it.  
(it would also invalidate these results.)  Or do we have a makefile problem 
that causes nssckbi to be optimized even in debug builds of NSS?  
Attached patch untested patch (obsolete) — Splinter Review
The assertion that I put into new function serverCAListShutdown is bogus,
and I plan to remove that line before checkin (assuming this passes my
tests and gets a positive review when I ask).
Target Milestone: 3.11.7 → 3.11.8
Nelson,

I believe your patch goes to another bug? (I think you pulled a bob;).


The new leaks slavo attached are clearly in the builtins C_FindObjects function somewhere. It's very likely they are the same ones listed in this bug.
Status: NEW → ASSIGNED
Bob, My patch contained two parts, a fix for bug 366553 and a fix (?) for
this bug.  Both bugs relate to CERT_GetSSLCACerts.

The stacks in comment 0 above both have CERT_GetSSLCACerts in common.
CERT_GetSSLCACerts calls PK11_TraverseSlotCerts with the callback 
function CollectDistNames.  This is the ONLY use of PK11_TraverseSlotCerts 
from within NSS shared libs.  (The only other user is signtool.)

I am not yet entirely certain, but it appears to me that the callback 
function passed to PK11_TraverseSlotCerts is supposed to destroy the cert 
reference that it is given by its caller, but CollectDistNames does not 
destroy that reference.  (Can you confirm or refute that hypothesis?)

It appears to me that, during the execution of PK11_TraverseSlotCerts and 
its many calls back to CollectDistNames, every cert reference passed 
to CollectDistNames is leaked.  If that analysis is correct, then the cause 
of the leaks in comment 0 is not in ckfw at all, but much higher up in the 
stack, in functions called by CERT_GetSSLCACerts.  

I am not certain about this analysis, however, because it seems to me that 
if we had cert reference leaks in CERT_GetSSLCACerts, then we would see 
strict shutdown failures in selfserv, and perhaps in signtool.  OTOH, maybe 
selfserv never shuts down cleanly, but is always killed, in which case we'd 
never see a strict shutdown failure.

My patch is intended to fix the (unproven) leak in CollectDistNames, as well 
as the (proven) leak (in libSSL) of the memory allocated and returned by 
CERT_GetSSLCACerts (which is bug 366553).  

The stack in comment 2 is probably a duplicate of the second stack
in comment 0, but we cannot be sure because it is too shallow, and 
because of the absence of symbols for libckfw on Solaris.

The leak stacks in comment 3 are not the same stacks as any other stacks
in this bug, and I suspect they belong in a separate bug.  They are leaks
that occur during nss_init.  But you could be right that these leaks 
really are in ckfw, and that the cause of the stacks in comment 3 is also
the cause of the stacks in comment 0.  

If & when I become sure of my analysis above, then I will take this bug, 
and will change the subject of this bug to show that it is about leaks 
that occur during CERT_GetSSLCACerts, consistent with comment 0, and will 
ask Slavo to file a separate bug about the stack in comment 3.

Of course, if you fix this leak in the meantime ... :-)
Summary: Leaks related to builtinsC_FindObjects → Leaks related to CERT_GetSSLCACerts
Comment on attachment 268339 [details] [diff] [review]
untested patch

No, is appears that CollectDistNames is NOT leaking cert references. 
So, bye bye to this patch.
Attachment #268339 - Attachment is obsolete: true
    OK, Now I see what you think the patch is releavant....

    So the certlist references will leak things at the NSS layer, not the PKCS #11
    layer.

    What FindObjects returns is handles to objects. There is not a "I'm through
    with this handle" call for PKCS #11. Handles persist for 1) as long as the
    process runs, or 2) until the referenced object no longer exists. For builtins
    2 is never relavant, so the only way these leaks exist outside ckfw itself is
    C_Finalize is not being called (which means NSS_shutdown is not called or it
    fails).

    More likely, the locks are getting freed up on C_Finalize, or worse, the Locks
    are more ephemeral locks which are leaked to every call to C_FindObjects.

    bob

BTW I think the reason CERT_GetSSLCACerts is always on the stack is because it's the first caller of C_FindObject in the builtins. This artues that the locks in question are of the startup variety.
OK, Bob, I'm glad I didn't take it. :)
Summary: Leaks related to CERT_GetSSLCACerts → Leaks related to nssCKFWInstance_CreateMutex
Nelson,

Re: comment 6, unfortunately, the dbx check leaks tool has a hard limit of 16 frames per stack. We need to file an RFE against the Studio product. I think I may have done that in the past but I'm not sure.
Looks like I never filed that RFE. I just tried the new Studio 12 dbx, and it still has the 16 frames limitation. Time to file it for studio 13 ...
RE: Missing symbols...

I now know why the symbols are missing. I saw the same thing in the shared db code. What is happening is the code is properly closing the shared library, so by the time the leak program goes and displays the results the shared library is gone. This tells us we are definately shutting down the PKCS #11 module correctly.

For diagnostic purposes we could get the symbols back by commenting out the PR_UnloadLibrary call in pk11wrap. (it will also add some leaks related to the shared library, But it will give the symbols of where these leaks are comming from).
In reply to comment 18,  
Bob, your explanation of the lost symbols is that unloading the library 
causes its symbols to go away.  That seems logical and reasonable.
But, why do we continue to see the symbols on Windows?
We unload the library on that platform, too.
Stacks with full symbolic names (see bug 384639):

Memory Leak (mel):
Found 111 leaked blocks with total size 9768 bytes
At time of each allocation, the call stack was:
    [1] calloc() at 0xb4d308a0
    [2] PR_Calloc() at line 475 in "prmem.c"
    [3] PR_NewLock() at line 174 in "ptsynch.c"
    [4] nssCKFWMutex_Create() at line 146 in "mutex.c"
    [5] nssCKFWInstance_CreateMutex() at line 514 in "instance.c"
    [6] nssCKFWObject_Create() at line 205 in "object.c"
    [7] nssCKFWFindObjects_Next() at line 385 in "find.c"
    [8] NSSCKFWC_FindObjects() at line 2622 in "wrap.c"
    [9] builtinsC_FindObjects() at line 742 in "nssck.api"
    [10] find_objects() at line 419 in "devtoken.c"
    [11] find_objects_by_template() at line 539 in "devtoken.c"
    [12] nssToken_FindTrustForCertificate() at line 1288 in "devtoken.c"
    [13] nssTrustDomain_FindTrustForCertificate() at line 1222 in "trustdomain.c"
    [14] nssTrust_GetCERTCertTrustForCert() at line 607 in "pki3hack.c"
    [15] fill_CERTCertificateFields() at line 793 in "pki3hack.c"
    [16] stan_GetCERTCertificate() at line 850 in "pki3hack.c"

Memory Leak (mel):
Found 110 leaked blocks with total size 9680 bytes
At time of each allocation, the call stack was:
    [1] calloc() at 0xb4d308a0
    [2] PR_Calloc() at line 475 in "prmem.c"
    [3] PR_NewLock() at line 174 in "ptsynch.c"
    [4] nssCKFWMutex_Create() at line 146 in "mutex.c"
    [5] nssCKFWInstance_CreateMutex() at line 514 in "instance.c"
    [6] nssCKFWObject_Create() at line 205 in "object.c"
    [7] nssCKFWFindObjects_Next() at line 385 in "find.c"
    [8] NSSCKFWC_FindObjects() at line 2622 in "wrap.c"
    [9] builtinsC_FindObjects() at line 742 in "nssck.api"
    [10] nssToken_TraverseCertificates() at line 1658 in "devtoken.c"
    [11] NSSTrustDomain_TraverseCertificates() at line 1077 in "trustdomain.c"
    [12] PK11_TraverseSlotCerts() at line 490 in "pk11cert.c"
    [13] CERT_GetSSLCACerts() at line 632 in "certhigh.c"
    [14] SSL_ConfigSecureServer() at line 744 in "sslsecur.c"
    [15] server_main() at line 1437 in "selfserv.c"
    [16] main() at line 2028 in "selfserv.c"

Memory Leak (mel):
Found leaked block of size 88 bytes at address 0x8175998
At time of allocation, the call stack was:
    [1] calloc() at 0xb4d308a0
    [2] PR_Calloc() at line 475 in "prmem.c"
    [3] PR_NewLock() at line 174 in "ptsynch.c"
    [4] nssCKFWMutex_Create() at line 146 in "mutex.c"
    [5] nssCKFWInstance_CreateMutex() at line 514 in "instance.c"
    [6] nssCKFWObject_Create() at line 205 in "object.c"
    [7] nssCKFWFindObjects_Next() at line 385 in "find.c"
    [8] NSSCKFWC_FindObjects() at line 2622 in "wrap.c"
    [9] builtinsC_FindObjects() at line 742 in "nssck.api"
    [10] pk11_FindObjectByTemplate() at line 1377 in "pk11obj.c"
    [11] pk11_isRootSlot() at line 1462 in "pk11slot.c"
    [12] PK11_InitSlot() at line 1529 in "pk11slot.c"
    [13] SECMOD_LoadPKCS11Module() at line 376 in "pk11load.c"
    [14] SECMOD_LoadModule() at line 323 in "pk11pars.c"
    [15] SECMOD_LoadModule() at line 338 in "pk11pars.c"
    [16] nss_Init() at line 486 in "nssinit.c"
The implication of comment 20 is that the PKCS#11 module that uses 
NSSCKFWC_FindObjects() does not clean up its locks when it is unloaded.
That's nssckbi, right?  

Bob, can you investigate that?
Keywords: mlk
This patch walks the hash list of objects and calls finalize on those objects when we are shutting down the token.

This logic is similiar to our existing logic to walk down the sessions and close them. Note the flag added to the finalize object. This is because we can't access the hash while we are walking it. This is safe because we blow way the full hash table once we've completed the walk.
Attachment #283403 - Flags: review?(nelson)
I've tested the memory leak runs with these stacks removed.
Attachment #283404 - Flags: review?(nelson)
Comment on attachment 283404 [details] [diff] [review]
Remove the ignore values from the stack.

The common element to all the stacks being removed is this sequence of calls:
nssCKFWObject_Create/nssCKFWInstance_CreateMutex/nssCKFWMutex_Create/PR_NewLock

unfortunately, in most of the stacks being removed here, that vital information
was replaced with */*/*/*  

Slavo, we need to ensure that dynamically loaded shared libs don't get
unloaded in memory leak runs, not even in optimized builds.
(This may be a change from what I've said before, I'm not sure.)
Attachment #283404 - Flags: review?(nelson) → review+
Comment on attachment 283403 [details] [diff] [review]
Call finalize on persistant objects when shutting down the token.

I have some concerns about non-atomic operations and races, but those are pre-existing conditions, and this patch doesn't make it any worse.  
So, r=nelson
Attachment #283403 - Flags: review?(nelson) → review+
(In reply to comment #24)
> Slavo, we need to ensure that dynamically loaded shared libs don't get
> unloaded in memory leak runs, not even in optimized builds.
> (This may be a change from what I've said before, I'm not sure.)
> 
Yes, it's a change. I reopened bug 384639 where this was solved and attached new patch there.
this as been checked in.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: