Closed Bug 370536 Opened 17 years ago Closed 16 years ago

Memory leaks in pointer tracker code in DEBUG builds only

Categories

(NSS :: Libraries, defect, P2)

3.11.4
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: slavomir.katuscak+mozilla, Assigned: nelson)

References

Details

(Keywords: memory-leak)

Attachments

(3 files, 1 obsolete file)

There was detected some memory still reachable after strsclnt exit. Probably it's not real leak and should be ignored, but rather look at it. 

NSS: securityjes5, 20070214.1, DBG version
OS: Red Hat Enterprise Linux AS release 3 (Taroon Update 7)

Started selfserv:

selfserv -D -p 8443 -d /share/builds/mccrel3/security/securityjes5/builds/20070214.1/wozzeck_Solaris8/mozilla/tests_results/security/touquet.1/server_memleak -n touquet.red.iplanet.com -e touquet.red.iplanet.com-ec -w nss -c ABCDEF:C001:C002:C003:C004:C005:C006:C007:C008:C009:C00A:C00B:C00C:C00D:C00E:C00F:C010:C011:C012:C013:C014cdefgijklmnvyz -t 5

Started strsclnt (normal mode):

/usr/local/bin/valgrind --tool=memcheck --leak-check=yes --show-reachable=yes --num-callers=50 strsclnt -q -p 8443 -d /share/builds/mccrel3/security/securityjes5/builds/20070214.1/wozzeck_Solaris8/mozilla/tests_results/security/touquet.1/client_memleak -w nss -c 1000 -C :C001 touquet.red.iplanet.com

==22797== 128 bytes in 4 blocks are still reachable in loss record 8 of 10
==22797==    at 0x442572F: malloc (vg_replace_malloc.c:149)
==22797==    by 0x4585966: PR_Malloc (../../../../pr/src/malloc/prmem.c:467)
==22797==    by 0x456A699: DefaultAllocTable (../../../lib/ds/plhash.c:72)
==22797==    by 0x456AE0A: PL_HashTableRawRemove (../../../lib/ds/plhash.c:340)
==22797==    by 0x456AF90: PL_HashTableRemove (../../../lib/ds/plhash.c:380)
==22797==    by 0x453C178: nssPointerTracker_remove (tracker.c:472)
==22797==    by 0x453A758: arena_remove_pointer (arena.c:165)
==22797==    by 0x453ABDF: nssArena_Destroy (arena.c:531)
==22797==    by 0x453AB7A: NSSArena_Destroy (arena.c:490)
==22797==    by 0x453D067: nssList_Destroy (list.c:165)
==22797==    by 0x453D808: nssListIterator_Destroy (list.c:393)
==22797==    by 0x452AB3E: NSSTrustDomain_Destroy (trustdomain.c:136)
==22797==    by 0x45317D4: STAN_Shutdown (pki3hack.c:229)
==22797==    by 0x44C3235: NSS_Shutdown (nssinit.c:792)
==22797==    by 0x804EAEC: main (strsclnt.c:1511)

Started strsclnt (bypass mode):

/usr/local/bin/valgrind --tool=memcheck --leak-check=yes --show-reachable=yes --num-callers=50 strsclnt -q -p 8443 -d /share/builds/mccrel3/security/securityjes5/builds/20070214.1/wozzeck_Solaris8/mozilla/tests_results/security/touquet.1/client_memleak -w nss -c 1000 -C A touquet.red.iplanet.com

==10861== 128 bytes in 4 blocks are still reachable in loss record 9 of 11
==10861==    at 0x442572F: malloc (vg_replace_malloc.c:149)
==10861==    by 0x4585966: PR_Malloc (../../../../pr/src/malloc/prmem.c:467)
==10861==    by 0x456A699: DefaultAllocTable (../../../lib/ds/plhash.c:72)
==10861==    by 0x456AE0A: PL_HashTableRawRemove (../../../lib/ds/plhash.c:340)
==10861==    by 0x456AF90: PL_HashTableRemove (../../../lib/ds/plhash.c:380)
==10861==    by 0x453C178: nssPointerTracker_remove (tracker.c:472)
==10861==    by 0x453A758: arena_remove_pointer (arena.c:165)
==10861==    by 0x453ABDF: nssArena_Destroy (arena.c:531)
==10861==    by 0x453AB7A: NSSArena_Destroy (arena.c:490)
==10861==    by 0x453D067: nssList_Destroy (list.c:165)
==10861==    by 0x453D808: nssListIterator_Destroy (list.c:393)
==10861==    by 0x452AB3E: NSSTrustDomain_Destroy (trustdomain.c:136)
==10861==    by 0x45317D4: STAN_Shutdown (pki3hack.c:229)
==10861==    by 0x44C3235: NSS_Shutdown (nssinit.c:792)
==10861==    by 0x804EAEC: main (strsclnt.c:1511)

In both cases there was set NSS_DISABLE_ARENA_FREE_LIST="1".
One more stack (bypass mode):

==15678== 2,251 bytes in 6 blocks are still reachable in loss record 14 of 15
==15678==    at 0x43D972F: malloc (vg_replace_malloc.c:149)
==15678==    by 0x45874D0: PR_Malloc (prmem.c:467)
==15678==    by 0x456C624: DefaultAllocTable (plhash.c:72)
==15678==    by 0x456CDB1: PL_HashTableRawRemove (plhash.c:340)
==15678==    by 0x456CF3A: PL_HashTableRemove (plhash.c:380)
==15678==    by 0x62BF8E4: nssPointerTracker_remove (tracker.c:472)
==15678==    by 0x62BDF9E: arena_remove_pointer (arena.c:165)
==15678==    by 0x62BE3EE: nssArena_Destroy (arena.c:531)
==15678==    by 0x62B78BA: nssToken_Destroy (devtoken.c:161)
==15678==    by 0x62AE966: token_destructor (trustdomain.c:124)
==15678==    by 0x62C0821: nssList_Clear (list.c:200)
==15678==    by 0x62AE9D5: NSSTrustDomain_Destroy (trustdomain.c:140)
==15678==    by 0x62B5301: STAN_Shutdown (pki3hack.c:229)
==15678==    by 0x624516A: NSS_Shutdown (nssinit.c:809)
==15678==    by 0x804E849: main (strsclnt.c:1528)
The problem here seems to be that no code in NSS ever calls 
nssPointerTracker_finalize to destroy the pointer tracker table.
Keywords: mlk
OS: Linux → All
Priority: -- → P2
Hardware: PC → All
Summary: Possible leak in arena destroy functions. → Memory leaks in pointer tracker code in DEBUG builds only
Target Milestone: --- → 3.12
Taking.  
Assignee: nobody → nelson
Attached patch monster cleanup patch v1 (obsolete) — Splinter Review
I set out to write a patch to call nssPointerTracker_finalize to 
plug the leak.  I discovered many problems with the Stan pointer 
tracking code, and I decided that fixing that code is not an objective
of NSS 3.12.  So, rather than fix it (to fix the leaks), I decided to
ifdef it all out with a new "feature test macro" named DEBUG_TRACKER.

Along the way, I discovered a bunch of other issues, including

- inconsistent ifdefs.  Some function definitions are inside of an 
ifdef that uses one symbol, but the invocations of those functions 
are inside of an ifdef for a different symbol.  I attempted to fix
that by putting all the pointer tracker definitions and invocations
inside of #ifdef DEBUG_TRACKER.

- the CKFW code exports all its NSPR-like "stubs" from any windows DLL
into which it is linked.  The solution is to remove PR_IMPLEMENT from
all the function definitions.

- The coding style used in Stan code, which separates function names
in all function definitions from the left parenthesis that follows,
defeats ctags, the utility that produces tags files for vi and emacs, 
making those editors unable to find Stan functions by name.  The 
solution is to join the line with the function name with the following
line that contains only a left parenthesis.

- Some (not much) function definitions put the opening left brace 
on the end of the line that names the function, rather than on a 
new line by itself.  This defeats vi and emacs ability to skip 
forward or backword to the next (or previous) function in the source.
Solution is to break those lines, putting the brace on its own separate
line.  

- tons of "#ifdef notdef" code that we're pointlessly maintaining.
Solution is to remove it.

So, I made those changes in all the files that I had to touch to ifdef
the pointer tracker code.  The resulting patch is pretty big, but it's
all mechanical, and should be easy to review.  The only logic changes 
are the ifdef changes, which remove the pointer tracking stuff. 

I will review the patch myself in bugzilla before asking others to review
it.
Comment on attachment 282911 [details] [diff] [review]
monster cleanup patch v1

Bob, please read comment 5 above and then review the first two patches for this bug.
Attachment #282911 - Flags: review?(rrelyea)
Attachment #282912 - Flags: review?(rrelyea)
In the course of writing up bug 398090, it occurred to me that there is a 
muck smaller quicker simpler HACK solution to this particular leak. 

It RELIES on the fact that only one of the 5 pointer tracker objects ever
gets initialized successfully, and we know which one that is, so we don't
have to worry about the other 4.  We can leave the existing tracker code
running for the one tracker instance that already works, and not worry 
about the other 4 instances that don't.  I'll work up a patch for that.
This patch needs more testing before I ask for review.
If it works, it will solve the same pointer tracker stacks as 
those listed in the second patch (attachment 282912 [details] [diff] [review]) above.
Comment on attachment 282911 [details] [diff] [review]
monster cleanup patch v1

r+ This patch and the one marked 'HACK v1 do not appear to be mutually exclusive.

I think it's reasonable to clean up the pointer stuff and put it under it's own #define, which this patch does.

I would at least like to see the PR_OnceWithArg chnage from the HACK v1 patch as well.
Attachment #282911 - Flags: review?(rrelyea) → review+
Comment on attachment 282912 [details] [diff] [review]
remove pointer tracker stacks from ignored file (checked in)

r+  relyea
Attachment #282912 - Flags: review?(rrelyea) → review+
Comment on attachment 282938 [details] [diff] [review]
Alternative HACK patch, v1 (checked in)

Bob, do you mind if I ask you to review this before I'm done testing it?
Attachment #282938 - Flags: review?(rrelyea)
Comment on attachment 282938 [details] [diff] [review]
Alternative HACK patch, v1 (checked in)

r+ I should have given it when I reviewed it last time.
Attachment #282938 - Flags: review?(rrelyea) → review+
Comment on attachment 282938 [details] [diff] [review]
Alternative HACK patch, v1 (checked in)

Patch committed on trunk
lib/base/arena.c;   new revision: 1.10; previous revision: 1.9
lib/base/base.h;    new revision: 1.19; previous revision: 1.18
lib/base/tracker.c; new revision: 1.7; previous revision: 1.6
lib/nss/nssinit.c;  new revision: 1.91; previous revision: 1.90
Attachment #282938 - Attachment description: Alternative HACK patch, v1 → Alternative HACK patch, v1 (checked in)
The so-called "alternative HACK" patch replaced calls to NSS's "call_once" 
function with calls to the essentially identical PR_CallOnce function.  

It also added a code path that attempted to free the values allocated by 
the pointer tracker code through that PR_CallOnce code.

When that patch was checked in, the amount of memory leaked (as reported by
tinderbox) was reduced significantly (more than I expected), but not all 
the items allocated by the pointer tracker code through that PR_CallOnce code
are being freed.  There are still two leak stacks that appear in selfserv,
strsclnt and ocspclnt.  The only difference is that these stacks now show
PR_CallOnce where they previously showed call_once.  So, I need to figure out
why those allocations are not being freed by the new nssArena_Shutdown 
function.  The two new stacks are:

SECMOD_LoadModule/
 SECMOD_LoadModule/
  SECMOD_LoadPKCS11Module/
   secmod_ModuleInit/
    builtinsC_Initialize/
     NSSCKFWC_Initialize/
      nssCKFWInstance_Create/
       NSSArena_Create/
        nssArena_Create/
         arena_add_pointer/
          nssPointerTracker_initialize/
           PR_CallOnceWithArg/
            trackerOnceFunc/
             PL_NewHashTable/
              DefaultAllocTable/
               PR_Malloc
and
SECMOD_LoadModule/
 SECMOD_LoadModule/
  SECMOD_LoadPKCS11Module/
   secmod_ModuleInit/
    builtinsC_Initialize/
     NSSCKFWC_Initialize/
      nssCKFWInstance_Create/
       NSSArena_Create/
        nssArena_Create/
         arena_add_pointer/
          nssPointerTracker_initialize/
           PR_CallOnceWithArg/trackerOnceFunc/
            PR_NewLock/
             PR_Calloc/
              calloc
OK, I figured it out.  
There are TWO copies of the Stan nssPointerTracker code being linked 
into NSS libraries, one into nss3, the other into nssckbi. 
Each copy has its own PRCallOnce structure, its own lock and its own 
hash table.  

The nssckbi allocation stacks are shown above.  The nss3 stacks are:

NSS_Initialize/
 nss_Init/
  STAN_LoadDefaultNSS3TrustDomain/
   NSSTrustDomain_Create/
    NSSArena_Create/
     nssArena_Create/
      arena_add_pointer/
       nssPointerTracker_initialize/
        PR_CallOnceWithArg/
         trackerOnceFunc/
          ...

The patch that was committed calls nssArena_Shutdown in nss3, but the
patch does not provide for calling it in nssckbi.  So, the remaining
fix is to call it in nssckbi somewhere.
Here's a proposal to fix the leak in nssckbi.  Bob, please review.

I'm not entirely satisfied with this patch.  I'm not sure this is the right approach.  But off hand I haven't thought of a better one.

CKFW allows the existence of multiple "instances".  We want to destroy the 
pointer tracker table for CKFW after (when) the last NSSArena that was 
created for a CKFW instance is destroyed.  The trouble is that we cannot 
accurately tell when that has occurred because we cannot accurately tell  
when an instance has created an NSSArena, nor when it has destroyed one.

Part way through the creation of an instance, it calls NSSArena_Create,
which will call PR_CallOnce to initialize the hash table of nssArenas.
CKFW's instance creation function may fail before, or after, that call to 
NSSArena_Create, so we cannot tell, by looking at the return value from 
the instance creation function, whether it created an NSSArena or not.
Likewise, we cannot tell from the destruction function whether it has 
destroyed an NSSArena or not.  

So, in this patch, I count the number of successful instance creations and 
decrement the count with each successful instance destruction.  If & when
the count of instances goes to zero, it destroys the pointer tracker stuff.

This approach has the property that it will be correct as long as no attempts
to initialize or finalize a CKFW instance ever fail, and it will be correct
(that is, will not leak) for SOME cases where failure to initialize or destroy an instance occurs.  But in other cases where failure to initialize or destroy an instance occurs, it will still leak.  For example, if an instance creation
fails after the pointer tracker has been initialized, there will be no 
corresponding call to the instance destructor, and consequently the code to 
destroy the pointer tracker will never be called.

An alternative approach to this one occurs to me.  Rather than burdening each
shared library that uses NSSArenas with the responsiblity to clean up pointer 
tracker stuff, we could try to put the tracker destruction logic into the 
NSSArena code itself.  Each time an NSSArena is created, we could increment
a counter, and decrement it on each NSSArena destruction.  When it goes to 
zero, we could destroy the pointer tracker.  

But I see problems with that alternative approach, too.  
1) Destruction (resetting) of a PRCallOnce structure is not atomic, so we 
   really don't want to do it anyplace where PR_CallOnce might get called on 
   a PRCallOnce structure near the same time that it is being destroyed/reset.
   There is a chance that one NSSArena could be being created on one thread
   at about the same time as the last NSSArena is being destroyed on another
   thread.  
2) It is possible for NSSArena creation to fail after the pointer tracker
   has been created.  If the NSSArena creation fails, then there will be no 
   matching call to the NSSArena destructor, and consequently the function to 
   destroy the pointer tracker will not be called.  
So, I actually think that alternative is worse than the approach in this patch.

I think this proposed patch is good enough.  Pointer trackers are only used
in debug builds, so there can be no pointer tracker leaks in optimized 
builds.  

Finally, I want to forestall any arguments about late/lazy initialization.
nssckbi is a PKCS#11 module.  We are already doing the initialization in the
earliest possible function (C_Initialize) and the destruction in the latest 
possible destructor (C_Finalize).  There is no opportunity to create yet 
another even earlier initialization function, or even later destruction 
function than these, because the PKCS#11 API doesn't allow for them.  
If all platforms supported a shared library initialization function that is 
always called when a shared library is loaded, and a shared library destruction function that is always called when a shared library is unloaded, those functions could be used to handle pointer tracker allocation and deallocation.
But AFAIK, not all supported platforms have such load/unload functions for 
shared libs.
Attachment #305266 - Flags: review?(rrelyea)
Hmm. Here's another interesting observation.
CKFW has a file of "stub" functions for NSPR.  Included in that file is a 
stub implementation of the pointer tracker stuff.  I gather than Bob 
intended for ckfw's stub version of the pointer tracker stuff to be used
in nssckbi, rather then the real implementation in nss/lib/base.  But the 
leak stacks being reported show that the nss/lib/base version is being used,
not the stub version.  I don't yet know why the stub versions are not used.

Is THAT the real problem here?  Is the real solution to cause nssckbi to 
use the stub version of the pointer tracker stuff?
RE the stub ---

I believe the stub is old code, back when we were trying to keep PKCS #11 modules independent of NSPR. I don't believe the stubs are still in use. Instead, we simply link directly witn NSPR.

The original purpose of the stubs were 1) to support explicit pkcs #11 locking semantics, and 2) worries about incompatible versions of NSPR in the address space. Both problems are anachronistic and do not play out in today's environments (where NSPR has binary compatibility guarentees and where OS's typically provide system level threads that all applications use).

bob
Comment on attachment 305266 [details] [diff] [review]
patch for lib/ckfw, v1 (checked in)

r+.

The primary problem with these memory leaks is noise in our leak detection. The fact that error paths may cause these leaks to reappear should not be an issue since these are all one-time startup leaks.

bob
Attachment #305266 - Flags: review?(rrelyea) → review+
Bob, Should we remove that file of stubs, if we no longer intend to use it?

If we no longer intend to use it, it would be bad if we accidentally did 
use it.  
Comment on attachment 305266 [details] [diff] [review]
patch for lib/ckfw, v1 (checked in)

Checking in wrap.c; new revision: 1.16; previous revision: 1.15
Attachment #305266 - Attachment description: patch for lib/ckfw, v1 → patch for lib/ckfw, v1 (checked in)
Attachment #282911 - Attachment is obsolete: true
Comment on attachment 282912 [details] [diff] [review]
remove pointer tracker stacks from ignored file (checked in)

Checking in ignored; new revision: 1.63; previous revision: 1.62
Attachment #282912 - Attachment description: remove pointer tracker stacks from ignored file → remove pointer tracker stacks from ignored file (checked in)
Nelson,

Re: comment 21, I would say yes. Let's remove this dead code.
re comment 21 and comment 24... I concur. I just needed to verify that I was in fact correct about it's usage before I agreed it could be removed (which I have now done).
Slavo, do there remain any pointerTracker leaks in the "ignored" file?
Or is this bug fixed?
(In reply to comment #26) 
> Slavo, do there remain any pointerTracker leaks in the "ignored" file?
> Or is this bug fixed?

There is no pointerTracker bug in ignored file (I just checked logs and you removed it in February), so this bugs seems to be fixed.
Then I declare this bug to have been fixed.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: