3.35 KB, patch
|Details | Diff | Splinter Review|
5.23 KB, patch
|Details | Diff | Splinter Review|
1.50 KB, patch
|Details | Diff | Splinter Review|
There was detected some memory still reachable after strsclnt exit. Probably it's not real leak and should be ignored, but rather look at it. NSS: securityjes5, 20070214.1, DBG version OS: Red Hat Enterprise Linux AS release 3 (Taroon Update 7) Started selfserv: selfserv -D -p 8443 -d /share/builds/mccrel3/security/securityjes5/builds/20070214.1/wozzeck_Solaris8/mozilla/tests_results/security/touquet.1/server_memleak -n touquet.red.iplanet.com -e touquet.red.iplanet.com-ec -w nss -c ABCDEF:C001:C002:C003:C004:C005:C006:C007:C008:C009:C00A:C00B:C00C:C00D:C00E:C00F:C010:C011:C012:C013:C014cdefgijklmnvyz -t 5 Started strsclnt (normal mode): /usr/local/bin/valgrind --tool=memcheck --leak-check=yes --show-reachable=yes --num-callers=50 strsclnt -q -p 8443 -d /share/builds/mccrel3/security/securityjes5/builds/20070214.1/wozzeck_Solaris8/mozilla/tests_results/security/touquet.1/client_memleak -w nss -c 1000 -C :C001 touquet.red.iplanet.com ==22797== 128 bytes in 4 blocks are still reachable in loss record 8 of 10 ==22797== at 0x442572F: malloc (vg_replace_malloc.c:149) ==22797== by 0x4585966: PR_Malloc (../../../../pr/src/malloc/prmem.c:467) ==22797== by 0x456A699: DefaultAllocTable (../../../lib/ds/plhash.c:72) ==22797== by 0x456AE0A: PL_HashTableRawRemove (../../../lib/ds/plhash.c:340) ==22797== by 0x456AF90: PL_HashTableRemove (../../../lib/ds/plhash.c:380) ==22797== by 0x453C178: nssPointerTracker_remove (tracker.c:472) ==22797== by 0x453A758: arena_remove_pointer (arena.c:165) ==22797== by 0x453ABDF: nssArena_Destroy (arena.c:531) ==22797== by 0x453AB7A: NSSArena_Destroy (arena.c:490) ==22797== by 0x453D067: nssList_Destroy (list.c:165) ==22797== by 0x453D808: nssListIterator_Destroy (list.c:393) ==22797== by 0x452AB3E: NSSTrustDomain_Destroy (trustdomain.c:136) ==22797== by 0x45317D4: STAN_Shutdown (pki3hack.c:229) ==22797== by 0x44C3235: NSS_Shutdown (nssinit.c:792) ==22797== by 0x804EAEC: main (strsclnt.c:1511) Started strsclnt (bypass mode): /usr/local/bin/valgrind --tool=memcheck --leak-check=yes --show-reachable=yes --num-callers=50 strsclnt -q -p 8443 -d /share/builds/mccrel3/security/securityjes5/builds/20070214.1/wozzeck_Solaris8/mozilla/tests_results/security/touquet.1/client_memleak -w nss -c 1000 -C A touquet.red.iplanet.com ==10861== 128 bytes in 4 blocks are still reachable in loss record 9 of 11 ==10861== at 0x442572F: malloc (vg_replace_malloc.c:149) ==10861== by 0x4585966: PR_Malloc (../../../../pr/src/malloc/prmem.c:467) ==10861== by 0x456A699: DefaultAllocTable (../../../lib/ds/plhash.c:72) ==10861== by 0x456AE0A: PL_HashTableRawRemove (../../../lib/ds/plhash.c:340) ==10861== by 0x456AF90: PL_HashTableRemove (../../../lib/ds/plhash.c:380) ==10861== by 0x453C178: nssPointerTracker_remove (tracker.c:472) ==10861== by 0x453A758: arena_remove_pointer (arena.c:165) ==10861== by 0x453ABDF: nssArena_Destroy (arena.c:531) ==10861== by 0x453AB7A: NSSArena_Destroy (arena.c:490) ==10861== by 0x453D067: nssList_Destroy (list.c:165) ==10861== by 0x453D808: nssListIterator_Destroy (list.c:393) ==10861== by 0x452AB3E: NSSTrustDomain_Destroy (trustdomain.c:136) ==10861== by 0x45317D4: STAN_Shutdown (pki3hack.c:229) ==10861== by 0x44C3235: NSS_Shutdown (nssinit.c:792) ==10861== by 0x804EAEC: main (strsclnt.c:1511) In both cases there was set NSS_DISABLE_ARENA_FREE_LIST="1".
One more stack (bypass mode): ==15678== 2,251 bytes in 6 blocks are still reachable in loss record 14 of 15 ==15678== at 0x43D972F: malloc (vg_replace_malloc.c:149) ==15678== by 0x45874D0: PR_Malloc (prmem.c:467) ==15678== by 0x456C624: DefaultAllocTable (plhash.c:72) ==15678== by 0x456CDB1: PL_HashTableRawRemove (plhash.c:340) ==15678== by 0x456CF3A: PL_HashTableRemove (plhash.c:380) ==15678== by 0x62BF8E4: nssPointerTracker_remove (tracker.c:472) ==15678== by 0x62BDF9E: arena_remove_pointer (arena.c:165) ==15678== by 0x62BE3EE: nssArena_Destroy (arena.c:531) ==15678== by 0x62B78BA: nssToken_Destroy (devtoken.c:161) ==15678== by 0x62AE966: token_destructor (trustdomain.c:124) ==15678== by 0x62C0821: nssList_Clear (list.c:200) ==15678== by 0x62AE9D5: NSSTrustDomain_Destroy (trustdomain.c:140) ==15678== by 0x62B5301: STAN_Shutdown (pki3hack.c:229) ==15678== by 0x624516A: NSS_Shutdown (nssinit.c:809) ==15678== by 0x804E849: main (strsclnt.c:1528)
The problem here seems to be that no code in NSS ever calls nssPointerTracker_finalize to destroy the pointer tracker table.
Created attachment 282911 [details] [diff] [review] monster cleanup patch v1 I set out to write a patch to call nssPointerTracker_finalize to plug the leak. I discovered many problems with the Stan pointer tracking code, and I decided that fixing that code is not an objective of NSS 3.12. So, rather than fix it (to fix the leaks), I decided to ifdef it all out with a new "feature test macro" named DEBUG_TRACKER. Along the way, I discovered a bunch of other issues, including - inconsistent ifdefs. Some function definitions are inside of an ifdef that uses one symbol, but the invocations of those functions are inside of an ifdef for a different symbol. I attempted to fix that by putting all the pointer tracker definitions and invocations inside of #ifdef DEBUG_TRACKER. - the CKFW code exports all its NSPR-like "stubs" from any windows DLL into which it is linked. The solution is to remove PR_IMPLEMENT from all the function definitions. - The coding style used in Stan code, which separates function names in all function definitions from the left parenthesis that follows, defeats ctags, the utility that produces tags files for vi and emacs, making those editors unable to find Stan functions by name. The solution is to join the line with the function name with the following line that contains only a left parenthesis. - Some (not much) function definitions put the opening left brace on the end of the line that names the function, rather than on a new line by itself. This defeats vi and emacs ability to skip forward or backword to the next (or previous) function in the source. Solution is to break those lines, putting the brace on its own separate line. - tons of "#ifdef notdef" code that we're pointlessly maintaining. Solution is to remove it. So, I made those changes in all the files that I had to touch to ifdef the pointer tracker code. The resulting patch is pretty big, but it's all mechanical, and should be easy to review. The only logic changes are the ifdef changes, which remove the pointer tracking stuff. I will review the patch myself in bugzilla before asking others to review it.
Created attachment 282912 [details] [diff] [review] remove pointer tracker stacks from ignored file (checked in)
Comment on attachment 282911 [details] [diff] [review] monster cleanup patch v1 Bob, please read comment 5 above and then review the first two patches for this bug.
In the course of writing up bug 398090, it occurred to me that there is a muck smaller quicker simpler HACK solution to this particular leak. It RELIES on the fact that only one of the 5 pointer tracker objects ever gets initialized successfully, and we know which one that is, so we don't have to worry about the other 4. We can leave the existing tracker code running for the one tracker instance that already works, and not worry about the other 4 instances that don't. I'll work up a patch for that.
Created attachment 282938 [details] [diff] [review] Alternative HACK patch, v1 (checked in) This patch needs more testing before I ask for review. If it works, it will solve the same pointer tracker stacks as those listed in the second patch (attachment 282912 [details] [diff] [review]) above.
Comment on attachment 282911 [details] [diff] [review] monster cleanup patch v1 r+ This patch and the one marked 'HACK v1 do not appear to be mutually exclusive. I think it's reasonable to clean up the pointer stuff and put it under it's own #define, which this patch does. I would at least like to see the PR_OnceWithArg chnage from the HACK v1 patch as well.
Comment on attachment 282912 [details] [diff] [review] remove pointer tracker stacks from ignored file (checked in) r+ relyea
Comment on attachment 282938 [details] [diff] [review] Alternative HACK patch, v1 (checked in) Bob, do you mind if I ask you to review this before I'm done testing it?
Comment on attachment 282938 [details] [diff] [review] Alternative HACK patch, v1 (checked in) r+ I should have given it when I reviewed it last time.
Comment on attachment 282938 [details] [diff] [review] Alternative HACK patch, v1 (checked in) Patch committed on trunk lib/base/arena.c; new revision: 1.10; previous revision: 1.9 lib/base/base.h; new revision: 1.19; previous revision: 1.18 lib/base/tracker.c; new revision: 1.7; previous revision: 1.6 lib/nss/nssinit.c; new revision: 1.91; previous revision: 1.90
The so-called "alternative HACK" patch replaced calls to NSS's "call_once" function with calls to the essentially identical PR_CallOnce function. It also added a code path that attempted to free the values allocated by the pointer tracker code through that PR_CallOnce code. When that patch was checked in, the amount of memory leaked (as reported by tinderbox) was reduced significantly (more than I expected), but not all the items allocated by the pointer tracker code through that PR_CallOnce code are being freed. There are still two leak stacks that appear in selfserv, strsclnt and ocspclnt. The only difference is that these stacks now show PR_CallOnce where they previously showed call_once. So, I need to figure out why those allocations are not being freed by the new nssArena_Shutdown function. The two new stacks are: SECMOD_LoadModule/ SECMOD_LoadModule/ SECMOD_LoadPKCS11Module/ secmod_ModuleInit/ builtinsC_Initialize/ NSSCKFWC_Initialize/ nssCKFWInstance_Create/ NSSArena_Create/ nssArena_Create/ arena_add_pointer/ nssPointerTracker_initialize/ PR_CallOnceWithArg/ trackerOnceFunc/ PL_NewHashTable/ DefaultAllocTable/ PR_Malloc and SECMOD_LoadModule/ SECMOD_LoadModule/ SECMOD_LoadPKCS11Module/ secmod_ModuleInit/ builtinsC_Initialize/ NSSCKFWC_Initialize/ nssCKFWInstance_Create/ NSSArena_Create/ nssArena_Create/ arena_add_pointer/ nssPointerTracker_initialize/ PR_CallOnceWithArg/trackerOnceFunc/ PR_NewLock/ PR_Calloc/ calloc
OK, I figured it out. There are TWO copies of the Stan nssPointerTracker code being linked into NSS libraries, one into nss3, the other into nssckbi. Each copy has its own PRCallOnce structure, its own lock and its own hash table. The nssckbi allocation stacks are shown above. The nss3 stacks are: NSS_Initialize/ nss_Init/ STAN_LoadDefaultNSS3TrustDomain/ NSSTrustDomain_Create/ NSSArena_Create/ nssArena_Create/ arena_add_pointer/ nssPointerTracker_initialize/ PR_CallOnceWithArg/ trackerOnceFunc/ ... The patch that was committed calls nssArena_Shutdown in nss3, but the patch does not provide for calling it in nssckbi. So, the remaining fix is to call it in nssckbi somewhere.
Created attachment 305266 [details] [diff] [review] patch for lib/ckfw, v1 (checked in) Here's a proposal to fix the leak in nssckbi. Bob, please review. I'm not entirely satisfied with this patch. I'm not sure this is the right approach. But off hand I haven't thought of a better one. CKFW allows the existence of multiple "instances". We want to destroy the pointer tracker table for CKFW after (when) the last NSSArena that was created for a CKFW instance is destroyed. The trouble is that we cannot accurately tell when that has occurred because we cannot accurately tell when an instance has created an NSSArena, nor when it has destroyed one. Part way through the creation of an instance, it calls NSSArena_Create, which will call PR_CallOnce to initialize the hash table of nssArenas. CKFW's instance creation function may fail before, or after, that call to NSSArena_Create, so we cannot tell, by looking at the return value from the instance creation function, whether it created an NSSArena or not. Likewise, we cannot tell from the destruction function whether it has destroyed an NSSArena or not. So, in this patch, I count the number of successful instance creations and decrement the count with each successful instance destruction. If & when the count of instances goes to zero, it destroys the pointer tracker stuff. This approach has the property that it will be correct as long as no attempts to initialize or finalize a CKFW instance ever fail, and it will be correct (that is, will not leak) for SOME cases where failure to initialize or destroy an instance occurs. But in other cases where failure to initialize or destroy an instance occurs, it will still leak. For example, if an instance creation fails after the pointer tracker has been initialized, there will be no corresponding call to the instance destructor, and consequently the code to destroy the pointer tracker will never be called. An alternative approach to this one occurs to me. Rather than burdening each shared library that uses NSSArenas with the responsiblity to clean up pointer tracker stuff, we could try to put the tracker destruction logic into the NSSArena code itself. Each time an NSSArena is created, we could increment a counter, and decrement it on each NSSArena destruction. When it goes to zero, we could destroy the pointer tracker. But I see problems with that alternative approach, too. 1) Destruction (resetting) of a PRCallOnce structure is not atomic, so we really don't want to do it anyplace where PR_CallOnce might get called on a PRCallOnce structure near the same time that it is being destroyed/reset. There is a chance that one NSSArena could be being created on one thread at about the same time as the last NSSArena is being destroyed on another thread. 2) It is possible for NSSArena creation to fail after the pointer tracker has been created. If the NSSArena creation fails, then there will be no matching call to the NSSArena destructor, and consequently the function to destroy the pointer tracker will not be called. So, I actually think that alternative is worse than the approach in this patch. I think this proposed patch is good enough. Pointer trackers are only used in debug builds, so there can be no pointer tracker leaks in optimized builds. Finally, I want to forestall any arguments about late/lazy initialization. nssckbi is a PKCS#11 module. We are already doing the initialization in the earliest possible function (C_Initialize) and the destruction in the latest possible destructor (C_Finalize). There is no opportunity to create yet another even earlier initialization function, or even later destruction function than these, because the PKCS#11 API doesn't allow for them. If all platforms supported a shared library initialization function that is always called when a shared library is loaded, and a shared library destruction function that is always called when a shared library is unloaded, those functions could be used to handle pointer tracker allocation and deallocation. But AFAIK, not all supported platforms have such load/unload functions for shared libs.
Hmm. Here's another interesting observation. CKFW has a file of "stub" functions for NSPR. Included in that file is a stub implementation of the pointer tracker stuff. I gather than Bob intended for ckfw's stub version of the pointer tracker stuff to be used in nssckbi, rather then the real implementation in nss/lib/base. But the leak stacks being reported show that the nss/lib/base version is being used, not the stub version. I don't yet know why the stub versions are not used. Is THAT the real problem here? Is the real solution to cause nssckbi to use the stub version of the pointer tracker stuff?
RE the stub --- I believe the stub is old code, back when we were trying to keep PKCS #11 modules independent of NSPR. I don't believe the stubs are still in use. Instead, we simply link directly witn NSPR. The original purpose of the stubs were 1) to support explicit pkcs #11 locking semantics, and 2) worries about incompatible versions of NSPR in the address space. Both problems are anachronistic and do not play out in today's environments (where NSPR has binary compatibility guarentees and where OS's typically provide system level threads that all applications use). bob
Comment on attachment 305266 [details] [diff] [review] patch for lib/ckfw, v1 (checked in) r+. The primary problem with these memory leaks is noise in our leak detection. The fact that error paths may cause these leaks to reappear should not be an issue since these are all one-time startup leaks. bob
Bob, Should we remove that file of stubs, if we no longer intend to use it? If we no longer intend to use it, it would be bad if we accidentally did use it.
Comment on attachment 305266 [details] [diff] [review] patch for lib/ckfw, v1 (checked in) Checking in wrap.c; new revision: 1.16; previous revision: 1.15
Comment on attachment 282912 [details] [diff] [review] remove pointer tracker stacks from ignored file (checked in) Checking in ignored; new revision: 1.63; previous revision: 1.62
Nelson, Re: comment 21, I would say yes. Let's remove this dead code.
re comment 21 and comment 24... I concur. I just needed to verify that I was in fact correct about it's usage before I agreed it could be removed (which I have now done).
Slavo, do there remain any pointerTracker leaks in the "ignored" file? Or is this bug fixed?
(In reply to comment #26) > Slavo, do there remain any pointerTracker leaks in the "ignored" file? > Or is this bug fixed? There is no pointerTracker bug in ignored file (I just checked logs and you removed it in February), so this bugs seems to be fixed.
Then I declare this bug to have been fixed.