Last Comment Bug 431805 - leak in NSSArena_Destroy()
: leak in NSSArena_Destroy()
Status: RESOLVED FIXED
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: unspecified
: All All
: -- normal (vote)
: 3.12.1
Assigned To: nobody
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-01 22:32 PDT by Boying Lu
Modified: 2008-05-16 20:47 PDT (History)
1 user (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
patch (1.63 KB, patch)
2008-05-01 22:34 PDT, Boying Lu
no flags Details | Diff | Splinter Review
new patch (1.68 KB, patch)
2008-05-03 07:17 PDT, Boying Lu
wtc: review+
Details | Diff | Splinter Review
patch v3 (as checked in) (2.57 KB, patch)
2008-05-09 18:06 PDT, Wan-Teh Chang
no flags Details | Diff | Splinter Review
patch v4 (4.12 KB, patch)
2008-05-09 20:00 PDT, Wan-Teh Chang
no flags Details | Diff | Splinter Review
patch v4.1 (4.15 KB, patch)
2008-05-09 20:12 PDT, Wan-Teh Chang
nelson: review-
Details | Diff | Splinter Review
patch v5 (4.25 KB, patch)
2008-05-16 19:37 PDT, Wan-Teh Chang
nelson: review+
Details | Diff | Splinter Review

Description Boying Lu 2008-05-01 22:32:49 PDT
Using mdb+libumem.so provided in solaris 10, I found a leak in NSSArena_Destroy()

Re-produce steps:
Same with the steps in bug 431628

libumem.so reports a leak in NSSArena_Destroy() and following is the call stack
                 libumem.so.1`umem_cache_alloc_debug+0x14f
                 libumem.so.1`umem_cache_alloc+0x144
                 libumem.so.1`umem_alloc+0xc5
                 libumem.so.1`malloc+0x27
                 libumem.so.1`calloc+0x47
                 libnspr4.so`PR_Calloc+0x65
                 libnss3.so`error_get_my_stack+0xfc
                 libnss3.so`nss_ClearErrorStack+0x26
                 libnss3.so`NSSArena_Create+0x26
                 libnss3.so`NSSTrustDomain_Create+0x26
                 libnss3.so`STAN_LoadDefaultNSS3TrustDomain+0x79
                 libnss3.so`nss_Init+0x44d
                 libnss3.so`NSS_NoDB_Init+0x95
                 libpipnss.so`unsigned nsNSSComponent::InitializeNSS+0x1c9
                 libpipnss.so`unsigned nsNSSComponent::Init+0x331
                 libpipnss.so`unsigned nsNSSComponentConstructor+0xbf
                 libxpcom_core.so`unsigned nsGenericFactory::CreateInstance+0x46
                 
                 libxpcom_core.so`unsigned nsComponentManagerImpl::CreateInstanc
                 eByContractID+0xda   
                 ...

The root cause is in follow codes NSSArena_Destroy()
(
  NSSArena *arena
)
{
  nss_ClearErrorStack();
  ...
}

nss_ClearErrorStack() doesn't free the thread private data which is set in error_get_my_stack(). Though in error_once_function(), this data is 
set to be freed when thread terminates, it doesn't work for main thread

So the data (stack) which is created in main thread is leaked
Comment 1 Boying Lu 2008-05-01 22:34:11 PDT
Created attachment 318969 [details] [diff] [review]
patch
Comment 2 Wan-Teh Chang 2008-05-02 10:20:53 PDT
Boying, could you look at the full call stack and see if
it is the main thread that calls NSS_NoDB_Init?

If it is the main thread, then I mayhave to resolve this bug
WONTFIX.  The per-thread error stack is supposed to
be freed by the NSPR  thread private data destructor
when the thread terminates.  However, for the main
thread this doesn't happen unless the main thread
calls PR_Cleanup.

Because NSPR may be used in the destructors of
static C++ objects, many C++ applications such as
Firefox cannot call PR_Cleanup.

I tried to fix this problem by adding a shared library
"fini" routine to libnspr4.so (bug 362768), but I had
to remove the calls to NSPR  thread private data
destructors because they caused crashes
(bug 383977).  Strictly speaking those crashes are
application bugs, but NSPR has a backward
compatibility requirement, so in this case we have
to be "bug compatible".

An alternative approach to free the error stack of
the calling thread in NSS_Shutdown.  Would you
like to try that?
Comment 3 Boying Lu 2008-05-03 06:34:59 PDT
(In reply to comment #2)
> Boying, could you look at the full call stack and see if
> it is the main thread that calls NSS_NoDB_Init?

Yes, the NSS_NoDB_Init() is called in main thread
> 
> If it is the main thread, then I mayhave to resolve this bug
> WONTFIX.  The per-thread error stack is supposed to
> be freed by the NSPR  thread private data destructor
> when the thread terminates.  However, for the main
> thread this doesn't happen unless the main thread
> calls PR_Cleanup.
> 
> Because NSPR may be used in the destructors of
> static C++ objects, many C++ applications such as
> Firefox cannot call PR_Cleanup.
> 
> I tried to fix this problem by adding a shared library
> "fini" routine to libnspr4.so (bug 362768), but I had
> to remove the calls to NSPR  thread private data
> destructors because they caused crashes
> (bug 383977).  Strictly speaking those crashes are
> application bugs, but NSPR has a backward
> compatibility requirement, so in this case we have
> to be "bug compatible".
> 
> An alternative approach to free the error stack of
> the calling thread in NSS_Shutdown.  Would you
> like to try that?
> 
ok, I'll give it a try tomorrow.

Comment 4 Boying Lu 2008-05-03 07:17:26 PDT
Created attachment 319166 [details] [diff] [review]
new patch

Add nss_DestroyErrorStack() in NSS_Shutdown() and the leak is gone
Comment 5 Wan-Teh Chang 2008-05-08 11:25:10 PDT
Comment on attachment 319166 [details] [diff] [review]
new patch

r=wtc.

Sorry about the delay.  I was looking into what other
files need this fix because security/nss/lib/base is
used by two .so's -- libnss3.so and libnssckbi.so.
I suspect libnssckbi.so may also need to call
nss_DestroyErrorStack() in its shutdown/finalize
function.
Comment 6 Nelson Bolyard (seldom reads bugmail) 2008-05-08 15:02:59 PDT
Wan-Teh, will you commit the patch for Boying?
Comment 7 Wan-Teh Chang 2008-05-09 18:06:23 PDT
Created attachment 320313 [details] [diff] [review]
patch v3 (as checked in)

I checked in the patch on the NSS trunk (NSS 3.12.1).

In nss_DestroyErrorStack, I added a check "0 != error_stack_index"
to determine if error_stack_index is valid.  Strictly speaking this
check is incorrect because 0 is the first valid index for NSPR thread
private data.  Boying, please see if this breaks your memory leak fix.

Checking in base/base.h;
/cvsroot/mozilla/security/nss/lib/base/base.h,v  <--  base.h
new revision: 1.20; previous revision: 1.19
done
Checking in base/error.c;
/cvsroot/mozilla/security/nss/lib/base/error.c,v  <--  error.c
new revision: 1.8; previous revision: 1.7
done
Checking in nss/nssinit.c;
/cvsroot/mozilla/security/nss/lib/nss/nssinit.c,v  <--  nssinit.c
new revision: 1.95; previous revision: 1.94
done
Comment 8 Wan-Teh Chang 2008-05-09 20:00:03 PDT
Created attachment 320324 [details] [diff] [review]
patch v4

Initialize the thread-private data index to -1 (an invalid
index), so that we can reliably detect whether the index is
valid.
Comment 9 Wan-Teh Chang 2008-05-09 20:12:16 PDT
Created attachment 320327 [details] [diff] [review]
patch v4.1

Improved the comment a bit.
Comment 10 Nelson Bolyard (seldom reads bugmail) 2008-05-10 22:11:33 PDT
Comment on attachment 320327 [details] [diff] [review]
patch v4.1


>+ * Thread-private data indexes are in the range [0, 127], so
>+ * -1 is an invalid index.
>  */
> 
>-static PRUintn error_stack_index;
>+static PRIntn error_stack_index = -1;

Wan-Teh, what would you think about leaving error_stack_index 
as a PRUintn and using MAX_UINT instead of -1 ?
Comment 11 Wan-Teh Chang 2008-05-12 18:30:39 PDT
I can do that.  I use -1 because it is a simple constant
commonly used to indicate an error/invalid value.  If I
use MAX_UINT, I'll need to define a macro:

#define INVALID_TPD_INDEX MAX_UINT

and use INVALID_TPD_INDEX.  Is it the (PRUintn *) typecast
in my patch that bothers you?  I can avoid that typecast
like this:

    PRUintn index;
    PRStatus rv = PR_NewThreadPrivateIndex(&index, PR_Free);
    if (rv == PR_SUCCESS)
        error_stack_index = index;
    return rv;

Please let me which you prefer.  Thanks!
Comment 12 Nelson Bolyard (seldom reads bugmail) 2008-05-12 18:51:18 PDT
That cast is one issue, and your proposed alternative solution in 
comment 11 is also an acceptable solution to that issue.  

The other issue is this new statement in nss_DestroyErrorStack:

>+    PR_SetThreadPrivate(error_stack_index, NULL);

I think that will generate a warning, since the function expects unsigned
and the argument is signed. 

I suggest that nss_DestroyErrorStack should also set error_stack_index
back to the "invalid" value.

If you keep -1 as the invalid value, I'd suggest changing the "!= -1" tests 
to ">= 0" tests instead.  The thought of passing ANY negative numbers to that
function disturbs me.  I think I prefer the INVALID_TPD_INDEX approach.
Comment 13 Nelson Bolyard (seldom reads bugmail) 2008-05-15 20:59:26 PDT
Comment on attachment 320327 [details] [diff] [review]
patch v4.1

Please see preceding bug comments for review comments.
Comment 14 Wan-Teh Chang 2008-05-16 19:37:51 PDT
Created attachment 321349 [details] [diff] [review]
patch v5

I implemented the INVALID_TPD_INDEX approach.  Regrettably
NSPR doesn't define PR_UINTN_MAX, so I have to use UINT_MAX
from the Standard C library header <limits.h>

I don't set error_stack_index back to INVALID_TPD_INDEX in
nss_DestroyErrorStack for two reasons:

1. This function only destroys the error stack of the calling
thread.  It is not a shutdown function for the error stack
component.

2. Even if this function were the shutdown function for the
error stack component, it would not make sense to reset
error_stack_index because there is no way to unregister an
NSPR TPD index.  If we ever reinitialize NSS, we can just
use the error_stack_index that was allocated in the previous
NSS session.
Comment 15 Nelson Bolyard (seldom reads bugmail) 2008-05-16 20:05:16 PDT
Comment on attachment 321349 [details] [diff] [review]
patch v5

r=nelson.  Thanks.
Comment 16 Wan-Teh Chang 2008-05-16 20:47:35 PDT
Patch v5 checked in (as a supplment to patch v3) on
the NSS trunk (NSS 3.12.1.)

Checking in base/error.c;
/cvsroot/mozilla/security/nss/lib/base/error.c,v  <--  error.c
new revision: 1.9; previous revision: 1.8
done
Checking in nss/nssinit.c;
/cvsroot/mozilla/security/nss/lib/nss/nssinit.c,v  <--  nssinit.c
new revision: 1.96; previous revision: 1.95
done

Note You need to log in before you can comment on or make changes to this bug.