Closed Bug 397486 Opened 17 years ago Closed 16 years ago

Session cache locks not freed on strsclnt shutdown.

Categories

(NSS :: Libraries, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED
3.11.10

People

(Reporter: slavomir.katuscak+mozilla, Assigned: julien.pierre)

References

Details

(Keywords: memory-leak)

Attachments

(3 files, 4 obsolete files)

Client session cache lock not freed on strsclnt shutdown.

Block in use (biu):
Found block of size 88 bytes at address 0x80f0160 (2.16% of total)
At time of allocation, the call stack was:
    [1] calloc() at 0xb4d308a0
    [2] PR_Calloc() at line 475 in "prmem.c"
    [3] PR_NewLock() at line 174 in "ptsynch.c"
    [4] __nss_InitLock() at line 67 in "nsslocks.c"
    [5] ssl_InitClientSessionCacheLock() at line 74 in "sslnonce.c"
    [6] lock_cache() at line 81 in "sslnonce.c"
    [7] ssl_LookupSID() at line 166 in "sslnonce.c"
    [8] ssl2_BeginClientHandshake() at line 3060 in "sslcon.c"
    [9] ssl_Do1stHandshake() at line 151 in "sslsecur.c"
    [10] ssl_SecureSend() at line 1152 in "sslsecur.c"
    [11] ssl_Send() at line 1432 in "sslsock.c"
    [12] PR_Send() at line 226 in "priometh.c"
    [13] handle_connection() at line 696 in "strsclnt.c"
    [14] do_connects() at line 887 in "strsclnt.c"
    [15] thread_wrapper() at line 439 in "strsclnt.c"
    [16] _pt_root() at line 221 in "ptthread.c"
Component: Test → Libraries
Priority: -- → P2
Target Milestone: --- → 3.12
Assignee: nobody → nelson
Attached patch patch v1 (obsolete) — Splinter Review
also eliminates a warning.
Attachment #282920 - Flags: review?(alexei.volkov.bugs)
Slavo, This bug report contains only a single stack, but there are 4 stacks
in the ignored file.  For each and every memory leak bug, please ensure that 
a) all the stacks in the ignored file also appear in the bug report
b) there are no stacks dupliated (same stack appears in multiple places in 
the ignored file).
Nelson, these stacks differs only in small details dependent on architecture, here are the next:

Stack from Solaris - DBG -> in description

Stack from Solaris/Sparc - OPT:

Block in use (biu):
Found block of size 88 bytes at address 0x86b40 (2.44% of total)
At time of allocation, the call stack was:
    [1] calloc() at 0xde141028
    [2] PR_NewLock() at 0xe592952c
    [3] __nss_InitLock() at 0xe956af18
    [4] ssl_LookupSID() at 0xebd1a658
    [5] ssl2_BeginClientHandshake() at 0xebd18068
    [6] ssl_Do1stHandshake() at 0xebd1ae54
    [7] ssl_SecureSend() at 0xebd1cd0c
    [8] ssl_Send() at 0xebd22e00
    [9] do_connects() at 0x167d4
    [10] thread_wrapper() at 0x14e78
    [11] _pt_root() at 0xe5930bf8
    [12] _lwp_start() at 0xf35c04f4

Stack from Solaris/X86 - OPT:

Block in use (biu):
Found block of size 88 bytes at address 0x80cbe28 (2.66% of total)
At time of allocation, the call stack was:
    [1] calloc() at 0xa1531853
    [2] PR_Calloc() at 0xa8d12576
    [3] PR_NewLock() at 0xa8d23bf5
    [4] __nss_InitLock() at 0xac96f53e
    [5] ssl_LookupSID() at 0xaf11ce57
    [6] ssl2_BeginClientHandshake() at 0xaf11a352
    [7] ssl_Do1stHandshake() at 0xaf11d67f
    [8] ssl_SecureSend() at 0xaf11f6fd
    [9] ssl_Send() at 0xaf124b84
    [10] PR_Send() at 0xa8d0a676
    [11] do_connects() at 0x8056652
    [12] thread_wrapper() at 0x805517c
    [13] _pt_root() at 0xa8d2b365
    [14] _thr_setup() at 0xa159fd36
    [15] _lwp_start() at 0xa15a0020

Stacks from Linux - DBG:
==30263== 88 bytes in 1 blocks are still reachable in loss record 51 of 117
==30263==    at 0x43B96BF: calloc (vg_replace_malloc.c:279)
==30263==    by 0x44FE6EC: PR_Calloc (prmem.c:474)
==30263==    by 0x4510193: PR_NewLock (ptsynch.c:174)
==30263==    by 0x61CB71A: __nss_InitLock (nsslocks.c:67)
==30263==    by 0x43F1150: ssl_InitClientSessionCacheLock (sslnonce.c:74)
==30263==    by 0x43F116D: lock_cache (sslnonce.c:80)
==30263==    by 0x43F1366: ssl_LookupSID (sslnonce.c:165)
==30263==    by 0x43EDE56: ssl2_BeginClientHandshake (sslcon.c:3059)
==30263==    by 0x43F1D9F: ssl_Do1stHandshake (sslsecur.c:151)
==30263==    by 0x43F3EE7: ssl_SecureSend (sslsecur.c:1152)
==30263==    by 0x43F9E0C: ssl_Send (sslsock.c:1432)
==30263==    by 0x44F3D70: PR_Send (priometh.c:226)
==30263==    by 0x804CBAF: handle_connection (strsclnt.c:696)
==30263==    by 0x804D262: do_connects (strsclnt.c:887)
==30263==    by 0x804C434: thread_wrapper (strsclnt.c:439)
==30263==    by 0x451854D: _pt_root (ptthread.c:221)
==30263==    by 0x55D370: start_thread (in /lib/tls/libpthread-2.3.4.so)
==30263==    by 0x4BDFFD: clone (in /lib/tls/libc-2.3.4.so)

I don't understand the reason of patch from comment 3. 
Why to remove stacks of these bug from ignored file ? Because of missing stacks from this comment ?
And why to remove stack of bug 397478 ?
Comment on attachment 282920 [details] [diff] [review]
patch v1

Since we consider SSL_ClearSessionCache to be thread save(derives from use of LOCK_CACHE/UNLOCK_CACHE), we need to take care of destruction of cacheLock in thread save manner(like the one that is used in nss_InitLock).
Attachment #282920 - Flags: review?(alexei.volkov.bugs) → review-
Comment on attachment 282923 [details] [diff] [review]
patch to file of ignored leak stacks (checked in)

r=alexei
Attachment #282923 - Flags: review?(alexei.volkov.bugs) → review+
In reply to comment 4:
> I don't understand the reason of patch from comment 3. 
> Why to remove stacks of these bug from ignored file ?

That patch removes the stacks that (I predict) will be fixed (eliminated)
by the first patch.

In reply to comment 5:
As you know, there is no thread safe way to destroy a lock.  The code that
destroys a lock must be called in a circumstance in which the caller knows
that there is no longer any remaining use of that lock, such as at Shutdown
time.  

Having said that, your review comment made me wonder if SSL_ClearSessionCache 
is only called during shutdown.  Perhaps the browser calls it at other times. 
So, I will create a new cache destruction function and register it to be
called by NSS_Shutdown.  Thanks for the review.
Attachment #282920 - Attachment is obsolete: true
Attached patch patch v2 (backed out) (obsolete) — Splinter Review
This patch attempts to ensure that the client cache lock only gets freed 
when NSS is shutdown, which presumably is a safe time to free it.
Attachment #303224 - Flags: review?(alexei.volkov.bugs)
Comment on attachment 303224 [details] [diff] [review]
patch v2 (backed out)

r=alexei
Attachment #303224 - Flags: review?(alexei.volkov.bugs) → review+
Comment on attachment 303224 [details] [diff] [review]
patch v2 (backed out)

Checking in sslnonce.c; new revision: 1.18; previous revision: 1.17
Attachment #303224 - Attachment description: patch v2 → patch v2 (checked in)
Comment on attachment 282923 [details] [diff] [review]
patch to file of ignored leak stacks (checked in)

Checking in ignored; new revision: 1.58; previous revision: 1.57
Attachment #282923 - Attachment description: patch to file of ignored leak stacks → patch to file of ignored leak stacks (checked in)
I figured out the problem as we were on the way to dinner.  
The problem is that in some cases, the server cache is initialized
before NSS is initialized.  When the server session cache is initialized,
it also initializes the lock for the client session cache.  The new code
for initializing and destroying the client session cache lock wants to 
register a shutdown callback, but the code to register a shutdown 
callback fails if NSS is not initialized.  

I think MAYBE that's a bug/mistake.  Maybe registering a shutdown callback
handler should work OK even if NSS has not yet been initialized.  

I know why my testing didn't catch this.  I didn't test the multi-process
server configuration, which triggers this bug, because I forgot that the
client session cache lock is also used by the server session cache.  

So, it's back to the drawing board for this bug.  Maybe in the end I'll
recommit the patches committed above, along with a change to the function
that registers shutdown callbacks.  Or maybe I'll go to plan B.
Nelson,

In patchv2, ssl_InitClientSessionCacheLock uses PR_CallOnce incorrectly. There shouldn't be an

if (!cacheLock)

before the call to PR_CallOnce .

I think many, if not most servers initialize the session cache before calling NSS_Initialize, whether they are single-process or multi-process. Currently, our QA only tests the single-process case. But it is still done before in selfserv.
So, we cannot rely on an NSS shutdown function to free the locks in the server case. This works for the client case, however. I think we have to treat the two cases separately :
1) for servers, free all the SSL locks at the same time as the SSL session cache
2) for clients, free all the SSL locks in an NSS shutdown callback

This requires knowing which case you are in before trying to register the callback.

I strongly suggest that we fix this bug after 403240 . My patch there has a global variable called LocksInitializedEarly, which you can simply test to decide whether or not to register the shutdown callback to free the locks.

Any server that created the locks early during session cache init will be expected to call the proper API to free the SSL session cache.

Any client (apps that didn't create an SSL server session cache) will have the locks freed at NSS_Shutdown time.
Assignee: nelson → julien.pierre.boogz
Depends on: 403240
Target Milestone: 3.12 → 3.11.10
Attachment #303224 - Attachment description: patch v2 (checked in) → patch v2 (backed out)
Attachment #303224 - Attachment is obsolete: true
I am updating the description, since both the client and server and client cache locks are now always being created, and thus leaked in the client case.
Attachment 304633 [details] [diff] takes care of freeing both locks in either the client or server case.
Summary: Client session cache lock not freed on strsclnt shutdown. → Session cache locks not freed on strsclnt shutdown.
OS: Solaris → All
Hardware: Sun → All
Attachment #304633 - Flags: review?(nelson)
Comment on attachment 304633 [details] [diff] [review]
Free locks, differently depending on server or client case

If you don't care about the return value of ssl_FreeLocks()
and ssl_ShutdownLocks(), you can combine them into one
function and just use NSS_RegisterShutdown to free the
locks.  The early init code already makes the init code more
complex, we shouldn't add more complexity to the shutdown
code.
Wan-Teh,

I cannot combine ssl_FreeLocks and ssl_ShutdownLocks . The early init case usually happens before NSS is initialized, and thus I cannot call NSS_RegisterShutdown - it will fail. That is the problem that Nelson ran into with attachment 303224 [details] [diff] [review] .
Comment on attachment 304633 [details] [diff] [review]
Free locks, differently depending on server or client case

1. The new lock handling functions all have ambiguous names.
Please insert the words "SessionCache" into the names of all these 
functions that initialize locks for both caches.  LibSSL contains 
many locks for many purposes, and I want the function names to be 
unambiguous.  So please change function names as follows:

initLocks	-> initSessionCacheLocks	
freeLocks	-> freeSessionCacheLocks	
ssl_FreeLocks	-> ssl_FreeSessionCacheLocks	
initLocksLate	-> initSessionCacheLocksLate	
ssl_InitLocks	-> ssl_InitSessionCacheLocks	

I will use these names in the comments below.

2. In bug 403240 comment 36, Wan-Teh suggested that you add a comment to 
ssl_InitSessionCacheLocks describing the argument to that function.  
He suggested using something like this:

"lateInit means that the call is not happening during a 1-time 
initialization function but rather during dynamic, lazy initialization".

I like that use of the word "lazy", and think it is more accurate than "late",
so I suggest you rename that argument "lazyInit".  I might also suggest that
you rename initSessionCacheLocksLate to something like 
lazyInitSessionCacheLocks.

3. Several functions that can return SECFailure have multiple paths 
that can do so.  Some of those paths set an error code.  Others do not.
Please ensure that ALL error paths set an error code.  The functions
that do this are:

a) ssl_FreeSymWrapKeysLock 
b) ssl_FreeClientSessionCacheLock 
c) ssl_FreeSessionCacheLocks 

4. function initSessionCacheLocks calls two other lock initialization functions.
It is possible that one will succeed but the other will fail.  In that case,
the lock that was successfully created will be leaked.  

I propose a simple solution.  At the end of initSessionCacheLocks, before
returning SECFailure, call freeSessionCacheLocks() (but ignore its result).  
I also suggest that you save the error code before calling 
freeSessionCacheLocks() and restore it afterwords, so that the error code 
that is in place when initSessionCacheLocks returns will be the initialization 
error and not some error code that occurs while trying to destroy the locks.

Otherwise this patch looks good. :)
Attachment #304633 - Flags: review?(nelson) → review-
I agree that a good term to describe the two types of initialization
will help someone to understand the code quicker.  "Dynamic initialization"
comes from the pthread_once man page.  Like Nelson, I like "lazy init",
which is why I used "dynamic, lazy init".

I was able to follow Julien's patch quickly in spite of the function name
issue because all the functions are short and right next to each other.
Better names, or a comment describing how these functions are related,
would certainly help.

Julien, it's not clear why you want these functions to return a status.
It seems that you want to detect calling these free functions twice.
If you allow these functions to be called more than once, these functions
can be made much shorter.
Attached patch Update (obsolete) — Splinter Review
Nelson,

Re: comment 18,

1) ssl_initLocks was added in a previous patch, and you didn't object to the naming when it was checked in :). But I don't object to the renaming, so it is in this patch.

2) I had already included Wan-Teh's  comment. I renamed the argument too.
I think these terms designate the same thing. I don't care much one way or the other.

3) I have added the PORT_SetError in this patch.

4) I have plugged that leak. But I hope that's not a real-world error case (running out of memory half way through the lock creation).

Wan-Teh,

Re: comment 19,

I added a couple of comments in this patch. Combined with the function renaming, I think this should be sufficient to understand the code.

I don't want to allow the functions to be called twice because this would go against the whole premise of the patch. I especially care about the early init case for servers and the subsequent code path being fast (not having to run through a lock or memory barrier to create the locks). If we run into problems with this code again, I think we'll be glad there are some failure conditions that we can check for when debugging.
Attachment #304633 - Attachment is obsolete: true
Attachment #304888 - Flags: superreview?(wtc)
Attachment #304888 - Flags: review?(nelson)
Comment on attachment 304888 [details] [diff] [review]
Update

Thanks, Julien.  Looks good.
Attachment #304888 - Flags: review?(nelson) → review+
Comment on attachment 304888 [details] [diff] [review]
Update

Function names should begin with a capital
letter, except for the prefix.  For example,

SSL_FooBar
ssl_FooBar
FooBar

Names beginning with a lowercase letter are
usually variables. For example,

fooBar

This naming convention is quite clear in
sslsnce.c, although there are a few exceptions.

I'll review the patch more carefully tomorrow.
Once upon a time, the convention was:
- public function names began with a capitalized prefix and an underscore, 
and the first letter after the underscore was capitalized, e.g.,  SSL_Foo
- public variable names began with a capitalized prefix but no underscore, 
e.g., SSLFoo
- non-public extern function names began with lower case prefix and underscore
and the first character after the underscore was capitalized.  ssl_Foo
- non-public extern variable names began with lower case prefix but no 
underscore, e.g., sslFoo
- static functions and variables typically have no prefix and the first character
is lower case.  
Comment on attachment 304888 [details] [diff] [review]
Update

Julien, the problem with not allowing these free
functions to be called twice is that it is different from what NSS has been doing.  This
inconsistency is evident in ssl3con.c.  Right
below the new ssl_FreeSymWrapKeysLock function
you added, we see SSL3_ShutdownServerCache return
SECSuccess if the cache wasn't initialized.  In
contrast, your new function fails with the
SEC_ERROR_NOT_INITIALIZED error under the same
condition.  When a new NSS developer or external
contributor needs to add a new free function in
the future, he won't know which convention to
follow.

Rename (capitalize after ssl_) the following
functions:

ssl_freeSessionCacheLocks => ssl_FreeSessionCacheLocks

ssl_initSessionCacheLocks => ssl_InitSessionCacheLocks

I would also capitalize freeSessionCacheLocks
and initSessionCacheLocks to follow the prevalent
style in sslnonce.c.

In sslnonce.c, list "nss.h" next to the other
NSS headers rather than below the system header
<time.h>.

freeSessionCacheLocks should be declared static.

It seems that ssl_ShutdownLocks should set the
SEC_ERROR_LIBRARY_FAILURE error if it fails.

Keep lockOnce and initSessionCacheLocksLazily
next to each other because they're used together
as arguments to PR_CallOnce.

Move initSessionCacheLocks down to right above
initSessionCacheLocksLazily so that all the free
functions are together and all the init functions
are together.

The comment I asked for is one that describes
the relations between the free functions, giving
an overview of your plan of attack  -- a core
free function that does the real work, a free
function for the early init case, and a free
function (NSS shutdown handler) for the lazy
init case.  But what you did is fine.  Since
these functions are short and next to each
other, it's easy to figure out what the overall
plan of attack is.
Attachment #304888 - Flags: superreview?(wtc) → superreview+
Nelson, Wan-Teh,

Thanks for the reviews.

I made all the changes from comment 24 except :

- I didn't keep lockOnce and initSessionCacheLocksLazily together . That's because ssl_shutdownLocks depends on lockOnce, and initSessionCacheLocksLazily depends on ssl_shutdownLocks . To change the order would have meant an extraneous foward function declaration for ssl_shutdownLocks . I think that was more confusing. I feel the forward declaration should only be necessary when there is a mutual dependency, which isn't the case here.

- I also didn't move initSessionCacheLocks up, because it depends on initSessionCacheLocksLazily, and that meant the later would have to be forward-declared.

I checked this in on the trunk :

Checking in ssl3con.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v  <--  ssl3con.c
new revision: 1.108; previous revision: 1.107
done
Checking in sslimpl.h;
/cvsroot/mozilla/security/nss/lib/ssl/sslimpl.h,v  <--  sslimpl.h
new revision: 1.64; previous revision: 1.63
done
Checking in sslnonce.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslnonce.c,v  <--  sslnonce.c
new revision: 1.23; previous revision: 1.22
done
Checking in sslsnce.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsnce.c,v  <--  sslsnce.c
new revision: 1.43; previous revision: 1.42
done

And to NSS_3_11_BRANCH :

Checking in ssl3con.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v  <--  ssl3con.c
new revision: 1.76.2.24; previous revision: 1.76.2.23
done
Checking in sslimpl.h;
/cvsroot/mozilla/security/nss/lib/ssl/sslimpl.h,v  <--  sslimpl.h
new revision: 1.42.2.16; previous revision: 1.42.2.15
done
Checking in sslnonce.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslnonce.c,v  <--  sslnonce.c
new revision: 1.17.2.2; previous revision: 1.17.2.1
done
Checking in sslsnce.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsnce.c,v  <--  sslsnce.c
new revision: 1.36.2.6; previous revision: 1.36.2.5
done
Attachment #304888 - Attachment is obsolete: true
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Julien, this is what I meant.  I also capitalized
initSessionCacheLocksLazily.

Feel free to ignore me.

Good work!
I updated the ignored leak stack list on both the trunk and branch .

Checking in ignored;
/cvsroot/mozilla/security/nss/tests/memleak/ignored,v  <--  ignored
new revision: 1.1.2.24; previous revision: 1.1.2.23
done

Checking in ignored;
/cvsroot/mozilla/security/nss/tests/memleak/ignored,v  <--  ignored
new revision: 1.64; previous revision: 1.63
done
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: