Closed Bug 397486 Opened 14 years ago Closed 14 years ago
Session cache locks not freed on strsclnt shutdown
2.15 KB, patch
|Details | Diff | Splinter Review|
6.74 KB, patch
|Details | Diff | Splinter Review|
1.67 KB, patch
|Details | Diff | Splinter Review|
Client session cache lock not freed on strsclnt shutdown. Block in use (biu): Found block of size 88 bytes at address 0x80f0160 (2.16% of total) At time of allocation, the call stack was:  calloc() at 0xb4d308a0  PR_Calloc() at line 475 in "prmem.c"  PR_NewLock() at line 174 in "ptsynch.c"  __nss_InitLock() at line 67 in "nsslocks.c"  ssl_InitClientSessionCacheLock() at line 74 in "sslnonce.c"  lock_cache() at line 81 in "sslnonce.c"  ssl_LookupSID() at line 166 in "sslnonce.c"  ssl2_BeginClientHandshake() at line 3060 in "sslcon.c"  ssl_Do1stHandshake() at line 151 in "sslsecur.c"  ssl_SecureSend() at line 1152 in "sslsecur.c"  ssl_Send() at line 1432 in "sslsock.c"  PR_Send() at line 226 in "priometh.c"  handle_connection() at line 696 in "strsclnt.c"  do_connects() at line 887 in "strsclnt.c"  thread_wrapper() at line 439 in "strsclnt.c"  _pt_root() at line 221 in "ptthread.c"
Component: Test → Libraries
Priority: -- → P2
Target Milestone: --- → 3.12
also eliminates a warning.
Attachment #282920 - Flags: review?(alexei.volkov.bugs)
Slavo, This bug report contains only a single stack, but there are 4 stacks in the ignored file. For each and every memory leak bug, please ensure that a) all the stacks in the ignored file also appear in the bug report b) there are no stacks dupliated (same stack appears in multiple places in the ignored file).
Nelson, these stacks differs only in small details dependent on architecture, here are the next: Stack from Solaris - DBG -> in description Stack from Solaris/Sparc - OPT: Block in use (biu): Found block of size 88 bytes at address 0x86b40 (2.44% of total) At time of allocation, the call stack was:  calloc() at 0xde141028  PR_NewLock() at 0xe592952c  __nss_InitLock() at 0xe956af18  ssl_LookupSID() at 0xebd1a658  ssl2_BeginClientHandshake() at 0xebd18068  ssl_Do1stHandshake() at 0xebd1ae54  ssl_SecureSend() at 0xebd1cd0c  ssl_Send() at 0xebd22e00  do_connects() at 0x167d4  thread_wrapper() at 0x14e78  _pt_root() at 0xe5930bf8  _lwp_start() at 0xf35c04f4 Stack from Solaris/X86 - OPT: Block in use (biu): Found block of size 88 bytes at address 0x80cbe28 (2.66% of total) At time of allocation, the call stack was:  calloc() at 0xa1531853  PR_Calloc() at 0xa8d12576  PR_NewLock() at 0xa8d23bf5  __nss_InitLock() at 0xac96f53e  ssl_LookupSID() at 0xaf11ce57  ssl2_BeginClientHandshake() at 0xaf11a352  ssl_Do1stHandshake() at 0xaf11d67f  ssl_SecureSend() at 0xaf11f6fd  ssl_Send() at 0xaf124b84  PR_Send() at 0xa8d0a676  do_connects() at 0x8056652  thread_wrapper() at 0x805517c  _pt_root() at 0xa8d2b365  _thr_setup() at 0xa159fd36  _lwp_start() at 0xa15a0020 Stacks from Linux - DBG: ==30263== 88 bytes in 1 blocks are still reachable in loss record 51 of 117 ==30263== at 0x43B96BF: calloc (vg_replace_malloc.c:279) ==30263== by 0x44FE6EC: PR_Calloc (prmem.c:474) ==30263== by 0x4510193: PR_NewLock (ptsynch.c:174) ==30263== by 0x61CB71A: __nss_InitLock (nsslocks.c:67) ==30263== by 0x43F1150: ssl_InitClientSessionCacheLock (sslnonce.c:74) ==30263== by 0x43F116D: lock_cache (sslnonce.c:80) ==30263== by 0x43F1366: ssl_LookupSID (sslnonce.c:165) ==30263== by 0x43EDE56: ssl2_BeginClientHandshake (sslcon.c:3059) ==30263== by 0x43F1D9F: ssl_Do1stHandshake (sslsecur.c:151) ==30263== by 0x43F3EE7: ssl_SecureSend (sslsecur.c:1152) ==30263== by 0x43F9E0C: ssl_Send (sslsock.c:1432) ==30263== by 0x44F3D70: PR_Send (priometh.c:226) ==30263== by 0x804CBAF: handle_connection (strsclnt.c:696) ==30263== by 0x804D262: do_connects (strsclnt.c:887) ==30263== by 0x804C434: thread_wrapper (strsclnt.c:439) ==30263== by 0x451854D: _pt_root (ptthread.c:221) ==30263== by 0x55D370: start_thread (in /lib/tls/libpthread-2.3.4.so) ==30263== by 0x4BDFFD: clone (in /lib/tls/libc-2.3.4.so) I don't understand the reason of patch from comment 3. Why to remove stacks of these bug from ignored file ? Because of missing stacks from this comment ? And why to remove stack of bug 397478 ?
Comment on attachment 282920 [details] [diff] [review] patch v1 Since we consider SSL_ClearSessionCache to be thread save(derives from use of LOCK_CACHE/UNLOCK_CACHE), we need to take care of destruction of cacheLock in thread save manner(like the one that is used in nss_InitLock).
Attachment #282920 - Flags: review?(alexei.volkov.bugs) → review-
Comment on attachment 282923 [details] [diff] [review] patch to file of ignored leak stacks (checked in) r=alexei
Attachment #282923 - Flags: review?(alexei.volkov.bugs) → review+
In reply to comment 4: > I don't understand the reason of patch from comment 3. > Why to remove stacks of these bug from ignored file ? That patch removes the stacks that (I predict) will be fixed (eliminated) by the first patch. In reply to comment 5: As you know, there is no thread safe way to destroy a lock. The code that destroys a lock must be called in a circumstance in which the caller knows that there is no longer any remaining use of that lock, such as at Shutdown time. Having said that, your review comment made me wonder if SSL_ClearSessionCache is only called during shutdown. Perhaps the browser calls it at other times. So, I will create a new cache destruction function and register it to be called by NSS_Shutdown. Thanks for the review.
Attachment #282920 - Attachment is obsolete: true
This patch attempts to ensure that the client cache lock only gets freed when NSS is shutdown, which presumably is a safe time to free it.
Attachment #303224 - Flags: review?(alexei.volkov.bugs)
Comment on attachment 303224 [details] [diff] [review] patch v2 (backed out) r=alexei
Attachment #303224 - Flags: review?(alexei.volkov.bugs) → review+
Comment on attachment 303224 [details] [diff] [review] patch v2 (backed out) Checking in sslnonce.c; new revision: 1.18; previous revision: 1.17
Attachment #303224 - Attachment description: patch v2 → patch v2 (checked in)
Comment on attachment 282923 [details] [diff] [review] patch to file of ignored leak stacks (checked in) Checking in ignored; new revision: 1.58; previous revision: 1.57
Attachment #282923 - Attachment description: patch to file of ignored leak stacks → patch to file of ignored leak stacks (checked in)
I figured out the problem as we were on the way to dinner. The problem is that in some cases, the server cache is initialized before NSS is initialized. When the server session cache is initialized, it also initializes the lock for the client session cache. The new code for initializing and destroying the client session cache lock wants to register a shutdown callback, but the code to register a shutdown callback fails if NSS is not initialized. I think MAYBE that's a bug/mistake. Maybe registering a shutdown callback handler should work OK even if NSS has not yet been initialized. I know why my testing didn't catch this. I didn't test the multi-process server configuration, which triggers this bug, because I forgot that the client session cache lock is also used by the server session cache. So, it's back to the drawing board for this bug. Maybe in the end I'll recommit the patches committed above, along with a change to the function that registers shutdown callbacks. Or maybe I'll go to plan B.
Nelson, In patchv2, ssl_InitClientSessionCacheLock uses PR_CallOnce incorrectly. There shouldn't be an if (!cacheLock) before the call to PR_CallOnce . I think many, if not most servers initialize the session cache before calling NSS_Initialize, whether they are single-process or multi-process. Currently, our QA only tests the single-process case. But it is still done before in selfserv. So, we cannot rely on an NSS shutdown function to free the locks in the server case. This works for the client case, however. I think we have to treat the two cases separately : 1) for servers, free all the SSL locks at the same time as the SSL session cache 2) for clients, free all the SSL locks in an NSS shutdown callback This requires knowing which case you are in before trying to register the callback. I strongly suggest that we fix this bug after 403240 . My patch there has a global variable called LocksInitializedEarly, which you can simply test to decide whether or not to register the shutdown callback to free the locks. Any server that created the locks early during session cache init will be expected to call the proper API to free the SSL session cache. Any client (apps that didn't create an SSL server session cache) will have the locks freed at NSS_Shutdown time.
Assignee: nelson → julien.pierre.boogz
Depends on: 403240
I am updating the description, since both the client and server and client cache locks are now always being created, and thus leaked in the client case. Attachment 304633 [details] [diff] takes care of freeing both locks in either the client or server case.
Summary: Client session cache lock not freed on strsclnt shutdown. → Session cache locks not freed on strsclnt shutdown.
Comment on attachment 304633 [details] [diff] [review] Free locks, differently depending on server or client case If you don't care about the return value of ssl_FreeLocks() and ssl_ShutdownLocks(), you can combine them into one function and just use NSS_RegisterShutdown to free the locks. The early init code already makes the init code more complex, we shouldn't add more complexity to the shutdown code.
Wan-Teh, I cannot combine ssl_FreeLocks and ssl_ShutdownLocks . The early init case usually happens before NSS is initialized, and thus I cannot call NSS_RegisterShutdown - it will fail. That is the problem that Nelson ran into with attachment 303224 [details] [diff] [review] .
Comment on attachment 304633 [details] [diff] [review] Free locks, differently depending on server or client case 1. The new lock handling functions all have ambiguous names. Please insert the words "SessionCache" into the names of all these functions that initialize locks for both caches. LibSSL contains many locks for many purposes, and I want the function names to be unambiguous. So please change function names as follows: initLocks -> initSessionCacheLocks freeLocks -> freeSessionCacheLocks ssl_FreeLocks -> ssl_FreeSessionCacheLocks initLocksLate -> initSessionCacheLocksLate ssl_InitLocks -> ssl_InitSessionCacheLocks I will use these names in the comments below. 2. In bug 403240 comment 36, Wan-Teh suggested that you add a comment to ssl_InitSessionCacheLocks describing the argument to that function. He suggested using something like this: "lateInit means that the call is not happening during a 1-time initialization function but rather during dynamic, lazy initialization". I like that use of the word "lazy", and think it is more accurate than "late", so I suggest you rename that argument "lazyInit". I might also suggest that you rename initSessionCacheLocksLate to something like lazyInitSessionCacheLocks. 3. Several functions that can return SECFailure have multiple paths that can do so. Some of those paths set an error code. Others do not. Please ensure that ALL error paths set an error code. The functions that do this are: a) ssl_FreeSymWrapKeysLock b) ssl_FreeClientSessionCacheLock c) ssl_FreeSessionCacheLocks 4. function initSessionCacheLocks calls two other lock initialization functions. It is possible that one will succeed but the other will fail. In that case, the lock that was successfully created will be leaked. I propose a simple solution. At the end of initSessionCacheLocks, before returning SECFailure, call freeSessionCacheLocks() (but ignore its result). I also suggest that you save the error code before calling freeSessionCacheLocks() and restore it afterwords, so that the error code that is in place when initSessionCacheLocks returns will be the initialization error and not some error code that occurs while trying to destroy the locks. Otherwise this patch looks good. :)
Attachment #304633 - Flags: review?(nelson) → review-
I agree that a good term to describe the two types of initialization will help someone to understand the code quicker. "Dynamic initialization" comes from the pthread_once man page. Like Nelson, I like "lazy init", which is why I used "dynamic, lazy init". I was able to follow Julien's patch quickly in spite of the function name issue because all the functions are short and right next to each other. Better names, or a comment describing how these functions are related, would certainly help. Julien, it's not clear why you want these functions to return a status. It seems that you want to detect calling these free functions twice. If you allow these functions to be called more than once, these functions can be made much shorter.
Nelson, Re: comment 18, 1) ssl_initLocks was added in a previous patch, and you didn't object to the naming when it was checked in :). But I don't object to the renaming, so it is in this patch. 2) I had already included Wan-Teh's comment. I renamed the argument too. I think these terms designate the same thing. I don't care much one way or the other. 3) I have added the PORT_SetError in this patch. 4) I have plugged that leak. But I hope that's not a real-world error case (running out of memory half way through the lock creation). Wan-Teh, Re: comment 19, I added a couple of comments in this patch. Combined with the function renaming, I think this should be sufficient to understand the code. I don't want to allow the functions to be called twice because this would go against the whole premise of the patch. I especially care about the early init case for servers and the subsequent code path being fast (not having to run through a lock or memory barrier to create the locks). If we run into problems with this code again, I think we'll be glad there are some failure conditions that we can check for when debugging.
Comment on attachment 304888 [details] [diff] [review] Update Thanks, Julien. Looks good.
Attachment #304888 - Flags: review?(nelson) → review+
Comment on attachment 304888 [details] [diff] [review] Update Function names should begin with a capital letter, except for the prefix. For example, SSL_FooBar ssl_FooBar FooBar Names beginning with a lowercase letter are usually variables. For example, fooBar This naming convention is quite clear in sslsnce.c, although there are a few exceptions. I'll review the patch more carefully tomorrow.
Once upon a time, the convention was: - public function names began with a capitalized prefix and an underscore, and the first letter after the underscore was capitalized, e.g., SSL_Foo - public variable names began with a capitalized prefix but no underscore, e.g., SSLFoo - non-public extern function names began with lower case prefix and underscore and the first character after the underscore was capitalized. ssl_Foo - non-public extern variable names began with lower case prefix but no underscore, e.g., sslFoo - static functions and variables typically have no prefix and the first character is lower case.
Comment on attachment 304888 [details] [diff] [review] Update Julien, the problem with not allowing these free functions to be called twice is that it is different from what NSS has been doing. This inconsistency is evident in ssl3con.c. Right below the new ssl_FreeSymWrapKeysLock function you added, we see SSL3_ShutdownServerCache return SECSuccess if the cache wasn't initialized. In contrast, your new function fails with the SEC_ERROR_NOT_INITIALIZED error under the same condition. When a new NSS developer or external contributor needs to add a new free function in the future, he won't know which convention to follow. Rename (capitalize after ssl_) the following functions: ssl_freeSessionCacheLocks => ssl_FreeSessionCacheLocks ssl_initSessionCacheLocks => ssl_InitSessionCacheLocks I would also capitalize freeSessionCacheLocks and initSessionCacheLocks to follow the prevalent style in sslnonce.c. In sslnonce.c, list "nss.h" next to the other NSS headers rather than below the system header <time.h>. freeSessionCacheLocks should be declared static. It seems that ssl_ShutdownLocks should set the SEC_ERROR_LIBRARY_FAILURE error if it fails. Keep lockOnce and initSessionCacheLocksLazily next to each other because they're used together as arguments to PR_CallOnce. Move initSessionCacheLocks down to right above initSessionCacheLocksLazily so that all the free functions are together and all the init functions are together. The comment I asked for is one that describes the relations between the free functions, giving an overview of your plan of attack -- a core free function that does the real work, a free function for the early init case, and a free function (NSS shutdown handler) for the lazy init case. But what you did is fine. Since these functions are short and next to each other, it's easy to figure out what the overall plan of attack is.
Attachment #304888 - Flags: superreview?(wtc) → superreview+
Nelson, Wan-Teh, Thanks for the reviews. I made all the changes from comment 24 except : - I didn't keep lockOnce and initSessionCacheLocksLazily together . That's because ssl_shutdownLocks depends on lockOnce, and initSessionCacheLocksLazily depends on ssl_shutdownLocks . To change the order would have meant an extraneous foward function declaration for ssl_shutdownLocks . I think that was more confusing. I feel the forward declaration should only be necessary when there is a mutual dependency, which isn't the case here. - I also didn't move initSessionCacheLocks up, because it depends on initSessionCacheLocksLazily, and that meant the later would have to be forward-declared. I checked this in on the trunk : Checking in ssl3con.c; /cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v <-- ssl3con.c new revision: 1.108; previous revision: 1.107 done Checking in sslimpl.h; /cvsroot/mozilla/security/nss/lib/ssl/sslimpl.h,v <-- sslimpl.h new revision: 1.64; previous revision: 1.63 done Checking in sslnonce.c; /cvsroot/mozilla/security/nss/lib/ssl/sslnonce.c,v <-- sslnonce.c new revision: 1.23; previous revision: 1.22 done Checking in sslsnce.c; /cvsroot/mozilla/security/nss/lib/ssl/sslsnce.c,v <-- sslsnce.c new revision: 1.43; previous revision: 1.42 done And to NSS_3_11_BRANCH : Checking in ssl3con.c; /cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v <-- ssl3con.c new revision: 22.214.171.124; previous revision: 126.96.36.199 done Checking in sslimpl.h; /cvsroot/mozilla/security/nss/lib/ssl/sslimpl.h,v <-- sslimpl.h new revision: 188.8.131.52; previous revision: 184.108.40.206 done Checking in sslnonce.c; /cvsroot/mozilla/security/nss/lib/ssl/sslnonce.c,v <-- sslnonce.c new revision: 220.127.116.11; previous revision: 18.104.22.168 done Checking in sslsnce.c; /cvsroot/mozilla/security/nss/lib/ssl/sslsnce.c,v <-- sslsnce.c new revision: 22.214.171.124; previous revision: 126.96.36.199 done
Attachment #304888 - Attachment is obsolete: true
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Julien, this is what I meant. I also capitalized initSessionCacheLocksLazily. Feel free to ignore me. Good work!
I updated the ignored leak stack list on both the trunk and branch . Checking in ignored; /cvsroot/mozilla/security/nss/tests/memleak/ignored,v <-- ignored new revision: 188.8.131.52; previous revision: 184.108.40.206 done Checking in ignored; /cvsroot/mozilla/security/nss/tests/memleak/ignored,v <-- ignored new revision: 1.64; previous revision: 1.63 done
You need to log in before you can comment on or make changes to this bug.