Closed Bug 685592 Opened 13 years ago Closed 13 years ago

File descriptor leak after "service httpd reload"

Categories

(NSS :: Libraries, defect, P2)

3.12.10
x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: elio.maldonado.batiz, Assigned: rrelyea)

Details

Attachments

(1 file, 2 obsolete files)

As originally reported aginst RHEL 5

Pavel Moravec 2011-06-15 03:55:12 EDT

Description of problem:
When mod_nss of version >=1.0.8 installed, service httpd restart causes FD leak
- some descriptors for pipes are not properly closed.


Version-Release number of selected component (if applicable):
latest httpd + mod_nss-1.0.3-4.el5 - no issue
latest httpd + mod_nss-1.0.3-8.el5.x86_64 - No issue.
latest httpd + mod_nss-1.0.8-3.el5.x86_64 - Issue is seen
latest httpd + mod_nss-1.0.8-4.el5_6.1.x86_64  - Issue is seen


How reproducible:
100%

Steps to Reproduce:
1. install mod_nss (of version at least 1.0.8-3) and httpd
2. service httpd start
3. Check FD count for httpd processes
4. service httpd reload
5. Check FD count for httpd processes

Actual results:
Number of FD increased by 198 (e.g. from 1686 to 1884). Any repetition of step
4. causes the same increase.

Expected results:
FD leaks doesn't happen. 

Additional info:
Due to dependency of the FD leak on mod_nss version, the bug seems to be
introduced by rebasing mod_nss to 1.0.8 (#588858). nss_engine_init.c seems be
the core stuff, nss_init_Child, nss_init_ModuleKill and nss_init_ChildKill in
particular.
Additional info
he bug is in nss instead of mod_nss. service http reload causes
SSL_ShutdownServerSessionIDCache and SSL_ConfigMPServerSIDCache being called.
However SSL_ShutdownServerSessionIDCache and CloseCache called from it does not
destroy mutexes, as everInherited variable is set to true there (and it can be
set to PR_FALSE only in InitCache method, nowhere else).

Proposed patch (also attached):

# cat nss-3.12.8-fdleak.patch
diff -up nss-3.12.8/mozilla/security/nss/lib/ssl/sslsnce.c.fdleak
nss-3.12.8/mozilla/security/nss/lib/ssl/sslsnce.c
--- nss-3.12.8/mozilla/security/nss/lib/ssl/sslsnce.c.fdleak 2011-06-29
12:45:17.000000000 -0400
+++ nss-3.12.8/mozilla/security/nss/lib/ssl/sslsnce.c 2011-06-29
12:46:05.000000000 -0400
@@ -1379,6 +1379,7 @@ SSL_ConfigServerSessionIDCache( int     
 SECStatus
 SSL_ShutdownServerSessionIDCacheInstance(cacheDesc *cache)
 {
+    cache->sharedCache->everInherited=PR_FALSE;
     CloseCache(cache);
     return SECSuccess;
 }
#
This a naive patch attempt and will show a better one.
Bob Relyea 2011-08-22 19:35:38 EDT

So the entire purpose of the 'everInherited' flag is to prevent wanton
destruction of sslMutex locks by one process that closes out while other
processes continue to run. The patch basically defeats that purpose (see
comment at line 1030 of sslsnce.c).

That being said, sslMutex_Destroy on Unix platforms does the following:
   1) close our process's copies of the pipe used to do the locking.
   2) clears out the Mutex state in our copy of the mutex.

The down side here is what if Mutex is in shared memory -- which we see is the
case in line 1253 of sslsnce.c. When we clear out the state, we clear out the
state for all Mutexes. If a single process shuts down, then all the mutexes
will go way, so that means this patch, as is, is unacceptable.

We need a new function, sslMutex_Free() which frees this process's resources
associated with the mutex, but leaves the rest of the mutext alone. This can be
called when everInherited is set.
Created attachment 519371 [details]
Patch with does not clobber the mutex, but only frees process specific
resources.

This patch accomplishes the goals of the previous patches without breaking
certain instances where a server may shutdown only one of it's processes and
keep the other processes running.

---------
I'll attach Bob's patch next.
This Bob Relyea's patch for nss 3.12.10 code.
Attachment #559214 - Flags: review?(nelson)
Comment on attachment 559214 [details] [diff] [review]
Patch with does not clobber the mutex, but only frees process specific resources.

Adding Wan-Teh in case Nelson is tied up at the moment.
Attachment #559214 - Flags: superreview?(wtc)
Assignee: nobody → emaldona
Priority: -- → P2
Target Milestone: --- → 3.13
Comment on attachment 559214 [details] [diff] [review]
Patch with does not clobber the mutex, but only frees process specific resources.

Elio: thanks for the patch.  I suggest some changes.

1. sslmutex.c

>+j
>+SECStatus
>+sslMutex_Free(sslMutex *pMutex)

Remove the "j" before the "SECStatus" line.

It is unfortunate that sslMutex_Free has to duplicate
a lot of code, but I discovered that it requires
refactoring to share code with sslMutex_Destroy.
It may be worth doing though.

I believe the Windows version of sslMutex_Free should
call sslMutex_2LevelDestroy because pMutex->u.sslLock
(a PRLock) is a process-owned resource.

2. sslmutex.h

> extern SECStatus sslMutex_Destroy(sslMutex *sem);
> 
>+/* free process owned resources only */
>+extern SECStatus sslMutex_Free(sslMutex *sem);

The comment should also say "without modifying shared memory".

Nit: we should pick a better name because "Free" and "Destroy"
are too similar.  Perhaps "DestroyInherited" or "FreeInherited"?

3. sslsnce.c

> 	** be) in use by multiple processes.  We do not wish to destroy
> 	** the mutexes while they are still in use.  
> 	*/

This comment should be updated, and moved inside the for loop below,
because it describes the cache->sharedCache->everInherited check.

> 	    for (; locks_initialized > 0; --locks_initialized, ++pLock ) {
>-		sslMutex_Destroy(&pLock->mutex);
>+		if (PR_TRUE == cache->sharedCache->everInherited) {
>+		    /* free process owned resources ownly */
>+		    sslMutex_Free(&pLock->mutex);
>+		} else {
>+		    sslMutex_Destroy(&pLock->mutex);
>+		}

Remove "PR_TRUE == ".

Fix the typo "ownly".
Attachment #559214 - Flags: superreview?(wtc) → superreview-
Comment on attachment 559214 [details] [diff] [review]
Patch with does not clobber the mutex, but only frees process specific resources.

In sslmutex.c, there are two more implementations of sslMutex_Destroy:
http://mxr.mozilla.org/security/ident?i=sslMutex_Destroy

I think we should add two more implementations of sslMutex_Free to
avoid undefined symbols on those platforms.
Elio, Bob: when a process terminates, all the file descriptors are
closed automatically by the OS.  So could you clarify what you meant
by the FD leak?  I guess a tool such as Purify or valgrind reports
the FD leak, but the FDs are still closed by the OS, right?
Attachment #559214 - Attachment is patch: true
Assignee: emaldona → rrelyea
The leak is coming when the main process shuts everything down then restarts. The process itself never shuts down, So the file descriptors hang around forever. In the actual application, this happens when we do an httpd restart, which happens whenever the customer changes the configuration of server. Eventually the sever runs out of file descriptors after a number of restarts.

Unfortunately it's not a valgrind leak, which I would have actively ignored.

bob
> I believe the Windows version of sslMutex_Free should
> call sslMutex_2LevelDestroy because pMutex->u.sslLock
> (a PRLock) is a process-owned resource.

With pMutex->u.sslLock in shared memory, so are you sure? The init code does the following...

    PR_ASSERT(sem);
    if (sem) {
        /* we need to reset the sslLock in the children or the single_process init
           function below will assert */
        sem->u.sslLock = NULL;
    }
    return single_process_sslMutex_Init(sem);

This means there is only one Mutex (the last one created). If any process frees it, the rest of the processes are hosed if they try to use it.

> It is unfortunate that sslMutex_Free has to duplicate
> a lot of code, but I discovered that it requires
> refactoring to share code with sslMutex_Destroy.
> It may be worth doing though.

right. I discovered the same thing, (my initial attempt was to have sslMutex_Destroy call sslMutex_Free. The refactor got pretty ugly pretty quickly, so I preferred the simpler patch (particularly since the duplicated code wasn't all the much code).

> The comment should also say "without modifying shared memory".

OK..

> Nit: we should pick a better name because "Free" and "Destroy"
>are too similar.  Perhaps "DestroyInherited" or "FreeInherited"?

I think Destroy is fine. I'm open for suggestions for a better name for the 'Free' call.

> Remove "PR_TRUE == ".

I prefer the style with PR_TRUE == removed, I was just trying to follow the prevailing style in lib/ssl (though I now see it's a mix).

> Fix the typo "ownly".
ok

> I think we should add two more implementations of sslMutex_Free to
> avoid undefined symbols on those platforms.

Ok
Incorporate wtc's review comments:

1) I've merged sslMutex_Free back into sslMutex_Destroy, and gave sslMutex_Destroy a single boolean to tell us whether or not to destroy the whole mutex, or just the process local pieces. This solved 2 of wtc's issues: duplicate code and naming. It also helped with the 2 othter sslMutex_Destroy functions... In one case we simply needed to add the parameter.

2) I've maintained not calling the level2 mutex destroy for the the reasons outlined in my comment 10.

3) The call to sslMutex_Destroy in sslsnce.c no longer needs to actually check toe state of the everInherited flag, is simply passes it to mutex_Destroy which treats is at the localProcess flag.

4) I've updated the comment.
Attachment #559214 - Attachment is obsolete: true
Attachment #559214 - Flags: review?(nelson)
Attachment #563795 - Flags: review?(wtc)
Comment on attachment 563795 [details] [diff] [review]
V2: Patch does not clober the mutex, only frees process specific resources

Thanks for the patch.  The patch has one bug.  You can check in after
fixing the bug.

1. lib/ssl/sslmutex.c

> #ifdef WINNT
>-    /* on NT, get rid of the PRLock used for fibers within a process */
>-    retvalue = sslMutex_2LevelDestroy(pMutex);
>+    if (processLocal) {
>+	/* Even though the NT mutex is created by each process, we threw away all but
>+         * one mutex. This means we can't destroy the one remaining one. */
>+	/* On NT, get rid of the PRLock used for fibers within a process. */
>+	retvalue = sslMutex_2LevelDestroy(pMutex);
>+    }
> #endif

BUG: please revert this change.  The only thing sslMutex_2LevelDestroy
does is a PR_DestroyLock call.  Here is the relevant code:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/ssl/sslmutex.h&rev=1.12&mark=74,79#66

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/ssl/sslmutex.c&rev=1.25&mark=362,364#362

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/ssl/sslmutex.c&rev=1.25&mark=56,64#56

NSPR locks are process local.  Also, I believe pMutex does NOT point to
shared memory for WINNT, as this code in sslsnce.c shows:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/ssl/sslsnce.c&rev=1.56&mark=1627,1633-1637,1644,1647-1649#1627

>+    /* are semephores global resources. See SEM_DESTROY(3) man page */

Nit: change "are semephores" to "semephores are".

2. lib/ssl/sslsnce.c

>+		sslMutex_Destroy(&pLock->mutex,cache->sharedCache->everInherited);

Nit: Add a space after the comma (,).
Attachment #563795 - Flags: review?(wtc) → review-
Checking in sslmutex.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslmutex.c,v  <--  sslmutex.c
new revision: 1.26; previous revision: 1.25
done
Checking in sslmutex.h;
/cvsroot/mozilla/security/nss/lib/ssl/sslmutex.h,v  <--  sslmutex.h
new revision: 1.13; previous revision: 1.12
done
Checking in sslsnce.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsnce.c,v  <--  sslsnce.c
new revision: 1.57; previous revision: 1.56
done

Checked in will all wtc's suggested changes.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
I fixed comment problems in Bob's checkin.

Checking in sslmutex.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslmutex.c,v  <--  sslmutex.c
new revision: 1.27; previous revision: 1.26
done
Checking in sslsnce.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsnce.c,v  <--  sslsnce.c
new revision: 1.58; previous revision: 1.57
done

This patch is a cumulative patch of what's checked in.
Attachment #563795 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: