Closed
Bug 385465
Opened 17 years ago
Closed 17 years ago
safe_pclose() fails to wait for child causing a deadlock w/PATCH
Categories
(NSS :: Libraries, defect)
Tracking
(Not tracked)
People
(Reporter: kurt, Unassigned)
References
()
Details
Attachments
(1 file, 1 obsolete file)
880 bytes,
patch
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; OpenBSD i386; en-US; rv:1.8.1.4) Gecko/20070609 Firefox/2.0.0.4
Build Identifier: 3.11.5
As it currently is written safe_pclose() is nether portable nor robust. It has three problems 1) assumes kill() on zombie processes will not return ESRCH, 2) doesn't handle EINTR errors from waitpid() 3) gives up waiting for the child after an arbitrary number of attempts.
When one of the above problems occurs safe_pclose() returns leaving the zombie process around. Later when a process is created using nspr interfaces the WaitPidDaemonThread() wakes up, calls waitpid() which returns the nss zombie status and then goes to sleep forever (since pr_wp.numProcs is now 0). Finally when PR_WaitProcess() is called on the UI thread it deadlocks waiting for the status of the new child, but WaitPidDaemonThread() will not wake up again.
Reproducible: Sometimes
Steps to Reproduce:
One way to reproduce is to use thunderbird w/engimail on an operating system that returns ESRCH when kill() is called on zombie processes. Then you need to be lucky and win a race where the first waitpid() call returns 0, the child terminates and then kill() is called on the zombie process returning ESRCH, safe_pclose() bails from the loop leaving the zombie remaining. The remaining zombie sets up the conditions for the deadlock between nspr's WaitPidDaemonThread() and PR_WaitProcess().
While POSIX recommends that kill() on a zombie process not return ESRCH, it does acknowledge that many systems implement this differently. To write portable code you can not make an assumption about what kill() will return on a zombie process.
Reporter | ||
Comment 1•17 years ago
|
||
Comment 3•17 years ago
|
||
You guys are looking at OLD code.
This bug was fixed 4+ months ago. See
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/freebl/unix_rand.c&rev=1.17.10.6#859
Martynas, you should not "confirm" bugs that are filed against old
versions of the code. Before confirming a bug, please check that it is
still present in the current sources.
Severity: critical → normal
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → DUPLICATE
Version: unspecified → 3.0
Reporter | ||
Comment 4•17 years ago
|
||
Thanks for the pointer to the new code. It has the exact same race I described in this bug report. first waitpid() returns 0, child terminates, kill on zombie returns -1 with errno ESRCH causing the blocking waitpid not to be called.
The code should look like this to be portable (see comments above about what POSIX says about kill() on terminated processes):
60 /* if the child hasn't exited, kill it -- we're done with its output */
861 while ((rv = waitpid(pid, &status, WNOHANG)) == -1 && errno == EINTR)
862 ;
863 if (rv == 0) {
kill(pid, SIGKILL);
864 while ((rv = waitpid(pid, &status, 0)) == -1 && errno == EINTR)
865 ;
866 }
Should this bug report be reopened or the other one?
Comment 5•17 years ago
|
||
The other one.
Reporter | ||
Comment 6•17 years ago
|
||
I can't find a way to reopen the other bug report. There isn't an option to reopen available on the other report, so I'm reopening this report. Please see comment 4 for a proposed patch that fixes the remaining portability issue with safe_pclose(). This problem leads to deadlock of thunderbird w/engimail.
Severity: normal → critical
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Comment 8•17 years ago
|
||
Nelson, i don't think the other one should be reopened, because this is a different issue, a new bug, that is.
Comment 9•17 years ago
|
||
Abolutely a duplicate of bug 51429. It says that the supposed fix for
bug 51429 was insufficient. I will reopen 51429.
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•