safe_pclose() fails to wait for child causing a deadlock w/PATCH

RESOLVED DUPLICATE of bug 51429

Status

--
critical
RESOLVED DUPLICATE of bug 51429
12 years ago
12 years ago

People

(Reporter: kurt, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

12 years ago
User-Agent:       Mozilla/5.0 (X11; U; OpenBSD i386; en-US; rv:1.8.1.4) Gecko/20070609 Firefox/2.0.0.4
Build Identifier: 3.11.5

As it currently is written safe_pclose() is nether portable nor robust. It has three problems 1) assumes kill() on zombie processes will not return ESRCH, 2) doesn't handle EINTR errors from waitpid() 3) gives up waiting for the child after an arbitrary number of attempts.

When one of the above problems occurs safe_pclose() returns leaving the zombie process around. Later when a process is created using nspr interfaces the WaitPidDaemonThread() wakes up, calls waitpid() which returns the nss zombie status and then goes to sleep forever (since pr_wp.numProcs is now 0). Finally when PR_WaitProcess() is called on the UI thread it deadlocks waiting for the status of the new child, but WaitPidDaemonThread() will not wake up again.

Reproducible: Sometimes

Steps to Reproduce:
One way to reproduce is to use thunderbird w/engimail on an operating system that returns ESRCH when kill() is called on zombie processes. Then you need to be lucky and win a race where the first waitpid() call returns 0, the child terminates and then kill() is called on the zombie process returning ESRCH, safe_pclose() bails from the loop leaving the zombie remaining. The remaining zombie sets up the conditions for the deadlock between nspr's WaitPidDaemonThread() and PR_WaitProcess().



While POSIX recommends that kill() on a zombie process not return ESRCH, it does acknowledge that many systems implement this differently. To write portable code you can not make an assumption about what kill() will return on a zombie process.
(Reporter)

Comment 1

12 years ago
Created attachment 269376 [details] [diff] [review]
patch to correct portablity and robustnes issues in safe_pclose

Comment 2

12 years ago
Confirming.
Status: UNCONFIRMED → NEW
Ever confirmed: true
You guys are looking at OLD code.  
This bug was fixed 4+ months ago.  See
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/freebl/unix_rand.c&rev=1.17.10.6#859

Martynas, you should not "confirm" bugs that are filed against old 
versions of the code.  Before confirming a bug, please check that it is 
still present in the current sources.  
Severity: critical → normal
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → DUPLICATE
Version: unspecified → 3.0
Duplicate of bug: 51429
(Reporter)

Comment 4

12 years ago
Thanks for the pointer to the new code. It has the exact same race I described in this bug report. first waitpid() returns 0, child terminates, kill on zombie returns -1 with errno ESRCH causing the blocking waitpid not to be called.

The code should look like this to be portable (see comments above about what POSIX says about kill() on terminated processes):

60     /* if the child hasn't exited, kill it -- we're done with its output */
861     while ((rv = waitpid(pid, &status, WNOHANG)) == -1 && errno == EINTR)
862         ;
863     if (rv == 0) {
            kill(pid, SIGKILL);
864         while ((rv = waitpid(pid, &status, 0)) == -1 && errno == EINTR)
865             ;
866     }

Should this bug report be reopened or the other one?
The other one.
(Reporter)

Comment 6

12 years ago
I can't find a way to reopen the other bug report. There isn't an option to reopen available on the other report, so I'm reopening this report. Please see comment 4 for a proposed patch that fixes the remaining portability issue with safe_pclose(). This problem leads to deadlock of thunderbird w/engimail. 
Severity: normal → critical
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---

Comment 7

12 years ago
Created attachment 269706 [details] [diff] [review]
unix_rand.diff, diff against trunk

from comment 4
Attachment #269376 - Attachment is obsolete: true

Comment 8

12 years ago
Nelson, i don't think the other one should be reopened, because this is a different issue, a new bug, that is.
Abolutely a duplicate of bug 51429.  It says that the supposed fix for
bug 51429 was insufficient.  I will reopen 51429.
Status: REOPENED → RESOLVED
Last Resolved: 12 years ago12 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 51429
You need to log in before you can comment on or make changes to this bug.