Closed Bug 385465 Opened 18 years ago Closed 18 years ago

safe_pclose() fails to wait for child causing a deadlock w/PATCH

Tracking

(Not tracked)

Status:

RESOLVED DUPLICATE of bug 51429

People

(Reporter: kurt, Unassigned)

References

(
URL
)

Details

Attachments

(1 file, 1 obsolete file)

patch to correct portablity and robustnes issues in safe_pclose 18 years ago Kurt Miller 955 bytes, patch		Details \| Diff \| Splinter Review
unix_rand.diff, diff against trunk 18 years ago Martynas Venckus 880 bytes, patch		Details \| Diff \| Splinter Review

Kurt Miller

Reporter

Description

•

18 years ago

User-Agent: Mozilla/5.0 (X11; U; OpenBSD i386; en-US; rv:1.8.1.4) Gecko/20070609 Firefox/2.0.0.4 Build Identifier: 3.11.5 As it currently is written safe_pclose() is nether portable nor robust. It has three problems 1) assumes kill() on zombie processes will not return ESRCH, 2) doesn't handle EINTR errors from waitpid() 3) gives up waiting for the child after an arbitrary number of attempts. When one of the above problems occurs safe_pclose() returns leaving the zombie process around. Later when a process is created using nspr interfaces the WaitPidDaemonThread() wakes up, calls waitpid() which returns the nss zombie status and then goes to sleep forever (since pr_wp.numProcs is now 0). Finally when PR_WaitProcess() is called on the UI thread it deadlocks waiting for the status of the new child, but WaitPidDaemonThread() will not wake up again. Reproducible: Sometimes Steps to Reproduce: One way to reproduce is to use thunderbird w/engimail on an operating system that returns ESRCH when kill() is called on zombie processes. Then you need to be lucky and win a race where the first waitpid() call returns 0, the child terminates and then kill() is called on the zombie process returning ESRCH, safe_pclose() bails from the loop leaving the zombie remaining. The remaining zombie sets up the conditions for the deadlock between nspr's WaitPidDaemonThread() and PR_WaitProcess(). While POSIX recommends that kill() on a zombie process not return ESRCH, it does acknowledge that many systems implement this differently. To write portable code you can not make an assumption about what kill() will return on a zombie process.

Kurt Miller

Reporter

Comment 1

•

18 years ago

Attached patch patch to correct portablity and robustnes issues in safe_pclose (obsolete) — Details — Splinter Review

Martynas Venckus

Comment 2

•

18 years ago

Confirming.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Nelson Bolyard (seldom reads bugmail)

Comment 3

•

18 years ago

You guys are looking at OLD code. This bug was fixed 4+ months ago. See http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/freebl/unix_rand.c&rev=1.17.10.6#859 Martynas, you should not "confirm" bugs that are filed against old versions of the code. Before confirming a bug, please check that it is still present in the current sources.

URL: http://bonsai.mozilla.org/cvsblame.cg...

Severity: critical → normal

Status: NEW → RESOLVED

Closed: 18 years ago

Resolution: --- → DUPLICATE

Version: unspecified → 3.0

Kurt Miller

Reporter

Comment 4

•

18 years ago

Thanks for the pointer to the new code. It has the exact same race I described in this bug report. first waitpid() returns 0, child terminates, kill on zombie returns -1 with errno ESRCH causing the blocking waitpid not to be called. The code should look like this to be portable (see comments above about what POSIX says about kill() on terminated processes): 60 /* if the child hasn't exited, kill it -- we're done with its output */ 861 while ((rv = waitpid(pid, &status, WNOHANG)) == -1 && errno == EINTR) 862 ; 863 if (rv == 0) { kill(pid, SIGKILL); 864 while ((rv = waitpid(pid, &status, 0)) == -1 && errno == EINTR) 865 ; 866 } Should this bug report be reopened or the other one?

Nelson Bolyard (seldom reads bugmail)

Comment 5

•

18 years ago

The other one.

Kurt Miller

Reporter

Comment 6

•

18 years ago

I can't find a way to reopen the other bug report. There isn't an option to reopen available on the other report, so I'm reopening this report. Please see comment 4 for a proposed patch that fixes the remaining portability issue with safe_pclose(). This problem leads to deadlock of thunderbird w/engimail.

Severity: normal → critical

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

Martynas Venckus

Comment 7

•

18 years ago

Attached patch unix_rand.diff, diff against trunk — Details — Splinter Review

from comment 4

Attachment #269376 - Attachment is obsolete: true

Martynas Venckus

Comment 8

•

18 years ago

Nelson, i don't think the other one should be reopened, because this is a different issue, a new bug, that is.

Nelson Bolyard (seldom reads bugmail)

Comment 9

•

18 years ago

Abolutely a duplicate of bug 51429. It says that the supposed fix for bug 51429 was insufficient. I will reopen 51429.

Status: REOPENED → RESOLVED

Closed: 18 years ago → 18 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

safe_pclose() fails to wait for child causing a deadlock w/PATCH

Categories

(NSS :: Libraries, defect)

Tracking

(Not tracked)

People

(Reporter: kurt, Unassigned)

References

(
URL
)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 1 obsolete file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Attachment

General

Description

File Name

Content Type