Closed
Bug 51429
Opened 24 years ago
Closed 17 years ago
RNG_SystemInfoForRNG possible "netstat" zombie process
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
3.11.8
People
(Reporter: nelson, Assigned: wtc)
References
()
Details
Attachments
(9 files, 7 obsolete files)
1.09 KB,
patch
|
nelson
:
review+
julien.pierre
:
review+
wtc
:
superreview+
|
Details | Diff | Splinter Review |
553 bytes,
text/plain
|
Details | |
388 bytes,
patch
|
nelson
:
review+
wtc
:
superreview-
|
Details | Diff | Splinter Review |
29 bytes,
text/plain
|
Details | |
3.38 KB,
text/plain
|
Details | |
1003 bytes,
patch
|
Details | Diff | Splinter Review | |
1.35 KB,
patch
|
Details | Diff | Splinter Review | |
737 bytes,
patch
|
Details | Diff | Splinter Review | |
1.62 KB,
patch
|
wtc
:
review+
|
Details | Diff | Splinter Review |
This bug was formerly bugsplat 300544.
It applies to all UNIX systems, to the unix implementation of
RNG_SystemInfoForRNG().
jgmysers@netscape.com wrote:
In ns/security/lib/util/unix_rand.c:safe_pclose(), if waitpid()
fails with EINTR, the child process could be left as a zombie.
Assignee | ||
Updated•24 years ago
|
QA Contact: wtc → sonmi
Updated•24 years ago
|
Target Milestone: 3.2 → 3.3
Assignee | ||
Updated•23 years ago
|
Target Milestone: 3.3 → 3.4
Assignee | ||
Comment 2•23 years ago
|
||
Changed the QA contact to Bishakha.
QA Contact: sonja.mirtitsch → bishakhabanerjee
Comment 4•23 years ago
|
||
I'm seeing [a] netstat zombie[s] on Red Hat Linux 7.3 using the 2002052409 Linux
build of Mozilla 1.0RC3.
Let me know if I can provide any useful information about this somehow.
Comment 5•22 years ago
|
||
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2a) Gecko/2002091318
I see something like this all the time:
22483 ? Z 0:00 [netstat <defunct>]
Is this what this bug is about?
It does not seem to be a problem.
pi
Assignee | ||
Comment 6•22 years ago
|
||
Assigned the bug to myself. Target NSS 3.7.
Assignee: ian.mcgreer → wtc
Priority: P3 → P2
Target Milestone: 3.5 → 3.7
Assignee | ||
Comment 7•22 years ago
|
||
*** Bug 183778 has been marked as a duplicate of this bug. ***
Comment 8•22 years ago
|
||
Isn't there some much, much, MUCH more sane way to get entropy?
This bug should be considered a bit higher priority that it seems, since a
browser screwing with netstat will look to many people like spyway or a trojan.
Assignee | ||
Comment 9•22 years ago
|
||
Moved to target milestone 3.8 because the original
NSS 3.7 release has been renamed 3.8.
Target Milestone: 3.7 → 3.8
Comment 10•22 years ago
|
||
It's still there in Mozilla 1.2.1.
Comment 11•22 years ago
|
||
15728 S 19:18 0:24 \_ /home/alex/programs/mozilla/mozilla-bin
15740 Z 19:18 0:00 \_ [netstat <defunct>]
Yup, it's always there after a while. Mozilla 1.2.1
Comment 12•22 years ago
|
||
I'm having the same problem also with this. Mozilla 1.3a
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3a) Gecko/20021212
Linux schloss 2.4.18-10 #1 Wed Aug 7 10:26:48 EDT 2002 i686 unknown
winzig 25910 1476 0 19:12 ? 00:00:00 /bin/sh /usr/local/mozilla/run-m
winzig 25916 25910 2 19:12 ? 00:02:08 /usr/local/mozilla/mozilla-bin
winzig 25918 25916 0 19:12 ? 00:00:00 /usr/local/mozilla/mozilla-bin
winzig 25919 25918 0 19:12 ? 00:00:01 /usr/local/mozilla/mozilla-bin
winzig 25920 25918 0 19:12 ? 00:00:00 /usr/local/mozilla/mozilla-bin
winzig 25922 25918 0 19:12 ? 00:00:00 /usr/local/mozilla/mozilla-bin
winzig 26157 25916 0 20:15 ? 00:00:00 [netstat <defunct>]
winzig 26243 25918 0 20:36 ? 00:00:00 /usr/local/mozilla/mozilla-bin
Comment 13•22 years ago
|
||
Looks like netstat hangs when it's stuck resolving an ip address into a hostname.
Comment 14•22 years ago
|
||
Netstat hanging would be a different bug.
Reporter | ||
Comment 15•22 years ago
|
||
Remove target milestone of 3.8, since these bugs didn't get into that release.
Target Milestone: 3.8 → ---
Comment 16•21 years ago
|
||
it's still there. Isn't it possible to use /dev/urandom?
Or is this not available on some systems?
Comment 17•21 years ago
|
||
zombified netstat still there in 1.6 and 1.7b. Wouldnt it be feasable to
incorporate something like "'EGD' http://egd.sourceforge.net/" into mozilla and
have mozilla read from a socket instead of using a system process that could be
exploitable to gain intelligence about running systems and maybe possible other
info or even gain a shell on a system. PS: When i see netstat being zombified
after a browser is using it, makes me think about the ole days of Internet
explorer reporting home "Phoning Home" like E.T. And knowing that you guys are
pretty respectable ppl i trust you to use netstat but there is allways that
slight possiblity of someone sneaking around doing malicious stuff. Please look
into how EGD handles getting more data for entropy.
Comment 18•21 years ago
|
||
Bug still there, with 1.7rc1 according to
http://qa.mandrakesoft.com/show_bug.cgi?id=9427
Reporter | ||
Comment 19•20 years ago
|
||
Neil, please have a look at this. Take it if you want it.
Reporter | ||
Updated•20 years ago
|
QA Contact: bishakhabanerjee → jason.m.reid
Comment 20•20 years ago
|
||
This patch has been suggested by others. It avoids a theoretically possible
bug, one that I have been unable to force on Solaris 10 x86. Briefly, it is
unlikely to happen because the parent reads all of the output of the netstat
command before issuing the waitpid() call. Since the child's stdout is closed
by its exit routine it will always be defunct immediately thereafter.
Furthermore, even when the parent is forced to truncate the input file and
while the child is explicitly held in the stopped state, waitpid() still
returns 0 rather than EINTR.
I suspect this last behavior may be different under different OSes.
Would some of the respondents who reported seeing this problem be kind enough
to try the patch?
Assignee: wtchang → neil.williams
Status: NEW → ASSIGNED
Attachment #185311 -
Flags: superreview?(wtchang)
Attachment #185311 -
Flags: review?(nelson)
Reporter | ||
Comment 21•20 years ago
|
||
If waitpid gets EINTR, do you want to kill pid? or just waitpid again?
Comment 22•20 years ago
|
||
Comment on attachment 185311 [details] [diff] [review]
patch to avoid zombie netstat
Nelson, I agree that it would be more correct to call waitpid again. I'll
submit a revised patch.
Attachment #185311 -
Attachment is obsolete: true
Comment 23•20 years ago
|
||
Incorporating Nelson's suggestion.
I note that most of the bug citations mention Linux. Also note that the
Mandrake bug mentioned in comment 18 has been fixed, apparently by a Mandrake
release.
Attachment #185508 -
Flags: superreview?(wtchang)
Attachment #185508 -
Flags: review?(nelson)
Reporter | ||
Comment 24•20 years ago
|
||
Comment on attachment 185508 [details] [diff] [review]
patch to avoid zombie netstat
That counter increment/check needs to happen in every path that repeats this
loop.
Attachment #185508 -
Flags: superreview?(wtchang)
Attachment #185508 -
Flags: review?(nelson)
Attachment #185508 -
Flags: review-
Reporter | ||
Comment 25•20 years ago
|
||
Comment on attachment 185311 [details] [diff] [review]
patch to avoid zombie netstat
Forgot to mark this patch with my previous review comment.
Attachment #185311 -
Flags: superreview?(wtchang)
Attachment #185311 -
Flags: review?(nelson)
Attachment #185311 -
Flags: review-
Comment 26•20 years ago
|
||
To be honest, I just applied a Debian patch for Mandriva/Mandrake package which
commented the netstat call, which isn't really needed on Linux system, since
/dev/random randomness is good enough.
Updated•20 years ago
|
Attachment #185508 -
Attachment is obsolete: true
Comment 27•20 years ago
|
||
Reviewer's last comments incorporated.
Attachment #185630 -
Flags: superreview?(wtchang)
Attachment #185630 -
Flags: review?(nelson)
Reporter | ||
Comment 28•20 years ago
|
||
Comment on attachment 185630 [details] [diff] [review]
third try
Looks good to me. Since we haven't found a reproducible test case with which
to test this patch, please get a second review on this. Thanks.
Attachment #185630 -
Flags: review?(nelson) → review+
Updated•20 years ago
|
Target Milestone: --- → 3.10.1
Comment 29•19 years ago
|
||
(In reply to comment #26)
> To be honest, I just applied a Debian patch for Mandriva/Mandrake package which
> commented the netstat call, which isn't really needed on Linux system, since
> /dev/random randomness is good enough.
You mean /dev/urandom, right ? On Linux, /dev/random might block. Not on BSD though.
While I'm not a reviewer, the latest patch looks good. EINTR will cause waipid
to be repeated (max 1000 times). All other errors will elave the loop
I'm running Gentoo 1.4.16, and I'm getting a zombie netstat process for both
Firefox (1.0.6 - Mozilla/5.0 - 20050722) and Thunderbird (1.0.6 (20050722)).
Originally, before upgrading to the latest Firefox and Thunderbird build, this
problem only used to occur with instances of Firefox.
/bin/bash /usr/libexec/mozilla-launcher
\_ /usr/lib/mozilla-firefox/firefox-bin
\_ /usr/lib/mozilla-firefox/firefox-bin
| \_ /usr/lib/mozilla-firefox/firefox-bin
| \_ /usr/lib/mozilla-firefox/firefox-bin
| \_ /usr/lib/mozilla-firefox/firefox-bin
| \_ /usr/lib/mozilla-firefox/firefox-bin
\_ [netstat] <defunct>
/bin/bash /usr/libexec/mozilla-launcher
\_ /usr/lib/mozilla-thunderbird/thunderbird-bin -mail
\_ /usr/lib/mozilla-thunderbird/thunderbird-bin -mail
| \_ /usr/lib/mozilla-thunderbird/thunderbird-bin -mail
| \_ /usr/lib/mozilla-thunderbird/thunderbird-bin -mail
| \_ /usr/lib/mozilla-thunderbird/thunderbird-bin -mail
| \_ /usr/lib/mozilla-thunderbird/thunderbird-bin -mail
| \_ /usr/lib/mozilla-thunderbird/thunderbird-bin -mail
\_ [netstat] <defunct>
Assignee | ||
Updated•19 years ago
|
Target Milestone: 3.10.1 → 3.11
Reporter | ||
Comment 31•19 years ago
|
||
Comment on attachment 185630 [details] [diff] [review]
third try
This bug needs a second review. Julien, will you review it?
Attachment #185630 -
Flags: review?(julien.pierre.bugs)
Assignee | ||
Comment 32•19 years ago
|
||
Comment on attachment 185630 [details] [diff] [review]
third try
r=wtc. This patch may not fix this bug though.
I would rename the variable r as rv.
Attachment #185630 -
Flags: superreview?(wtchang)
Attachment #185630 -
Flags: superreview+
Attachment #185630 -
Flags: review?(julien.pierre.bugs)
Comment 33•19 years ago
|
||
Comment on attachment 185630 [details] [diff] [review]
third try
Like Wan-Teh, I'm not sure if this bug will fix the problem, but I don't think
it will hurt.
Re: the kill() call, I think we should probably break out of the loop on any
error, and not just on ESRCH. In the Solaris man pages, the other possible
errors are EINVAL (invalid signal number) and EPERM (no permission to kill the
process), which should also be fatal, since subsequent calls to kill aren't
going to fix those problems, and the only point of continuing the waitpid loop
is to work gracefully if the process exits quickly within 1000 iterations. On
the other hand, these other errors should never happen, since we are passing a
hardcoded signal value, and we started the process, so we should have
permission - unless it changed user. I hope netstat doesn't do that ;).
Attachment #185630 -
Flags: review+
Reporter | ||
Comment 34•19 years ago
|
||
Hmm. Netstat could be setuid root!
Comment 35•19 years ago
|
||
I've appplied the patch, but unfortunately there are still sometimes hanging
zombie netstat processes:
├─firefox───run-mozilla.sh───firefox-bin───netstat
Reporter | ||
Comment 36•19 years ago
|
||
Hmm. I wonder. Perhaps the netstat processes are not zombies,
but are actually live hung processes that cannot be killed by
the parent process because they are running as root and the
parent isn't. A full long ps output might help diagnose that.
Please attach it rather than adding it inline as a comment.
Comment 37•19 years ago
|
||
(In reply to comment #36)
> Hmm. I wonder. Perhaps the netstat processes are not zombies,
> but are actually live hung processes that cannot be killed by
> the parent process because they are running as root and the
No, the process is a real zombie and is not running under a different UID and
netstat is not SETUID.
> parent isn't. A full long ps output might help diagnose that.
> Please attach it rather than adding it inline as a comment.
I'll attached a file with the relevant part of the ps output.
Comment 38•19 years ago
|
||
Assignee | ||
Comment 39•19 years ago
|
||
Christian: what is the Linux distribution you are
using? What's the kernel version and do you know
the glibc version and whether you are using LinuxThreads
or NPTL? This will help us find a similar system to
reproduce this zombie netstat process.
Do you know how to use strace on firefox-bin to find out if
waitpid was called at all? If not, I can supply a patch
to add logging statements. (Or you can try to do that.)
Comment 40•19 years ago
|
||
The problem is most likely that waitpid() is being called with WNOHANG. The
function can loop the 1000 times before the child process is scheduled to
receive the signal.
I think we need a blocking waitpid() call, at least after the first attempt.
Things to prevent a deadlock would include setting a timer to cause the blocking
waitpid() to return EINTR and making sure the pipe is drained to EOF and/or closed.
Comment 41•19 years ago
|
||
Assignee | ||
Comment 42•19 years ago
|
||
Comment on attachment 197901 [details] [diff] [review]
Proposed fix (untested)
We should only need to make the kill(pid, SIGKILL) call
once. If so, we can do that before calling waitpid.
It should be harmless to send SIGKILL to a zombie.
John, your patch never calls waitpid with WNOHANG. Is
this intentional?
In safe_popen, if fork succeeds but fdopen fails,
safe_popen will return NULL, which will cause safe_pclose
and therefore waitpid to not be called. We should be
able to fix that by moving the fdopen call before the
fork call. (fdopen is unlikely to fail though.)
Comment 43•19 years ago
|
||
I think the SIGKILL is unnecessary. I put the patch together quickly, feel free
to tweak it.
Comment 44•19 years ago
|
||
(In reply to comment #41)
> Created an attachment (id=197901) [edit]
> Proposed fix (untested)
>
(In reply to comment #42)
> (From update of attachment 197901 [details] [diff] [review] [edit])
> We should only need to make the kill(pid, SIGKILL) call
> once. If so, we can do that before calling waitpid.
> It should be harmless to send SIGKILL to a zombie.
>
> John, your patch never calls waitpid with WNOHANG. Is
> this intentional?
>
> In safe_popen, if fork succeeds but fdopen fails,
> safe_popen will return NULL, which will cause safe_pclose
> and therefore waitpid to not be called. We should be
> able to fix that by moving the fdopen call before the
> fork call. (fdopen is unlikely to fail though.)
>
I agree with Wan-Teh. Waitpid without WNOHANG could cause indefinite wait.
Should probably try WHNOHANG first, then if waitpid() returns 0, kill() and
waitpid(pid, &status, 0). I'll try to work up a patch.
Comment 45•19 years ago
|
||
Please add this patch as well, since the code is useless on Linux, where /dev/urandom is used.
The patch is from Wichert Akkerman (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=241200 )
Updated•19 years ago
|
Attachment #216373 -
Flags: review?(nelson)
Reporter | ||
Updated•19 years ago
|
QA Contact: jason.m.reid → libraries
Reporter | ||
Comment 46•19 years ago
|
||
Neil, contact Wichert Akkerman and Eric Dorland (see the URL above for their
email addresses), and see if they can reproduce at will, and are willing to
test your patch.
Reporter | ||
Comment 47•19 years ago
|
||
Comment on attachment 216373 [details] [diff] [review]
workaround for Linux
Our first preference is to fix this bug "the right way" which uses netstat but doesn't leave it as a zombie. If that can be achieved, then we want to use netstat even on Linux.
But this needs resolution now. So, if the "right fix" cannot be verified before 3.11.1-Beta, then I think we must deploy this workaround at that time.
Attachment #216373 -
Attachment description: additional patch → workaround for Linux
Attachment #216373 -
Flags: review?(nelson) → review+
Reporter | ||
Updated•19 years ago
|
Target Milestone: 3.11 → 3.11.1
Reporter | ||
Updated•19 years ago
|
Priority: P2 → P1
Target Milestone: 3.11.1 → 3.11.2
Comment 49•18 years ago
|
||
This seems to still be present in Firefox 2.0 derived from Debian. The #ifdef LINUX patch seems to be applied (I'm not sure whether that's a Debian patch or if it was merged), but doesn't seem to be taking effect. "linux" is what the preprocessor provides, not "LINUX", though perhaps that's supposed to be defined somewhere in the tree.
Comment 50•18 years ago
|
||
(In reply to comment #49)
> This seems to still be present in Firefox 2.0 derived from Debian. The #ifdef
> LINUX patch seems to be applied (I'm not sure whether that's a Debian patch or
> if it was merged), but doesn't seem to be taking effect. "linux" is what the
> preprocessor provides, not "LINUX", though perhaps that's supposed to be
> defined somewhere in the tree.
The #ifdef LINUX in Firefox 2.0 in Ubuntu is something from Ubuntu, since there have been changes to the unix_rand.c file between the latest firefox in debian experimental (2.0b2) and 2.0 final. Though this new Ubuntu patch should work, I can confirm that the one that was applied in the Debian version does what it is supposed to do.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Reporter | ||
Updated•18 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Updated•18 years ago
|
Assignee: neil.williams → wtchang
Status: REOPENED → NEW
Assignee | ||
Comment 51•18 years ago
|
||
I reproduced this zombie process on Red Hat Enterprise Linux 4
Update 4 using two small test programs, and understood why the
code fails to reap the child process.
Here is the first test program, mystat.c. Compile it with
gcc mystat.c -o mystat
It is to be used in place of netstat. You can also use netstat,
but for some reason it's easier to reproduce the bug with mystat.
The mystat program does nothing.
Assignee | ||
Comment 52•18 years ago
|
||
Here is the second test program, popen.c. Compile it with
gcc popen.c -o popen
Then, run the test with
./popen
On RHEL 4 U4, it is not hard to see something like this:
kill(23585, SIGKILL)=0
[repeated 1000 times]
count equals 1000
If you see this, run this command immediately (within 10
seconds) to see the zombie process:
$ ps -eaf | grep mystat
wtchang 23585 23584 0 17:05 pts/4 00:00:00 [mystat] <defunct>
wtchang 23589 16153 0 17:05 pts/5 00:00:00 grep mystat
I found that waitpid(..., WNOHANG) repeatedly returns 0 because
the OS is only running the popen process and is not giving the
mystat process a chance to run and terminate. The evidence is
that if the popen program calls sched_yield() between the
waitpid(..., WNOHANG) calls (there is one such call in popen.c
that you can uncomment out), then the bug cannot be reproduced,
and you will only see at most one line like this:
kill(23608, SIGKILL)=0
Assignee | ||
Comment 53•18 years ago
|
||
John's description in comment 40 of what causes the problem
is right on.
This patch is based on John's patch in comment 41 and implements
what Neil outlined in comment 44.
Attachment #197901 -
Attachment is obsolete: true
Attachment #250408 -
Flags: superreview?(rrelyea)
Attachment #250408 -
Flags: review?(neil.williams)
Assignee | ||
Comment 54•18 years ago
|
||
This patch fixes the corner case in safe_popen that I described
at the end of comment 42.
Attachment #250410 -
Flags: review?(neil.williams)
Comment 55•18 years ago
|
||
Comment on attachment 250410 [details] [diff] [review]
Patch for a minor problem in safe_popen
This looks correct. Is it not correct to close(p[0]) when fork() returns -1? Why?
Attachment #250410 -
Flags: review?(neil.williams) → review+
Comment 56•18 years ago
|
||
Comment on attachment 250408 [details] [diff] [review]
Proposed fix v2
Thanks for tracking this down, Wan-Teh. It finally makes good sense what's happening. I was going to use sleep(1) but this is better.
Attachment #250408 -
Flags: review?(neil.williams) → review+
Comment 57•18 years ago
|
||
I noticed non-printable characters in unix_rand.c. Turns out they're form feed characters--at lines 283,324 and others. As a matter of policy should these be removed when working up a patch?
Reporter | ||
Comment 58•18 years ago
|
||
Neil, Feel free to remove FF characters in your next patch.
Assignee | ||
Comment 59•18 years ago
|
||
(In reply to comment #55)
> (From update of attachment 250410 [details] [diff] [review])
> This looks correct. Is it not correct to close(p[0]) when fork() returns -1?
> Why?
p[0] has been associated with fp, so fclose(fp) will call close(p[0])
under the cover, according to the fclose man page in the Single Unix
Specification:
http://www.opengroup.org/onlinepubs/009695399/functions/fclose.html
The fclose() function shall perform the equivalent of a close()
on the file descriptor that is associated with the stream pointed
to by stream.
I should add a comment to clarify this. How about something like this:
fclose(fp); /* this closes p[0], the fd associated with fp */
Assignee | ||
Comment 60•18 years ago
|
||
Note that the fdopen function associates a stream with a file descriptor.
Our code has
fp = fdopen(p[0], "r");
Comment 61•18 years ago
|
||
Comment on attachment 250408 [details] [diff] [review]
Proposed fix v2
r+ = relyea
Did you want to add the yield or sleep(0) in this version of the patch as well?
Attachment #250408 -
Flags: superreview?(rrelyea) → superreview+
Assignee | ||
Comment 62•18 years ago
|
||
I checked in this patch on the NSS trunk (NSS 3.12) with
a comment to explain that fclose(fp) closes p[0].
Checking in unix_rand.c;
/cvsroot/mozilla/security/nss/lib/freebl/unix_rand.c,v <-- unix_rand.c
new revision: 1.23; previous revision: 1.22
done
Attachment #250410 -
Attachment is obsolete: true
Assignee | ||
Comment 63•18 years ago
|
||
I added a PR_Sleep(0) call instead, which calls sched_yield().
Calling sched_yield() directly would have porting issues because
not all Unix platforms have sched_yield() and it would require
linking libfreebl3.so with -lrt. Calling PR_Sleep(0) allows us
to let NSPR handle these porting issues.
I considered calling sleep(0), which is available everywhere and
defined in libc, but I can't find anything in the sleep man page
in the Single Unix Specification that says sleep(0) yields the
processor:
http://www.opengroup.org/onlinepubs/009695399/functions/sleep.html
The man page for the related function usleep says usleep(0) "has no
effect":
http://www.opengroup.org/onlinepubs/009695399/functions/usleep.html
So I decided to not use sleep(0).
I checked in this patch on the NSS trunk (NSS 3.12).
Checking in unix_rand.c;
/cvsroot/mozilla/security/nss/lib/freebl/unix_rand.c,v <-- unix_rand.c
new revision: 1.24; previous revision: 1.23
done
Attachment #250408 -
Attachment is obsolete: true
Assignee | ||
Updated•18 years ago
|
Status: NEW → RESOLVED
Closed: 18 years ago → 18 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 64•18 years ago
|
||
Wan-Teh, Can the kill() call fail with EINTR?
Should the kill call be retried if it fails with EINTR,
as the patch now does for the waitpid() calls?
Assignee | ||
Comment 65•18 years ago
|
||
Nelson, kill() cannot fail with EINTR:
http://www.opengroup.org/onlinepubs/009695399/functions/kill.html
Actually waitpid(..., WNOHANG) most likely can't fail with EINTR
either, but I still handle it as defensive programming. In general
only blocking system calls can fail with EINTR.
Assignee | ||
Comment 66•18 years ago
|
||
I checked in the "proposed patch (as checked in)" on the
NSS_3_11_BRANCH (NSS 3.11.6).
Checking in unix_rand.c;
/cvsroot/mozilla/security/nss/lib/freebl/unix_rand.c,v <-- unix_rand.c
new revision: 1.17.10.6; previous revision: 1.17.10.5
done
Target Milestone: 3.12 → 3.11.6
Comment 67•18 years ago
|
||
does anybody supposed to commit https://bugzilla.mozilla.org/attachment.cgi?id=216373
?
Reporter | ||
Comment 68•18 years ago
|
||
Comment on attachment 216373 [details] [diff] [review]
workaround for Linux
requesting sr
Attachment #216373 -
Flags: superreview?(wtchang)
Assignee | ||
Comment 69•18 years ago
|
||
Comment on attachment 216373 [details] [diff] [review]
workaround for Linux
Let's come up with a better way to turn off the
safe_popen(netstat_ni_cmd) call. We already use
two different methods for BSD/OS and Solaris. The
sample code omits the irrelevant code.
Method 1 (used by BSD/OS): if we succeeded in getting
data from /dev/urandom, then we skip the
safe_popen(netstat_ni_cmd) call.
bytes = RNG_FileUpdate("/dev/urandom", SYSTEM_RNG_SEED_COUNT);
#ifdef BSDI
if (bytes)
return;
#endif
fp = safe_popen(netstat_ni_cmd);
if (fp != NULL) {
while ((bytes = fread(buf, 1, sizeof(buf), fp)) > 0)
RNG_RandomUpdate(buf, bytes);
safe_pclose(fp);
}
Method 2 (used by Solaris): turn off the safe_popen(netstat_ni_cmd)
call by conditional compilation, equivalent to this patch.
#define DO_NETSTAT 1
#ifdef SOLARIS
#undef DO_NETSTAT
#endif
#ifdef DO_NETSTAT
fp = safe_popen(netstat_ni_cmd);
if (fp != NULL) {
while ((bytes = fread(buf, 1, sizeof(buf), fp)) > 0)
RNG_RandomUpdate(buf, bytes);
safe_pclose(fp);
}
#endif
So, this patch could be replaced by a patch that undefines DO_NETSTAT
for Linux. But I think we should generalize the new patch to cover
all the platforms that have /dev/urandom. It may be better to do this
in a new bug report.
Attachment #216373 -
Flags: superreview?(wtchang) → superreview-
Comment 70•18 years ago
|
||
What dou think about this in nspr?
static PRStatus OpenDevRandom( void )
{
- fdDevRandom = open( "/dev/random", O_RDONLY );
+ fdDevRandom = open( "/dev/urandom", O_RDONLY );
return((-1 == fdDevRandom)? PR_FAILURE : PR_SUCCESS );
} /* end OpenDevRandom() */
Assignee | ||
Comment 71•18 years ago
|
||
romaxa: good idea. NSPR should also use /dev/urandom instead of
/dev/random.
Comment 72•18 years ago
|
||
(In reply to comment #69)
> So, this patch could be replaced by a patch that undefines DO_NETSTAT
> for Linux.
FWIW, we've been doing this in debian for a while to solve the netstat zombies problem.
Comment 73•18 years ago
|
||
I'm confused whether this bug is really fixed.
Wan-Teh, I do not understand the purpose of your comment 69 and why you discuss attachment 216373 [details] [diff] [review] proposes to undefine DO_NETSTAT for Linux.
What is the motivation for a new patch? Do you want to "optimize" the fix?
Is is still necessary to undefine DO_NETSTAT for Linux in order to fix this bug?
Or did Wan-Teh's checkin of attachment 250666 [details] [diff] [review], comment 66, to the 3.11.6 branch already fix the reported problems?
Did someone test/verify?
Thanks in advance for a summary of the current status.
Reporter | ||
Comment 74•18 years ago
|
||
Kai, I wouldn't try to fix this any more than it is already fixed unless
someone complains that it is not actually fixed at present. I see no such
complaint above.
Comment 75•18 years ago
|
||
Nelson, the motivation for my question:
I would like to understand the best approach to backport a fix to Firefox/SeaMonkey in RHEL 4.
Should we go with "#undef DO_NETSTAT"
or should we take the patch that was checked in.
Because Wan-Teh continued to discuss, I was concerned the checked in patch might not be complete.
Comment 76•18 years ago
|
||
FWIW, we do #undef DO_NETSTAT on Debian (and Ubuntu does too, AFAIK). Also see my comment #45. I still don't understand obstination as in comment #47. Why continue to use netstat for the sake of it ?
Comment 77•18 years ago
|
||
Kai: FYI "#undef DO_NETSTAT" has also been tried on RHEL4/firefox and appears to fix the netstat zombie issue there, see IssueTracker#119883.
Assignee | ||
Comment 78•18 years ago
|
||
Kai, for your backporting, take the patch that was checked in.
We should consider removing the netstat (safe_popen and safe_pclose)
code on the NSS trunk. Let's do that in a new bug report.
Comment 79•18 years ago
|
||
Thanks for the clarifications and your recommendations.
I want to follow Wan-Teh's recommendation to use the patch as checked in.
That way we will use the same patch now and in future NSS updates.
Reporter | ||
Comment 82•17 years ago
|
||
Bug 385465 alleges that the fix for this bug was/is incomplete, and continues
to fail in certain reproducible cases. A patch attached to that bug,
attachment 269706 [details] [diff] [review], contains a proposed fix for the trunk.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 3.11.6 → 3.11.8
Reporter | ||
Comment 83•17 years ago
|
||
Comment on attachment 250662 [details] [diff] [review]
Patch for a minor problem in safe_popen (as checked in)
This patch is now also on the branch:
unix_rand.c; new revision: 1.17.10.7; previous revision: 1.17.10.6
unix_rand.c on the branch now matches the trunk.
Comment 84•17 years ago
|
||
Is there a reason this bug is still opened? Looks like the other problems were fixed, but maybe I'm missing something.
Comment 85•17 years ago
|
||
Yes the problem described in Bug 385465 still remains. There is a proposed patch to correct the issue attachment 269706 [details] [diff] [review] that has not been committed. It would be great if this could be reviewed and committed.
In summary the issue is that the return value of kill() on a zombie process is not standardized by POSIX. It recommends that kill() on a zombie not return ESRCH but then acknowledges that many systems do. safe_pclose() on those systems has a race that leads to deadlock which I have hit quite a bit.
Regardless of how the operating system implements kill() on zombie processes, I don't see the logic in bailing out of safe_pclose() if kill() fails for any reason. waitpid() should still be called so that the conditions for the deadlock are avoided (i.e. don't exit safe_pclose() leaving the zombie around).
Reporter | ||
Comment 86•17 years ago
|
||
Kurt,
Attachment 269706 [details] [diff] is attached to a resolved bug, and has no review request,
so it's not getting any attention because it doesn't show up on any searches
for patches awaiting review on unresolved bugs.
I suggest attaching it to this bug, and requesting review from Wan-Teh.
Assignee | ||
Comment 87•17 years ago
|
||
This bug was reopened on 2007-07-01. It's better to close this bug
and restore its 3.11.6 target milestone, and use bug 385465 for the
remaining issue.
Reporter | ||
Comment 88•17 years ago
|
||
I'm not sure that I agree that it's better.
Bug 385465 proves that this bug was marked "fixed" prematurely.
A checkin was done, but it seems that this bug is still not actually fixed.
I'd like to keep all the history of this bug in one bug report until it
is really fixed.
Comment 89•17 years ago
|
||
Patch to correct remaining issue (thanks Nelson for the pointer). Requesting review from Wan-Teh.
I'm not sure what if anything I should do with the flags option on this form.
Nelson or Wan-Teh perhaps you could fix them if I got it wrong.
Reporter | ||
Updated•17 years ago
|
Attachment #284642 -
Flags: review?(wtc)
Assignee | ||
Comment 90•17 years ago
|
||
Comment on attachment 284642 [details] [diff] [review]
patch to correct issue described in comment #85
NSS 3.11.6 is already released and has a partial fix for
this bug. The partial fix works on all the platforms that
the NSS team tested on, therefore it is still useful.
Since we didn't write release notes for NSS 3.11.6,
the target milestone of this bug is a good way to document
the fact that NSS 3.11.6 has a partial fix. Reopening
this bug and changing the target milestone makes it
hard to discover that information.
Kurt, your patch is correct. I considered changing the
code to call kill() only once, but decided not to make the
change because "it ain't broke".
Is OpenBSD your platform? I don't have POSIX, so I use
Single Unix Specification to help me write portable code.
Its kill() man page doesn't mention the issue with zombies
and ESRCH:
http://opengroup.org/onlinepubs/007908799/xsh/kill.html
The best fix for this bug is for NSS to not exec
processes and mess with signal handlers and waitpid.
Attachment #284642 -
Flags: review?(wtc) → review+
Comment 91•17 years ago
|
||
Thanks for the review. Yes OpenBSD is my platform. I guess I should have said SUSv3 instead of POSIX. Indeed the SUSv2/POSIX doesn't address how zombies are handled which is how I suppose different 1003.1-1988 compliant kill() implementations came about.
Here's the url to the SUSv3 version that I was referring to:
http://www.opengroup.org/onlinepubs/009695399/functions/kill.html
There's a paragraph in the Rational section beginning with "Existing implementations..."
Comment 92•17 years ago
|
||
(In reply to comment #90)
> The best fix for this bug is for NSS to not exec
> processes and mess with signal handlers and waitpid.
Especially on platforms that already have good random number generation.
Comment 93•17 years ago
|
||
Comment on attachment 284642 [details] [diff] [review]
patch to correct issue described in comment #85
Can we check in this patch?
Comment 94•17 years ago
|
||
FWIW, In Fedora's experimental sandbox (called Rawhide) we started to use NSS 3.12 alpha for testing purposes.
In our OS package, I removed all patches that worked around this bug.
Now, I got new complaints that we are still executing netstat, which is causing some unwanted side effects for SELinux. We'd like to avoid executing netstat.
I would like to do this on trunk:
#ifdef LINUX
#undef DO_NETSTAT
#endif
Comment 95•17 years ago
|
||
Attachment #290839 -
Flags: superreview?(rrelyea)
Attachment #290839 -
Flags: review?(wtc)
Assignee | ||
Comment 96•17 years ago
|
||
Comment on attachment 290839 [details] [diff] [review]
Patch to undef DO_NETSTAT on Linux
Kai, I suggest you do the following:
1. Verify that we don't define DO_PS on any platform:
http://lxr.mozilla.org/security/search?string=DO_PS
Then remove the dead code ifdef'd with DO_PS.
2. Add LINUX to the code here and update the comment:
978 /*
979 * Bug 100447: On BSD/OS 4.2 and 4.3, we have problem calling safe_popen
980 * in a pthreads environment. Therefore, we call safe_popen last and on
981 * BSD/OS we do not call safe_popen when we succeeded in getting data
982 * from /dev/urandom.
983 */
984
985 #ifdef BSDI
986 if (bytes)
987 return;
988 #endif
Assignee | ||
Comment 97•17 years ago
|
||
I checked in Kurt Miller's patch on the NSS trunk for NSS 3.12.
I restored the use of tab characters to make the patch smaller.
Checking in unix_rand.c;
/cvsroot/mozilla/security/nss/lib/freebl/unix_rand.c,v <-- unix_rand.c
new revision: 1.26; previous revision: 1.25
done
Attachment #284642 -
Attachment is obsolete: true
Comment 98•17 years ago
|
||
(In reply to comment #96)
> (From update of attachment 290839 [details] [diff] [review])
> 2. Add LINUX to the code here and update the comment:
>
> 978 /*
> 979 * Bug 100447: On BSD/OS 4.2 and 4.3, we have problem calling safe_popen
> 980 * in a pthreads environment. Therefore, we call safe_popen last and on
> 981 * BSD/OS we do not call safe_popen when we succeeded in getting data
> 982 * from /dev/urandom.
> 983 */
> 984
> 985 #ifdef BSDI
> 986 if (bytes)
> 987 return;
> 988 #endif
Note that there's a patch attached to bug 174993 (attachment 272917 [details] [diff] [review]), which achieves that effect. It hasn't been checked in so far, but got the necessary reviews.
Comment 99•17 years ago
|
||
Kaspar, thanks for the pointer. I asked in bug 174993 to give people a chance to veto, but if we don't hear anything within the next day, we should check that in.
Comment 100•17 years ago
|
||
(In reply to comment #96)
> 1. Verify that we don't define DO_PS on any platform:
> http://lxr.mozilla.org/security/search?string=DO_PS
>
> Then remove the dead code ifdef'd with DO_PS.
Wan-Teh, I remove you made this proposal for cleanup purposes, only.
Sure, I can do that. Will attach a patch.
Comment 101•17 years ago
|
||
Attachment #291228 -
Flags: review?(wtc)
Assignee | ||
Comment 102•17 years ago
|
||
Comment on attachment 291228 [details] [diff] [review]
remove unused DO_PS code [checked in]
r=wtc.
Attachment #291228 -
Flags: review?(wtc) → review+
Updated•17 years ago
|
Attachment #291228 -
Attachment description: remove unused DO_PS code → remove unused DO_PS code [checked in]
Comment 103•17 years ago
|
||
Comment on attachment 291228 [details] [diff] [review]
remove unused DO_PS code [checked in]
This patch checked in:
/cvsroot/mozilla/security/nss/lib/freebl/unix_rand.c,v <-- unix_rand.c
new revision: 1.27; previous revision: 1.26
Comment 104•17 years ago
|
||
Comment on attachment 290839 [details] [diff] [review]
Patch to undef DO_NETSTAT on Linux
r+ holding my nose....
I totally agree with undefining DO_NETSTAT for Linux.
I don't like the #undef method of setting it, unfortunately that's because there is an explicit #define DO_NETSTAT in the file.
I would much rather that #define be ifdef out rather than undef'ing DO_NETSTAT explicitly.
I would happily enteratain a patch that does that.
bob
Attachment #290839 -
Flags: superreview?(rrelyea) → superreview+
Reporter | ||
Comment 105•17 years ago
|
||
Comment on attachment 290839 [details] [diff] [review]
Patch to undef DO_NETSTAT on Linux
This patch is obsolete. This has already been fixed by another
patch that has already been checked in:
1.28 <kaie@kuix.de> 2007-12-03 13:07
Bug 174993, don't fork netstat if data has been gathered from /dev/urandom
r=nelson
That fix is better than the one proposed here in one important
respect. If the attempt to read /dev/urandom should fail for some
reason, that other fix (already committed) does not leave the PRNG
unseeded.
Attachment #290839 -
Flags: review-
Assignee | ||
Updated•17 years ago
|
Attachment #290839 -
Flags: review?(wtc)
Comment 106•17 years ago
|
||
Comment on attachment 290839 [details] [diff] [review]
Patch to undef DO_NETSTAT on Linux
Nelson said this patch is obsolete. Setting the flag.
Attachment #290839 -
Attachment is obsolete: true
Reporter | ||
Comment 107•17 years ago
|
||
I believe this bug is now fixed by the checkin documentd in comment 105.
Status: REOPENED → RESOLVED
Closed: 18 years ago → 17 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•