100447 - safe_popen fails under BSD/OS 4.2, BSD/OS 4.3

Reporter

Description

•

24 years ago

It was pointed out to me that the builds of mozilla that I have made for BSD/OS4.2 core dump on exit. This was rather curious, I thought, but after some investigation, I was able to trace back the failure to the safe_popen() function in mozilla/security/nss/lib/util/unix_rand.c After looking at this problem a little closer, it finally became clear to me that this was also the problem that prevented mozilla from running on BSD/OS 4.3 if I compiled with --enable-crypto! The safe_popen() function takes a program to run, makes a pipe, forks and then closes down the majority of the fds for the child process. It execs the command and sends back the results to the parent -- results which are used to try to garner some randomness from the machine. The command that is attemtped to run is "netstat -in", which produces a more or less static output for a given machine under BSD/OS. If you wanted something that counts packets, so it changes often, it would have to be "netstat -sin". Here's what a "netstat -in" looks like on BSD/OS 4.2: lidl@sloar-40: netstat -in Name Index MTU Speed Mtrc Address Network exp0 1 1500 100M 0 00:90:27:c1:bb:3c exp0 1 0 192.111.45.25 192.111.45.0/26 exp0 1 0 fe80::290:27ff:fec1:bb3c%exp0 fe80::/64 lo0 2 4352 0 lo0 2 0 fe80::1%lo0 fe80::/64 lo0 2 0 ::1 ::1/128 lo0 2 0 127.0.0.1 127 ppp0* 3 1500 0 tun0* 4 1500 0 vlan0* 5 0 0 00:00:00:00:00:00 vlan1* 6 0 0 00:00:00:00:00:00 gif0* 7 1280 0 gif1* 8 1280 0 gif2* 9 1280 0 gif3* 10 1280 0 Not very random. Under BSD/OS, the safe_popen() routine causes the child process to coredump. Under 4.2, it coredumps after running the exec'd program. This shows itself as a completely "regular" run of mozilla, but after it exits normally, a core file is written from the child process. Under 4.3, the child exits before any data is written to the parent, so the parent process hangs forever in the fread() of the expected data from the (no longer running) child process. Removing the attempt to call safe_popen() (heck, I removed the entire function from my local copy) makes the program: 1) Not dump core under BSD/OS 4.2. 2) Initialize successfully under BSD/OS 4.3. To compenstate for this lack of randomness, I added "/dev/urandom" to the list of files that gets read, more or less applying the patch that is attached to this bug: http://bugzilla.mozilla.org/show_bug.cgi?id=96626 I marked this bug as "critical" in that it core-dumps on 4.2 (although only when exiting the application) and prevents it from running on 4.3 entirely. I believe you should be able to replicate this problem by examing the directory where the tests on host 'otaku' (from the tinderbox-ports page) are run -- look for the mozilla-bin.core file. I'll attach my diffs for this problem, which disable the safe_popen() code entirely for BSD/OS.

lidl

Reporter

Comment 1

•

24 years ago

Attached patch patch to remove safe_popen() function if compiled on BSD/OS (obsolete) — Details — Splinter Review

Wan-Teh Chang

Assignee

Comment 2

•

24 years ago

Nelson, could you please take a look at this bug? Thanks.

Assignee: wtc → nelsonb

Keyser Sose

Comment 3

•

24 years ago

Marking NEW adding keywords.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Keywords: patch, review

Wan-Teh Chang

Assignee

Comment 4

•

24 years ago

Instead of removing the call to safe_popen on BSD/OS, we should find out why safe_popen causes the child process to coredump. Could you post the stack trace in the core file?

Priority: -- → P1

Target Milestone: --- → 3.4

lidl

Reporter

Comment 5

•

24 years ago

Adding a traceback will be difficult for me -- I've upgraded most my machines to BSD/OS 4.3, which as I explained in the original bug report fails differently than under 4.2. Because the browser won't even start under 4.3 without this patch, I've already had to modify all my local copies of the CVS tree to have this patch included. However, since there is a BSD/OS 4.2 tinderbox running (otaku), and the bug manifests itself as a core dump after the main browser exits, I expect that you will find a core dump on otaku, in whatever the working directory of the browser is when the browser tests are run. If not, cd to a writable directory, run the browser, exit it (you don't even have to visit any sites) and examine the core file. If you don't have the core file on the tinderbox machine, I can, if I must, reload one of my workstations with 4.2 and get that trace. However, as I pointed out, the netstat that is called doesn't give a particularly random number under BSD/OS. You would literally be better off calling random() and using that the netstat that is run.

Nelson Bolyard (seldom reads bugmail)

Comment 6

•

24 years ago

On most (some? many?) Unix platforms, netstat -in gives interface statistics for all interfaces. On BSD, it gives only static configuration info, and therefore is useless as a source of entropy. I suspect that there is some alternate set of flags that produces interface statistics for all interfaces on BSD. Can you tell us what those flags are and show some sample output of such a command?

lidl

Reporter

Comment 7

•

24 years ago

Well, if you run 'netstat -in', you get the useless (for randomness purposes) list of interfaces on the machine, which is pretty static across reboots. To get the statistics, you would need to run 'netstat -sin', which looks like the historical 'netstat -in' output: lidl@nohost-77: uname -a BSD/OS nohost.pix.net 4.3 BSDI BSD/OS 4.3 Kernel #2: Tue Oct 23 13:59:52 EDT 2001 root@nohost.pix.net:/usr/src/sys/compile/MEKETREX i386 (mac address sanitized to protect the innocent) lidl@nohost-78: netstat -sin Name Index Address Ipkts Ierrs Opkts Oerrs Coll Drop exp0 1 xx:xx:xx:xx:xx:xx 364945 0 340750 0 0 0 exp0 1 192.168.168.1 365571 336982 exp0 1 fe80::2xx:xxff:fexx:xxxx%exp0 0 1 exp1 2 xx:xx:xx:xx:xx:xx 6630526 0 5851870 1 0 0 exp1 2 192.168.169.1 6509562 5731715 exp1 2 fe80::2xx:xxff:fexx:xxxx%exp1 0 1 exp1 2 192.168.169.2 112487 141258 wx0* 3 xx:xx:xx:xx:xx:xx 0 0 0 0 0 0 lo0 4 293072 0 293072 0 0 0 lo0 4 ::1 0 0 lo0 4 fe80::1%lo0 0 0 lo0 4 127.0.0.1 267669 267669 ppp0* 5 0 0 0 0 0 0 tun0* 6 0 0 0 0 0 0 vlan0* 7 00:00:00:00:00:00 0 0 0 0 0 0 gif0* 8 0 0 0 0 0 0

Nelson Bolyard (seldom reads bugmail)

Comment 8

•

24 years ago

The patch give above disables the use of the netstat command on ALL unix platforms, and adds the use of /dev/urandom only on BSDI. I don't object to replacing netstat with /dev/urandom on platforms that support /dev/urandom, but I do object to eliminating the use of netstat without replacing it with something that provides at least as much entropy as is being lost.

Nelson Bolyard (seldom reads bugmail)

Comment 9

•

24 years ago

In http://lists.insecure.org/linux-kernel/2000/Mar/1739.html the author explains that some systems may run out of swap space on one of the first Copy-On-Write (COW) events after a fork, causing a SIGSEGV to the child. This seems like a very likely cause for bug 100447, since mozilla is renowned for using lots of memory. So, submittor, please try substituting vfork in place of fork in safe_popen on your BSD box and see if that solves the problem. While you're at it, please make the command use the -s flag on BSD. If this works, please attach the patch to this bug report. While it is likely that mozilla will begin to use /dev/urandom on some platforms in the future, that is not the solution to this problem. Please follow bug 96626 on the introduction of /dev/urandom to mozilla.

Whiteboard: awaiting trial of vfork by submittor

lidl

Reporter

Comment 10

•

24 years ago

I'll address two things in this email, then I'll go build the test cases that I was asked to test. (Even though I'm 99% sure that it isn't going to help.) 1) I didn't mean to disable the netstat code on all machines, just BSD/OS. There is a missing chunk of code in the attachment that addresses this problem. #ifndef BSDI #define DO_NETSTAT #endif 2) I'm quite certain that I'm not running out of swap. The 4.2 machine that I was testing on had in excess of 600MB of swap. Mozilla may be a memory pig, but even it is not that big. The two different 4.3 machines that I've been testing on have, respectively, 512MB RAM + 2GB swap, and 1.25GB RAM + 2.25GB of swap. I'm going to checkout a fresh tree and try the changes suggested. I'll report back later on the results.

lidl

Reporter

Comment 11

•

24 years ago

Well, as I suspected, changing the call from fork() to vfork() didn't make it work. So that's not the problem. And, as a point of reference, even when running the debugging build under gdb, I still had in excess of 1.5GB of available swap space, so the running out swap problem that was referenced isn't the problem either. I've cleaned up my earlier patch a little so that it doesn't disable the netstat code on most machines, and more or less predicates whether or not safe_popen() gets used on whether or not /dev/urandom is in play. While this doesn't address all of the commentary in bug 96626, I don't feel it needs to either. Some of that commentary is quite flawed, as it assumes deep knowledge about the internals of the /dev/urandom implementation. Those assumptions may hold true for Linux, but they don't for many flavors of the various BSD systems available. All one should assume about /dev/urandom is that you can open it and read a random or pretty pseudo-random stream of bytes for it. One should NOT assume that is does SHA-1 internally and therefore anything more than 160 bits pointless. I'll attach my current patch next.

lidl

Reporter

Comment 12

•

24 years ago

Attached patch Improved patch to turn off safe_popen and use /dev/urandom (obsolete) — Details — Splinter Review

Nelson Bolyard (seldom reads bugmail)

Comment 13

•

24 years ago

Thank you for testing vfork. This bug reports a problem with safe_popen on BSDI. Another bug, bug 108708, reports a similar problem on Solaris, (although it's not clear that these problems have the same cause). Rather than fixing the root cause of the problem, the patches in this bug report would simply avoid using safe_popen and substitute the use of /dev/urandom instead. This report also points out that /dev/urandom is of varying quality on various platforms, which is precisely why I think the use of /dev/urandom should be in addition to, rather than instead of, other sources of entropy. There are two separate issues: 1) fix safe_popen on BSDI and Solaris, (bugs 100447 and 108708) 2) add /dev/urandom as another source. (bug 96626) I have a patch for 96626 that attempts to use /dev/urandom on ANY unix/Linux that supports it (not a compile-time decision). It addresses the other concerns expressed in 96626 by trying /dev/urandom first, and only trying 1KB of it. I'll add that patch to 96626 when it's ready (probably today). In the meantime, let's get to the bottom of what's wrong with safe_popen on BSDI (and Solaris). I have no BSDI system on which to investigate, so further progress on this issue depends on BSDI developers.

Keywords: patch, review

Whiteboard: awaiting trial of vfork by submittor

lidl

Reporter

Comment 14

•

24 years ago

Getting to the bottom of what's wrong with the code in safe_popen() is going to be a pretty messy problem, for a variety of reasons: 1) That code is written without any thought to the reality of running on systems that use pthreads as the threading model. Signal handling under pthreads doesn't guarantee that any particular thread will get a particular signal delivered to it. Remember that signals are still process-scope, not thread scope. So, diddling the signals with sigaction() before and after forking is at best a potentially haphazard way of doing things. At worst, doomed to failure. 2) The use of strtok() is a no-no in a treaded program, at least without a mutex to protect the state of it. Given that safe_popen() makes no attempt to protect mutexes that might be held across the fork() (one of the few things that are carried over from parent->child when a pthread'd program forks), I wouldn't care to guess what odd failures might exhibit themselves due to some residual mutex being held in libc or another library that the child process encounters already locked by the parent. I would guess that the safe_popen() routines should be using pthread_atfork() to handle a reasonable cleanup of the child space process context. Without that rewrite, I think this code is doomed to be buggy. All that being said, I'm building a debug version of the suspect code under 4.2 (where the child actually drops a core file on application exit), and I'll post a backtrace when it finishes.

Nelson Bolyard (seldom reads bugmail)

Comment 15

•

24 years ago

safe_popen is not a public function. It is intended for use only with the software crypto token. Prevention of simultaneous use of safe_popen is handled at higher levels within the software crypto token (or are supposed to be). Additional redundant locks at this level should not be added. Having said that, contributions for the improvement of safe_popen are welcome.

lidl

Reporter

Comment 16

•

24 years ago

Your comments about safe_open() being a private function -- sure, but that doesn't address any of the points I just raised. Given that this function runs in the context of a threaded application, and on many of unix-like flavors, it uses pthreads, the points that I raised are entirely valid and haven't been addressed. Strtok is unsafe in a multi-threaded application. Irregardless of whether or not the safe_popen() function is only running once, any other thread that is operating could smash the static strtok buffer, corrupting the buffer before safe_popen is done with it. Since you cannot force other threads to not operate while this is running, you have an insurmountable problem with the code as it exists currently. The signal handling code is likewise bound to give problems without additional protection. While there may be protection with regard to only one instance of safe_open() operating due to the higher levels of the code, there is no protection (nor can there be) as the effects of other threads that may be interleaved or running simultaenously on another processor in the system. As for contributions to the code, I've contributed twice. In both cases the problems are avoided by effectively removing this code from the program. Spending a huge amount of time in order to rewrite this code to make it proof against pthreads effects is not something that I am willing to do. The root issue here is how to get randomness into the program, not on how to get fork(), pipe() and a host of string functions to operate perfectly in a pthreads environment. I believe that my patch (removal of safe_popen and all the dangerous things it does) and the reading of a system source of randomness is an effective solution to the problem. As a final concern, since part of the goals of the mozilla project is to make it run faster (and given that it is already a very large application), using fork() is no way to make it run faster. Much less running fork() twice.

Wan-Teh Chang

Assignee

Comment 17

•

24 years ago

I noticed the sigaction(SIGCHLD) calls in safe_popen() and safe_pclose() and I agree with Mr. Lidl that they have many potential problems in a pthreads program. I just opened bug 109971 for this issue. (I don't have a solution though.) However, I don't think those two sigaction() calls caused this bug. Their purpose is to make waitpid() work. To verify that they are not the cause of this bug, we can do this experiment: - safe_popen: comment out the sigaction stuff. - safe_pclose: comment out the sigaction call and the waitpid while loop. On the issue of strtok(), yes, it's not thread-safe. Looking at the code, I think it can be rewritten to avoid the strtok() call. But there are other problems with our child process. In the child process, we can only call async-signal-safe functions between fork() and execvp(). We definitely violated this restriction. I don't know if this is the cause of the problem though. I will attach a patch to address this issue.

Nelson Bolyard (seldom reads bugmail)

Comment 18

•

24 years ago

NSS uses NSPR. NSPR uses pthreads on all **ix platforms. Ergo, I believe NSS threads are always pthreads. When a pthread thread calls fork, the only thread that is replicated in the newly forked process is the thread that called fork. That is, the new process initially has only one thread. strtok is used in the only thread in the new process. It is safe in that context. One could replace strtok with something like strtok_r, but IMO that's pointless in that single-threaded process environment. If BSDI's pthread library is reproducing more than one thread in the newly forked process, I believe that is a flaw in that pthread library. A greater concern is that processes that use NSS MUST be able to succesfully fork other processes. If there is a flaw in NSS/NSPR that makes NSS-based processes from being able to fork children, such a flaw MUST be fixed. To simply eliminate the calls to fork rather than fixing the real flaw is to stick one's head in the sand. I do not object, in principle, to replacing the popen with data from /dev/urandom on those platforms where /dev/urandom is known to be more than a trivial LFSR PRNG. ONCE the flaw(s) that affect fork on BSDI and Solaris are resolved, then I have no problem with eliminating the use of safe_popen on BSDI.

Wan-Teh Chang

Assignee

Comment 19

•

24 years ago

Attached patch Proposed patch to call only async-signal-safe functions in the child process. — Details — Splinter Review

Wan-Teh Chang

Assignee

Comment 20

•

24 years ago

lidl@pix.net wrote: > Under 4.3, the child exits before any data is written to the parent, so > the parent process hangs forever in the fread() of the expected data from > the (no longer running) child process. Could you attach a debugger to the child process and post its backtrace? I am wondering if its backtrace would look similar to the one in bug 108708, seen on Solaris.

Nelson Bolyard (seldom reads bugmail)

Comment 21

•

24 years ago

Why does the parent hang, in this case, instead of getting zero from fread, due to reading EOF? Is it possibly the case that if the FD is already at EOF before the parent calls fdopen, then fread doesn't detect EOF?

lidl

Reporter

Comment 22

•

24 years ago

Here's the stack backtrace for the failure on BSD/OS 4.2. lidl@sloar-39: gdb ./mozilla-bin mozilla-bin.core GNU gdb Copyright 1998 Free Software Foundation, Inc. [...] Reading symbols from /data/obj/mozilla-096/dist/bin/components/libimpText.so...done. Reading symbols from /data/obj/mozilla-096/dist/bin/components/libabsyncsvc.so...done. #0 0x486bcb6d in kill () from /shlib/libc.so.2 (gdb) where #0 0x486bcb6d in kill () from /shlib/libc.so.2 #1 0x487436c3 in abort () from /shlib/libc.so.2 #2 0x48753624 in _thread_aio_entry_destroy () from /shlib/libc.so.2 #3 0x48753e8c in _thread_sys_close () from /shlib/libc.so.2 #4 0x4a833d8c in safe_popen (cmd=0x4a885a28 "netstat -nis") at unix_rand.c:666 #5 0x4a834038 in RNG_SystemInfoForRNG () at unix_rand.c:790 #6 0x4a7896c5 in nss_Init (configdir=0x8450f40 "/homes/elves/lidl/.mozilla/lidl/vipfc2ii.slt", certPrefix=0x4a86b47b "", keyPrefix=0x4a86b47b "", secmodName=0x4a86b49d "secmod.db", readOnly=0, noCertDB=0, noModDB=0, forceOpen=0) at nssinit.c:250 #7 0x4a789867 in NSS_InitReadWrite ( configdir=0x8450f40 "/homes/elves/lidl/.mozilla/lidl/vipfc2ii.slt") at nssinit.c:307 #8 0x4a7155fd in nsNSSComponent::InitializeNSS (this=0x824e700) at /scratch/proj/src/mozilla-096/mozilla/security/manager/ssl/src/nsNSSComponent.cpp:547 #9 0x4a715a35 in nsNSSComponent::Init (this=0x824e700) at /scratch/proj/src/mozilla-096/mozilla/security/manager/ssl/src/nsNSSComponent.cpp:609 #10 0x4a71d859 in nsNSSComponentConstructor (aOuter=0x0, aIID=@0x8060c08, aResult=0x8047510) at /scratch/proj/src/mozilla-096/mozilla/security/manager/ssl/src/nsNSSModule.cpp:51 #11 0x48235064 in nsGenericFactory::CreateInstance (this=0x8255ce0, aOuter=0x0, aIID=@0x8060c08, aResult=0x8047510) at /scratch/proj/src/mozilla-096/mozilla/xpcom/components/nsGenericFactory.cpp:74 #12 0x4822f686 in nsComponentManagerImpl::CreateInstanceByContractID (this=0x806a200, aContractID=0x8061072 "@mozilla.org/security/entropy;1", aDelegate=0x0, aIID=@0x8060c08, aResult=0x8047510) at /scratch/proj/src/mozilla-096/mozilla/xpcom/components/nsComponentManager.cpp:1673 #13 0x48230620 in nsComponentManagerImpl::GetServiceByContractID (this=0x806a200, aContractID=0x8061072 "@mozilla.org/security/entropy;1", aIID=@0x8060c08, result=0x80475b4) at /scratch/proj/src/mozilla-096/mozilla/xpcom/components/nsComponentManager.cpp:2119 #14 0x4822ba07 in nsGetServiceByContractID::operator() (this=0x8047710, aIID=@0x8060c08, aInstancePtr=0x80475b4) at /scratch/proj/src/mozilla-096/mozilla/xpcom/components/nsComponentManager.cpp:236 #15 0x8058ea3 in nsCOMPtr<nsIEntropyCollector>::assign_from_helper (this=0x8047720, helper=@0x8047710, aIID=@0x8060c08) at ../../dist/include/xpcom/nsCOMPtr.h:972 #16 0x805be1b in nsCOMPtr<nsIEntropyCollector>::nsCOMPtr (this=0x8047720, helper=@0x8047710) at ../../dist/include/xpcom/nsCOMPtr.h:553 #17 0x8052f08 in VerifyPsmAbsentOrSane (argc=1, argv=0x804793c) at /scratch/proj/src/mozilla-096/mozilla/xpfe/bootstrap/nsAppRunner.cpp:1029 #18 0x8053ae7 in main1 (argc=1, argv=0x804793c, nativeApp=0x0) at /scratch/proj/src/mozilla-096/mozilla/xpfe/bootstrap/nsAppRunner.cpp:1254 #19 0x8054de3 in main (argc=1, argv=0x804793c) at /scratch/proj/src/mozilla-096/mozilla/xpfe/bootstrap/nsAppRunner.cpp:1630 #20 0x804f28e in __start () (gdb) up 3 #3 0x48753e8c in _thread_sys_close () from /shlib/libc.so.2 (gdb) up #4 0x4a833d8c in safe_popen (cmd=0x4a885a28 "netstat -nis") at unix_rand.c:666 666 for (fd = getdtablesize(); --fd > 2; close(fd)) Current language: auto; currently c (gdb) list 661 case 0: 662 /* dup write-side of pipe to stderr and stdout */ 663 if (p[1] != 1) dup2(p[1], 1); 664 if (p[1] != 2) dup2(p[1], 2); 665 close(0); 666 for (fd = getdtablesize(); --fd > 2; close(fd)) 667 ; 668 669 /* clean up environment in the child process */ 670 putenv("PATH=/bin:/usr/bin:/sbin:/usr/sbin:/etc:/usr/etc"); (gdb) p fd $1 = 7 -Kurt

Wan-Teh Chang

Assignee

Comment 23

•

24 years ago

Mr. Lidl, thank you for the stack backtrace for the failure on BSD/OS 4.2. So the child process crashed while closing fd 7. Some observations. 1. It would be nice to find out what fd 7 was. In particular, was it p[0] or p[1] (the pipe)? 2. All the calls made by the child process up to that point (dup2, close, and sysconf) were async-signal-safe. So this crash in close() could be a BSD/OS bug. 3. One possible workaround is to comment out the for loop that closes all the unnecessary fd's in the child process: for (fd = getdtablesize(); --fd > 2; close(fd)) ; This might introduce security vulnerability though. On the other hand, the commands NSS executes, 'ps' and 'netstat', are not arbitrary user code, so perhaps this security precaution of preventing access to unnecessary fd's is not really necessary.

lidl

Reporter

Comment 24

•

24 years ago

Well, I've spent quite a bit of time dinking around with various test setups trying to see what works and what doesn't work. The long and short of it is that I cannot make the exec() stuff work under BSD/OS, either due to an error in how fork() is implemented, or some other interaction that I don't understand. The traceback that I appended showed the child process, "hung" when attempting to close fd 7. "hung" is a bad description -- it was busy flipping between the threads of the process, because it thought that the close call would block. But the observant reader will note that this was in the child process, after the fork. Why are there multiple threads in the child process after a fork? There shouldn't be, according the rules of pthreading. But clearly there are (see my previous debugger output). 1) I wasn't able to figure out what fd 7 was, I probably could do this with a little more effort. But not tonight. 2) As above, it's not a *crash* in close, but rather where the thread goes into a loop trying to switch to the other threads. They don't have any work to do, so they switch back. Repeat, lather, rinse. 3) Commenting out the loop that closes the fds of the child process causes the program to exit with a GDK error. What is really happening is the process is getting a SIGPIPE writing to the X11 socket. The application is acting as if FD_CLOEXEC is set on all the fds when the command is run. Weird. Modifying the code so that it only closes down to fd 8 gives the same error. (SIGPIPE) Finally, a small modification of changing the exec in the child process to a call to _exit(0) shows that child process exits cleanly and the parent process goes happily on its way. (Except, of course, that it doesn't get the random data it was looking for.) So, I'm drawing a blank as to what to test next -- there is definately something squirrely with the pthreads, fork() and attaching to the child process under BSD/OS. I don't know if it is a real problem, or just something that looks like a problem. I just don't know. Anyhow, I'm going to attach the latest version of my patch, which has my debugging stuff conditionalized, rolls together my previous patch, integrates wtc's patch to avoid certain unsafe operations and disables the safe_popen code for just BSD/OS.

lidl

Reporter

Comment 25

•

24 years ago

Attached patch Latest version of the patch, union of previous patches and more! — Details — Splinter Review

Nelson Bolyard (seldom reads bugmail)

Comment 26

•

24 years ago

This latest patch is apparently applied to an older version of the sources. The code on the trunk and the code on the tip of the NSS_3_3_BRANCH (which which PSM currently takes the sources) already use /dev/urandom. Perhaps the PSM team has not yet moved the static NSS_CLIENT_TAG forward to include these recent changes to unix_rand.c, so I'll add some members of the PSM team to this cc list I'll attach a patch that builds on the existing code that uses /dev/urandom and that doesn't call fork on BSDI systems when data is obtained from /dev/urandom.

Assignee: nelsonb → wtc

Nelson Bolyard (seldom reads bugmail)

Comment 27

•

24 years ago

Attached patch patch for NSS_3_3_BRANCH (obsolete) — Details — Splinter Review

This patch applies to the current tip of the NSS_3_3_BRANCH. It doesn't attempt to fork either PS or netstat on BSDI systems when it succeeded in getting data from /dev/urandom.

Wan-Teh Chang

Assignee

Comment 28

•

24 years ago

Comment on attachment 58802 [details] [diff] [review] patch for NSS_3_3_BRANCH I think +#if defined(BSDI) + if (bytes) + return; +#endif should simply be +#if defined(BSDI) + return; +#endif because we can't call safe_popen() no matter what. Calling safe_popen() will not get us more entropy; instead the child process will crash.

Nelson Bolyard (seldom reads bugmail)

Comment 29

•

24 years ago

In the absence of both netstat and /dev/urandom input, this function does not collect enough entropy to securely seed the PRNG, IMO. So, perhaps NSS initialization should fail in this case.

Wan-Teh Chang

Assignee

Comment 30

•

24 years ago

Mr. Lidl: If you comment out the loop that closes the fds of the child process, is it the parent process or the child process that gets a SIGPIPE writing to the X11 socket? It is worrisome that the child process can't even call exec after we removed all the unsafe functions. This seems to mean that a process using pthreads and X11 can't spawn another process. Can you have someone at BSDI look at this problem? Maybe the pthreads library on BSD/OS is not ready to be used by the Mozilla client?

Status: NEW → ASSIGNED

Wan-Teh Chang

Assignee

Comment 31

•

24 years ago

Attached patch patch for NSS tip. — Details — Splinter Review

This is Nelson's patch (attachment 58802 [details] [diff] [review]) plus a comment describing this bug and the use of "netstat -nis" on BSD/OS (when the bug that prevents us from calling safe_popen in a pthreads environment is fixed).

Attachment #49865 - Attachment is obsolete: true

Attachment #57608 - Attachment is obsolete: true

Attachment #58802 - Attachment is obsolete: true

Wan-Teh Chang

Assignee

Comment 32

•

24 years ago

Fix checked into the tip of NSS.

Status: ASSIGNED → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

Wan-Teh Chang

Assignee

Comment 33

•

23 years ago

Mr. Lidl, Yesterday a customer sent me a simple test program that can cause safe_popen to fail on Linux in a way similar to the stack trace you provided in comment #22. I looked into it and found that it was a shared library linking order problem. I invoked the 'ldd' command on the executable and found that libc.so was before libpthread.so. When I built the test program with the -pthread option of g++/gcc so that libpthread.so is before libc.so in the linking order, the test program completed successfully. If BSD/OS has 'ldd' or a similar command that can report the shared library linking order of an executable program, could you invoke it on the mozilla executable and see the relative order of libc.so and libpthread.so? If the C++ compiler has a -pthread or -pthreads option, are we using that option when linking the mozilla executable on BSD/OS? If the C++ compiler doesn't have such an option, we will need to link with -lpthread explicitly to ensure the correct linking order of libc.so and libpthread.so.

lidl

Reporter

Comment 34

•

23 years ago

There is not a seperate pthreads library on BSD/OS. The pthreads functions have been integrated into libc, directly. However, as you requested, here is the ldd output: lidl-58: export LD_LIBRARY_PATH=/usr/local/mozilla lidl-59: cd /usr/local/mozilla lidl-60: ldd mozilla-bin libgkgfx.so => /usr/local/mozilla/libgkgfx.so (0x4808e000) libjsj.so => /usr/local/mozilla/libjsj.so (0x480b8000) libmozjs.so => /usr/local/mozilla/libmozjs.so (0x480d1000) libxpcom.so => /usr/local/mozilla/libxpcom.so (0x48143000) libplds4.so => /usr/local/mozilla/libplds4.so (0x4823f000) libplc4.so => /usr/local/mozilla/libplc4.so (0x48242000) libnspr4.so => /usr/local/mozilla/libnspr4.so (0x48247000) libdl.so => /shlib/libdl.so (0x48274000) libgtk-1.2.so.0 => /usr/contrib/lib/libgtk-1.2.so.0 (0x48277000) libgdk-1.2.so.0 => /usr/contrib/lib/libgdk-1.2.so.0 (0x48395000) libgmodule-1.2.so.0 => /usr/contrib/lib/libgmodule-1.2.so.0 (0x483c5000) libglib-1.2.so.0 => /usr/contrib/lib/libglib-1.2.so.0 (0x483c8000) libXext.so.6 => /usr/X11R6/lib/libXext.so.6 (0x483ea000) libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x483f8000) libm.so => /shlib/libm.so.0.0 (0x484d6000) libstdc++.so.1 => /usr/lib/libstdc++.so.1 (0x484e7000) libgcc.so.1 => /shlib/libgcc.so.1 (0x48526000) libc.so.2 => /shlib/libc.so.2 (0x48532000) I will add one other comment about a possibly related problem that I have heard of that causes problems on BSD/OS when linking pthreads problems. Basically, the problem is that giving libc as an explicit library on the link line (via a -lc) can cause problems that will not occur if the program is linked such that the compiler will implicitly add the library to the link line. e.g.: gcc -o foo foo.o -lc (might cause pthreads problems) gcc -o foo foo.o (will not cause pthreads problems) I don't think that is the issue here, as the final link line on my BSD/OS 4.3 host goes like this: g++ -o mozilla-bin -I/usr/X11R6/include -fno-rtti -fno-exceptions -Wall -Wconver sion -Wpointer-arith -Wbad-function-cast -Wcast-align -Woverloaded-virtual -Wsyn th -Wno-ctor-dtor-privacy -pthread -pipe -DNDEBUG -DTRIMMED -DWIDGET_DLL=\"libw idget_gtk.so\" -DGFXWIN_DLL=\"libgfx_gtk.so\" -I/usr/contrib/lib/glib/include -I /usr/contrib/include -I/usr/X11R6/include nsAppRunner.o nsWindowCreator.o showO SAlert.o nsSigHandlers.o nsNativeAppSupportGtk.o nsNativeAppSupportBase.o -L.. /../dist/bin -L../../dist/lib -lgkgfx -lmpfilelocprovider_s -ljsj -L../../dist /bin -lmozjs -L../../dist/bin -lxpcom -L/usr/obj/data/mozilla/dist/lib -lplds4 - lplc4 -lnspr4 -ldl -L/usr/contrib/lib -L/usr/X11R6/lib -lgtk -lgdk -lgmodule - lglib -ldl -lXext -lX11 -lm -ldl -lm

Wan-Teh Chang

Assignee

Comment 35

•

23 years ago

I found that NSS (mozilla/security/nss) is not compiled with the -pthread option. That might be a problem. To fix this, edit mozilla/security/coreconf/BSD_OS.mk and define CC as 'gcc -pthread'.

patch to remove safe_popen() function if compiled on BSD/OS 24 years ago lidl 1.53 KB, patch		Details \| Diff \| Splinter Review
Improved patch to turn off safe_popen and use /dev/urandom 24 years ago lidl 2.92 KB, patch		Details \| Diff \| Splinter Review
Proposed patch to call only async-signal-safe functions in the child process. 24 years ago Wan-Teh Chang 2.97 KB, patch		Details \| Diff \| Splinter Review
Latest version of the patch, union of previous patches and more! 24 years ago lidl 6.81 KB, patch		Details \| Diff \| Splinter Review
patch for NSS_3_3_BRANCH 24 years ago Nelson Bolyard (seldom reads bugmail) 1.76 KB, patch		Details \| Diff \| Splinter Review
patch for NSS tip. 24 years ago Wan-Teh Chang 2.32 KB, patch		Details \| Diff \| Splinter Review