Last Comment Bug 182758 - freebl PRNG hashes netstat and /dev/urandom data rather than just using /dev/urandom
: freebl PRNG hashes netstat and /dev/urandom data rather than just using /dev/...
Status: RESOLVED FIXED
: perf
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: 3.11.2
: Sun Solaris
P2 normal (vote)
: 3.11.3
Assigned To: Julien Pierre
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2002-11-29 20:55 PST by Aaron Lehmann
Modified: 2006-09-01 15:17 PDT (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
On Solaris, use only /dev/urandom if it is available. If not, use libkstat (5.04 KB, patch)
2006-08-25 20:35 PDT, Julien Pierre
wtc: superreview-
Details | Diff | Splinter Review
Update (5.61 KB, patch)
2006-08-26 12:37 PDT, Julien Pierre
wtc: review+
nelson: review-
Details | Diff | Splinter Review
Update with feedback from Nelson and Wan-Teh (5.68 KB, patch)
2006-08-30 16:33 PDT, Julien Pierre
nelson: review-
Details | Diff | Splinter Review
update (6.82 KB, patch)
2006-08-31 17:47 PDT, Julien Pierre
nelson: review+
wtc: superreview+
Details | Diff | Splinter Review
Patch as checked in (6.79 KB, patch)
2006-09-01 15:11 PDT, Julien Pierre
no flags Details | Diff | Splinter Review

Description User image Aaron Lehmann 2002-11-29 20:55:04 PST
In security/nss/lib/freebl/unix_rand.c, bytes are read from /dev/urandom and
from the output of netstat to seed the random number generator. I do not see why
this code invokes netstat in the first place. If the OS has a decent random
device, it should be a much better source for entropy than processes like
netstat. In fact, on OS' where /dev/urandom is well-implemented, I don't see why
Mozilla should even use its own PRNG. On Linux, /dev/urandom is a similar PRNG
to the one used in Mozilla, but its entropy comes from things like interrupts.
The Mozilla PRNG uses /dev/urandom and netstat as entropy sources. Why not just
read from /dev/urandom when random bytes are required, when running on an OS
with a cryptographically secure random device? Doing so would not only be
faster, but should be even more secure because nothing is really gained by
hashing output from /dev/urandom a second time, and netstat output has
questionable randomness.
Comment 1 User image Wan-Teh Chang 2002-12-02 13:43:31 PST
Assigned the bug to Nelson.
Comment 2 User image Nelson Bolyard (seldom reads bugmail) 2002-12-02 14:15:00 PST
I see no bug here.  The code is presently working as intended.  
The code works, and produces correct results, whether the OS has a 
well-implemented /dev/urandom or not.

There are ***x OSes that have no /dev/urandom.  It has been reported that
there are ***x OSes that have a bad implementation of /dev/urandom.

How does mozilla know whether an OS's implementation of /dev/urandom is
a good one?  

Should mozilla assume that all /dev/urandom implementations are good?
Comment 3 User image John Unruh 2002-12-12 14:21:08 PST
Can this bug be marked invalid?
Comment 4 User image Nelson Bolyard (seldom reads bugmail) 2002-12-12 15:49:05 PST
In comment 2 I asked a non-rhetorical question.  I expected the answer to
be something like "Yes, for <my favorite platform> mozilla/NSS should assume
that /dev/urandom is present and well implemented.".  

I'm not opposed to changing the code so that, on platforms that we believe
normally have good secure /dev/urandom implementations, NSS will attempt
to use /dev/urandom, and if that succeeds, it will skip the netstat step.

Submittor, is that what you want?
Comment 5 User image Aaron Lehmann 2002-12-12 17:30:57 PST
Linux is the only kernel I am familiar enough with to advocate using its RNG
device. What port maintainers would be good CC candidates for this bug?
Comment 6 User image Julien Pierre 2006-08-25 20:30:06 PDT
This has become a hot issue because the libldap library in Solaris 10 uses NSS, and can be transparently invoked in unknowing applications. If an application has for example enforced a no-forking policy through pthread_atfork handler, but calls some LDAP library calls that invokes NSS, we try to fork netstat, even though /dev/urandom is available. We just don't need to do that.

We still have a requirement to suport Solaris 8, which doesn't have /dev/urandom . We still need an entropy source on Solaris 8. For that case, I have written new code that uses libkstat. Patch forthcoming.
Comment 7 User image Julien Pierre 2006-08-25 20:35:10 PDT
Created attachment 235522 [details] [diff] [review]
On Solaris, use only /dev/urandom if it is available. If not, use libkstat

In the libkstat case, I am feeding all kernel statistics to the PRNG, 4 KB at a time. This can be about 800 - 1200 KB of data on the machines I have tested. I don't have time to sort through which parts really change, but many do. I have benchmarked that this code path takes between 65 and 100% of the CPU and wall time of forking netstat, in other words, it is no worse, and sometimes better, depending on the machiine. I came to that conclusion by timing a script that executes certutil -L on a blank DB 100 times. I tried both on Sparc and x86 .
Comment 8 User image Wan-Teh Chang 2006-08-25 23:03:10 PDT
Comment on attachment 235522 [details] [diff] [review]
On Solaris, use only /dev/urandom if it is available. If not, use libkstat

The last #ifdef SOLARIS block in your patch should be
moved forward, before the #ifdef DO_PS block.  It is
best if it can be combined with the #ifdef BSDI block.
Comment 9 User image Wan-Teh Chang 2006-08-25 23:12:12 PDT
Comment on attachment 235522 [details] [diff] [review]
On Solaris, use only /dev/urandom if it is available. If not, use libkstat

These are just some minor problems.  But since there
are many, I want to ask you to submit a new patch.

In freebl you should use PORT_Assert or PR_ASSERT
(they are synomyms).

The libc assert is controlled by the NDEBUG macro,
which is different from the macros that control
PR_ASSERT.

Declare all the variables and functions static.

>+#include "/usr/include/kstat.h"

This should be #include <kstat.h>.

entropy_collected should not be a global variable.
It should be a local variable in RNG_kstat and passed
as argument to CollectEntropy.

Is this based on some sample code?  This doesn't look
like your coding style.
Comment 10 User image Julien Pierre 2006-08-26 12:37:04 PDT
Created attachment 235586 [details] [diff] [review]
Update

Wan-Teh,

This was written from scratch, it wasn't sample code.

I switched from assert to PORT_Assert, as well as from malloc/free to PORT_Alloc/PORT_Free .

I made both entropy_collected and entropy_buf static locals in RNG_kstat.

I fixed the include path.

Regarding comment 8 and the ifdef, I had put it at the end rather than near the BSDI ifdef, because if I did the later, I would get a compiler warning about unreachable statement in the call to safe_popen . I agree it makes more logical sense to group the BSDI and SOLARIS ifdefs, but getting rid of the compiler warning requires an extra "ifndef SOLARIS" on the netstat forking code, which I have included in this patch.
Comment 11 User image Wan-Teh Chang 2006-08-29 15:12:39 PDT
Comment on attachment 235586 [details] [diff] [review]
Update

Julien, this patch has a bug, and I want to suggest some changes.
So please submit a new patch.

1. In general I don't like leaving debug printf statements in the
code.

2. I suggest that max_entropy_len be renamed entropy_buf_size to
more clearly indicate its relation to entropy_buf.

3. In CollectEntropy, we have:

>+    PRUint32 buffered;
...
>+    buffered += tocopy;

'buffered' is read uninitialized.  Change += to =.

4. Also in CollectEntropy, we have:

>+    if (inlen) {
>+        buffered += CollectEntropy(inbuf, inlen, entropy_buf,
>+                                   entropy_collected);
>+    }
>+    return buffered;

You really like recursion.  I find it harder to understand.
Tail recursions can be easily converted into a loop.

5. In RNG_kstat, we have

>+    kstat_t*        ksp = NULL;
...
>+            total += CollectEntropy((char*)ksp, sizeof(struct kstat),
>+                                    entropy_buf, &entropy_collected);

Is struct kstat the same as kstat_t?

Since I am not familiar with the contents of kstat_t, I can't really
review the two CollectEntropy calls other than that they look correct.

6. It would be nice to add a one-line comment to summarize what
CollectEntropy and RNG_kstat do.


7. In RNG_SystemInfoForRNG, we have:

>+    if (!bytes) {
>+        /* On Solaris 8, /dev/urandom isn't available, so we use libkstat. */
>+        RNG_kstat();
>+    }
>+    return;

Unless you are worried about the cost of calling RNG_kstat(), I
suggest you always call it (i.e., remove if (!bytes)), so that the
code is exercised on your development machine.

>+#ifndef SOLARIS
>+    /* As mentioned above, we never want to fork netstat on Solaris */
>     fp = safe_popen(netstat_ni_cmd);
>     if (fp != NULL) {
> 	while ((bytes = fread(buf, 1, sizeof(buf), fp)) > 0)
> 	    RNG_RandomUpdate(buf, bytes);
> 	safe_pclose(fp);
>     }
>+#endif

Seems like you can emulate the code for the "ps" command
and prepare for other platforms that don't want to run the
"netstat" command:

#ifndef SOLARIS
#define DO_NETSTAT 1
#endif

...

#ifdef DO_NETSTAT
    fp = safe_popen(netstat_ni_cmd);
    ...
#endif
Comment 12 User image Nelson Bolyard (seldom reads bugmail) 2006-08-30 01:27:42 PDT
Comment on attachment 235586 [details] [diff] [review]
Update

I have some minor quibbles with this patch.

1) rather than seeing all the new code be "ifdef solaris" and "ifndef solaris"
which emphasizes the platform-specific nature of the code, 
I'd rather see it use "feature test macros", e.g. "if defined(DO_NETSTAT)" 
and "if !defined(DO_NETSTAT)", where the macro (#define) name is the name of
a feature, not a platform, and then have other #ifdefs that conditionally 
define that feature test macro, or not.  

That way, the Netstat code becomes a feature, to be considered on a platform 
by platform basis, not a platform specific hack, and it is more obvious that 
other platforms may wish to incorporate it, too.  I think all the 
conditionally compiled features for obtaining entropy should be coded that way.

2) I'd like to see the code structured in such a way that there is some 
run time check that ensures that in no build, at no time, is it possible
for this function to "fall out" the bottom without having succesfully 
gotten any entropy from any of its sources.  Perhaps we should have a 
counter of bytes of entropic data recieved/digested, and at the bottom of 
the function abort() if that number is zero.  The code should be structured 
not to bail out early, but to conditionally do subsequent parts only if 
previous ones failed, so that it always gets to the bottom of the function 
and always tests to see if any was succesfully gotten.  

If the code is structured that way, then the next person to come along with
another platform specific hack is likely to follow the established pattern
in the code.  We REALLY don't want this function ever returning without 
succeeding!
Comment 13 User image Nelson Bolyard (seldom reads bugmail) 2006-08-30 01:30:46 PDT
Oh, I meant to add, that doing kstat should be a separate feature  with a 
separate feature test macro, from doing netstat.  The code might be:

#ifdef solaris
#define DO_KSTAT 1
#under DO_NETSTAT
#endif

and the kstat code would be conditioned by 
#ifdef DO_KSTAT
and the netstat code by 
#ifdef DO_NETSTAT
Comment 14 User image Julien Pierre 2006-08-30 12:57:55 PDT
Wan-Teh,

In response to comment 11 :

1-4 : I will resolve these in an upcoming patch.
5 : yes, kstat_t is the same as struct kstat . But I will change the code for consistency to use kstat_t in both places.
6 : will do
7 : the cost of RNG_kstat is about the same or lower than safe_popen, but it is still non-zero . I don't see a good reason to execute it if we have /dev/urandom . I tested the code on my machine by adding an environment variable to always trigger it. I also have a Solaris 8 box which doesn't have /dev/urandom that I tested on, and I verified it took this code path. We also have one nightly QA box, mintchip, which is running Solaris 8, so this code will be exercised.
8 and Nelson's comment 12 1) and 13 :
I will add a DO_NETSTAT feature macro, the safe_popen code is shared between several platforms. However, for kstat, this code is completely specific to Solaris, so it belongs within an #ifdef SOLARIS . And I wouldn't call it a hack. unix_rand.c is full of platform-specific code surrounded by platform #ifdefs . Even the source file itself is conditionally compiled - we have separate mac_rand.c, os2_rand.c and win_rand.c.

Nelson,
In response to comment 12 :
1 : see the end of my response to Wan-Teh above
2 : I agree that the code should not return without having collected some entropy. However, this is a general problem that is common to all platforms well beyond the scope of this particular fix. RNG_SystemInfoForRNG returns void, so there is no possibility for softoken to check if anything went wrong with the RNG during C_Initialize . That is a serious problem, but it needs to be fixed for all platforms in a separate bug. For now, my upcoming patch will change RNG_kstat to not be a void function, and return the number of entropy bytes actually collected, which will make it easier to convert its caller to a non-void function in the future.
Comment 15 User image Julien Pierre 2006-08-30 16:33:03 PDT
Created attachment 236154 [details] [diff] [review]
Update with feedback from Nelson and Wan-Teh

- remove kprintf statements
- rename max_entropy_len to max_entropy_buf_len
- initialize buffered to zero
- replace recursion with a while loop
- replace struct kstat with kstat_t
- add one-liners for CollectEntropy and RNG_kstat
- add and use DO_NETSTAT feature macro
- make RNG_kstat return number of entropy bytes and assert that it is non-zero (the caller returns void, so this is the most I can do in this bug).

Wan-Teh, the doc on libkstat is available at http://developers.sun.com/solaris/articles/kstatc.html .
Comment 16 User image Nelson Bolyard (seldom reads bugmail) 2006-08-30 23:00:52 PDT
Comment on attachment 236154 [details] [diff] [review]
Update with feedback from Nelson and Wan-Teh

This patch is all OK except for the behavior of function CollectEntropy.
It should never call RNG_RandomUpdate with zero bytes of input.   And
I think it should call RNG_RandomUpdate 0, 1 or 2 times (at most) per call.  

The variable names "buffered" and "entropy_collected" seem to be interchanged.  
That is, "entropy_collected" counts the amount in the buffer, and "buffered" 
counts the amount input to the PRNG.  Seems backwards to me, but that's not 
a huge deal.  I just have to remember that buffered means collected by PRNG
and entropy_collected means buffered.  :-/

Consider if CollectEntropy gets called for the first time with a inlen==12k.
The code will call RNG_RandomUpdate with zero bytes immediately, then it
will loop, copying 4KB at a time into a buffer, and calling RNG_RandomUpdate
once each time in the loop.  In that case, it seems to me that (a) it 
shouldn't call RNG_RandomUpdate with zero bytes, and (b) it can call 
RNG_RandomUpdate just once to swallow the whole 12KB input in a single shot.
No need to break it up into smaller bites first.  CollectEntropy can do the 
job in just two calls to RNG_RandomUpdate, with no zero length calls.

Let me suggest this algorithm psuedo code:
   if (input len >= max buf len)
       if (buffer is not empty)
           RNG_RandomUpdate what's in the buffer
           mark buffer emtpy
       RNG_RandomUpdate the whole input
       return
   while (there remains unconsumed input)
       append as much of the unconsumed input into the buffer as will fit.
       if (buffer is full)
           RNG_RandomUpdate what's in the buffer
           mark buffer emtpy
   return

That while loop will execute 0, 1 or 2 times 
and will call RNG_RandomUpdate 0 or 1 times.
Comment 17 User image Julien Pierre 2006-08-31 17:24:12 PDT
Nelson,

I have done a fair amount of benchmarking with the code. I found that buffering 4KB at a time gave indistinguishable performance from having a single big buffer (4 megabytes) allocated upfront, buffering all the kstat data into it, and feeding it to the RNG in one call. I benchmarked that by simply changing the size of my buffer. Given this finding, I'm inclined to keep the code as simple as possible, even if there are possible algorithmic improvements that don't result in an actual difference.

That said, the fact that my code called RNG_RandomUpdate with a zero-length was a bug, which I will correct.
Comment 18 User image Julien Pierre 2006-08-31 17:47:08 PDT
Created attachment 236342 [details] [diff] [review]
update

This patch contains several changes :
1) More comments
2) CollectEntropy and RNG_kstat are both changed to return a SECStatus . This is to ease future work in bug 350798 to check for success
3) CollectEntropy has an extra PRUint32 *total_fed argument which gets incremented
4) RNG_kstat has an extra PRUInt32* fed argument to return the number of bytes of entropy that it generated and fed to the RNG
5) CollectEntropy now only feeds data to the RNG if the buffer is full, and will never pass a zero length to RNG_RandomUpdate
6) CollectEntropy local variable buffered renamed to processed
7) CollectEntropy argument entropy_collected renamed to entropy_buffered. Same for RNG_kstat local
Comment 19 User image Nelson Bolyard (seldom reads bugmail) 2006-08-31 18:21:44 PDT
Comment on attachment 236342 [details] [diff] [review]
update

r=nelson
Please file a separate bug about the fact that RNG_SystemInfoForRNG can fail silently in optimized builds.
Comment 20 User image Julien Pierre 2006-08-31 18:27:41 PDT
Nelson,

Thanks for the review. I think your request from comment 19 is covered under bug 350798, which I filed yesterday .
Comment 21 User image Wan-Teh Chang 2006-09-01 09:55:04 PDT
Comment on attachment 236342 [details] [diff] [review]
update

r=wtc.

Please remove the 4 extraneous semicolons after closing curly
braces.  Just search for "};" in the file and remove the ";".

The following are optional suggested changes and comments.
You can ignore them, especially 7, 8, and 9.

1. max_entropy_buf_len should be renamed entropy_buf_len.
The length of entropy_buf is fixed, so "max" doesn't make
sense.

2. In this comment:

>+/* Buffer entropy data, and feed it to the RNG, 4 KB at a time.
>+ * Returns error if RNG refuses data.

Change "4 KB" to "max_entropy_buf_len bytes".

Change "RNG refuses data" to "RNG_RandomUpdate fails".

3. Since entropy_collected has been renamed entropy_buffered,
perhaps CollectEntropy should also be renamed BufferEntropy?

4. In CollectEntropy, the local variable 'processed' is not
necessary.  You can update *total_fed directly, i.e., replace

        processed += tocopy;

by

        *total_fed += tocopy;

at the end of the while loop.

5. In RNG_kstat, you have:

>+            if (-1 == kstat_read(kc, ksp, NULL)) {

and

>+    if (kstat_close(kc)) {

It would look nicer to use the same method to test for the failure
of kstat_read and kstat_close.

6. The local variable 'total_fed' in RNG_kstat is not necessary.
RNG_kstat can actually pass 'fed' through to the two CollectEntropy
calls (instead of &total_fed).

7

>+    if (!bytes) {
>+        /* On Solaris 8, /dev/urandom isn't available, so we use libkstat. */
>+        PRUint32 kstat_bytes = 0;
>+        if (SECSuccess != RNG_kstat(&kstat_bytes)) {
>+            PORT_Assert(0);
>+        };
>+        bytes += kstat_bytes;
>+        PORT_Assert(bytes);
>+    }

Since 'bytes' is 0, you can just pass &bytes instead of
&kstat_bytes to RNG_kstat.  Since the type of 'bytes' is
size_t, this change would require propagating that type
to the 'fed' parameter of RNG_kstat and possibly the
'total_fed' parameter of CollectEntropy.

8. RNG_kstat can call RNG_RandomUpdate directly.  It doesn't need to
buffer the entropy in CollectEntropy.  Another way to say this is that
if the CollectEntropy function is a good idea, then we should use it
throughout the RNG_SystemInfoForRNG function.

For example, the while loop that steps through all the environment
variables in the 'environ' array is very similar to your RNG_kstat
function.  It should use the same method (either buffered or
non-buffered) as RNG_kstat to feed its data to RNG_RandomUpdate.

9. RNG_SystemInfoForRNG already has a buffer 'buf'.  Ideally you should
be able to use that buffer instead of 'entropy_buf' in RNG_kstat.
Comment 22 User image Julien Pierre 2006-09-01 15:11:14 PDT
Created attachment 236470 [details] [diff] [review]
Patch as checked in

Wan-Teh,

Thanks for the review.

1) This is a leftover from my Pascal days that I will probably never get rid of.
Speaking of which, this code would have been so much cleaner in that language which permits declaration of not only local variables, but also local functions, thus avoiding the need for any pointers to locals in this case.
I fixed this.

3) Yes, I had actually considered that, but decided that maybe 1 or 2 symbols should be unchanged from the previous patch ;) But you are correct, so I made this change.

4) This change simplifies the code, so I made it too.

5) From the man pages for kstat_read and kstat_close :

RETURN VALUES
     Upon successful completion, kstat_read()  and  kstat_write()
     return  the  current  kstat chain ID (KCID). Otherwise, they
     return -1 and set errno to indicate the error.

     Upon successful completion, kstat_close() returns 0.  Other-
     wise, -1 is returned and errno is set to indicate the error.

For kstat_close, the only correct value is zero, so checking for if (kstat_close()) is best IMO. Even if the failure case is only supposed to be -1, if there is a bug, it could be something else, and my code detects that.

For kstat_read, there is no unique known good value, so I have to check for -1 .
So, I prefer to keep the 2 different checks here.

6) Only if *fed is initialized to 0, which it isn't guaranteed to be . An extra *fed = 0 statement is needed to replace total_fed = 0 . However, looking at this, I found that *fed wasn't getting reset at all if kstat_open failed upfront and the function returned. So I have changed the code to initialize *fed first . I also added a NULL-pointer check for the fed argument.

7) This change is not necessary.

8) Buffering the entropy data makes sense only if there may be lots of small chunks, as there happens to be always the case for the kstat structures themselves, and some of the data they point to. The code would need to be reviewed for each platform to see where it makes sense to buffer.

The buffering could be moved into RNG_RandomUpdate itself, if there was an RNG_RandomFinish or RNG_RandomFlush to go with it.

9) That is not necessary and ties the code a lot more to its caller.

I checked this patch in to the tip .
Checking in config.mk;
/cvsroot/mozilla/security/nss/lib/freebl/config.mk,v  <--  config.mk
new revision: 1.16; previous revision: 1.15
done
Checking in unix_rand.c;
/cvsroot/mozilla/security/nss/lib/freebl/unix_rand.c,v  <--  unix_rand.c
new revision: 1.19; previous revision: 1.18
done

And NSS_3_11_BRANCH :

Checking in config.mk;
/cvsroot/mozilla/security/nss/lib/freebl/config.mk,v  <--  config.mk
new revision: 1.15.2.1; previous revision: 1.15
done
Checking in unix_rand.c;
/cvsroot/mozilla/security/nss/lib/freebl/unix_rand.c,v  <--  unix_rand.c
new revision: 1.17.10.2; previous revision: 1.17.10.1
done
Comment 23 User image Julien Pierre 2006-09-01 15:17:53 PDT
P2 is more appropriate, since only a subset of particularly messed-up applications suffer from the use of fork.

Note You need to log in before you can comment on or make changes to this bug.