Closed Bug 287116 Opened 19 years ago Closed 18 years ago

valgrind reports UMR in alg_fips186_1_x3_1

Categories

(NSS :: Libraries, defect, P3)

3.9.3
x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.11.1

People

(Reporter: wtc, Assigned: wtc)

References

Details

(Keywords: valgrind)

Attachments

(1 file)

A customer submitted the following bug report.

He's using valgrind 2.4 on Red Hat Enterprise Linux 3.
He's seeing the following error when running NSS 3.9.3.

==3587== Conditional jump or move depends on uninitialised value(s)
==3587==    at 0x1B9049C8: memcmp (mac_replace_strmem.c:325)
==3587==    by 0x1BAD107B: alg_fips186_1_x3_1 (prng_fips1861.c:199)
==3587==    by 0x1BAD13E8: prng_RandomUpdate (prng_fips1861.c:329)
==3587==    by 0x1BAD1455: RNG_RandomUpdate (prng_fips1861.c:347)
==3587==    by 0x1BAD1F6B: RNG_FileUpdate (unix_rand.c:959)
==3587==    by 0x1BAD1E2D: RNG_SystemInfoForRNG (unix_rand.c:846)
==3587==    by 0x1BABBEF0: nsc_CommonInitialize (pkcs11.c:2781)
==3587==    by 0x1BABC03C: NSC_Initialize (pkcs11.c:2835)
==3587==    by 0x1BA2762C: secmod_ModuleInit (pk11load.c:108)
==3587==    by 0x1BA27A16: SECMOD_LoadPKCS11Module (pk11load.c:263)
==3587==    by 0x1BA2A285: SECMOD_LoadModule (pk11pars.c:322)
==3587==    by 0x1BA2A2FA: SECMOD_LoadModule (pk11pars.c:337)
==3587==    by 0x1BA09827: nss_Init (nssinit.c:456)
==3587==    by 0x1BA099CE: NSS_Initialize (nssinit.c:525)

Is this UMR benign?
This bug report requires more investigation. It's not immediately obvious
how the code could arrive at that memcmp call with either of its inputs
being uninitialized.

The memcmp call at line 199 in NSS 3.9.3 (now at line 202 on trunk, due
to license block changes, but no others) compares two arrays of 20 bytes each
(BSIZE is 20).  

199            if (memcmp(x_j, rng->Xj, BSIZE) == 0)

If either of those arrays contains uninitialized data at that point, then 
something is very broken.  This is FIPS validated code, so we'd better check.

The array rng->Xj is in the "global rng context", which is allocated via
PORT_ZAlloc, so if it contains a UMR data pattern, then I'd have to 
conclude that the UMR data got copied into it some time after it was 
created.

The array x_j was filled in by the immediately preceeding call to 
RNG_UpdateAndEnd_FIPS186_1 (which is a variant of SHA1_End), and was 
possibly modified by the call to dsa_reduce_mod_q().  

Perhaps a bug in the special SHA1-like algoirthm RNG_UpdateAndEnd_FIPS186_1
(if it existed) might cause the contents of ctx->H to contain UMR data,
but it would have to have been byte swapped UMR data.  

Were any other UMRs reported immediately prior to this one, in any SHA1
or RNG code?  If so, that might explain other UMR data being copied into 
one of the arrays input to memcmp.

I see one other potential issue.   Perhaps this customer's program 
initialized NSS, shut it down, and then reinitialized it, or tried to 
initialize it, failed, and then tried again?  

RNG_RNGInit uses PR_CallOnce() to call function rng_init.  

rng_init can fail in several ways.  It can fail to allocate globalrng, and
it can fail to allocate the PRLock for the globalrng.  In the latter case,
it leaves the globalRNG uninitialized (other than having been zeroed by 
PORT_ZAlloc), and does not get globalrng->isValid.  RNG_RNGInit tests to
see if globalrng is NULL, but does not also test to see if globalrng->isValid
is set.  So, a failure to allocate that lock goes undetected.  

Whether detected or not, if rng_init fails, the PRCallOnce structure that 
was used to call it is not reset, so it will not be called again unless 
and until RNG_RNGShutdown is called (it clears the CallOnce structure).  
Could RNG_RNGInit have been called twice without an intervening call to 
RNG_RNGShutdown?
I just noticed RNG_FileUpdate was in the stack.  Perhaps there is some file
whose contents lead to a repeatable intermediate state where the data in
the RNG context buffer happens to look like the UMR pattern.  Is that possible?

Is this UMR reproducible?
The product is Netscape Directory Server, so it's not doing
the initialize, shutdown, reinitialize sequence.  The two
functions at the bottom of the stack were

==3587==    by 0x1B970EEB: slapd_nss_init (ssl.c:472)
==3587==    by 0x80691F9: main (main.c:800) 

The bug reporter thinks the UMR is likely to be benign or even
a false alarm:

    valgrind complains about a lot of bit shifting operations.
    It's as if, when a shift creates "new bits", the new bits
    are marked as uninitialized. 
QA Contact: bishakhabanerjee → jason.m.reid
FWIW, I also see this UMR reported when running Firefox under valgrind.
Target Milestone: --- → 3.11
Attached patch patchSplinter Review
So this does seem to be benign, but the fact that this patch fixes the warnings (tons of them!) indicates that it's not a bogus warning.

I'm assuming this is because struct stat has holes in it, although it could be because stat doesn't fill in the entire struct.  I hope it's the former.

This does very slightly reduce the amount of "random" data, but I sure hope we're not depending on a few bits of probably-somewhat-predictable stack from holes in a stack allocated structure for anything critical.

It would be very helpful for those running Mozilla/Firefox under valgrind -- this causes tons of warnings, since (without the patch) valgrind seems to taint the contents of the random number generator as derived from uninitialized data and thus warn every time it's updated.
Attachment #210988 - Flags: review?(wtchang)
On FC4, /usr/include/bits/stat.h shows that struct stat has a number of members like:
    unsigned short int __pad1;
which look like intentional padding.  It wouldn't surprise me if some of these were uninitialized.

It does not appear to have any unintentional padding due to alignment requirements, though.  I tested this with a C program using the macro:

#define PRINT_INFO(n_) printf("stat::%s offset=%d size=%d\n", #n_, ((char*)&(s.n_)) - ((char*)&s), sizeof(s.n_))

on each member of struct stat.
Comment on attachment 210988 [details] [diff] [review]
patch

Thanks, David.  I hadn't imagined that the stat struct would contain uninitialized data.
Attachment #210988 - Flags: review+
Comment on attachment 210988 [details] [diff] [review]
patch

r=wtc.  NSS should not depend on the random values
in the unused/reserved fields of struct stat because
the existence of those fields is not documented and
they may not even exist on some platforms or in a
future version of the OS.
Attachment #210988 - Flags: review?(wtchang) → review+
Ah.. are we loosing entropy here. The function that we are running is trying to feed more entropy into the pool. We do care if the memory doesn't exist, but reading UMR data here is actually a good thing.

Yes, the os could start initializing these fields, or they may always be some fixed value, but initializing the data structure is guarrenteed to give 0 entropy from those fields.

In this case the UMR is more than benign, its beneficial.

bob
I believe t is a mistake leave this as-is, on two grounds:

  - Bad style. The content of gaps in a |struct stat| at a particular stack location is not "high entropy": it is uninitialized. For the purpose of cryptanalysis, undefined. It happens to be the case that those numbers are the same run-to-run on my system, but who's checking? Not us. Our PRNG-seeding code folds in everything it can imagine as "not being a compile-time constant", without regard to entropy quantity or quality, depletion, attack model, re-seed rate, etc. Some of the things it does are actively detremental to security seeding from OS-version constants?), other parts simply add code without adding security. We ought to resist/correct this style: making a security system too much of a jumble to follow doesn't make its output cryptographically strong.

  - Defeating important debugging. Tracking defined-ness is a valuable debugging method, and our current behavior interferes with it. We could tell everyone who wants to use valgrind (or similar tools) to write suppressions for the initial context: that is work for N > 1 developers. Or we could remove it, and make the debugging tools work for N developers without such customization: that is work for 1 developer.
I agree that this is a case wherein the value of the questionable extra
entropy is not worth the pain of the UMRs.  There are places in NSS where
optimized algorithms get speed at the expense of UMRs, UMRs which are 
certain never to cause any problems (doing word-aligned word reads of 
sections of strings, for example).  But there is so little benefit to 
the UMRs in this function that I'm quite willing to forego them.
(In reply to comment #11)
> There are places in NSS where
> optimized algorithms get speed at the expense of UMRs, UMRs which are 
> certain never to cause any problems (doing word-aligned word reads of 
> sections of strings, for example).

While other tools may do so, valgrind would not report such things as UMRs.  It tracks which memory is uninitialized and only complains if the uninitialized data is used as the basis for a conditional jump or move, as a parameter to a system call, etc.

Any chance that this patch will be checked in?
> Any chance that this patch will be checked in?
Only question is when.  

Glen, if this patch gets checked in now, 
will we have to redo the FIPS tests for the PRNG? 
(I think not, but want to be sure.)
Priority: -- → P3
Target Milestone: 3.11 → 3.11.1
QA Contact: jason.m.reid → libraries
Comment on attachment 210988 [details] [diff] [review]
patch

Wan-teh and I discussed this bug today. The method RNG_FileUpdate does not get envoked by the FIPS algorthm tests for PRNG, and therefore we do not have to rerun the tests. Wan-teh also asked me to check this in  on the 3.11 and 3.12 branches. 
cvs commit -m "David Baron fix for valgrind report of UMR r=wtchang sr=Nelson" unix_rand.c
Checking in unix_rand.c;
/cvsroot/mozilla/security/nss/lib/freebl/unix_rand.c,v  <--  unix_rand.c
new revision: 1.18; previous revision: 1.17
done

new revision: 1.17.10.1; previous revision: 1.17
done
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
*** Bug 217558 has been marked as a duplicate of this bug. ***
Keywords: valgrind
You need to log in before you can comment on or make changes to this bug.