Last Comment Bug 353910 - memory leak in RNG_RNGInit
: memory leak in RNG_RNGInit
Status: RESOLVED FIXED
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: 3.11.3
: All All
: P2 normal (vote)
: 3.11.4
Assigned To: Julien Pierre
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-09-22 18:24 PDT by Julien Pierre
Modified: 2006-09-26 15:27 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
change default macro (3.92 KB, patch)
2006-09-22 19:31 PDT, Julien Pierre
nelson: review+
Details | Diff | Splinter Review
patch with Wan-Teh's suggestion (3.73 KB, patch)
2006-09-23 15:03 PDT, Nelson Bolyard (seldom reads bugmail)
wtc: review-
Details | Diff | Splinter Review
Implements Wan-Teh's suggestion in comment 5 (5.05 KB, patch)
2006-09-25 19:00 PDT, Julien Pierre
nelson: review+
wtc: superreview+
Details | Diff | Splinter Review

Description Julien Pierre 2006-09-22 18:24:12 PDT
This freebl memory leak is seen on every NSS initialization .

        88       1  0x813c648  calloc < PR_Calloc < PR_NewLock < 0xd15041f9< PR_CallOnce < 0xd15042ea< RNG_RNGInit < nsc_CommonInitialize < NSC_Initialize < secmod_ModuleInit < SECMOD_LoadPKCS11Module < SECMOD_LoadModule < SECMOD_LoadModule < nss_Init < NSS_Initialize < main
        88       1  0x813c1a8  calloc < PR_Calloc < 0xd154d5aa< 0xd15041bf< PR_CallOnce < 0xd15042ea< RNG_RNGInit < nsc_CommonInitialize < NSC_Initialize < secmod_ModuleInit < SECMOD_LoadPKCS11Module < SECMOD_LoadModule < SECMOD_LoadModule < nss_Init < NSS_Initialize < main
Comment 1 Julien Pierre 2006-09-22 19:31:06 PDT
Created attachment 239755 [details] [diff] [review]
change default macro

I was pleasantly surprised that the shutdown code for the RNG context was already written. But it was #ifdef LEAK_TEST . IMO, this was the wrong macro name to use. The Mozilla browser is no longer the main user of softoken. softoken gets delivered as part of several operating systems and used in many kinds of applications, many of which load and unload it dynamically. This memory needs to be cleaned up for these applications, and not just for testing purposes. I kept the code under ifdef KEEP_ENTROPY if the browser wants to have the existing behavior, but it should be the exception rather than the rule.
Comment 2 Nelson Bolyard (seldom reads bugmail) 2006-09-22 23:52:49 PDT
Comment on attachment 239755 [details] [diff] [review]
change default macro

r=nelson, with one change:

>-static PRCallOnceType coRNGInit = { 0, 0, 0 };
>+static PRCallOnceType pristineCallOnce;
>+static PRCallOnceType coRNGInit;

Declare pristineCallOnce to be const.
Comment 3 Wan-Teh Chang 2006-09-23 08:07:55 PDT
I know about this leak.  I don't know why I looked at it.
I can't find an existing bug, so perhaps I found it when
I reviewed the code for FIPS validation.

When I looked at this issue, I came up with the following
solution.

Add a static RNGContext structure and make globalrng point
to it:

  static RNGContext therngcxt;

Then, change rng_init to set globalrng to &therngcxt,
and change freeRNGContext to not call
  PORT_ZFree(globalrng, sizeof *globalrng);
Instead, manually zero the globalrng->xxx fields except
the globalrng->XKEY buffer, which holds entropy.

(Note: we can simply declare globalrng as:

  static RNGContext globalrng;

But this requires changing global->xxx to globalrng.xxx
and globalrng to &globalrng throughout the file, which
makes the patch larger.  It also requires a new way to
test if rng_init succeeded.  Right now we simply test
globalrng != NULL.)

Finally, remove the #ifdef LEAK_TEST from the softoken.
Comment 4 Nelson Bolyard (seldom reads bugmail) 2006-09-23 15:03:44 PDT
Created attachment 239823 [details] [diff] [review]
patch with Wan-Teh's suggestion

I really like Wan-Teh's suggestion of using a static RNGContext, and
having globalrng be set to its address in rng_init.  So I wrote the
attached patch, which incorporates Julien's previous patch plus 
Wan-Teh's suggestions (I think).  I have not done extensive testing
with this patch.
Comment 5 Wan-Teh Chang 2006-09-23 15:30:57 PDT
Comment on attachment 239823 [details] [diff] [review]
patch with Wan-Teh's suggestion

Thank you, Nelson.  I just thought of another issue.
Since the internal state of the RNG is a critical security
parameter (CSP) of the crypto module, let's hash it with
B_HASH_BUF as a form of zeroization without losing the
entropy.

The 'Xj', 'avail' and 'seedCount' fields also need to be
zeroed during RNG shutdown or initialization.
Comment 6 Nelson Bolyard (seldom reads bugmail) 2006-09-23 16:27:22 PDT
I'll let you write that patch, Wan-Teh.
Comment 7 Julien Pierre 2006-09-25 14:10:26 PDT
It occurred to me over the week-end that as part of bug 115951, libfreebl3.so will be unloaded. Thus, all global variables relating to the RNG context, static or not, will be freed from memory during C_Finalize, and there will be no way to keep the entropy around after C_Finalize . When C_Initialize gets called a second time, it will load libfreebl again, and get a new "static" variable with a pristine state. Thus, attachment 239755 [details] [diff] [review] can be simplified to just call RNG_RNGShutdown unconditionally in pkcs11.c, and the long comment can be removed .

Even if libfreebl didn't get unloaded, in a properly-written NSS and PKCS#11 application, NSS_Initialize or softoken's C_Initialize should not be called concurrently from multiple threads. And C_Finalize's behavior is undefined if there are any pending calls to other PKCS#11 APIs in other threads, except for C_WaitForSlotEvent . As long as C_WaitForSlotEvent doesn't need to access the RNG context, I think we should be OK with the simpler patch.

I think the dynamic unloading of libfreebl will automatically take care of the zeroing of the context mentioned by Wan-Teh in issue 5, so there should be no zero'ing code needed to write if we fix bug 115951 .
Comment 8 Julien Pierre 2006-09-25 19:00:35 PDT
Created attachment 240089 [details] [diff] [review]
Implements Wan-Teh's suggestion in comment 5

Note that with the current architecture of softoken with a freebl shared library on all platforms, the entropy will always be lost in the  BL_Cleanup call that follows RNG_RNGShutdown, which will dlclose freebl once bug 115951 is fixed . So, it's not particularly useful that RNG_RNGShutdown preserves the entropy . But as a CSP, the entropy still needed to be zero'ed/reset since it is a CSP, even if it got unloaded.

If we ever change back to freebl static libraries, or some other architecture such as filtee library that doesn't require unloading freebl in C_Finalize, then this code will allow the entropy to be reused in programs that call C_Initialize / C_Finalize / C_Initialize again without unloading the softoken .
Comment 9 Wan-Teh Chang 2006-09-26 09:24:07 PDT
Comment on attachment 240089 [details] [diff] [review]
Implements Wan-Teh's suggestion in comment 5

r=wtc.

I found that we actually documented that PRCallOnceType
variables should be initialized to all zeros:
http://www.mozilla.org/projects/nspr/reference/html/prinit.html#15926
http://developer.mozilla.org/en/docs/PR_CallOnce

So, it's okay to get rid of pristineCallOnce andjust use
memset to reset coRNGInit.

You may want to change the word "free" in the comment
and function name here:

 /*
  * Free the global RNG context
  */
 static void
 freeRNGContext()

because it's no longer accurate.  Perhaps we can use
"clean up" or "shut down".
Comment 10 Nelson Bolyard (seldom reads bugmail) 2006-09-26 10:11:01 PDT
Let's just stay with the "pristine" copy and doing a structure copy.
It's more efficient than using memset for such a small struct.
Comment 11 Julien Pierre 2006-09-26 15:27:00 PDT
Thanks for the reviews. I changed the comment for freeRNGContext to say "Clean up" .

I checked the patch in to the tip :

Checking in freebl/prng_fips1861.c;
/cvsroot/mozilla/security/nss/lib/freebl/prng_fips1861.c,v  <--  prng_fips1861.c
new revision: 1.25; previous revision: 1.24
done
Checking in softoken/pkcs11.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v  <--  pkcs11.c
new revision: 1.133; previous revision: 1.132
done

And to NSS_3_11_BRANCH :
Checking in freebl/prng_fips1861.c;
/cvsroot/mozilla/security/nss/lib/freebl/prng_fips1861.c,v  <--  prng_fips1861.c
new revision: 1.22.2.2; previous revision: 1.22.2.1
done
Checking in softoken/pkcs11.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v  <--  pkcs11.c
new revision: 1.112.2.18; previous revision: 1.112.2.17
done

Christophe, this is also an init-time leak that we had chosen to ignore in previous leak tests. Please remove stacks that include RNG_RNGInit from the ignore list from future leak tests.

Note You need to log in before you can comment on or make changes to this bug.