Closed Bug 115951 Opened 23 years ago Closed 17 years ago

freebl dynamic library is never unloaded by libsoftoken or libssl. Also tiny one-time leak in freebl's loader.c

Categories

(NSS :: Libraries, defect, P3)

Tracking

(Not tracked)

RESOLVED FIXED
3.11.4

People

(Reporter: nelson, Assigned: julien.pierre)

References

Details

(Keywords: coverity, Whiteboard: FIPS [CID 499])

Attachments

(5 files, 3 obsolete files)

On platforms where freebl is a separate DSO from libNSS3, 
(e.g. Solaris for Sparc and HPUX for PARisc)
the code that loads the freebl DSO allocates and then leaks a 
BLLibrary structure.  

I can see several ways to fix this:

a) free it in freebl_LoadDSO prior to returning PR_SUCCESS, or 

b) have a static BLLibrary structure instead of dynamically allocating 
and freeing it, or 

c) keep a static copy of the pointer to the BLLibrary, and have 
BL_Cleanup() (a function down at the bottom of loader.c) call 
bl_UnloadLibrary with it.  In this case, the PRCallOnce flag in
freebl_RunLoaderOnce should also be cleared by BL_Cleanup, so that 
if an application reinitializes NSS after calling BL_Cleanup, the 
library will get reloaded properly.

I think c is the cleanest way to do this, but I have no strong preferences.
Marking PR3 because this leak is trivial at worst.
Priority: -- → P3
Target Milestone: --- → 3.4
Changed the QA contact to Bishakha.
QA Contact: sonja.mirtitsch → bishakhabanerjee
Set target milestone to NSS 3.5.
Target Milestone: 3.4 → 3.5
Moved to 3.7 and assigned to Nelson.
Assignee: wtc → nelsonb
Target Milestone: 3.5 → 3.7
Moved to target milestone 3.8 because the original
NSS 3.7 release has been renamed 3.8.
Target Milestone: 3.7 → 3.8
Attached patch patch v1 (obsolete) — Splinter Review
This approach unloads the freebl shared library when BL_Cleanup is called.
It continues to use PRCallOnceType to avoid loading the freebl shared library
more than once.  When the shared library is unloaded, the PRCallOnceType is
reset by memsetting it.  Wan-Teh, is this acceptable?
Comment on attachment 113765 [details] [diff] [review]
patch v1

> void 
> BL_Cleanup(void)
> {
>   if (!vector && PR_SUCCESS != freebl_RunLoaderOnce())
>       return;
>   (vector->p_BL_Cleanup)();
>+  bl_UnloadLibrary(&blLib);
>+  memset(&once, 0, sizeof once);
> }

The behavior after zeroing the PRCallOnceType variable
is undefined.  In the current NSPR implementation, it
does what you want.
I wanted to check in the patch for this bug so it would be fixed in NSS 3.8.
But I am concerned that it might conceivably break profile switching for 
mozilla on the affected platforms, even though the NSS test programs pass
with this patch.  NSS test programs do not test the ability to restart NSS
after doing a shutdown, so if this patch left NSS (or the blapi shared lib)
in some state where it could not be restarted, that would not be detected 
with the NSS test programs.  

So, if someone case test that this patch doesn't break profile switching on
solaris and hpux, then I'd like to see this patch applied for NSS 3.8.  
I think we should implement option a or b.  It is not
necessary to unload the freebl DSO during NSS_Shutdown.
Unloading the freebl DSO is equivalent to unloading
the softoken, which we don't do.

Perhaps we should have two flavors of NSS_Shutdown:
shutdown for profile switching, and shutdown for
program termination.  It is simply not necessary to
unload the freebl DSO for profile switching.  It would
be nice but not necessary to unload the freebl DSO for
program termination.
Remove target milestone of 3.8, since these bugs didn't get into that release.
Target Milestone: 3.8 → ---
*** Bug 274005 has been marked as a duplicate of this bug. ***
This leak was reported internally at Sun against Solaris 10.
I was in favor of option a) also, until I read this discussion .
I agree with Wan-Teh that unloading libfreebl*.so is the equivalent of unloading
the softoken.

We don't ever do that in NSS, as libnss3 is implicitly linked with libsoftokn3 .
However, the softoken module can be used independently of libnss3, for example
with the JDK 1.5 PKCS#11 engine . In this type of environment, it would be
reasonable to expect C_Finalize to unload libfreebl.so .

I see that PR_CallOnce is being used here to load freebl only once. I'm
wondering what the behavior will be in an application that does not link with
NSPR, for example a Java application . In theory, in that case, unloading
libsoftokn3.so from memory should also unload libnspr4.so, right ? And thus
there should be no problem trying to reload the softoken, as NSPR would get
initialized again . Or is something else going to happen ?
It would be good to provide a fix for this in 3.11, especially as we are
planning on reworking freebl .
Target Milestone: --- → 3.11
QA Contact: bishakhabanerjee → jason.m.reid
(In reply to comment #12) wherein Julien wrote:
> I was in favor of option a) also, until I read this discussion .

what option do you now favor?  option b?

You mentioned having c_finalize also unload freebl.  
Does doing so, or not doing so, change the right fix for this?
Right now, I favor leaving this P3/minor bug alone.
As ugly as this is, maybe we should make it a WONTFIX. I don't think it makes much sense to fix the leak for the freebl string and still leak the entire freebl shared library ;)

If we want to fix this right, we need to:

1) unload libfreebl in C_Finalize

2) make sure libfreebl is refcounted in the loader when it's loaded from libssl so it doesn't get unloaded early

3) think about the Java case some more. NSPR isn't safe for multiple shutdown/reinitializations. See the many bugs on PR_Cleanup in bugzilla for more info . When softoken is loaded from java, it would be a problem if libnspr also gets unloaded from the process. What happens is that when threads that had previously called NSPR functions end (which could have been created from the Java side), a thread termination callback gets invoked, but it points to an area of memory that has been freed, and then we get a coredump. The workaround for this problem that we have used in existing applications that don't implicitly load NSPR was to ... leak NSPR by dlopen'ing it so it never gets unloaded ... :) It doesn't make much sense to fix the libfreebl leak if we still have to leak libnspr, IMO . Right now libnspr is leaked by virtue of libfreebl being leaked.
This is not an issue in C applications whose main executable implicitly links with NSPR, but it is an issue in Java applications that dynamically load softoken.
So, IMO, fixing this right means that we need to fix many NSPR shutdown issues, and that is going to be a lot of work.
QA Contact: jason.m.reid → libraries
The new shutdown callback registration function for bug 326482
will be part of the solution for this bug, too, I think.
Depends on: 326482
Target Milestone: 3.11 → 3.11.1
retarget to 3.12 because of FIPS
Whiteboard: FIPS
Target Milestone: 3.11.1 → 3.12
Coverity CID 499
Whiteboard: FIPS → FIPS [CID 499]
When using the SSL bypass mode, this leak occurs a second time.
Re: comments 15/2 and 20 :
This leak now exists twice, once in libsoftoken, and once in libssl for bypass . But there is no need for refcounting in the loader, since there are actually two instances of the loader code as well. refcounting is done for us in the OS within dlopen () when libfreebl is loaded more than once .

NSC_Finalize would be a valid entry point to free the library structure and dlclose() from libsoftoken . In this case, there is no need for registering a shutdown callback as suggested in comment 16 .

Unfortunately, there is no equivalent shutdown entry point in libssl to do what's needed. We would need to register some kind of SSL shutdown function to do this, if we don't already.

Re: comment 15, I now think that this might make sense to fix even if we leak NSPR and separately from that problem. This leak shows up on every NSS init in a program that repeatedly initializes NSS and shuts down . The NSPR "leak" would only show as a one-time leak. Since NSPR doesn't support shutdown and unloading properly and may never do so, I would even advocate that NSPR be changed to always stay resident upon initialization (eg. by dlopen'ing itself to bump the refcount).
Wan-Teh,

Re: comment 7, do you foresee a solution that doesn't depend on the current implementation of PR_CallOnce / PRCallOnceType ? Either this should be made public, or maybe we could add a PR_ResetOnce(PRCallOnceType*) which would currently be implemented as memset ?
To fix this leak in the ssl bypass case is much harder. Even though we have an NSS facility to register a callback function at NSS shutdown time, there is no initialization entry point in libssl to register that callback !!! The way that freebl gets loaded in libssl is through the function table in loader.c which uses PR_CallOnce on freebl_RunLoaderOnce().

That function would be an appropriate place to call NSS_RegisterShutdown, except that we only want to do it when loading freebl from libssl, but not from softoken, since libsoftoken cannot depend on libnss . Right now the same freebl.a gets linked into both libsoftoken and libssl . So, this would entail having two slightly different versions of the freebl loader :-(

I think we can using a statically linked function callback to solve this. Each of libssl and libsoftokn would have their own. The libsoftokn one wouldn't do anything. I will attach a patch that does that.
NSS_RegisterShutdown will need to behave properly for this to be fixed, so I'm adding bug 353608 as a dependency of this bug.
Depends on: 353608
The best solution is to use platform-dependent techniques
such as filtee libraries or or load/unload the freebl3 DLL/shared
library in the DllMain and _init/_fini routines of the
softokn3 and ssl3 DLLs/shared libraries.  But this is a lot
of work.  So you can zero the PRCallOnceType variable to
restore it to the pristine state.  I suggest that you do it
like this:

    static PRCallOnce pristineCallOnce;
    PRCallOnce callOnce;

    ....
    /* reset */
    callOnce = pristineCallOnce;

The reason is that in pthreads, the equivalent pthread_once_t
needs to be initialized with the macro PTHREAD_ONCE_INITIALIZER.
So the equivalent pthread code would look like:

    static pthread_once_t pristineOnce = PTHREAD_ONCE_INITIALIZER;
    pthread_once_t once = PTHREAD_ONCE_INITIALIZER;

    ...
    /* reset */
    once = pristineOnce;
I'm changing the platform to "all/all", since we now have a libfreebl dynamic library on all platforms. I'm also changing the title to reflect the real issue which is not just leaking the little BLLib structure but the entire dynamic library.
OS: Solaris → All
Hardware: Sun → All
Summary: tiny one-time leak in freebl's loader.c → freebl dynamic library is never unloaded by libsoftoken or libssl. Also tiny one-time leak in freebl's loader.c
Attached patch Fix structure leak, and more (obsolete) — Splinter Review
1) Use a static structure for BLLib

2) Unload the freebl dynamic library in BL_Cleanup

3) Add an init callback functionality in the freebl loader called FREEBL_InitCallback . Both ssl and softoken must define this function in order to link. softoken defines an empty function since it already calls BL_Cleanup in C_Finalize . ssl uses this feature to register an NSS shutdown callback with NSS_RegisterShutdown to call BL_Cleanup .
Note that I chose arbitrary name and source files for the callback function names and source file locations.  Feel free to suggest better ones.

4) reset the vector pointer to NULL in BL_Cleanup . Otherwise, the sequence of C_Initialize, C_Finalize, C_Initialize will crash in the 2nd C_Initialize since vector is non-NULL, and freebl doesn't get reloaded . I found this out with pk11mode - no other NSS program tests this code sequence. We should definitely include Glen's pk11mode program in our QA .

However, this change may not be the right fix . I think the following code sequence which is repeated multiple times in loader.c doesn't make sense :

  if (!vector && PR_SUCCESS != freebl_RunLoaderOnce())
      return ;
  return (vector->p_RNG_RNGInit)();

If vector is non-NULL, we don't execute freebl_RunLoaderOnce , and thus we don't call PR_CallOnce. We just go directly to the next line and dereference vector. However, on platforms without total store order, vector may be non-NULL, but vector->p_RNG_RNGInit may be NULL . The only way to ensure that vector->p_RNG_RNGInit is also non-NULL is to always call freebl_RunLoaderOnce() first. So, the order of the 2 tests needs to be reversed . This will obviously impact performance, but it is the only thing that will make the code work reliably on non-TSO platforms, assuming freebl_RunLoaderOnce and PR_CallOnce are implemented correctly on all platforms (which the later is not : see bug 273649).

However, when considering the softoken's use of the loader only, we may decide after careful inspection of the PKCS#11 specification that a thread must first call C_Initialize before any other operations can succeed. Thus, it may be OK to test only vector and not call freebl_RunLoaderOnce() at all, since C_Initialize will always call it, and the program isn't supposed to have any pending PKCS#11 operations before C_Initialize succeeds.

Unfortunately, the same does not hold true for libssl's use of the loader in the bypass case. libssl does not have an initialization function like C_Initialize, and thus must truly be able to do a late dynamic loading of freebl . This freebl load from ssl currently happens only in programs that use bypass, the first time a socket makes use of the bypass feature. And it could happen in any number of threads - there is no SSL API requirement to begin doing a handshake first in one thread before other threads can do so . So, to fix this properly for ssl on non-TSO platforms, the loader linked to ssl should always call PR_CallOnce for each function in the table. This would obviously be a performance hit where it is least wanted, but it is needed for correctness, unless we add a libssl initialization function to always do the dynamic freebl load upfront .

5) I tested this fix and it passes all.sh on Solaris and Windows, both OS_TARGET=WINNT and OS_TARGET=WIN95. I also tested it with the pk11mode test program that Glen has been working on.

6) By plugging the leak of freebl3.dll, this fix also causes nspr4.dll to be unloaded from memory if softoken is used in non-NSPR programs .

Non-NSPR programs that unload the softoken on Windows will crash in one of the NSPR background threads since they are running code that is no longer in memory.
This is not technically a regression - this bug already existed in the NSS 3.10 softoken, and was only accidentally hidden in 3.11 by loading and leaking freebl3.dll, which caused nspr4.dll to be leaked also . It is not confirmed if unloading NSPR on other platforms than Windows with OS_TARGET=WINNT has any ill effects and which ones. I tested with OS_TARGET=WIN95 and it was OK . I couldn't test Unix because pk11mode doesn't have native code to load softoken - it uses NSPR's PR_LoadLibrary, and thus it isn't possible to get NSPR unloaded in pk11mode on Unix to check the effect of NSPR being unloaded .

We might choose to treat this NSPR unloading issue in a different bug than this one, as there are multiple possible solutions to it - either a change in softoken to leak NSPR, or a change in NSPR to leak itself. If we choose to fix it in NSPR, the NSPR bug should be made a dependency of this one. If we choose to fix it in softoken, the change can be made in another patch as part of this bug. Wan-Teh, I would like your opinion on this.
Attachment #239608 - Flags: superreview?(wtchang)
Attachment #239608 - Flags: review?(nelson)
Assignee: nelson → julien.pierre.bugs
Comment on attachment 239608 [details] [diff] [review]
Fix structure leak, and more

The callback related code in this patch is hard to
understand.  It's a high price to pay for this
leak, which is a one-time leak unless an
application loads and unloads libsoftokn3.so or
libssl3.so repeatedly (not to be confused with
calling C_Initialize and C_Finalize on
libsoftokn3.so repeatedly).

To relinquish control on key3db and cert8.db,
it's sufficient to call C_Finialize.  You don't
need to unload libsoftokn3.so.

I would call NSS_RegisterShutdown(SSL_BypassShutdown, NULL)
in SSL_ImportFD, which I believe any user of the
SSL library must call.  Alternatively, call
NSS_RegisterShutdown(SSL_BypassShutdown, NULL)
in SSL_OptionSet etc. when the SSL_BYPASS_PKCS11
option is enabled.

In freebl/blapi.h, we have

> typedef struct {
>     PRLibrary *dlh;
> } BLLibrary;
> 
>+static BLLibrary blLib;

It's time to eliminate the BLLibrary type and
just use PRLibrary.

> static BLLibrary *
> bl_LoadLibrary(const char *name)

Change this to return PRLibrary *.

> static PRFuncPtr
> bl_FindSymbol(BLLibrary *lib, const char *name)
> {
>     PRFuncPtr f;
> 
>     f = PR_FindFunctionSymbol(lib->dlh, name);
>     return f;
> }

Delete bl_FindSymbol.  Replace by direct
PR_FindFunctionSymbol calls.
 
> static PRStatus
> bl_UnloadLibrary(BLLibrary *lib)
> {
>     if (PR_SUCCESS != PR_UnloadLibrary(lib->dlh)) {
>         return PR_FAILURE;
>     }
>-    PR_Free(lib);
>+    lib->dlh = NULL;
>     return PR_SUCCESS;
> }

Similarly, delete bl_UnloadLibrary and call
PR_UnloadLibary directly.
Wan-Teh,

Re: comment 28,

I agree with you that it is not strictly required to unload libfreebl if the application only calls C_Initialize and C_Finalize without repeatedly loading/unloading the libsoftokn3 PKCS#11 module. Another possibility would be to unload freebl in a library unload callback (DllMain, etc) as you  mentioned in comment 25 . But it is also much more complicated to implement that way.
Also, the PKCS#11 specification defines C_Finalize's job to be "clean up miscellaneous Cryptoki-associated resources" in section 6.9, table 8 . I think that libfreebl3.so qualifies as one of those resources to be cleaned up . So IMO, it is appropriate and desirable to unload libfreebl in C_Finalize .

I like your suggestions about calling NSS_RegisterShutdown somewhere else for SSL_BypassShutdown are good . I know the callback mechanism in the patch is not elegant. But registering in SSL_OptionSet or SSL_ImportFD runs the risk of having repeated acquisitions of the shutdown function list lock, as opposed to doing it as part of BL_Init which will be done only once. The solution may be to do a PR_CallOnce on the routine that will call NSS_RegisterShutdown, and have SSL_BypassShutdown reset the once object . It is probably better to do it in SSL_OptionSet(SSL_BYPASS_PKCS11, ...) which will be less common than SSL_ImportFD . This way, applications that don't use bypass (most or all of them today) will continue not to unnecessarily load libfreebl from libssl.

Re: BLLibrary vs PRLibrary, that change isn't required to fix the leak, but I agree with you that it should be done - it probably should have been done last year when I converted the loader to be 100% NSPR and eliminated the native code .
Attached patch Incorporate Wan-Teh's feedback (obsolete) — Splinter Review
1) This patch no longer has the freebl init callback mechanism that the previous one did.
2) I cleaned up the unnecessary wrapper functions in loader.c
3) I added some error checking for the new functions in ssl, which are now in sslsock.c and invoked only when SSL_OptionSet(sock, SSL_BYPASS_PKCS11, PR_TRUE) or SSL_OptionSetDefault(SSL_BYPASS_PKCS11, PR_TRUE) are called . If servers are using the model socket option with SSL_OptionSet, SSL_BypassSetup should be a one-time call only .
Attachment #239608 - Attachment is obsolete: true
Attachment #239727 - Flags: superreview?(wtchang)
Attachment #239727 - Flags: review?(nelson)
Attachment #239608 - Flags: superreview?(wtchang)
Attachment #239608 - Flags: review?(nelson)
Comment on attachment 239727 [details] [diff] [review]
Incorporate Wan-Teh's feedback

I agree with 99% of this patch, but there are a few things that 
must be changed.

1. Any time we assert that a pointer is non-NULL, and that assertion
is NOT followed by code that tests the same condition as the assert 
and handles it, we create a klocwork bug report.

So in BL_Cleanup, we need a line that reads "if (blLib)" following
this new assertion:

>+  PORT_Assert(blLib);
>+  PR_UnloadLibrary(blLib);
>+  blLib = NULL;

2.  SSL_OptionSet should treat any non-zero value as true, and
not only treat PR_TRUE as true.  Tests like this one:

>+            if (PR_TRUE == on) {

should either be 
              if (on) {
or
              if (PR_FALSE != on) {

I stronly perfer the former.  Remember, the c compiler doesn't 
enforce that PRBools must contain only PR_TRUE or PR_FALSE.  
Remember the principle of least astonishment.
Attachment #239727 - Flags: review?(nelson) → review-
Nelson,

re: comment 31

1) BL_Cleanup is a function that returns void . I thought about adding an if (blLib) test after the assertion I added, but there just isn't any action I could think of doing for optimized code, and I didn't want to go through the trouble of changing the BL_Cleanup return type. IMO, this is one of the cases where it is appropriate to have an assertion without a test. We should find a way to teach klocwork about that, or mark its report as invalid if it comes.

2)

a) Even though the C compiler does not strictly enforce enum values, IMO, any code that assigns values other than PR_TRUE and PR_FALSE to a PRBool, or depends on them, should be changed to use a different type that is explicitly defined to allow other values. That includes some broken code in SSL_OptionSet for certain. SSL options. Unfortunately, this is a public API, so the type argument cannot easily be changed. However, the only legal values defined for SSL_BYPASS_PKCS11 are also the only legal values for PRBool, which are PR_TRUE and PR_FALSE .

b) if (on)
is the equivalent of
if (0 != on)
That code is making an assumption that 0 is one of the legal values in the enum, which may or may not be the case. The code shouldn't be aware of the particular values in the enum, and only name the constants. If the values assigned to enum were ever to change, or if 0 is not one of the legal values, if (on) will not do what's intended.

c) I could live with

if (PR_FALSE != on)

I assume you prefer it because it means :

if (PR_TRUE == on) || (any_other_value_defined_in_the_future_for_SSL_BYPASS_PKCS11_but_also_illegal_value_for_PRBool == on)

IMO, it is not good practice to use other values than PR_TRUE and PR_FALSE . The precedent set for SSL_REQUIRE_NEVER / SSL_REQUIRE_ALWAYS / SSL_REQUIRE_FIRST_HANDSHAKE / SSL_REQUIRE_NO_ERROR should be an exception, one that should not be perpetuated in other SSL options .

But if we ever want to do that horrible thing again, then the code that examines the value of "on" in SSL_OptionSet and SSL_OptionSetDefault will have to be changed in order to allow for other values and do other things with them, and that means the code in my patch will have to be changed anyway. We can make that change then, and hopefully never. In the meantime, for SSL_BYPASS_PKCS11,
if (PR_FALSE != on)
is the equivalent of
if (PR_TRUE == on)
Julien, many text books on good programming teach against the practice
of having two boolean values in a variable that can hold more than two 
values.  They include examples that look approximately like this:

   if (value == 0) 
         do false case
   if (value == 1)
         do true case
and then point out: what happens if value == 2?
They teach that the proper solution is to pick one value and always test
for equality or inequality to that.  All values equal to that value are
treated one way, all values not equal to it are treated the other. 

Therefore, for SSL, I have chosen the value zero as the one value.
PR_FALSE is zero by definition.  Don't ask me to consider that PR_FALSE
might be defined to be some non-zero value.
Julien, Are you saying that it is appropriate to crash if blLib is NULL
when BL_Cleanup is called in non-debug builds?
Re: comment 34, there is no crash in non-debug builds that would be avoided by testing blLib . If blLib is NULL, PR_UnloadLibrary will fail and set the PR_INVALID_ARGUMENT_ERROR error code. Thus, blLib being NULL is only an error for BL_Cleanup, but it has no way to report that error anywhere given that it is a void function.
1) I decided to still add the check for blLib, because the implementation for PR_UnloadLibrary is in a different module, and may still crash in a different version of NSPR than the one for which I checked the source, however unlikely that may be.

2) I added assertions in loader.c that PR_UnloadLibrary succeeded. I hope no code checking tools will complain about that !

3) re: comment 33, the previous patch didn't include
if (on == false_value ) { do_false_case }
if (on == true_value ) { do_true_case }

But rather :

if (on == true_value ) { do_true_case } else { do_false_case }

This means either do_true_case or do_false_case are always executed.

You object and recommend using instead :

if (on != false_value { do_true_case } else { do _false_case }

In both cases, either the true or false code path is always executed.
If on is a random_value distinct from true_value and false_value, then my test construct would call do_false case, while yours would call do_true case.

Given that 2 (or any random_value not PR_TRUE or PR_FALSE) is an undefined value for PRBool, I don't agree that it is preferrable to call do_true_case over do_false_case for undefined values . I have implemented your suggestion in the interest of getting the patch out quickly. It is no worse and no better than the original code.

If you want to be strict, an undefined input value needs to be handled as a third case, an error case, ie:

if (on == true_value ) { do_true_case } else if (on == false_value) { do_false_case } else { do_error_case };

I think that would clearly be a better solution than either my original patch or your suggestion, but it is also a change that should be made for all options that take a PRBool, ie. all of them except SSL_REQUIRE_CERTIFICATE, I believe. I only wanted to change the implementation for SSL_BYPASS_PKCS11 in this patch, so I abstained from making that change.

Lastly, not everyone commits the numeric value of every enum to memory. That's the compiler's job. I personally find the following to be unreadable code :
if (on)
When on is a PRBool, this actually means :
if ((PRBool) 0 != on)
That is confusing unless you know the definition of PRBool . To make sense of this expression requires the reader to check prtypes.h to figure out that PR_FALSE is 0, and that the expression actually means :
if (PR_FALSE != on )

IMO, all code using PRBool should be independent of the numeric values assigned to PR_FALSE and PR_TRUE. The only assumption that is reasonable to make which is obvious from the names PR_FALSE and PR_TRUE is that they are distinct values. I realize a lot of existing NSPR code doesn't comply with this, but for new code, it is easy to comply by always spelling out the enum names rather than referring to numeric values such as 0.
Attachment #113765 - Attachment is obsolete: true
Attachment #239727 - Attachment is obsolete: true
Attachment #240377 - Flags: superreview?(wtchang)
Attachment #240377 - Flags: review?(nelson)
Attachment #239727 - Flags: superreview?(wtchang)
Comment on attachment 240377 [details] [diff] [review]
Incorporate Nelson's feedback

Thanks, Julien.  r=nelson
Attachment #240377 - Flags: review?(nelson) → review+
Comment on attachment 240377 [details] [diff] [review]
Incorporate Nelson's feedback

r=wtc.  Please make the following changes.

In freebl/loader.c

>+static PRLibrary* blLib;

(Optional) in C, it's more common to say "PRLibrary *blLib".

In bl_LoadLibrary

>-    if (NULL == lib->dlh) {
>+    if (NULL == lib) {
> #ifdef DEBUG_LOADER
>         PR_fprintf(PR_STDOUT, "\nLoading failed : %s.\n", name);
> #endif
>-        PR_Free(lib);
>         lib = NULL;
>     }

Delete "lib = NULL;".

>+static PRCallOnceType once;

(Optional) use a longer variable name for 'once' since now it's in
the file scope.
 
In BL_Cleanup

>+  PORT_Assert(blLib);
>+  if (blLib) {
>+      PRStatus status = PR_UnloadLibrary(blLib);
>+      PORT_Assert(PR_SUCCESS == status);
>+      blLib = NULL;
>+  }

(Optional but recommended) I would remove the assertion and the
if (blLib) test.  You set 'vector' and 'blLib' together in
freebl_LoadDSO, so it doesn't make sense to assert/test 'blLib'
but not assert/test 'vector' in BL_Cleanup.

In ssl/sslsock.c

>+static PRCallOnceType pristineCallOnce;

Add 'const'.

>+static PRCallOnceType once;

(Optional) use a longer variable name.

>-	    ss->opt.bypassPKCS11   = on;
>+            if (PR_FALSE != on) {
>+                if (PR_SUCCESS == SSL_BypassSetup() ) {
>+                    ss->opt.bypassPKCS11   = on;
>+                } else {
>+                    rv = SECFailure;
>+                }
>+            } else {
>+                ss->opt.bypassPKCS11   = PR_FALSE;
>+            }

(Optional) you can rewrite the code like this:

    if (PR_FALSE != on && PR_SUCCESS != SSL_BypassSetup()) {
        rv = SECFailure;
        break;
    }
    ss->opt.bypassPKCS11 = on;

>-	ssl_defaults.bypassPKCS11   = on;
>+        if (PR_FALSE != on) {
>+            if (PR_SUCCESS == SSL_BypassSetup()) {
>+                ssl_defaults.bypassPKCS11   = on;
>+            } else {
>+                return SECFailure;
>+            }
>+        } else {
>+            ssl_defaults.bypassPKCS11   = PR_FALSE;
>+        }

(Optional) similarly, you can rewrite the code like this:

    if (PR_FALSE != on && PR_SUCCESS != SSL_BypassSetup()) {
        return SECFailure;
    }
    ssl_defaults.bypassPKCS11 = on;
Attachment #240377 - Flags: superreview?(wtchang) → superreview+
Comment on attachment 240377 [details] [diff] [review]
Incorporate Nelson's feedback

I also recommend that you change "PR_FALSE != on" to "on".

When we edit someone else's code, we should imitate his style.
If you look the code before and after your changes, you will
see a lot of tests like "if (on)" and "if (on && ...)".  Now
the new "PR_FALSE != on" tests stand out.
Thanks for the reviews, Nelson and Wan-Teh .

I removed the unnecessary lib = NULL assignment, and renamed the once variables.

Checked in to the trunk :

Checking in freebl/loader.c;
/cvsroot/mozilla/security/nss/lib/freebl/loader.c,v  <--  loader.c
new revision: 1.30; previous revision: 1.29
done
Checking in ssl/sslsock.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsock.c,v  <--  sslsock.c
new revision: 1.49; previous revision: 1.48
done

There is a conflict on the NSS_3_11_BRANCH in loader.c which I'm trying to resolve. I will also attach the patch for that branch when checked in.
Do you need to fix this bug on the NSS_3_11_BRANCH?
I'll leave it up to you.  Just wanted to remind you
that we need to be strict on NSS 3.11 branch checkins.
Wan-Teh,

Yes, I would like to fix it on the branch as well because a lot of application groups are doing leak checks. It is better if they don't see this leak and constantly come back to complain to us. I found that the conflict in loader.c is due to bug 338798 . This is very confusing because the bug was marked as fixed in 3.11.2, but only parts of the changes went to NSS_3_11_BRANCH, and other parts to the trunk only. The loader patch only went to the trunk.
Checking in freebl/loader.c;
/cvsroot/mozilla/security/nss/lib/freebl/loader.c,v  <--  loader.c
new revision: 1.26.2.3; previous revision: 1.26.2.2
done
Checking in ssl/sslsock.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsock.c,v  <--  sslsock.c
new revision: 1.44.2.5; previous revision: 1.44.2.4
done
I'm marking this bug fixed, 5 years after it was first opened, finally ;). But the story does not end here. I opened several bugs for the pre-existing problems found during the creation of this fix :

1) NSPR unloading will cause a crash
Since we didn't decide how to proceed to fix this, I created 2 bugs
Bug 354613 is about fixing this problem only for softoken .
Bug 354614 is about fixing it the problem in NSPR itself.
If we fix the later, it would automatically fix the former. We may fix the former first and the later in the future.

2) problem with incorrect PR_CallOnce usage in loader.c
Bug 354609
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Target Milestone: 3.12 → 3.11.4
Comment on attachment 240377 [details] [diff] [review]
Incorporate Nelson's feedback

This patch will result in the real BL_Cleanup (defined
in lib/freebl/rsa.c) being called twice at NSS shutdown
if SSL_BYPASS_PKCS11 is enabled.

Right now the real BL_Cleanup is not reference counted.
Fortunately, the only function that BL_Cleanup calls,
RSA_Cleanup, has a test to prevent double frees.  We
may want to add a comment to the real BL_Cleanup to
remind ourselves that BL_Cleanup may be called twice at
NSS shutdown and any function BL_Cleanup calls must
prevent double frees.

We also need to verify that the softoken cannot call BL_Cleanup
while the SSL library is using freebl, and vice versa.
(One scenario I checked is that the application has set
the ssl_defaults.bypassPKCS11 option to true and performs
a FIPS/non-FIPS mode switch.)  My code inspection showed
that we're also safe here.

If you didn't consider these issues when you wrote or
reviewed this patch, you may want to do another review.
Another idea is to add a new function BL_Unload.

 void 
 BL_Unload(void)
 {
   if (blLib) {
       vector = NULL;    
       PRStatus status = PR_UnloadLibrary(blLib);
       PORT_Assert(PR_SUCCESS == status);
       blLib = NULL;
       once = pristineCallOnce;
   }
 }

BL_Cleanup would be changed back.

The softoken would call BL_Unload after calling BL_Cleanup.

The SSL library would call BL_Unload instead of BL_Cleanup.
Wan-Teh,

I did consider the issue and concluded that calling BL_Cleanup twice was safe. The SSL library only needs to unload freebl at NSS_Shutdown time, it does not need to cleanup the freebl internal state which is shared with softoken. This may cause problems in the future depending on the order of execution if BL_Cleanup is ever made a non-void function to check for errors. I like the idea of a separate BL_Unload function and will implement this, so I'm reopening this bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Now that the unloading happens in a separate function, the freebl library is no longer guaranteed to have been previously loaded when BL_Unload gets invoked, as was the case in BL_Cleanup.

SSL_BypassShutdown may legitimately call BL_Unload without freebl having ever been loaded, because it gets registered when an SSL socket gets configured for SSL_BYPASS_PKCS11, rather than when libssl actually uses bypass and calls one of the freebl functions (as in attachment 239608 [details] [diff] [review]). If the SSL socket is configured for SSL_BYPASS_PKCS11 and then destroyed, SSL_BypassShutdown gets registered, but freebl is never loaded.

So, at NSS_Shutdown time, SSL_BypassShutdown will call BL_Cleanup but blLib will be NULL. Because of this case, I had to remove the PORT_Assert(blLib) which existed in BL_Cleanup from BL_Unload.

The upside is that freebl will no longer get loaded and immediately unloaded in SSL_BypassShutdown / BL_Cleanup in this case.
Attachment #240536 - Flags: superreview?(wtchang)
Attachment #240536 - Flags: review?(nelson)
Attachment #240536 - Flags: superreview?(wtchang) → superreview+
It's not necessary to separate BL_Unload from BL_Cleanup.  
Just change BL_CLeanup to do this:

-  if (!vector && PR_SUCCESS != freebl_RunLoaderOnce())
-      return;
-  (vector->p_BL_Cleanup)();
+  if (vector)
+     (vector->p_BL_Cleanup)();

then to the unload code in the function after that.  This also solves
the problem seen where we sometimes load freebl just to clean it up. :)
Nelson,

Re: comment 49,

No, it isn't strictly required currently. However, your patch does not solve the problem that we don't want to call 

(vector->p_BL_Cleanup)();

from SSL_BypassShutdown .

Even though it is currently safe to call this function twice, logically that operation is only supposed to be performed once, from softoken's C_Finalize.

attachment 240536 [details] [diff] [review] solves that problem .
Comment on attachment 240536 [details] [diff] [review]
Incremental patch. Move unloading code out of BL_Cleanup and into BL_Unload

I wish we didn't have to add another new public function to blapi.h to solve this.  
But I don't have a better solution to offer at this time.  
So, r=nelson
Attachment #240536 - Flags: review?(nelson) → review+
Wan-Teh, Nelson,

Thanks for the reviews. Nelson, it would be possible not to add a separate function if BL_Cleanup was changed to take an argument such as PRBool unloadOnly . But this would amount to have BL_Cleanup not call the real cleanup function, so I think it would be confused, and thus attachment 240536 [details] [diff] [review] is better, IMO .

I checked it in to the tip :

Checking in freebl/blapi.h;
/cvsroot/mozilla/security/nss/lib/freebl/blapi.h,v  <--  blapi.h
new revision: 1.25; previous revision: 1.24
done
Checking in freebl/loader.c;
/cvsroot/mozilla/security/nss/lib/freebl/loader.c,v  <--  loader.c
new revision: 1.31; previous revision: 1.30
done
Checking in softoken/pkcs11.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v  <--  pkcs11.c
new revision: 1.134; previous revision: 1.133
done
Checking in ssl/sslsock.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsock.c,v  <--  sslsock.c
new revision: 1.50; previous revision: 1.49
done

And to NSS_3_11_BRANCH :

Checking in freebl/blapi.h;
/cvsroot/mozilla/security/nss/lib/freebl/blapi.h,v  <--  blapi.h
new revision: 1.23.2.2; previous revision: 1.23.2.1
done
Checking in freebl/loader.c;
/cvsroot/mozilla/security/nss/lib/freebl/loader.c,v  <--  loader.c
new revision: 1.26.2.4; previous revision: 1.26.2.3
done
Checking in softoken/pkcs11.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v  <--  pkcs11.c
new revision: 1.112.2.19; previous revision: 1.112.2.18
done
Checking in ssl/sslsock.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsock.c,v  <--  sslsock.c
new revision: 1.44.2.6; previous revision: 1.44.2.5
done

I'm closing this bug once again.
Status: REOPENED → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → FIXED
A drawback of adding BL_Unload is that it is tied to the
current implementation, which dynamically loads the freebl
shared library on all platforms.  It is possible to have
a different implementation that doesn't need to load the
freebl shared library dynamically.  For example
- if the platform has only one freebl shared library, we
  could directly link with it.
- on Solaris we could link with a default freebl shared
  library and use the filter/filtee library method.

Also the commands such as bltest and fipstest that use
the freebl API would also need to call BL_Unload.  (Right
now they don't even call BL_Cleanup.)

If we don't add BL_Unload, we need to add a comment that explains
the rules of calling BL_Cleanup when there are more than one
users of the freebl shared library.  It must be safe to
call BL_Cleanup more than once.
There are two kinds of shared libraries on Mac OS X.
1. Dynamic shared library (.dylib).  We usually link
"statically" with a dynamic shared library even though
it can also be loaded at run time.  All of our shared
libraries except libnssckbi.dylib are dynamic shared
libraries.

2. Loadable bundle.  This can only be loaded at run time.

NSPR 4.6.x or earlier has a bug in PR_UnloadLibrary.  If
the shared library is a dynamic shared library, PR_UnloadLibrary
incorrectly returns PR_FAILURE (bug 351609).  This bug has
been fixed on the NSPR trunk (NSPR 4.7 pre-release).

Since libfreebl3.dylib is now built as a dynamic shared library,
we get an assertion failure in BL_Unload at loader.c:996.

There are three solutions.

1. comment out the assertion for Mac OS X.

2. backport the NSPR fix to the NSPR_4_6_BRANCH.  This requires
releasing NSPR 4.6.4 and getting mozilla.org drivers' approval
for MOZILLA_1_8_BRANCH checkin (because MOZILLA_1_8_BRANCH is
kept in sync with NSPR_4_6_BRANCH right now).

3. build libfreebl3.dylib as a loadable bundle.  I am nervous
about making such a change in a patch release even though my
testing shows this change works.

This patch contains solutions 2 and 3.  Let me know which solution
(1, 2, or 3) you prefer.
Attachment #241391 - Flags: superreview?(nelson)
Attachment #241391 - Flags: review?(julien.pierre.bugs)
Wan-Teh,

I think we should keep the assertion in the loader because it detects a real bug.
I would prefer if you just backported the fix from NSPR 4.7 to NSPR_4_6_BRANCH . We can release NSPR 4.6.4 when NSS 3.11.4 or 3.11.5 ships . There are a few other bugs that need to be fixed in NSPR 4.6.4 as well that I'm working on.
Comment on attachment 241391 [details] [diff] [review]
Patch for Mac OS X

Either of the changes in this patch is acceptable to me, but I prefer the prlink.c solution .
Attachment #241391 - Flags: review?(julien.pierre.bugs) → review+
Comment on attachment 241391 [details] [diff] [review]
Patch for Mac OS X

OK, I checked in the NSPR change in this patch on the
NSPR_4_6_BRANCH for NSPR 4.6.4.  See bug 351609.
Attachment #241391 - Flags: superreview?(nelson)
Comment on attachment 241391 [details] [diff] [review]
Patch for Mac OS X

I checked in the NSS change in this patch on the NSS
trunk (NSS 3.12) only.  libfreebl3.dylib should be
built as a loadable bundle just like libnssckbi.dylib
because it is only dynamically loaded.  We don't generate
the import library freebl3.lib for freebl3.dll on Windows
for the same reason.
Comment on attachment 241391 [details] [diff] [review]
Patch for Mac OS X

I backed out the NSS change in this patch on the NSS trunk (NSS 3.12)
because the test program bltest couldn't locate and load libfreebl3.dylib.

Checking in config.mk;
/cvsroot/mozilla/security/nss/lib/freebl/config.mk,v  <--  config.mk
new revision: 1.18; previous revision: 1.17
done
(In reply to comment #59)
> (From update of attachment 241391 [details] [diff] [review])
> I backed out the NSS change in this patch on the NSS trunk (NSS 3.12)
> because the test program bltest couldn't locate and load libfreebl3.dylib.
> 
> Checking in config.mk;
> /cvsroot/mozilla/security/nss/lib/freebl/config.mk,v  <--  config.mk
> new revision: 1.18; previous revision: 1.17
> done
> 
I verified that bltest tests now run in 32 bit mode on MAC OS X 10.4.8 on 2x2.66 GHz Dual-Core Intel xeon. The bltest runs in 64 bit mode except all of the RC4 tests fail.

in both the 32 bit and 64 bit builds I found failures with 
tstclnt: error looking up host: A directory lookup on a network address has failed. These errors may be due to my environment I am using DHCP, and will look into this.

I will open a MAC OS X 64 bit RC4 bug and also a bug for 64 bit pk11mode failure. 


comment 59 says the fix was backed out, so I am reopening this bug. 
If it really still is fixed, please explain how that can be with the 
fix backed out.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The patch (attachment 241391 [details] [diff] [review]) contains two independent changes.
Either one is sufficient to fix the PR_UnloadLibrary failure on
Mac OS X.  So it is fine to back out one of the changes because
it introduced a new problem with the test program bltest.
Status: REOPENED → RESOLVED
Closed: 18 years ago17 years ago
Resolution: --- → FIXED
does this fix unload a dll on windolls?

if a dll is unloaded on windows, this may be exploitable via any crash like null deref and exceptions handlers.

in a bug dveditz claimed you *never unload dlls*.
Georgi, NSS has a way to unload shared libraries on all platforms, 
and (IINM) FireFox has UI with which to do it.  If you have an issue
with this, please file a new bug.
(In reply to comment #64)
> Georgi, NSS has a way to unload shared libraries on all platforms, 
> and (IINM) FireFox has UI with which to do it.  If you have an issue
> with this, please file a new bug.
> 

well, i don't do windows, so i can't file a bug.

just wanted to point out that *iirc* dveditz claimed in a bug i can't access anymore firefox doesn't unload DLLs.

Article:
http://uninformed.org/index.cgi?v=4&a=5
Exploiting the Otherwise Non-exploitable on Windows

From the article:
http://uninformed.org/index.cgi?v=4&a=5&p=9
To gain control of the top-level UEF, Fx and Gx will need to be deregistered asymmetrically. To accomplish this, DLL #1 must be unloaded before DLL #2.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: