Closed Bug 385151 Opened 17 years ago Closed 17 years ago

Remove the link time dependency from NSS to Softoken

Categories

(NSS :: Libraries, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: KaiE, Assigned: KaiE)

Details

Attachments

(1 file, 6 obsolete files)

For several scnearios it helps to remove the link time dependency from NSS to Softoken.

More explanations later.
Attached patch Patch v1 (obsolete) — Splinter Review
This patch works for me on Linux and Mac.
Not yet tested on Windows.
Comment on attachment 269053 [details] [diff] [review]
Patch v1

I don't object in principle to building nss3 wihtout linking it
to softoken, and to dynamically loading softoken by nss3.
But I do object to doing it this way, with this loader.c-like code.  

There already is code to dynamically load PKCS#11 modules.  It's in
function NSC_ModuleDBFunc.  IMO, the right solution is to move
that function out of libsoftoken into libnss3.  If NSC_ModuleDBFunc
is moved to libnss3, then there is no need for this loader-like
code in libnss3.  That's the right architectural design, IMO.

The only reason that NSC_ModuleDBFunc was in libsoftoken in the first
place was that libDBM was linked into libsoftoken, and we didn't want
to link two copies of it.  But now, dbm is in its own shared lib.
So, there is no reason remaining to keep NSC_ModuleDBFunc in libsoftken.
And it can be moved to libnss3.  

I am really opposed to using a loader-like hack to load softoken, so 
that softoken can then do the dynamic loading of all OTHER PKCS#11 
modules.  Clearly, the dynamic loading of all PKCS#11 modules should
all be done the same way, in one place, especially now tht there remains
no reason to continue to special case softoken (because dbm is in its 
own shared lib).

Again, the issue isn't "do we or don't we dynamically load softoken?"
It is: do we implement YET ANOTHER dynamic linking method when we 
already have a well written one for loading PKCS#11 modules?  
Why not use the good one to load softoken in the first place ?
Comment on attachment 269053 [details] [diff] [review]
Patch v1

This patch seems to contain no code to unload softoken at shutdown.  
For programs that load and unload NSS during execution, that would 
be a rather huge leak of process address space.
Attached patch Patch v2 (obsolete) — Splinter Review
This patch only activates the decoupling of libsoftoken from libnss if NSS_DECOUPLE_SOFTOKEN is defined.
Attachment #269053 - Attachment is obsolete: true
Attached patch PSM patch v3 (obsolete) — Splinter Review
While this is a NSS bug, I take the liberty to attach a patch against PSM that enables building of this decoupled mode.
Attached patch PSM Patch v4 (obsolete) — Splinter Review
Better PSM patch.

We should make sure that Mozilla no longer specifies softokn on its link command lines. Therefore this patch changes mozilla/configure.in
Attachment #269060 - Attachment is obsolete: true
I tested the combination patch v2 + PSM patch v4 

I was able to build successfully on:
- Windows
- Mac PPC (build universal)
- Mac Intel (build univeral)

I also did a Thunderbird build on Mac, and that worked, too.

I confirmed that binaries on Windows and Mac no longer reference the softoken library.
(In reply to comment #3)
> This patch seems to contain no code to unload softoken at shutdown.  
> For programs that load and unload NSS during execution, that would 
> be a rather huge leak of process address space.  

It is correct, the patch has no code to unload softoken.

However, it does have code that ensure it will only get loaded once:
  status = PR_CallOnce(&loadSoftokenOnce, &softoken_LoadDSO);

When NSS gets shutdown and initialized again, we will still have the DSO loaded and resuse it.

When searching the NSS sources, I see various locations that have a comment saying
 /* do not unload the internal module */
Kai, if NSS3 itself gets unloaded, then the memory that contains the content
of &loadSoftokenOnce will be deallocated, causing the knowledge that
the function was already called once to be lost.  When NSS is reloaded
that function will get called again.  Now, the good news is that the OS
will not load the same OS multiple times (IINM).  But while NSS is 
unloaded, the space occupied by the softoken shared library and its 
associated data segment will be effectively leaked.

I will say again here what I said in IM last night:
I think that dynamically loading softoken, just like any other PKCS#11 module,
by NSS3 would be a good thing, on all platforms, and not just in PSM builds.  
But it needs to be done just like other PKCS#11 modules, and it needs to be 
unloaded when we're done with it.
I agree with Nelson in that if we load softoken dynamically, we also need to unload it explicitly just like any other PKCS#11 module at shutdown.

I also don't object to doing this decoupling all the time.

Re: the loader-like approach do dynamic loading, I am afraid we may need something like this in order to properly locate the softoken in the same location as libnss.

However, I am against duplicating the code from loader.c because that is not sustainable. There should be only one source for the loader, and it should be usable by either softoken to load freebl, or by nss to load softoken.

There is also the question of whether we want to be able to operate NSS without or softoken at all. We have actually had some internal requests to do that and use the Solaris Crypto Framework exclusively instead. This decoupling would be a good opportunity to have options to allow it.

I think this probably calls for 

1) another NSS_Initialize flag . NSS_INIT_NO_NSS_SOFTOKEN ?
2) if NSS is in the default mode where it needs NSS softoken, and it tries to load NSS softoken, but it can't be successfully found through secmod.db, then it needs to fall back to the loader-type logic to find it
Attached patch NSS Patch v5 (obsolete) — Splinter Review
a) 
We all seem to agree that "decouple by default" is a good idea.

(In this new patch I still drive it using a define NSS_DECOUPLE_SOFTOKEN, but we can easily remove that)


b)
I added code to unload the library.
When I worked on it, I saw that NSS is loading the internal module twice on init.
(I guess NSS has a good reason to do so, and I don't want to change that now.)

Therefore I introduced a load counter. We'll unload when the load counter reaches zero.


c)
As Bob suggested, I tested this patch in the client and switched between FIPS and non-FIPS a couple of times during the same sessions, and I had SSL pages loading while I switched. Seems to work fine.


d)
In this new patch I'm no longer duplicating functions.
I generalized the parameters and comments in the existing loader functions and moved them to a new file genload.c

The file genload.c is meant to be included like
  #include "genload.h"

genload.c lives in security/nss/lib/freebl/
I believe that's right, because it must be part of the crypto code.

These functions are required in security/nss/lib/pk11wrap, so I'm using a cross-directory include statement:
  #include "../freebl/genload.c"


e)
At this time I would like to defer Julien's proposal to introduce a mechanism to support a mode without softoken, I think this can be worked on as a separate task.
Assignee: nobody → kengert
Attachment #269059 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #269285 - Flags: review?(nelson)
It's not at all clear to me that there is an absolute requirement that 
softoken be co-located with libnss3.  

Loader contains code to ensure that freebl is co-located with softoken,
and that's appropriate because those two files are released together and
are tightly bound together (in terms of interface features).  One cannot 
mix softoken from one release with freebl of another.

But we've already agreed that it IS POSSIBLE and DESIRABLE to sometimes
use softoken/freebl from OLDER releases with newer NSS3 libraries.  That 
says to me that there is no need for softoken to be tightly co-resident with 
libnss3. There is no point in separating the versions of NSS and Softoken
if the separate versions cannot be used separately.

I think the requrement is that we must be able to find the softoken that
the user wants to use.  We give the user the tools with which to configure
the location of that, or any other, PKCS#11 module.  Why should we not 
honor the user's configuration, even if it says that softoken is elsewhere
than libnss3?  
Nelson,

I believe we should first try to load softoken from secmod.db, and if we can't find it there, then try to load it from the same location as libnss. This is what I proposed in comment 10 part 2. Any objection to that ?
Major objections:
1) No secmod.db to date has every identified where softoken is.
2) Secmod.db is going away.

If you have a requirement to 'run without softoken', you need to write a bug and explain *Exactly* what it is you want to do and why. Running without softoken is a lot harder than mucking with what gets loaded at startup. The current architecture allows us enough flexibility to tweak these things relatively easily, but the rest of NSS would not be happy is softoken isn't there unless it's replaced by a completely compatible duplicate.


Comment on attachment 269285 [details] [diff] [review]
NSS Patch v5

I'm passing this review request to Julien.  I won't be available to do any reviews until July 2.
Attachment #269285 - Flags: review?(nelson) → review?(julien.pierre.boogz)
Bob, Kai,

Re: comment 11 e and 14, I agree the "NSS mode without softoken" should be a separate bug.

Re: the currently attached patch, I am not sure why we would want to do this decoupling optionally. Bob, Is there any reason not to always do it that way ?

Re: comment 12, Nelson makes a good point that we may not want to always load the softoken from the libnss location. A customer may want to use a different version of softoken, like the FIPS-approved version. If secmod.db is going away, which other way do you suggest we locate it ? 
(In reply to comment #16)
> Re: the currently attached patch, I am not sure why we would want to do this
> decoupling optionally. Bob, Is there any reason not to always do it that way ?

I'm find to do it always.
I've only made it optional for my personal enjoyment, but I'm fine to switch it to always.


> Re: comment 12, Nelson makes a good point that we may not want to always load
> the softoken from the libnss location. A customer may want to use a different
> version of softoken, like the FIPS-approved version 

Fine with me, but why do everything at one step?
As of today, there is a hard tie between libnss3 and libsoftokn3.
This patch makes a helpful first step to separate them and works as is.

If you would like to enhance it further and make the libs completely independent, why not postpone that enhancement to a separate step in a separate bug?

Kai,

What's the benefit of checking in your patch if we are still always loading the very same softoken library from the same place as before, except on-demand rather than implicitly ?
Julien, as I understand it, this patch solves two separate problems with
one blow.  They are (in no particular order):

1) The rebase-like problem on RH Linux, and 
2) A linker problem for firefox on some platforms (Mac & ?), where it appears 
that two copies of sqlite3 are being linked into the address space.  

1) Linux has a tool analogous to Windows' "rebase" and Irix's "requickstart".
In linux this is known as "prelinking" (as I understand it), which (IMO) is
a rather unfortunate choice of terms.  It modifies the address at which a 
shared library is loaded into VM by default.  It writes the modified address 
into the shared library, and thus breaks pre-existing signatures on the shared 
lib.  When a program is run that has been rebased, and all its dependent 
shared libraries have been rebased, it starts up more quickly (perhaps MUCH
more quickly) than if ANY component has not been rebased.

If a library has not been "prelinked", then no other shared lib or executable 
that is linked against it can be prelinked.  So, if softokn3 cannot be prelinked, then libnss3 cannot be prelinked, and libSSL3 cannot be prelinked, 
and any programs that link with NSS cannot be prelinked.  

When a system library is not prelinked, it may cause a very large part of
the system's code to be inelligible for prelinking, and as a result, the 
whole system may be slowed down significantly (or so I have been told).  
NSS is a system library on some versions of linux, and not prelinking it 
causes a detrimental effect.  So, it is considered desirable for all 
non-prelinked libraries to be dynamically loaded rather than linked in.

Other possible solutions to this problem might include:

a) making NSS's signature on shared libraries be aware of the format of 
those shared libraries, so that the signed data does not include the 
small part of the file header that is modified by the rebase-like tool.
Of course, this makes the signing tool and the signature verification 
tool very platform dependent.  But this is how all native OS code signing
tools work.  (It would also be possible to recort the signature in the 
file itself).  

b) Change the platform's tools to allow "prelinking" of programs and libraries
even when some of their dependents are not prelinked.  This is how Windows'
rebase and Irix's requickstart tools worked.  This might entail changing the
OS's loader to only relocate those shared libs that are/were not prelinked 
at process load time.

2) softoken now includes (that is, links in) a copy of sqlite3.  The symbols
of sqlite3 should be private to softken3, and should not be available to 
code outside of softoken3.  But alas, on some systems (reportedly, MAC),
those symbols ARE available outside of softoken3.  Since FireFox also links
against its own copy of sqlite3, when it links against both sqlite3 and 
NSS+softoken3 (which includes sqlite3), the linker complains that there 
are duplicate symbols being loaded.  Dynamically loading softoken3 also
solves this problem.  But I think a better solution is to figure out why
softoken's private symbols are not private on the MAC and fix that.
Nelson,

1) Why can softoken not be prelinked on RH linux ? If it it's only because of the signature file being broken, why can't shlibsign be re-run after the pre-linking ? That would be easier than a) and b) to implement.

2) Are you saying some sqlite symbols are being re-exported from softoken3 on Mac ? If so, that should be fixed, and we shouldn't need this patch to do it.
Julien, 
AFAIK, the broken signature is the only reason why prelinking isn't being
done on RH Linux, but one of the RH guys should confirm that.  I suspect
this is an issue with their corporate sw packaging process, and may be 
out of the RH NSS team's control.

Given that the Mac linker reports duplicate symbols, it evidently must be
the case that the symbols are in some way exported or otherwise available
to the linker from within softoken.  I agree that's an egregious bug and 
really should be fixed, whether or not we use dynamic loading of softoken.

Kai or Bob should speak to these questions.
That is correct.

We had two possible fixes, one was to run shlibsign after the prelink, the other was to prevent the prelinking. The former was deemed to difficult to implement reliably in the existing preling infrastructure.
Update about the linking issue on MacOS.  Seems that sqlite3 is built as a
shared library.  softoken links with the sqlite3 shared lib, so sqlite3's
symbols are NOT private to softoken (neither are they exported by softoken).

If FireFox was being linked with NSS+softoken+sqlite3 shared libs, and with
its own non-shared sqlite archive (.a), then it's no wonder that it detected
duplicate symbols.  The question in that case is: why aren't duplicate symbols
reported on all platforms?  My guess is that FireFox is NOT linked with NSS's
sqlite3 shared lib on all platforms.  Perhaps the MacOS linker is doing a full 
transitive closure on the symbols from all shared libs and their dependents,
but other OS's linkers are not doing the transitive closure.  Or perhaps the 
FireFox makefile only lists the sqlite3 shared lib dependency on the MAC.
I don't know.  

FireFox plans to go to linking with a shared lib version of sqlite3 rather
than with an archive.  I think that should solve the problem, because both
FireFox and softoken can be linked against the same sqlite3 shared lib.  

This means that, when FireFox is converted to link with a sqlite3 shared lib,
then this issue will cease to be a reason to want to dynamically load 
softoken from nss3.  (Note that I don't oppose dynamically loading softoken3 
on any platform.)
Nelson,

Re: comment 23, RedHat still needs this patch for the Linux rebasing problem, even if the Mac linker issue is resolved independently, right ?

I would like to see a separate bug on that one and get it fixed first properly before we check in attachment 269285 [details] [diff] [review] .
Julien, I think you are suggesting that bug 306907 should block this one. Yes?
Nelson, that bug is not on NSS, so I don't think it should block this one. I was under the impression that we had link errors on Mac during the NSS build. It looks like I misunderstood. I still wish they would fix their issue independently of this bug and preferably, but we can't make that bug a blocker.
(In reply to comment #26)
> I was under the impression that we had link errors on Mac during 
> the NSS build. It looks like I misunderstood.

It should be obvious that we are not proposing to check in a patch that will break building NSS.
Comment on attachment 269285 [details] [diff] [review]
NSS Patch v5

I gave this review request to Julien because I was leaving for vacation.
Now that I am back from vacation, I am taking this review request back upon myself.  
I am reviewing it now.
Attachment #269285 - Flags: review?(julien.pierre.boogz) → review?(nelson)
Comment on attachment 269285 [details] [diff] [review]
NSS Patch v5

r=nelson, with the following changes:

>Index: mozilla/security/nss/lib/freebl/genload.c
>===================================================================
>RCS file: mozilla/security/nss/lib/freebl/genload.c
>diff -N mozilla/security/nss/lib/freebl/genload.c
>--- /dev/null	1 Jan 1970 00:00:00 -0000
>+++ mozilla/security/nss/lib/freebl/genload.c	21 Jun 2007 21:02:03 -0000
>@@ -0,0 +1,142 @@
>+#ifdef XP_UNIX

This new file needs to have the usual triple-license boilnerplate at the
top, like the other source files.

>+/*
>+ * We use PR_GetLibraryFilePathname to get the pathname of the loaded 
>+ * shared lib that contains this function, and then do a PR_LoadLibrary
>+ * with an absolute pathname for the softoken shared library.
>+ */
>+
>+static PRLibrary *
>+loader_LoadLibrary(const char *nameToLoad, 
>+                   PRFuncPtr funcInAlreadyLoadedLib,
>+                   const char *nameOfAlreadyLoadedLib)
>+{
>+    PRLibrary *lib = NULL;
>+    char* fullPath = NULL;
>+    PRLibSpec libSpec;
>+
>+    /* Get the pathname for nameOfAlreadyLoadedLib, i.e. /usr/lib/libnss3.so
>+     * PR_GetLibraryFilePathname works with either the base library name or a
>+     * function pointer, depending on the platform. We can't query an exported
>+     * symbol such as NSC_GetFunctionList, because on some platforms we can't
>+     * find symbols in loaded implicit dependencies.
>+     * But we can just get the address of this function !
>+     */
>+    fullPath = PR_GetLibraryFilePathname(nameOfAlreadyLoadedLib, 
>+                                         funcInAlreadyLoadedLib);

Please get rid of the function parameter named "funcInAlreadyLoadedLib".
Make it an automatic variable in the function, the way it was in the old
code in loader.c.  E.g.

>-    PRFuncPtr fn_addr = (PRFuncPtr) &bl_LoadLibrary;

Please rename "nameOfAlreadyLoadedLib" to "NameOfThisSharedLib".
Or Eliminate that parameter, too, and just reference a name string that will
be defined in the .c file that includes this one, before this file is included.
(Think of that name string as an argument to this .c file, rather than as an
argument to this function.  You could even use a #define'd symbol for this name.)

>Index: mozilla/security/nss/lib/pk11wrap/pk11load.c
>===================================================================
>RCS file: /cvsroot/mozilla/security/nss/lib/pk11wrap/pk11load.c,v
>retrieving revision 1.20
>diff -u -p -r1.20 pk11load.c
>--- mozilla/security/nss/lib/pk11wrap/pk11load.c	29 Sep 2006 19:53:07 -0000	1.20
>+++ mozilla/security/nss/lib/pk11wrap/pk11load.c	21 Jun 2007 21:02:04 -0000

>+static PRStatus
>+softoken_RunLoaderOnce( void )
>+{
>+  PRStatus status;
>+
>+  status = PR_CallOnce(&loadSoftokenOnce, &softoken_LoadDSO);
>+  return status;
>+}

>+#ifdef NSS_DECOUPLE_SOFTOKEN
>+    /*
>+     * Workaround code that loads softoken as a dynamic library,
>+     * even though the rest of NSS assumes this as the "internal" module.
>+     */
>+    if (!softokenLib && PR_SUCCESS != softoken_RunLoaderOnce())
>+        return SECFailure;

Please get rid of function softoken_RunLoaderOnce, and replace the call to
it (above) with the call to PR_CallOnce that is now in softoken_RunLoaderOnce.
Attachment #269285 - Flags: review?(nelson) → review+
Kai,

Re: comment 27, what I meant was that I thought the main purpose of this patch was to resolve link errors on Mac, but that wasn't the case since there was also another reason to reason Linux shared lib signing issues.
Nelson, thanks a lot for your review+

But I think we need another round of reviews. Sorry. Although it's probably a small round.


First, I think I should follow your and Julien's proposal to "always" do this decoupling. I removed the word "optionally" from this bug's summary.


Second, you requested a couple of changes and I adressed most of them.
However, I would like to recommend that I do not follow this proposal:

> Please get rid of the function parameter named "funcInAlreadyLoadedLib".
> Make it an automatic variable in the function, the way it was in the old
> code in loader.c.

I would like to explain why I think the calling code should still explicitly provide a pointer to a function in the reference library.

Yes, as of today, this executable code will be present in each library that includes it, it will therefore reside in the reference library, and therefore the loader could simply use a function pointer to itself.

But I'm worried that this might change in the future. This code might eventually get moved to somewhere else, and maybe then it's no longer in the reference library?

I propose to give the caller the full control over setting the correct values.

I added the following comment to the included file, in order to document the required parameters:

/*
 * This file is meant to be included by other .c files.
 * This file takes two "parameters", the scope which includes this
 * code shall declare provide variables:
 *   const char *NameOfThisSharedLib;
 *   PRFuncPtr FunctionInThisSharedLib;
 *
 * NameOfThisSharedLib:
 *   The file name of the shared library that shall be used as the 
 *   "reference library". The loader will attempt to load the requested
 *   library from the same directory as the refernece library.
 *
 * FunctionInThisSharedLib:
 *   On some platforms we require a function pointer within the
 *   reference library in order to locate it. This might be necessary
 *   if the reference library got loaded as an implicit dependency.
 *   (This loader code will not simply use a function pointer to itself,
 *    as the loader code might be located elsewhere. 
 */

New patch coming up.
Summary: Optionally remove the link time dependency from NSS to Softoken → Remove the link time dependency from NSS to Softoken
Attached patch NSS Patch v6 (obsolete) — Splinter Review
Attachment #269067 - Attachment is obsolete: true
Attachment #269285 - Attachment is obsolete: true
Attachment #272965 - Flags: review?(nelson)
Kai, we WANT this loader function to ONLY be used from within the library 
that is loading the dependency.  That is one of the reasons why the loader
function is a static function.  Allowing the caller to pass the address of
any function means that we're allowing the caller to pass in incorrect 
values, whereas the address of the loader function itself should always 
be the correct value.  

IIRC, there is an NSPR function that you can call with the address of a 
function, and it returns the name of the shared library that contains that 
address.  Perhaps we should use that, too, so that the caller does not need 
to supply the function address or the shared lib name.  The loader function
can always supply its own address and ask for its own shared lib name.
However, I'm not sure that that get-library-name function works properly
on all platforms.  I seem to remember there is some issue with it.
Perhaps Wan-Teh will remember the details.
Right.  PR_GetLibraryFilePathname is not implemented on all platforms.
Thanks for the clarification Nelson. I missed that the function is declared static and therefore voids my concern.

Ok, as PR_GetLibraryFilePathname does not work properly on all platforms, let's keep the current code which provides the library name.
Attached patch Patch v7Splinter Review
When compared with the reviewed patch v5, this new patch

- addresses all of Nelson's requests

- removes the NSS_DECOUPLE_SOFTOKEN flag, and all the code
  the did the "internal" loading in the past,
  which is no longer necessary

I think this patch is what you want, but I want to give you a chance
to have another look before I check it in.
Attachment #272965 - Attachment is obsolete: true
Attachment #273095 - Flags: review?(nelson)
Attachment #272965 - Flags: review?(nelson)
Comment on attachment 273095 [details] [diff] [review]
Patch v7

Thanks ,Kai.  Looks good.
Attachment #273095 - Flags: review?(nelson) → review+
Thanks for the reviews, Nelson!
Checked in, marking fixed.

done
Checking in freebl/genload.c;
/cvsroot/mozilla/security/nss/lib/freebl/genload.c,v  <--  genload.c
initial revision: 1.1
done
Checking in freebl/loader.c;
/cvsroot/mozilla/security/nss/lib/freebl/loader.c,v  <--  loader.c
new revision: 1.33; previous revision: 1.32
done
Checking in nss/config.mk;
/cvsroot/mozilla/security/nss/lib/nss/config.mk,v  <--  config.mk
new revision: 1.30; previous revision: 1.29
done
Checking in pk11wrap/manifest.mn;
/cvsroot/mozilla/security/nss/lib/pk11wrap/manifest.mn,v  <--  manifest.mn
new revision: 1.18; previous revision: 1.17
done
Checking in pk11wrap/pk11load.c;
/cvsroot/mozilla/security/nss/lib/pk11wrap/pk11load.c,v  <--  pk11load.c
new revision: 1.21; previous revision: 1.20
done
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.12
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: