Closed Bug 302416 Opened 19 years ago Closed 19 years ago

NSS root cert module & fortezza should not be using NSPR static libraries

Categories

(NSS :: Build, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: caillon, Assigned: rrelyea)

References

Details

Attachments

(2 files, 3 obsolete files)

NSPR static libraries are deprecated and unsupported, yet NSS is still a
consumer.  Red Hat currently ships the NSPR static libs because of NSS's
requirement.  As far as I am aware, once we fix this issue, we can stop shipping
the NSPR static libs.
This seems to get firefox and friends to build (and run), at least,
--with-system-nspr on a system without static libs present in the NSPR_LIBS
dir.
Attachment #190776 - Flags: review?(wtchang)
Attachment #190776 - Attachment description: Proposed patch, v1.0 → Proposed patch, v1.0 from Tom Callaway <tcallawa@redhat.com>
Comment on attachment 190776 [details] [diff] [review]
Proposed patch, v1.0
from Tom Callaway <tcallawa@redhat.com>

The fundamental issue here is whether these two PKCS #11
libraries should depend on NSPR at run time.  The concern
is that the application that loads these PKCS #11 libraries
may be linked with an incompatible version of NSPR.

We avoid the runtime dependency on NSPR by linking with
the NSPR static libraries.  We could also accomplish this
by not using NSPR at all, but that's a lot of work.  (I
tried at least twice but never had the time to finish it.)

Right now we should revisit this issue: is it still necessary
for these libraries, especially libnssckbi.so, to not depend
on NSPR at run time?

If it is now ok to depend on NSPR, then this patch is in the
right direction.  But it is incomplete: the nsprstub.c and
stub.c files should also be removed.  (Just remove them from
mozilla/security/nss/lib/ckfw/manifest.mn and
mozilla/security/nss/lib/fortcrypt/swfort/pkcs11/manifest.mn.)
This is why I marked the patch r-.  (The patch should also
handle the Windows cases in those two makefiles.)
Attachment #190776 - Flags: review?(wtchang) → review-
The only credible argument against NSS's PKCS11 modules using NSPR, that I 
recall hearing, had to do with use of locks.  PKCS11 modules are supposed 
to be able to use caller-supplied locking primitives, rather than using 
locking primitives of their own choosing.  If the PKCS11 modules call an 
NSPR function that blocks using locking primitives other than those 
supplied by the caller, that likely leads to potential races.  

Recently we've been talking about PKCS11 modules that effectively require
that the application allow them to use the system locking primitives, and 
not require that the module use application-supplied primitives.  Perhaps
we should just declare that NSS's PKCS11 modules are all like that.  
If we did that, wouldn't that allow us to use NSPR and the system's native
locking primitives?

Are there any other reasons not to use NSPR in PKCS11 modules?
Have I perhaps forgotten others?  If so, please remind of them here.

Do we need to schedule another NSS team meeting for this subject?
I believe nelson hit the main issue -- the locks. There is also a historic issue
with mixing of different versions of NSPR. I believe that's no longer a problem,
and if it is static linking makes the problem worse. The lock issue is primarily
a problem when NSPR uses non-system locks (that is the system does not have it's
own threads library, or the NSPR uses some addition abstraction, such as NT
Fibers. The former is no longer a consideration [NSPR no longer supports any
systems which do not support threads], which just leaves the fibers version as
an issue.... but then if we use the version of NSPR linked in to the
application, we will still get the correct behavior.

Anyway I agree, we should either go with dynamic NSPR or remove it from the
builtins completely. Going completely dynamic would be easiest... we remove the
stubs and make the toolkit init function return 'OS_LOCKING'.

bob

Removing NSPR dependency from lib/ckfw is possible but
is a lot of work and requires creating copies of the
following .c files for lib/ckfw:

mozilla/nsprpub/lib/ds/plarena.c
mozilla/nsprpub/lib/ds/plhash.c

mozilla/security/nss/lib/base/arena.c
mozilla/security/nss/lib/base/errorval.c
mozilla/security/nss/lib/base/libc.c
mozilla/security/nss/lib/base/utf8.c

We also need to create copies of the .h files these
.c files include.

I demonstrated this is feasible, but asked myself,
"is it worth it?"
I'll attach a patch which goes the other way (removing the need to make NSPR
static) as soon as I can test builtin. (my patch currently builds).

bob
Assignee: caillon → rrelyea
OS: Linux → All
Hardware: PC → All
Attached patch Link NSPR dynamically (obsolete) — Splinter Review
Not only link NSPR dynamically, but make sure we don't use the stub files.
Attachment #190776 - Attachment is obsolete: true
OK a quick review shows a bug in unlock (not checking for mutex->lock).

I'll attack a new version with just this fix. In the meantime a review of the
rest of the patch would be appriciated;).

bob
Attachment #190866 - Flags: superreview?
Attachment #190866 - Flags: review?
Comment on attachment 190866 [details] [diff] [review]
Link NSPR dynamically

The locks are an issue.
PKCS#11 allows 4 ways of initializing the module.

1) single-threaded

This is done by passing setting pInitArgs to NULL in C_Initialize, or by
setting all four lock function pointers to NULL in CK_C_INITIALIZE_ARGS, and
not setting the CKF_OS_LOCKING_OK flag.

This patch breaks in the later case; it will erroneously return CKR_CANT_LOCK. 

2) multi-threaded with application-provided lock functions.

This is done by passing 4 lock pointers in CK_C_INITIALIZE_ARGS , and not
setting the CKF_OS_LOCKING_OK flag.

This mode of operation was previously supported by nssckbi . This patch breaks
it. I consider this a regression .

3) multi-threaded with system locks.

This is set by setting the CKF_OS_LOCKING_OK flag, and not passing any lock
function pointers.

This patch enables this mode, which previously didn't work.

4) multi-threaded with system locks or application-provided locks.

This mode worked previously, and the module chose the application-provided
locks. Now, it still works, but uses system locks. This is fine.

Re: Nelson's comment #3 .
I don't recall that we talked about making PKCS#11 modules that require the use
of system locks. We talked about applications with mixed NSS and Java code that
could only work correctly if all the PKCS#11 modules were using system locks
(ie. case 3 above).   This doesn't mean the modules should be incapable of
supporting the other 3 modes for other applications which have different
requirements.
Attachment #190866 - Flags: superreview? → superreview-
Bob,

BTW, you may want to look at the comments in
https://bugzilla.mozilla.org/show_bug.cgi?id=297734 for a closely related problem .
Comment on attachment 190866 [details] [diff] [review]
Link NSPR dynamically

I am fine with shipping the NSS 3.11 nssckbi module
with that regression and backing out this change only
if a customer needs the new root CA list without
changing their code.

Since the nssckbi module can be used independent
of NSS, a customer that initializes nssckbi using
method 2 has the option of not upgrading nssckbi
when they upgrade to NSS 3.11.
Not all PKCS #11 modules support all modes. I know of certain PKCS #11 modules
form some vendor which already have exactly the same semantics that I checked in
for nssckbi.

Currently nssckbi will fail if OS_Locking is set and the call backs are used.
Use of the callbacks preclude linking dynamically with NSPR because we have to
override other callers which already call NSPR directly. This was why this token
was linked statically to begin with.

Case 1 does need to be dealt with and I'll include that in the patch.

RE: comment 10. This patch fixes that bug.

bob

RE using non-system locking. The bug in question says that two different
libraries calling the same PKCS #11 function can only effectively share that
PKCS #11 module if it supplies the OS_LOCKING flag. If the applications *ONLY*
supply the locking parameters, the application is implying that it is there are
application level threads. In this case sharing between modules is highly
problematic.

Oops the line:

Currently nssckbi will fail if OS_Locking is set and the call backs are used.

should read:

Currently nssckbi will fail if OS_locking is set and the call backs are not suppied.

Of course this is the subject of the bug Julien points to.

In general it's very difficult for a given PKCS #11 module to support both.
Softoken only support 'OS_Locking' (by direct NSPR calls) and completely ignores
the function list.

This fixes Julien's issue with case 1. I makes the explicit case why case 3 can
no longer be supported.
Attachment #190866 - Attachment is obsolete: true
Attachment #190892 - Flags: superreview?
Attachment #190892 - Flags: review?
Attachment #190866 - Flags: superreview-
Wan-teh,

re: your comment #11,

I don't think shipping with this regression in 3.11 is a compelling option. A
customer could start to depend on the OS_LOCKING after we ship 3.11, then
another customer could ask us to revert to the old behavior, and we would then
break them . We shouldn't break anybody. We need to do it right, and add the
support for OS locking without removing support for application locks. I have
discussed many of the details involved in supporting both in bug 297734 .
Bob,

Re: your comment in the patch,

The root cert module link with nssbase, which calls NSPR function names
directly. But in fact, this worked correctly before with application locks,
because it used the NSPR stub functions, not the real NSPR APIs. nssbase should
be enhanced to be able to select the lock functions it uses. Instead of
referencing PR_NewLock / PR_DestroyLock / PR_Lock / PR_Unlock, it could
reference _pr_ ... and then we would put a stub implementation in ckbi with a
switch statement to use PR_ or not, and another in NSS which would call PR_ .

I successfully just removed plc4.lib and plds4.lib from the link line of
nssckbi, without adding the dynamic libraries . Since our linker options are not
set right (bugzilla 297802) , it linked fine, even though there were unresolved
symbols. I was then able to use this library with modutil successfully, and it
used the NSPR symbols already loaded in the process ...

nm on this libnssckbi.so reveals that the only unresolved symbols were :

[1875]  |         0|       0|FUNC |GLOB |0    |UNDEF  |PL_ArenaAllocate
[1859]  |         0|       0|FUNC |GLOB |0    |UNDEF  |PL_ArenaRelease
[1876]  |         0|       0|FUNC |GLOB |0    |UNDEF  |PL_CompareValues
[1872]  |         0|       0|FUNC |GLOB |0    |UNDEF  |PL_FinishArenaPool
[1878]  |         0|       0|FUNC |GLOB |0    |UNDEF  |PL_HashTableAdd
[1883]  |         0|       0|FUNC |GLOB |0    |UNDEF  |PL_HashTableDestroy
[1857]  |         0|       0|FUNC |GLOB |0    |UNDEF  |PL_HashTableEnumerateEntries
[1881]  |         0|       0|FUNC |GLOB |0    |UNDEF  |PL_HashTableLookup
[1879]  |         0|       0|FUNC |GLOB |0    |UNDEF  |PL_HashTableRemove
[1861]  |         0|       0|FUNC |GLOB |0    |UNDEF  |PL_InitArenaPool
[1867]  |         0|       0|FUNC |GLOB |0    |UNDEF  |PL_NewHashTable
[1880]  |         0|       0|FUNC |GLOB |0    |UNDEF  |PL_strcasecmp
[1860]  |         0|       0|FUNC |GLOB |0    |UNDEF  |PL_strlen

Calling the PL hash table and string functions should always be OK in nssckbi,
as long as we define them not to ever use locks .

I believe the arena functions do use locks (arena freelist), so they indeed
cause a problem. But if the arena freelist is disabled, then no locks should
ever be used, and the libnssckbi.so truly does not use any OS locking. So, the
arena code in libnssbase.a at this point is the main obstacle to being able to
support both OS locking and application locks in ckbi right now.

We have several options when this regression actually affects
a customer.

We can ask the customer to use CKF_OS_LOCKING_OK.

We can ask the customer to upgrade to NSS 3.11 without upgrading
the nssckbi module.

We can release a new nssckbi module from the NSS_3_10_BRANCH for
this customer.

We can lie and ignore the mutex callbacks and just use PRLocks.
It is safe to lie because all NSS customers today use native
threads and all the OS's that NSS supports have native threads.

If none of these are acceptable to the customer, we can back out
this patch and add CKF_OS_LOCKING_OK support to the nssckbi module.

Implementing OS locking support only requires four short
functions for pthreads (which covers Unix/Linux/Mac OS X), Windows,
and OS/2 each.  Other platforms (BeOS and OpenVMS) don't need OS
locking support. This is much simpler than looking up the PR_NewLock,
PR_Lock, PR_Unlock, and PR_DestroyLock symbols at run time.
NSS ckbi is not part of the NSS public API. It's a separate PKCS #11 module. If
someone wants to use it as a separate PKCS #11 module, that's fine, but that has
nothing to do with the NSS public interface. It's the same as if the link
directly with freebl. We can and do change the freebl interface as needed.

So...

1. We've already determined that NSS always set the CKF_OS_LOCKING_OK flag.
2. Most libraries actually dislike the use of passing locking primatives.
3. Softoken cannot easily support both passed primatives and CKF_OS_LOCKING_OK.
4. The PKCS #11 spec says that it is perfectly acceptable to PKCS #11 modules to
reject CKF_OS_LOCKING_OK or locking primatives (or both for that matter).

The "regression" is *very* unlikely to cause any effect in applications (most
libraries either supply CKF_OS_LOCKING_OK or both the flag and the primatives),
and it's a "regression" in a non-public API.

bob
Is "OS Locking" well defined by PKCS#11 for all platforms?

IINM, for OS Locking to work as intended in the presence of multiple 
PKCS11 modules, those modules must all agree on its definition, or else
we have the worst case of different locking schemes being used, the 
very problem that the lock callbacks were intended to solve.

There is one platform about which I am particularly concerned WRT the 
definition of OS Locking: Win NT.  As I understand it, there are two 
different methods, one for fibers and one for Win32 threads.  What is 
the right definition?
In reply to comment 18, PKCS#11 is a public API.
Any PKCS#11 module produced in NSS has that public API.
1) yes, PKCS #11 is a public API, but the PKCS #11 modules in NSS are not part
of the NSS public API. Use of these modules outside of NSS are 'ad hoc' uses. 

2) the builtin's module are compliant with the PKCS #11 spec.

The NT Fibers issue is a good issue to review, though NSS currently must assume
that OS locking is 'fine' for NT Fibers because NSS already sets the
CKF_OS_LOCKING_OK flag.

bob
Bob,

Sun doesn't consider the PKCS#11 API to be an "ad hoc" use of NSS . We
explicitly support applications, such as the JDK, that access our PKCS#11
modules directly. There are other such applications as well, for example the
Solaris crypto framework (libpkcs11 wrapper PKCS#11 library). We don't know if
there will be more or not, but we are prepared to support them . So for us,
PKCS#11 is part of the public API. This is different from freebl, which is
clearly a private interface.
Attachment #190866 - Flags: review?
When the "WINNT" configuration of NSPR is used, it is fine
for the softoken to claim it does OS locking because native
threads can use PRLocks without any problems.

On the other hand, when the "WINNT" configuration of NSPR
is used, an NSPR local thread (implemented as an NT fiber)
that calls into a truly OS locking PKCS #11 module will
block the underlying native thread when it waits to acquire
an OS lock inside the PKCS #11 module.  This could be
problematic.  So NSPR-based apps should create NSPR global
threads (implemented as native threads) to call into a
truly OS locking PKCS #11 module in the "WINNT" configuration.
I was told that NES uses NSPR global threads to call into
HSMs.
Wan-Teh,

Re: comment #23,

I believe in the first paragraph, you meant "When using the WIN95 configuration".

I'm not sure if NES does anything differently when calling into HSMs. You can't
turn a fiber into a global thread while you are already running on it, can you ?
So you would have to do this at server initialization time, when starting the
server's worker threads. However, NES/iWS have dynamic reconfiguration
capability, which is triggered by an external signal. The server might initially
not use SSL, or use it without an HSM, and then start using an HSM on the fly. I
don't recall any facility to reap all the web server fibers and restart them as
global threads. This would be quite messy for an application to deal with ...
Summary: NSS should not be using NSPR static libraries → NSS root cert module & fortezza should not be using NSPR static libraries
Comment on attachment 190892 [details] [diff] [review]
Fix the lock check for the unlock call. Handle the full explicit 'case 1' of the PKCS #11 locking semantics.

This patch only has one serious problem and some minor problems.

1. ckfwm.h

>-NSS_EXTERN CK_RV
>-nssSetLockArgs(
>-   CK_C_INITIALIZE_ARGS_PTR pInitArgs,
>-   CryptokiLockingState* returned
>-);

It seems that you replaced this function by the new function
nssCKFW_GetThreadSafeState.  If so, you should either declare
the new function here, or make it a static function in wrap.c.

2. mutex.c

> struct NSSCKFWMutexStr {
>-  CK_VOID_PTR etc;
>-
>-  CK_DESTROYMUTEX Destroy;
>-  CK_LOCKMUTEX Lock;
>-  CK_UNLOCKMUTEX Unlock;
>+   PRLock *lock;
> };

The indentation level should be two spaces.

3. wrap.c

>+/* figure out out locking semantics */
>+CK_RV
>+nssCKFW_GetThreadSafeState(CK_C_INITIALIZE_ARGS_PTR pInitArgs,
>+                           CryptokiLockingState *pLocking_state) {
>+  int functionCount;

This needs to be initialized to 0.  This is the serious problem.

>+  /* CKF_OS_LOCKING_OK is not set, and not functions supplied, 

"not" => "no"

>+  /* OS_LOCKING_OK is not set and functions have been supplied. Since
>+   * ckfw uses nssbase library which explicitly calls NSPR, and because
>+   * we can't link with NSPR statically, then we can reliably capture 
>+   * those PR_Lock() calls, so we can't support applications which have
>+   * their own threading module.

This sentence is hard to understand.  I suggest simply saying this:

>+  /* OS_LOCKING_OK is not set and functions have been supplied. Since
>+   * ckfw uses nssbase library which explicitly calls NSPR, we can't
>+   * support applications which have their own threading module.

4. builtins/Makefile

>+else 
>+EXTRA_SHARED_LIBS += \
>+        $(DIST)/lib/$(NSPR31_LIB_PREFIX)plc4.lib \
>+        $(DIST)/lib/$(NSPR31_LIB_PREFIX)plds4.lib \
>+        $(DIST)/lib/$(NSPR31_LIB_PREFIX)nspr4.lib \
>+        $(NULL)
>+endif # NS_USE_GCC

You need to replace "$(DIST)/lib" by "$(NSPR_LIB_DIR)".
Attachment #190892 - Flags: superreview?
Attachment #190892 - Flags: review?
Attachment #190892 - Flags: review-
In our meeting today, we came to an agreement to change the root cert module and
drop support for application-supplied lock callbacks, and add support for OS
locking.

If we were to run into a problem with any customer after dropping the callbacks,
Wan-Teh promised to implement support for both OS locking and application callbacks.
Blocks: 297734
Priority: -- → P2
Target Milestone: --- → 3.11
Attachment #190892 - Attachment is obsolete: true
Attachment #192568 - Flags: superreview?(julien.pierre.bugs)
Attachment #192568 - Flags: review?(wtchang)
Comment on attachment 192568 [details] [diff] [review]
Incorporate wtc's comments

r=wtc.

You missed one comment change I suggested.

In wrap.c, we have

>+  /* CKF_OS_LOCKING_OK is not set, and not functions supplied, 
>+   * explicit case 1 */

Change the second "not" to "no".

I found another minor problem.	In builtins/Makefile,
now that we are linking with NSPR *shared* libraries,
they should be listed in the variable EXTRA_SHARED_LIBS
instead of EXTRA_LIBS.	You changed one of the occurrences
(Windows and not using GCC) but we should change the other two
occurrences (Windows and using GCC; not Windows).
Attachment #192568 - Flags: review?(wtchang) → review+
Comment on attachment 192568 [details] [diff] [review]
Incorporate wtc's comments

Another question:

>+   * therefore we can't support applications which have their own threading 
>+   * module.

Did you mean "threading module" or "threading model"?

The sentence reads better with "therefore" removed.
Comment on attachment 192568 [details] [diff] [review]
Incorporate wtc's comments

Bob,

I have a few issues with this patch

1) (low priority)
You added some dead code  in mutex.c after calls to mutex_add_pointer, which
always returns CKR_OK .

2) more serious - the reason for the r-

You are only checking LockingState in nssCKFWMutex_Create, but not in Lock,
Unlock or Destroy . In those functions, you rely upon the value of mutex->lock
being non-NULL - otherwise you return CKR_OK .

If the lock creation previously failed, but the application code still called
Lock, Unlock, or Destroy methods, this will result in the PKCS#11 module
silently not locking, and may lead to hard to debug race conditions.

I would suggest checking the LockingState in all 4 functions, and returning
CKR_OK early if LockingState is SingleThreaded . If it's MultiThreaded, do the
proper error checking for the value of mutex->lock and fail if it's NULL,
rather than return CKR_OK.
Attachment #192568 - Flags: superreview?(julien.pierre.bugs) → superreview-
Julien,

If lock recreation fails, then a NULL mutex is returned. Any application which
blindly passes null to the locking function will be met with the same effect as
passing NULL to PR_Lock, that is a crash. What actually happens in this code is
the operation trying to allocate the lock will fail, so lock and unlock are
never called.

So it's not necessary to separately check the locking state. You cannot have a
valid mutex pointer if locking_state is MultiThreaded and the lock creation
failed, so mutex->lock completely describes the current threadstate.

About the 'dead-code'. Even though the mutex_add_pointer() currently returns
CKR_OK, it is defined to possibly fail. If all the code which calls this
function 'knows' that it can't fail, it increases the cost of allowing it to do
so in the future (I have enough code like that already;). On the other hand, it
might be we want to remove the pointer tracking code altogether. In that case,
that is a separate bug and should be tackled on it's own.

bob
Comment on attachment 192568 [details] [diff] [review]
Incorporate wtc's comments

Requesting rereview in the light of my comments.
Attachment #192568 - Flags: superreview- → superreview?
Attachment #192568 - Flags: superreview? → superreview+
Checking in ckfwm.h;
/cvsroot/mozilla/security/nss/lib/ckfw/ckfwm.h,v  <--  ckfwm.h
new revision: 1.6; previous revision: 1.5
done
Checking in manifest.mn;
/cvsroot/mozilla/security/nss/lib/ckfw/manifest.mn,v  <--  manifest.mn
new revision: 1.8; previous revision: 1.7
done
Checking in mutex.c;
/cvsroot/mozilla/security/nss/lib/ckfw/mutex.c,v  <--  mutex.c
new revision: 1.7; previous revision: 1.6
done
Checking in wrap.c;
/cvsroot/mozilla/security/nss/lib/ckfw/wrap.c,v  <--  wrap.c
new revision: 1.13; previous revision: 1.12
done
Checking in builtins/Makefile;
/cvsroot/mozilla/security/nss/lib/ckfw/builtins/Makefile,v  <--  Makefile
new revision: 1.16; previous revision: 1.15
done
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Note that you don't need to link nspr4, since synbols from nspr4 are not used. Only those of plds4 and plc4 are.
Mike,

That is not correct, PR_Lock and PR_DestroyLock are used, and they are symbols from libnspr4 .
(In reply to comment #35)
> Mike,
> 
> That is not correct, PR_Lock and PR_DestroyLock are used, and they are symbols
> from libnspr4 .

Aaaaaah that's been added by the patch.
Mike,

Yes, it has. Read comment 9 to understand the 4 possible ways a PKCS#11 module does locking. We used to support case 2 and dropped it for case 3; and the type of locks used in case 4 changed.

Previously, pointers to lock functions were being passed in from C_Initialize by libnss (which were actually functions operating on PRLock) . Now, the root cert module itself operates on PRLock directly, and no longer supports the lock callback functions.
Note the link to nspr, plds and plc is not an absolute necessity, though. The module being loaded by libnss, which is linked to nspr and friends. Well, it's fine either way
We forgot to cvs remove nsprstub.c when we removed it
from lib/ckfw/manifest.mn.
Attachment #319705 - Flags: review?(rrelyea)
Attachment #319705 - Flags: review?(rrelyea) → review+
Comment on attachment 319705 [details] [diff] [review]
cvs remove lib/ckfw/nsprstub.c

I removed nsprstub.c from CVS on the NSS trunk (NSS 3.12.1).

Removing nsprstub.c;
/cvsroot/mozilla/security/nss/lib/ckfw/nsprstub.c,v  <--  nsprstub.c
new revision: delete; previous revision: 1.7
done
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: