This bug was reported by the JDK 1.5 team. The JDK 1.5 PKCS#11 support can load PKCS#11 modules directly, not through NSS. They attempted to load our root cert module, libnssckbi.so, and it crashed. The problem is that the the JDK provided NULL function pointers for CreateMutex, DestroyMutex, LockMutex, and UnlockMutex. nssckbi, unlike softoken, does not have a built-in implementation of those functions that it can fall back to, in case the caller wants multithreaded usage but does not supply the functions itself, as permitted by the PKCS#11 spec. Here is a quote from the JDK team below : "We call with CKF_OS_LOCKING_OK set and the function pointers set to null, as allowed to the spec. This is case 2 in PKCS#11 section 11.4 C_Initialize(), page 141 in the PDF: === 2. If the flag is set, and the function pointer fields aren’t supplied (i.e., they all have the value NULL_PTR), that means that the application will be performing multi-threaded Cryptoki access, and the library needs to use the native operating system primitives to ensure safe multi-threaded access. If the library is unable to do this, C_Initialize should return with the value CKR_CANT_LOCK. === If the CKBI module does not support this case, it must return CKR_CANT_LOCK as per spec and we will fall back to single threaded access." There are several possible fixes for this : 1) return CKR_CANT_LOCK, as suggested. However, we probably would have to change other code too, in order to make sure we don't attempt to call the locking functions at all 2) link with NSPR implicitly and use the PRLock implementation . I'm guessing there is some reason we don't do this already - what is it ? 3) load NSPR on-demand only if we get into the situation where the caller requests multithreaded access without providing callback functions
There is an option 4) - create a fallback OS direct locking call. I would argue against 2 and 3. Linking loadable PKCS #11 modules with NSPR is something to be avoided. We have ran into problems in the past with mismatched version of the NSPR used by PKCS #11 modules and their host applications. Softoken is the exception because it is not really loaded, but linked with the application. In general we try to make the PKCS #11 module as standalone as possible. libnssckbi is very carefull to stub out NSPR calls to accomplish this. I would suggest options 1 or 4. (Note, for JSS it's probably best for it to use the same scheme that NSS uses in cause the module ends up being shared). To soften the case 2 and 3 a bit, this policy was created because we had a number of applications (Netscape Communicator, old Netscape Servers, etc) which had statically linked versions of NSPR. Loading newer PKCS #11 modules into these older applications invariabled caused crashes. With shared library versions of NSPR, this may not be quite the problem, but there still is the question of what happens when we go to NSPR version 5, and you again have mismatched NSPR interfaces. bob
Thanks for that comment, Bob. I don't particularly like option 4, because we will end up duplicating much code from NSPR, and we will have to maintain this code for new platforms. I can see why option 2 can cause problems with applications that already use NSPR themselves, particularly static versions. I'm not sure if the same problems apply to option 3, where NSPR is dynamically loaded. I think it might not. We will still need to re-implement native code to load the library and symbols, though ... So I think option 1 is probably the preferred way.
The crash is in NSSArena_Create, which gets called in nssCKFWInstance_Create before the arguments have been checked. Patch forthcoming. ---- called from signal handler with signal 4 (SIGILL) ------  0x54ab8958(0x1927a8, 0x0, 0xffbfddd8, 0x44f9a8, 0xe8, 0x5d81e8), at 0x54ab8957 => PR_NewLock(), line 375 in "nsprstub.c"  nssArena_Create(), line 418 in "arena.c"  NSSArena_Create(), line 386 in "arena.c"  nssCKFWInstance_Create(pInitArgs = 0x16fd98, mdInstance = 0x7ec52ee4, pError = 0xffbfdfb0), line 214 in "instance.c"  NSSCKFWC_Initialize(pFwInstance = 0x7ec50edc, mdInstance = 0x7ec52ee4, pInitArgs = 0x16fd98), line 162 in "wrap.c"  builtinsC_Initialize(pInitArgs = 0x16fd98), line 114 in "nssck.api"  Java_sun_security_pkcs11_wrapper_PKCS11_C_1Initialize(0x36b90, 0xffbfe1e4, 0x7ec16340, 0x7f15831c, 0xfffffcf8, 0x7ec50ee0), at 0x7edd5f58
Actually, the flags in CK_C_INITIALIZE_ARGS are 0 in this case, as well as all 4 function pointers. This would mean it is case 1 in the PKCS#11 doc for C_Initialize - which is that the application won't be performing concurrent access . That's rather odd but perhaps it is the way the Java PKCS#11 provider is written ... In any case we still need to prevent this crash, which right now happens whenever the function pointers aren't supplied by the application.
(In reply to comment #4) To clarify, the SunPKCS11 code currently first calls C_Initialize() with an CK_C_INITIALIZE_ARGS that has the four mutex pointers set to NULL and CKF_OS_LOCKING_OK in flags set (case 2 in the spec). If that fails with CKR_CANT_LOCK, we call C_Initialize() again with the same CK_C_INITIALIZE_ARGS structure but with its flags = 0, i.e. no multi-threaded access (case 1).
Created attachment 154424 [details] [diff] [review] evaluate CK_C_INITIALIZE_ARGS only once, and allow working in single-threaded mode
Comment on attachment 154424 [details] [diff] [review] evaluate CK_C_INITIALIZE_ARGS only once, and allow working in single-threaded mode Saul, Bob, could you pleae review this patch ? Thanks.
I forgot to note that this patch also includes changes from Bob's first ckfw patch in bug 248981 .
Julien- I have looked at this code. It looks correct to me (you can switch it to r=mcgreer if you want).
Thanks, Ian ! I would still like to get a review from Bob or Nelson before checking in.
Comment on attachment 154424 [details] [diff] [review] evaluate CK_C_INITIALIZE_ARGS only once, and allow working in single-threaded mode I'm not really happy with the disjoint but related globals. The LockingState global in mutex is set for a particular pInitArgs by code in nsprstub.c. If we have some code that passes a different pInitArgs into NSSCKFWMutex_Create(), then Create might pick the wrong values. I believe it would be better to change the signature of NSSCKFWMutex_Create to take locking state as an argument. nsprstub.c could then pass it's calculated global version of LockingState that it knows matches the pInitArgs. Another way would be to create a datastructure with both pInitArg and the cached LockingState in a single dataStructure. The rest of the patch looks fine to me. I'm all in favor the the idea of caching the calculation of LockingState. bob
Created attachment 154611 [details] [diff] [review] address Bob's concerns Bob, thanks for the review. Here is a new patch . The changes are : 1) keep all the globals in nsprstub.c - the locking state is nssstub_LockingState 2) add a CryptokiLockingState argument to nssCKFWMutex_Create 3) add the CryptokiLockingState to the NSSCKFWInstance 4) add a CryptokiLockingState argument to NSSCKFWInstance_Create 5) return computed CryptokiLockingState from nssSetLockArgs 6) in NSSCKFWC_Initialize, pass the CryptokiLockingState from nssSetLockArgs to NSSCKFWInstance_Create These changes were all necessary in order to avoid referencing the nsprstub globals .
Comment on attachment 154611 [details] [diff] [review] address Bob's concerns Bob, please review.
Comment on attachment 154611 [details] [diff] [review] address Bob's concerns Yes, much better. sr=relyea
Checked in to NSS_3_9_BRANCH and tip : Checking in ckfw.h; /cvsroot/mozilla/security/nss/lib/ckfw/ckfw.h,v <-- ckfw.h new revision: 1.5; previous revision: 1.4 done Checking in ckfwm.h; /cvsroot/mozilla/security/nss/lib/ckfw/ckfwm.h,v <-- ckfwm.h new revision: 1.4; previous revision: 1.3 done Checking in instance.c; /cvsroot/mozilla/security/nss/lib/ckfw/instance.c,v <-- instance.c new revision: 1.9; previous revision: 1.8 done Checking in mutex.c; /cvsroot/mozilla/security/nss/lib/ckfw/mutex.c,v <-- mutex.c new revision: 1.4; previous revision: 1.3 done Checking in nsprstub.c; /cvsroot/mozilla/security/nss/lib/ckfw/nsprstub.c,v <-- nsprstub.c new revision: 1.6; previous revision: 1.5 done Checking in nssckfwt.h; /cvsroot/mozilla/security/nss/lib/ckfw/nssckfwt.h,v <-- nssckfwt.h new revision: 1.3; previous revision: 1.2 done Checking in wrap.c; /cvsroot/mozilla/security/nss/lib/ckfw/wrap.c,v <-- wrap.c new revision: 1.11; previous revision: 1.10 done
Created attachment 155123 [details] [diff] [review] incremental patch Testing found that the crash was resolved, but the library still didn't work in single-threaded mode. This was due to the error code not being set to OK in the single threaded case. This one-line patch fixes the problem.
Since Bob is away and we have an urgent need, I have taken the liberty to check in this fix. Checked in to the tip : Checking in mutex.c; /cvsroot/mozilla/security/nss/lib/ckfw/mutex.c,v <-- mutex.c new revision: 1.5; previous revision: 1.4 done And NSS_3_9_BRANCH : Checking in mutex.c; /cvsroot/mozilla/security/nss/lib/ckfw/mutex.c,v <-- mutex.c new revision: 18.104.22.168; previous revision: 22.214.171.124 done