User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7.5) Gecko/20041109 Firefox/1.0 Build Identifier: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7.5) Gecko/20041109 Firefox/1.0 NSPR by design needs to be initialized on the primordial thread. If is not initialized on the primoridial thread unpredictable crashes can occur. Currently Java applications need to initialize JSS (which inits NSPR/NSS) in the main method of the java app. A new method, loadNSPR(), is required that will allow NSPR (NSS/JSS libraries) to be loaded in the main primordial thread of a java application, and then the java application can choose to initialize JSS/NSS if needed in a separate thread. Reproducible: Sometimes Steps to Reproduce: 1.load JSS app that does not init JSS in the main method 2.run app several times, at some point JVM will crash. 3. Actual Results: A DESCRIPTION OF THE PROBLEM : # # An unexpected error has been detected by HotSpot Virtual Machine: # # SIGSEGV (0xb) at pc=0xb74f8311, pid=10216, tid=2630290352 # # Java VM: Java HotSpot(TM) Server VM (1.5.0_01-b08 mixed mode) # Problematic frame: # C [libc.so.6+0x72311] # # An error report file with more information is saved as hs_err_pid10216.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # Expected Results: loadNSPR() will allow apps that want to use JSS, to not have to init JSS in the main primordial thread.
Attachment #184135 - Flags: review?(saul.edwards.bugs) → review+
Do you know exactly what caused the crash? It is better to eliminate this NSPR requirement if possible.
I'm also wondering why the apps don't want to initialize JSS on the primordial thread but can initialize NSPR on the primordial thread. Could you give a scenario where this is necessary?
The case is a product wishes to only init JSS, if and when they have need to later on in there multi-thread based app. We've shown that if they init JSS in there main, that this problem would not occur, but they would rather just init NSPR, and then init JSS. here is a stack trace #0 0xb75c88c1 in __waitpid_nocancel () from /lib/tls/libpthread.so.0 #1 0xb6ebb76a in VMError::fork_and_exec () from /usr/jdk1.5.0_04/jre/lib/i386/server/libjvm.so #2 0xb6ebb8b5 in VMError::show_message_box () from /usr/jdk1.5.0_04/jre/lib/i386/server/libjvm.so #3 0xb6ebb51a in VMError::report_and_die () from /usr/jdk1.5.0_04/jre/lib/i386/server/libjvm.so #4 0xb6df5bda in JVM_handle_linux_signal () from /usr/jdk1.5.0_04/jre/lib/i386/server/libjvm.so #5 0xb6df3164 in signalHandler () from /usr/jdk1.5.0_04/jre/lib/i386/server/libjvm.so #6 <signal handler called> #7 0x86c38d63 in pt_AttachThread () from /opt/sun/private/lib/libnspr4.so #8 0x86c3926c in PR_GetCurrentThread () from /opt/sun/private/lib/libnspr4.so #9 0x86c2bd68 in PR_SetError () from /opt/sun/private/lib/libnspr4.so #10 0x86dc3952 in PORT_SetError () from /opt/sun/private/lib/libnss3.so #11 0x86d9eafc in PK11_GetBestSlotMultiple () from /opt/sun/private/lib/libnss3.so #12 0x86d9ec8a in PK11_GetBestSlot () from /opt/sun/private/lib/libnss3.so #13 0x86da86a8 in PK11_CreateDigestContext () from /opt/sun/private/lib/libnss3.so #14 0x86def6af in Java_org_mozilla_jss_pkcs11_PK11MessageDigest_initDigest ()
Glen, could you try this NSPR patch and see if it allows you to initialize NSPR on any thread?
Wan-Teh, The choice of thread for the NSPR initialization wasn't the only reason we wanted to expose PR_Init through JNI . The applications also couldn't ensure there would be only a single thread doing the initialization . Allowing NSPR init in any thread wouldn't help them unless NSPR is also made thread-safe, which it is currently not. I'm not sure if it could even be done on all platforms, but it would certainly have a negative performance impact on every NSPR call .
NSS initialization isn't thread-safe, either. Why don't you also need to initialize NSS in the main method of the Java app?
Good point. Perhaps the issues are unrelated, but I know they came up together in the discussion with the application team. I'll let Glen handle it and test your patch.
not that it's likely to work well, but should you expose PR_Cleanup? :)
Glen, please try this patch. Sorry I didn't test the previous patch.
Attachment #184149 - Attachment is obsolete: true
Wan-Teh, What about the platforms that don't use pthreads, such as Windows ? Do they have any dependency on NSPR being initialized in the primordial thread ?
For the "WINNT" configuration of NSPR, this is such a requirement in NSPR 4.1.x and earlier because the primordial thread is used to process the I/O completion port. I fixed that in NSPR 4.2. So you should be okay. Of course we need to test it and fix any problems we find. The requirement is not that NSPR must be initialized on the primordial thread, but that the thread that NSPR was initialized on must not terminate before all uses of NSPR are done. We should try to eliminate this requirement.
I tested the 2nd patch and it fails every time nspr is not initialized on the primordial thread. # # An unexpected error has been detected by HotSpot Virtual Machine: # # SIGSEGV (0xb) at pc=0x9af88957, pid=13909, tid=2601515952 # # Java VM: Java HotSpot(TM) Server VM (1.5.0_01-b08 mixed mode) # Problematic frame: # C [libnspr4.so+0x30957] # # An error report file with more information is saved as hs_err_pid13909.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp #
Glen, could you get a stack trace like the one in comment 4?
C [libnspr4.so+0x30957] C [libnspr4.so+0x31162] PR_GetCurrentThread+0x42 C [libnspr4.so+0x1d192] PR_SetError+0x18 C [libnss3.so+0x50952] PORT_SetError+0x1e C [libnss3.so+0x2bafc] PK11_GetBestSlotMultiple+0x7c C [libnss3.so+0x2bc8a] PK11_GetBestSlot+0x22 C [libnss3.so+0x356a8] PK11_CreateDigestContext+0x2c C [libjss3.so+0x112de] Java_org_mozilla_jss_pkcs11_PK11MessageDigest_initDigest+0x5a
Glen, can you use a debug libnspr4.so build so we have more information in the stack trace? I'm beginning to suspect that the crash is caused by the NSPR initialization thread safety problem Julien mentioned in comment 6. I will attach an experimental patch next that will help us determine if this is the case.
This patch uses pthread_once to call the actual NSPR initialization routine (_PR_InitStuff). Glen, please apply this patch to NSPR and see if it fixes the crash. If it does, then undo patch v2 (attachment 184373 [details] [diff] [review]) and try again.
Glen, Isn't JSS supposed to provide a thread-safe initialization on top of NSS/NSPR init , using Java locking semantics ? If so, then the NSPR non thread-safety couldn't be the root cause .
I tried pthread_once to initialize NSPR and I get the same stack trace. I am using the debug bits. JSS will only init NSPR/NSS once. The init method is synchronized so, as Julien states the NSPR thread initialization should not be an issue with JSS. Wan-Teh I will send you the hs_err_pid log in a separate email.
Glen, please give this patch a try.
Attachment #184373 - Attachment is obsolete: true
NSPR, v3 patch works, I ran the test a 100 times with no error. Will you be able to check this into NSPR 4.6, if yes, when do you expect to RTM?
Component: Library → NSPR
Product: JSS → NSPR
Version: 4.0 → 4.6
Assignee: glen.beasley → wtchang
Status: ASSIGNED → NEW
QA Contact: glen.beasley
The only difference from NSPR patch v3 is that in PR_Cleanup, we set the thread-specific data for the PRThread structure to NULL after we destroy the PRThread structure (using _pt_thread_death).
Attachment #184527 - Attachment is obsolete: true
Comment on attachment 184543 [details] [diff] [review] Patch for NSPR, v3.1 I haven't started the QA of NSPR 4.6, so it is possible to put this in NSPR 4.6. Could you and Julien review this patch? Let me explain what this patch does. NSPR (pthreads-based version) maintains a doubly-linked list of all the threads known to NSPR. Each node on this list has a prev pointer and a next pointer. pt_book.first points to the head of this list, and pt_book.last points to the tail of this list. This patch does two things. 1. It allows the "primordial" thread to terminate early, while NSPR is still in use. Here the primordial thread is not the thread that runs the main() program, but the thread that initialized NSPR. This thread's PRThread structure has the PT_THREAD_PRIMORD flag. So in _pt_thread_death, we also remove the PRThread structure for the "primordial" thread from the doubly-linked list. 2, Once we allow #1 to happen, one invariant no longer hold. Before, the doubly-linked list is never empty because the "primordial" thread's PRThread structure is always on the list. Now the list may be empty, which means pt_book.first and pt_book.last may be NULL. So our code needs to handle this new possibility. PR_Cleanup also needs to call _pt_thread_death on the current thread *before* it destroys the pt_book.ml lock, because _pt_thread_death now removes the "primordial" thread from the list, so the lock (pt_book.ml) that protects the list cannot be destroyed before the _pt_thread_death call.
Attachment #184543 - Flags: review?(glen.beasley) → review+
Comment on attachment 184543 [details] [diff] [review] Patch for NSPR, v3.1 I checked in this patch on the NSPR trunk for NSPR 4.6. This patch needs to be carefully reviewed and tested. This type of linked list code often has bugs even though it looks simple.
Marked the bug fixed. NSPR 4.6 is released today.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → 4.6
Version: 4.6 → 4.5.1
Updating the description to reflect the fix.
Summary: new method CryptoManager.loadNSPR() needed → NSPR should allow initialization in a thread other than the primordial one
Comment on attachment 184543 [details] [diff] [review] Patch for NSPR, v3.1 Wan-Teh, The patch v3.1 looks correct. I would add some assertions for increased confidence. In pt_root, when adding the thread to the list, if pt_book.last is NULL, assert that pt_book.first is NULL as well before reassigning pt_book.first. Similarly, in pt_thread_death, you may want to always assert that pt_book.first is non-NULL.
Attachment #184543 - Flags: superreview?(julien.pierre.bugs) → superreview+
You need to log in before you can comment on or make changes to this bug.