Closed Bug 294955 Opened 19 years ago Closed 19 years ago

NSPR should allow initialization in a thread other than the primordial one

Categories

(NSPR :: NSPR, defect)

4.5.1
Sun
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: glenbeasley, Assigned: wtc)

Details

Attachments

(2 files, 4 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7.5) Gecko/20041109 Firefox/1.0
Build Identifier: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7.5) Gecko/20041109 Firefox/1.0

NSPR by design needs to be initialized on the primordial thread. If is not 
initialized on the primoridial thread unpredictable crashes can occur.  
Currently Java applications need to initialize JSS (which inits NSPR/NSS) in the
main method of the java app. 

A new method, loadNSPR(), is required that will allow NSPR (NSS/JSS libraries)
to be loaded in the main primordial thread of a java application, and then the
java application can choose to initialize JSS/NSS if needed in a separate thread.

Reproducible: Sometimes

Steps to Reproduce:
1.load JSS app that does not init JSS in the main method
2.run app several times, at some point JVM will crash. 
3.

Actual Results:  
A DESCRIPTION OF THE PROBLEM :
#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0xb74f8311, pid=10216, tid=2630290352
#
# Java VM: Java HotSpot(TM) Server VM (1.5.0_01-b08 mixed mode)
# Problematic frame:
# C  [libc.so.6+0x72311]
#
# An error report file with more information is saved as hs_err_pid10216.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

Expected Results:  
loadNSPR() will allow apps that want to use JSS, to not have to init JSS in 
the main primordial thread.
Assignee: wtchang → glen.beasley
Status: NEW → ASSIGNED
Attachment #184135 - Flags: superreview?(wtchang)
Attachment #184135 - Flags: review?(saul.edwards.bugs)
Attachment #184135 - Flags: review?(saul.edwards.bugs) → review+
Do you know exactly what caused the crash?  It is
better to eliminate this NSPR requirement if possible.
I'm also wondering why the apps don't want to initialize
JSS on the primordial thread but can initialize NSPR on
the primordial thread.  Could you give a scenario where
this is necessary?
The case is a product wishes to only init JSS, if and when they have 
need to later on in there multi-thread based app. 
We've shown that if they init JSS in there main, that this problem
would not occur, but they would rather just init NSPR, and then init 
JSS. 

here is a stack trace

#0  0xb75c88c1 in __waitpid_nocancel () from /lib/tls/libpthread.so.0
#1  0xb6ebb76a in VMError::fork_and_exec () from
/usr/jdk1.5.0_04/jre/lib/i386/server/libjvm.so
#2  0xb6ebb8b5 in VMError::show_message_box () from
/usr/jdk1.5.0_04/jre/lib/i386/server/libjvm.so
#3  0xb6ebb51a in VMError::report_and_die () from
/usr/jdk1.5.0_04/jre/lib/i386/server/libjvm.so
#4  0xb6df5bda in JVM_handle_linux_signal () from
/usr/jdk1.5.0_04/jre/lib/i386/server/libjvm.so
#5  0xb6df3164 in signalHandler () from
/usr/jdk1.5.0_04/jre/lib/i386/server/libjvm.so
#6  <signal handler called>
#7  0x86c38d63 in pt_AttachThread () from /opt/sun/private/lib/libnspr4.so
#8  0x86c3926c in PR_GetCurrentThread () from /opt/sun/private/lib/libnspr4.so
#9  0x86c2bd68 in PR_SetError () from /opt/sun/private/lib/libnspr4.so
#10 0x86dc3952 in PORT_SetError () from /opt/sun/private/lib/libnss3.so
#11 0x86d9eafc in PK11_GetBestSlotMultiple () from /opt/sun/private/lib/libnss3.so
#12 0x86d9ec8a in PK11_GetBestSlot () from /opt/sun/private/lib/libnss3.so
#13 0x86da86a8 in PK11_CreateDigestContext () from /opt/sun/private/lib/libnss3.so
#14 0x86def6af in Java_org_mozilla_jss_pkcs11_PK11MessageDigest_initDigest ()
Attached patch Patch for NSPR (obsolete) — Splinter Review
Glen, could you try this NSPR patch and see if it
allows you to initialize NSPR on any thread?
Wan-Teh,

The choice of thread for the NSPR initialization wasn't the only reason we
wanted to expose PR_Init through JNI . The applications also couldn't ensure
there would be only a single thread doing the initialization . Allowing NSPR
init in any thread wouldn't help them unless NSPR is also made thread-safe,
which it is currently not. I'm not sure if it could even be done on all
platforms, but it would certainly have a negative performance impact on every
NSPR call .
NSS initialization isn't thread-safe, either.  Why don't
you also need to initialize NSS in the main method of the
Java app?
Good point. Perhaps the issues are unrelated, but I know they came up together
in the discussion with the application team. I'll let Glen handle it and test
your patch.
not that it's likely to work well, but should you expose PR_Cleanup? :)
Attached patch Patch for NSPR, v2 (obsolete) — Splinter Review
Glen, please try this patch.  Sorry I didn't test the
previous patch.
Attachment #184149 - Attachment is obsolete: true
Wan-Teh,

What about the platforms that don't use pthreads, such as Windows ? Do they have
any dependency on NSPR being initialized in the primordial thread ?
For the "WINNT" configuration of NSPR, this is such a
requirement in NSPR 4.1.x and earlier because the
primordial thread is used to process the I/O completion
port.  I fixed that in NSPR 4.2.  So you should be okay.
Of course we need to test it and fix any problems we
find.

The requirement is not that NSPR must be initialized on
the primordial thread, but that the thread that NSPR was
initialized on must not terminate before all uses of NSPR
are done.  We should try to eliminate this requirement.
I tested the 2nd patch and it fails every time nspr is not initialized on the
primordial thread. 
#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0x9af88957, pid=13909, tid=2601515952
#
# Java VM: Java HotSpot(TM) Server VM (1.5.0_01-b08 mixed mode)
# Problematic frame:
# C  [libnspr4.so+0x30957]
#
# An error report file with more information is saved as hs_err_pid13909.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#
Glen, could you get a stack trace like the one in comment 4?
C  [libnspr4.so+0x30957]
C  [libnspr4.so+0x31162]  PR_GetCurrentThread+0x42
C  [libnspr4.so+0x1d192]  PR_SetError+0x18
C  [libnss3.so+0x50952]  PORT_SetError+0x1e
C  [libnss3.so+0x2bafc]  PK11_GetBestSlotMultiple+0x7c
C  [libnss3.so+0x2bc8a]  PK11_GetBestSlot+0x22
C  [libnss3.so+0x356a8]  PK11_CreateDigestContext+0x2c
C  [libjss3.so+0x112de] 
Java_org_mozilla_jss_pkcs11_PK11MessageDigest_initDigest+0x5a
Glen, can you use a debug libnspr4.so build so we have
more information in the stack trace?

I'm beginning to suspect that the crash is caused by
the NSPR initialization thread safety problem Julien
mentioned in comment 6.  I will attach an experimental
patch next that will help us determine if this is the
case.
This patch uses pthread_once to call the actual NSPR
initialization routine (_PR_InitStuff).  Glen, please
apply this patch to NSPR and see if it fixes the crash.
If it does, then undo patch v2 (attachment 184373 [details] [diff] [review]) and
try again.
Glen,

Isn't JSS supposed to provide a thread-safe initialization on top of NSS/NSPR
init , using Java locking semantics ? If so, then the NSPR non thread-safety
couldn't be the root cause .
I tried pthread_once to initialize NSPR and I get the same 
stack trace. I am using the debug bits. 

JSS will only init NSPR/NSS once. The init method is synchronized
so, as Julien states the NSPR thread initialization should not 
be an issue with JSS. Wan-Teh I will send you the hs_err_pid log
in a separate email. 
Attached patch Patch for NSPR, v3 (obsolete) — Splinter Review
Glen, please give this patch a try.
Attachment #184373 - Attachment is obsolete: true
NSPR, v3 patch works, I ran the test a 100 times with no error. 
Will you be able to check this into NSPR 4.6, if yes, when do 
you expect to RTM?
Component: Library → NSPR
Product: JSS → NSPR
Version: 4.0 → 4.6
Assignee: glen.beasley → wtchang
Status: ASSIGNED → NEW
QA Contact: glen.beasley
The only difference from NSPR patch v3 is that in PR_Cleanup,
we set the thread-specific data for the PRThread structure
to NULL after we destroy the PRThread structure (using
_pt_thread_death).
Attachment #184527 - Attachment is obsolete: true
Attachment #184458 - Attachment is obsolete: true
Comment on attachment 184543 [details] [diff] [review]
Patch for NSPR, v3.1

I haven't started the QA of NSPR 4.6, so it is possible
to put this in NSPR 4.6.

Could you and Julien review this patch?  Let me explain
what this patch does.

NSPR (pthreads-based version) maintains a doubly-linked
list of all the threads known to NSPR.	Each node on this
list has a prev pointer and a next pointer.  pt_book.first
points to the head of this list, and pt_book.last points
to the tail of this list.

This patch does two things.

1. It allows the "primordial" thread to terminate early,
while NSPR is still in use.  Here the primordial thread
is not the thread that runs the main() program, but the
thread that initialized NSPR.  This thread's PRThread
structure has the PT_THREAD_PRIMORD flag.

So in _pt_thread_death, we also remove the PRThread structure
for the "primordial" thread from the doubly-linked list.

2, Once we allow #1 to happen, one invariant no longer
hold.  Before, the doubly-linked list is never empty
because the "primordial" thread's PRThread structure is
always on the list.  Now the list may be empty, which means
pt_book.first and pt_book.last may be NULL.  So our code
needs to handle this new possibility.

PR_Cleanup also needs to call _pt_thread_death on the
current thread *before* it destroys the pt_book.ml lock,
because _pt_thread_death now removes the "primordial"
thread from the list, so the lock (pt_book.ml) that
protects the list cannot be destroyed before the
_pt_thread_death call.
Attachment #184543 - Flags: superreview?(julien.pierre.bugs)
Attachment #184543 - Flags: review?(glen.beasley)
Attachment #184543 - Flags: review?(glen.beasley) → review+
Comment on attachment 184543 [details] [diff] [review]
Patch for NSPR, v3.1

I checked in this patch on the NSPR trunk for
NSPR 4.6.

This patch needs to be carefully reviewed and tested.
This type of linked list code often has bugs even
though it looks simple.
Marked the bug fixed.  NSPR 4.6 is released today.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Target Milestone: --- → 4.6
Version: 4.6 → 4.5.1
Updating the description to reflect the fix.
Summary: new method CryptoManager.loadNSPR() needed → NSPR should allow initialization in a thread other than the primordial one
Comment on attachment 184543 [details] [diff] [review]
Patch for NSPR, v3.1

Wan-Teh,

The patch v3.1 looks correct. I would add some assertions for increased
confidence.

In pt_root, when adding the thread to the list, if pt_book.last is NULL, assert
that pt_book.first is NULL as well before reassigning pt_book.first.

Similarly, in pt_thread_death, you may want to always assert that pt_book.first
is non-NULL.
Attachment #184543 - Flags: superreview?(julien.pierre.bugs) → superreview+
Attachment #184135 - Flags: superreview?(wtchang)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: