Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Solaris native thread support: GetThreadPrivate fails if thread not created via NSPR

VERIFIED FIXED in 4.4.1

Status

NSPR
NSPR
VERIFIED FIXED
14 years ago
14 years ago

People

(Reporter: Gerard Roos, Assigned: Wan-Teh Chang)

Tracking

4.4.1
Sun
Solaris

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 4 obsolete attachments)

(Reporter)

Description

14 years ago
User-Agent:       Mozilla/4.8 [en] (X11; U; SunOS 5.8 sun4u)
Build Identifier: NSPR distribution v4.3

With solaris native thread support and with a thread created 'behind the scenes'
of NSPR:
the thread's NSPR data obtained by _MD_CURRENT_THREAD() (i.e.
_pr_current_thread_tls()) is NULL, producing a SEGV in
nspr-4.3/mozilla/nsprpub/pr/src/threads/prtpd.c:228 (dereferencing NULL pointer
'self').

Reproducible: Always

Steps to Reproduce:
1. Build NSPR (configure --with-native-threads)
2. create a thread yourself without using NSPR
3. in that thread, call PR_GetCurrentThread(), which returns NULL (API doc
(out-dated?) says "Never returns NULL").
4. PR_GetCurrentThread() is also used in PR_GetThreadPrivate() and is assumed to
never return NULL.
Actual Results:  
NULL pointer dereference.

Expected Results:  
Lazy initialization of NSPR-specific thread data on that thread upon access
(like it is done in the Windows (and probably other platforms as well)
implementation if I am not mistaken...)
(Reporter)

Updated

14 years ago
Version: other → 4.3
(Reporter)

Comment 1

14 years ago
Created attachment 128836 [details] [diff] [review]
Lazy init of NSPR-specific thread data

Until now this patch has been working for me.

However, I am not fully certain that _PR_InitThreads() is intended (and thus
legal) to be called at any moment later than NSPR initialization.

Furthermore, I suspect---I do not yet fully understand every line of code,
sorry to provide a patch anyhow---that by calling _PR_InitThreads() after NSPR
initialization, I am doing more work than is actually required to initialize
the current thread's missing NSPR-specific data structures (but the extra
work--if any--does not seem to hurt so far).
(Assignee)

Comment 2

14 years ago
Thanks for the bug report and the patch.  Your patch
is not the right way to solve the problem.  But you
don't need to solve this problem in the first place.
You should build NSPR without any configure option.
NSPR uses the pthread interface by default on Solaris.
The pthread interface is merely another interface to
the native thread engine.

The pthread-based NSPR works with threads not created
via NSPR.  Please let me know if you must use the
Solaris native threads (the thr_ interface) based NSPR.
That implementation is deprecated and I am not planning
to support it.  If you must use that implementation, I
can give you some pointers on the right way to solve the
problem.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
(Reporter)

Comment 3

14 years ago
Thanks for your answer. I have been using NSPR with pthread support (default
build) in the first place for building my shared library.  However, 'If a POSIX
threads  program  calls  fork(2),  it  implicitly  calls  fork1(2)' (man
pthreads). This change in forking behavour broke the (existing) application into
which I was loading my library. I thus had to build the library without pthread
dependency. This problem led to the 'solution' proposed above.
I would be grateful could you give me hints for a better (i.e. correct)
solution.
(Assignee)

Comment 4

14 years ago
_pr_current_thread_tls is the right function to modify.
You want to make it look like the _MD_CURRENT_THREAD
function in mozilla/nsprpub/pr/src/md/windows/w95thred.c.

    thr_getspecific(threadid_key, (void **)&ret);

    if (NULL == thread) {
        thread = _PRI_AttachThread(
            PR_USER_THREAD, PR_PRIORITY_NORMAL, NULL, 0);
    }
    PR_ASSERT(thread != NULL);

    return thread;

Then, you need to set up a destructor for the thread
specific data key 'threadid_key', in which you can free
the PRThread structure for the attached threads.  The
destructor needs to be registered using the THR_KEYCREATE
call in mozilla/nsprpub/pr/src/md/unix/solaris.c (we
are passing NULL right now).  The equivalent destructor
for the pthreads-based implementation is _pt_thread_death
in mozilla/nsprpub/pr/src/pthreads/ptthread.c.  You need
to write the equivalent code for solaris.c (and perhaps
some files in mozilla/nsprpub/pr/src/threads/combined/).

Is it possible for you to modify the existing application
so that it works with fork1(2) too?  fork1(2) is the
portable fork semantics for POSIX Threads.
(Assignee)

Comment 5

14 years ago
Regarding the destructor for 'threadid_key':

It is possible that you just need to register
it using the THR_KEYCREATE call in solaris.c,
and it would look similar to the code under
the DLL_DETACH_THREAD case in
mozilla/nsprpub/pr/src/md/windows/w95dllmain.c.
Something like this:

static void
threadid_key_destructor(void *value)
{
    PRThread *me = (PRThread *)value;
    PR_ASSERT((me != NULL) && (me->flags & _PR_ATTACHED));
    _PRI_DetachThread();
}

Don't look at _pt_thread_death in ptthread.c.
(Reporter)

Comment 6

14 years ago
Created attachment 129589 [details] [diff] [review]
#define _MD_GET_ATTACHED_THREAD() _pr_attached_thread_tls()
Attachment #128836 - Attachment is obsolete: true
(Reporter)

Comment 7

14 years ago
Created attachment 129590 [details] [diff] [review]
leave _MD_GET_ATTACHED_THREAD undefined on Solaris

_MD_GET_ATTACHED_THREAD is defined in _solaris.h
(Reporter)

Comment 8

14 years ago
Created attachment 129591 [details] [diff] [review]
register threadid_key destructor and correct lazy init of thread data

Since _MD_GET_ATTACHED_THREAD (_pr_current_thread_tls on Solaris) is called
from within _PRI_Attach_Thread, a second function _pr_attached_thread_tls() is
required to prevent an endless loop. It simply returns the attached thread data
(as did _pr_current_thread_tls() before), whether valid or NULL.
(Reporter)

Comment 9

14 years ago
The last three patches together (one patchfile per file, sorry for not providing
a single patchfile) should solve the problem as requested/expected, i.e. tests
done so far have shown correct behaviour and have completed successfully.
Wan-Teh Chang, any comments on the patches by Gerard?
(Assignee)

Updated

14 years ago
Attachment #129589 - Flags: review+
(Assignee)

Comment 11

14 years ago
Comment on attachment 129590 [details] [diff] [review]
leave _MD_GET_ATTACHED_THREAD undefined on Solaris

This patch needs a minor tweak:

    !defined(SOLARIS)

should be replaced by

    !(defined(SOLARIS) && defined(_PR_GLOBAL_THREADS_ONLY))

because we can also build NSPR on Solaris with
_PR_LOCAL_THREADS_ONLY defined.
Attachment #129590 - Flags: review-
(Assignee)

Comment 12

14 years ago
Comment on attachment 129591 [details] [diff] [review]
register threadid_key destructor and correct lazy init of thread data

This patch has a subtle bug, which I discovered only
after running an NSPR test (foreign.c, modified to
create Solaris native threads) inside a debugger.

It turns out that when Solaris destroys thread
specific data, it first sets the thread specific
data to NULL and then passes the value to the
destructor.

Consider our destructor.

  static void
  threadid_key_destructor(void *value)
  {
      PRThread *me = (PRThread *)value;
      PR_ASSERT((me != NULL) && (me->flags & _PR_ATTACHED));
      _PRI_DetachThread();
  }

When it is invoked, Solaris sets the data for threadid_key
to NULL, and passes the value to it.  But we are not passing
this value to _PRI_DetachThread()!  _PRI_DetachThread is
supposed to get that value with a _PR_MD_CURRENT_THREAD() call,
but _PR_MD_CURRENT_THREAD() will see that the current thread
is NULL and attach the thread again.  So we end up detaching
the newly attached PRThread structure, and the old PRThread
structure is leaked.

The right solution is to modify _PRI_DetachThread() to take
a PRThread * parameter.  But this will require modify all
the callers of _PRI_DetachThread().

A simpler workaround is for our destructor to set the current
thread before calling _PRI_DetachThread():

  static void
  threadid_key_destructor(void *value)
  {
      PRThread *me = (PRThread *)value;
      PR_ASSERT((me != NULL) && (me->flags & _PR_ATTACHED));
+     /*
+      * The Solaris thread library sets the thread specific
+      * data (the current thread) to NULL before invoking
+      * the destructor.  We need to restore it to prevent the
+      * _PR_MD_CURRENT_THREAD() call in _PRI_DetachThread()
+      * from attaching the thread again.
+      */
+     _PR_MD_SET_CURRENT_THREAD(me);
      _PRI_DetachThread();
  }

Gerard, could you add this _PR_MD_SET_CURRENT_THREAD(me) call
to your threadid_key_destructor and redo your tests?
Attachment #129591 - Flags: review-
(Assignee)

Comment 13

14 years ago
Created attachment 131201 [details] [diff] [review]
Proposed patch

This patch subsumes Gerard's three patches.  It also includes
modifications to the foreign.c and provider.c tests to create
Solaris native threads.
(Assignee)

Updated

14 years ago
Attachment #129589 - Attachment is obsolete: true
Attachment #129590 - Attachment is obsolete: true
Attachment #129591 - Attachment is obsolete: true
(Assignee)

Comment 14

14 years ago
Patch checked into the NSPR trunk.
Status: ASSIGNED → RESOLVED
Last Resolved: 14 years ago
Resolution: --- → FIXED
Whiteboard: [4.5]
Target Milestone: --- → 4.4.1
(Reporter)

Comment 15

14 years ago
hi, sorry for my late reply. I have run into a bug in threadid_key_destructor() 
lately.

The assertion ((me != NULL) && (me->flags & _PR_ATTACHED)) as proposed in #5 is 
too restrictive: the PRThreads flag can be _PR_PRIMORDIAL. During implicit 
initialization (_PR_ImplicitInitialization -> _PR_InitStuff -> _PR_InitThreads), 
InitThreads sets the thread's flag to _PR_PRIMORDIAL, but never to _PR_ATTACHED.
So, the corrected destructor function is:

static void
threadid_key_destructor(void *value)
{
    PRThread *me = (PRThread *)value;
    PR_ASSERT(me != NULL);
    /* the thread could be PRIMORDIAL (thus not ATTACHED) */
    if (me->flags & _PR_ATTACHED) {
        /*
         * The Solaris thread library sets the thread specific
         * data (the current thread) to NULL before invoking
         * the destructor.  We need to restore it to prevent the
         * _PR_MD_CURRENT_THREAD() call in _PRI_DetachThread()
         * from attaching the thread again.
         */
        _PR_MD_SET_CURRENT_THREAD(me);
        _PRI_DetachThread();
    }
}

If the thread is not _PR_ATTACHED, nothing is done. This behaviour is similar to 
the one found in e.g. irix.c: irix_detach_sproc().

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 16

14 years ago
Created attachment 133896 [details] [diff] [review]
Supplemental patch

You are right.	Here is your contribution in the form of
a patch.
(Assignee)

Comment 17

14 years ago
Supplemental patch checked into the NSPR trunk (NSPR 4.5)
and NSPR_4_4_BRANCH (NSPR 4.4.1.).

Please mark the bug VERIFIED if the code is now working to
your satisfaction.
Status: REOPENED → RESOLVED
Last Resolved: 14 years ago14 years ago
Resolution: --- → FIXED
(Reporter)

Comment 18

14 years ago
all tests passed, works so far.
Status: RESOLVED → VERIFIED
(Assignee)

Updated

14 years ago
Whiteboard: [4.5]
You need to log in before you can comment on or make changes to this bug.