Last Comment Bug 214411 - Solaris native thread support: GetThreadPrivate fails if thread not created via NSPR
: Solaris native thread support: GetThreadPrivate fails if thread not created v...
Status: VERIFIED FIXED
:
Product: NSPR
Classification: Components
Component: NSPR (show other bugs)
: 4.3
: Sun Solaris
: -- normal (vote)
: 4.4.1
Assigned To: Wan-Teh Chang
: Wan-Teh Chang
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2003-07-30 00:30 PDT by Gerard Roos
Modified: 2003-11-26 16:05 PST (History)
0 users
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Lazy init of NSPR-specific thread data (726 bytes, patch)
2003-07-30 00:48 PDT, Gerard Roos
no flags Details | Diff | Splinter Review
#define _MD_GET_ATTACHED_THREAD() _pr_attached_thread_tls() (795 bytes, patch)
2003-08-11 06:33 PDT, Gerard Roos
wtc: review+
Details | Diff | Splinter Review
leave _MD_GET_ATTACHED_THREAD undefined on Solaris (700 bytes, patch)
2003-08-11 06:35 PDT, Gerard Roos
wtc: review-
Details | Diff | Splinter Review
register threadid_key destructor and correct lazy init of thread data (1.58 KB, patch)
2003-08-11 06:41 PDT, Gerard Roos
wtc: review-
Details | Diff | Splinter Review
Proposed patch (11.46 KB, patch)
2003-09-10 17:20 PDT, Wan-Teh Chang
no flags Details | Diff | Splinter Review
Supplemental patch (1.33 KB, patch)
2003-10-22 17:40 PDT, Wan-Teh Chang
no flags Details | Diff | Splinter Review

Description Gerard Roos 2003-07-30 00:30:54 PDT
User-Agent:       Mozilla/4.8 [en] (X11; U; SunOS 5.8 sun4u)
Build Identifier: NSPR distribution v4.3

With solaris native thread support and with a thread created 'behind the scenes'
of NSPR:
the thread's NSPR data obtained by _MD_CURRENT_THREAD() (i.e.
_pr_current_thread_tls()) is NULL, producing a SEGV in
nspr-4.3/mozilla/nsprpub/pr/src/threads/prtpd.c:228 (dereferencing NULL pointer
'self').

Reproducible: Always

Steps to Reproduce:
1. Build NSPR (configure --with-native-threads)
2. create a thread yourself without using NSPR
3. in that thread, call PR_GetCurrentThread(), which returns NULL (API doc
(out-dated?) says "Never returns NULL").
4. PR_GetCurrentThread() is also used in PR_GetThreadPrivate() and is assumed to
never return NULL.
Actual Results:  
NULL pointer dereference.

Expected Results:  
Lazy initialization of NSPR-specific thread data on that thread upon access
(like it is done in the Windows (and probably other platforms as well)
implementation if I am not mistaken...)
Comment 1 Gerard Roos 2003-07-30 00:48:29 PDT
Created attachment 128836 [details] [diff] [review]
Lazy init of NSPR-specific thread data

Until now this patch has been working for me.

However, I am not fully certain that _PR_InitThreads() is intended (and thus
legal) to be called at any moment later than NSPR initialization.

Furthermore, I suspect---I do not yet fully understand every line of code,
sorry to provide a patch anyhow---that by calling _PR_InitThreads() after NSPR
initialization, I am doing more work than is actually required to initialize
the current thread's missing NSPR-specific data structures (but the extra
work--if any--does not seem to hurt so far).
Comment 2 Wan-Teh Chang 2003-07-30 16:33:31 PDT
Thanks for the bug report and the patch.  Your patch
is not the right way to solve the problem.  But you
don't need to solve this problem in the first place.
You should build NSPR without any configure option.
NSPR uses the pthread interface by default on Solaris.
The pthread interface is merely another interface to
the native thread engine.

The pthread-based NSPR works with threads not created
via NSPR.  Please let me know if you must use the
Solaris native threads (the thr_ interface) based NSPR.
That implementation is deprecated and I am not planning
to support it.  If you must use that implementation, I
can give you some pointers on the right way to solve the
problem.
Comment 3 Gerard Roos 2003-07-31 00:43:46 PDT
Thanks for your answer. I have been using NSPR with pthread support (default
build) in the first place for building my shared library.  However, 'If a POSIX
threads  program  calls  fork(2),  it  implicitly  calls  fork1(2)' (man
pthreads). This change in forking behavour broke the (existing) application into
which I was loading my library. I thus had to build the library without pthread
dependency. This problem led to the 'solution' proposed above.
I would be grateful could you give me hints for a better (i.e. correct)
solution.
Comment 4 Wan-Teh Chang 2003-07-31 16:27:02 PDT
_pr_current_thread_tls is the right function to modify.
You want to make it look like the _MD_CURRENT_THREAD
function in mozilla/nsprpub/pr/src/md/windows/w95thred.c.

    thr_getspecific(threadid_key, (void **)&ret);

    if (NULL == thread) {
        thread = _PRI_AttachThread(
            PR_USER_THREAD, PR_PRIORITY_NORMAL, NULL, 0);
    }
    PR_ASSERT(thread != NULL);

    return thread;

Then, you need to set up a destructor for the thread
specific data key 'threadid_key', in which you can free
the PRThread structure for the attached threads.  The
destructor needs to be registered using the THR_KEYCREATE
call in mozilla/nsprpub/pr/src/md/unix/solaris.c (we
are passing NULL right now).  The equivalent destructor
for the pthreads-based implementation is _pt_thread_death
in mozilla/nsprpub/pr/src/pthreads/ptthread.c.  You need
to write the equivalent code for solaris.c (and perhaps
some files in mozilla/nsprpub/pr/src/threads/combined/).

Is it possible for you to modify the existing application
so that it works with fork1(2) too?  fork1(2) is the
portable fork semantics for POSIX Threads.
Comment 5 Wan-Teh Chang 2003-07-31 18:37:16 PDT
Regarding the destructor for 'threadid_key':

It is possible that you just need to register
it using the THR_KEYCREATE call in solaris.c,
and it would look similar to the code under
the DLL_DETACH_THREAD case in
mozilla/nsprpub/pr/src/md/windows/w95dllmain.c.
Something like this:

static void
threadid_key_destructor(void *value)
{
    PRThread *me = (PRThread *)value;
    PR_ASSERT((me != NULL) && (me->flags & _PR_ATTACHED));
    _PRI_DetachThread();
}

Don't look at _pt_thread_death in ptthread.c.
Comment 6 Gerard Roos 2003-08-11 06:33:30 PDT
Created attachment 129589 [details] [diff] [review]
#define _MD_GET_ATTACHED_THREAD() _pr_attached_thread_tls()
Comment 7 Gerard Roos 2003-08-11 06:35:06 PDT
Created attachment 129590 [details] [diff] [review]
leave _MD_GET_ATTACHED_THREAD undefined on Solaris

_MD_GET_ATTACHED_THREAD is defined in _solaris.h
Comment 8 Gerard Roos 2003-08-11 06:41:11 PDT
Created attachment 129591 [details] [diff] [review]
register threadid_key destructor and correct lazy init of thread data

Since _MD_GET_ATTACHED_THREAD (_pr_current_thread_tls on Solaris) is called
from within _PRI_Attach_Thread, a second function _pr_attached_thread_tls() is
required to prevent an endless loop. It simply returns the attached thread data
(as did _pr_current_thread_tls() before), whether valid or NULL.
Comment 9 Gerard Roos 2003-08-14 00:58:06 PDT
The last three patches together (one patchfile per file, sorry for not providing
a single patchfile) should solve the problem as requested/expected, i.e. tests
done so far have shown correct behaviour and have completed successfully.
Comment 10 thebeastwitheyesthatstared 2003-09-09 08:30:30 PDT
Wan-Teh Chang, any comments on the patches by Gerard?
Comment 11 Wan-Teh Chang 2003-09-10 10:52:33 PDT
Comment on attachment 129590 [details] [diff] [review]
leave _MD_GET_ATTACHED_THREAD undefined on Solaris

This patch needs a minor tweak:

    !defined(SOLARIS)

should be replaced by

    !(defined(SOLARIS) && defined(_PR_GLOBAL_THREADS_ONLY))

because we can also build NSPR on Solaris with
_PR_LOCAL_THREADS_ONLY defined.
Comment 12 Wan-Teh Chang 2003-09-10 11:13:55 PDT
Comment on attachment 129591 [details] [diff] [review]
register threadid_key destructor and correct lazy init of thread data

This patch has a subtle bug, which I discovered only
after running an NSPR test (foreign.c, modified to
create Solaris native threads) inside a debugger.

It turns out that when Solaris destroys thread
specific data, it first sets the thread specific
data to NULL and then passes the value to the
destructor.

Consider our destructor.

  static void
  threadid_key_destructor(void *value)
  {
      PRThread *me = (PRThread *)value;
      PR_ASSERT((me != NULL) && (me->flags & _PR_ATTACHED));
      _PRI_DetachThread();
  }

When it is invoked, Solaris sets the data for threadid_key
to NULL, and passes the value to it.  But we are not passing
this value to _PRI_DetachThread()!  _PRI_DetachThread is
supposed to get that value with a _PR_MD_CURRENT_THREAD() call,
but _PR_MD_CURRENT_THREAD() will see that the current thread
is NULL and attach the thread again.  So we end up detaching
the newly attached PRThread structure, and the old PRThread
structure is leaked.

The right solution is to modify _PRI_DetachThread() to take
a PRThread * parameter.  But this will require modify all
the callers of _PRI_DetachThread().

A simpler workaround is for our destructor to set the current
thread before calling _PRI_DetachThread():

  static void
  threadid_key_destructor(void *value)
  {
      PRThread *me = (PRThread *)value;
      PR_ASSERT((me != NULL) && (me->flags & _PR_ATTACHED));
+     /*
+      * The Solaris thread library sets the thread specific
+      * data (the current thread) to NULL before invoking
+      * the destructor.  We need to restore it to prevent the
+      * _PR_MD_CURRENT_THREAD() call in _PRI_DetachThread()
+      * from attaching the thread again.
+      */
+     _PR_MD_SET_CURRENT_THREAD(me);
      _PRI_DetachThread();
  }

Gerard, could you add this _PR_MD_SET_CURRENT_THREAD(me) call
to your threadid_key_destructor and redo your tests?
Comment 13 Wan-Teh Chang 2003-09-10 17:20:44 PDT
Created attachment 131201 [details] [diff] [review]
Proposed patch

This patch subsumes Gerard's three patches.  It also includes
modifications to the foreign.c and provider.c tests to create
Solaris native threads.
Comment 14 Wan-Teh Chang 2003-09-10 17:29:38 PDT
Patch checked into the NSPR trunk.
Comment 15 Gerard Roos 2003-10-22 07:07:12 PDT
hi, sorry for my late reply. I have run into a bug in threadid_key_destructor() 
lately.

The assertion ((me != NULL) && (me->flags & _PR_ATTACHED)) as proposed in #5 is 
too restrictive: the PRThreads flag can be _PR_PRIMORDIAL. During implicit 
initialization (_PR_ImplicitInitialization -> _PR_InitStuff -> _PR_InitThreads), 
InitThreads sets the thread's flag to _PR_PRIMORDIAL, but never to _PR_ATTACHED.
So, the corrected destructor function is:

static void
threadid_key_destructor(void *value)
{
    PRThread *me = (PRThread *)value;
    PR_ASSERT(me != NULL);
    /* the thread could be PRIMORDIAL (thus not ATTACHED) */
    if (me->flags & _PR_ATTACHED) {
        /*
         * The Solaris thread library sets the thread specific
         * data (the current thread) to NULL before invoking
         * the destructor.  We need to restore it to prevent the
         * _PR_MD_CURRENT_THREAD() call in _PRI_DetachThread()
         * from attaching the thread again.
         */
        _PR_MD_SET_CURRENT_THREAD(me);
        _PRI_DetachThread();
    }
}

If the thread is not _PR_ATTACHED, nothing is done. This behaviour is similar to 
the one found in e.g. irix.c: irix_detach_sproc().

Comment 16 Wan-Teh Chang 2003-10-22 17:40:55 PDT
Created attachment 133896 [details] [diff] [review]
Supplemental patch

You are right.	Here is your contribution in the form of
a patch.
Comment 17 Wan-Teh Chang 2003-10-22 17:49:31 PDT
Supplemental patch checked into the NSPR trunk (NSPR 4.5)
and NSPR_4_4_BRANCH (NSPR 4.4.1.).

Please mark the bug VERIFIED if the code is now working to
your satisfaction.
Comment 18 Gerard Roos 2003-11-04 05:46:08 PST
all tests passed, works so far.

Note You need to log in before you can comment on or make changes to this bug.