Last Comment Bug 379550 - Proposing threading safety patch for stable branch(es)
: Proposing threading safety patch for stable branch(es)
: crash, fixed1.8.0.13, fixed1.8.1.5
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: 1.8 Branch
: All All
: -- normal (vote)
: ---
Assigned To: Kai Engert (:kaie)
: Nathan Froyd [:froydnj]
Depends on: 362768
Blocks: 383977
  Show dependency treegraph
Reported: 2007-05-02 19:02 PDT by Kai Engert (:kaie)
Modified: 2007-07-09 10:56 PDT (History)
5 users (show)
dveditz: blocking1.8.1.5+
dveditz: blocking1.8.0.13+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

Patch v1 (1.16 KB, patch)
2007-05-02 19:04 PDT, Kai Engert (:kaie)
wtc: review-
Details | Diff | Splinter Review
Patch v2 (1.30 KB, patch)
2007-05-07 12:12 PDT, Kai Engert (:kaie)
wtc: review+
darin.moz: superreview+
dveditz: approval1.8.1.5+
dveditz: approval1.8.0.13+
Details | Diff | Splinter Review

Description Kai Engert (:kaie) 2007-05-02 19:02:06 PDT
From mozilla/xpcom/threads/nsThread.cpp
function nsThread::Shutdown()

  // XXX nspr doesn't seem to be calling the main thread's destructor
  // callback, so let's help it out:

In bug 362768 we added code to the most recent versions of NSPR that makes the above workaround unnecessary.

In fact, with the code from bug 362768 in place, the above workaround will cause a crash, because a cleanup function gets called twice.

As NSPR is considered a system library on some Linux distributions, which may incorporate the above bugfix, it is proposed to add this patch to the stable Mozilla branch(es).

This is also a good preparation, should Mozilla on the stable branch ever upgrade to an NSPR version that contains the patch from bug 362768.

The patch we are proposing is safe, see comment in attached patch.
Comment 1 Kai Engert (:kaie) 2007-05-02 19:04:36 PDT
Created attachment 263543 [details] [diff] [review]
Patch v1
Comment 2 Wan-Teh Chang 2007-05-02 20:20:40 PDT
Comment on attachment 263543 [details] [diff] [review]
Patch v1

Kai, you forgot to remove the nsThread::Exit call.
Comment 3 Kai Engert (:kaie) 2007-05-02 21:17:11 PDT
Wan-Teh, I'm confused.

I'm proposing to land a patch that will work with both old and newer NSPR.
Comment 4 Wan-Teh Chang 2007-05-03 07:13:12 PDT
Unlike pthreads' pthread_setspecific, NSPR's PR_SetThreadPrivate
calls the destructor on the old thread private data before setting
the new thread private data.  In this case, the old thread private
data is gMainThread and the destructor is nsThread::Exit, so

+        PR_SetThreadPrivate(kIThreadSelfIndex, NULL);

is equivalent to

+        nsThread::Exit(NS_STATIC_CAST(nsThread*, gMainThread));
+        set thread private data at kIThreadSelfIndex to NULL;

Therefore you must remove the original nsThread::Exit call, otherwise
you'd be calling it twice, which would be a double-free error.

So the net effect of my proposed change is that we set the thread
private data at kIThreadSelfIndex to NULL after we destroy the old
thread private data (gMainThread).
- The old NSPR doesn't destroy thread private data of the main thread,
  so this change won't affect it.
- With the new NSPR, the NULL thread private data will prevent NSPR
  from calling the destructor.
So my proposed change will work with both the old and new NSPR.
Comment 5 Wan-Teh Chang 2007-05-03 09:26:00 PDT
Comment on attachment 263543 [details] [diff] [review]
Patch v1

>+        // but on older versions of NSPR it will not get called.

Add "unless we call PR_Cleanup" at the end of this sentence.

>+        // - set the callback pointer to NULL to ensure it will not get
>+        //   called by NSPR
>+        // - call the function ourselves, in case we're using an older
>+        //   NSPR library

Change this to:

          // - call the function ourselves
          // - set the data pointer to NULL to ensure the function will
          //   not get called again by NSPR
          // The PR_SetThreadPrivate call does both of these.
Comment 6 Kai Engert (:kaie) 2007-05-07 12:12:26 PDT
Created attachment 264031 [details] [diff] [review]
Patch v2

Wan-Teh, thanks a lot for clarifying.
Here is a new patch.
Comment 7 Christopher Aillon (sabbatical, not receiving bugmail) 2007-05-07 13:50:37 PDT
Nominating as blocking.  This is causing a ton of crashes in 64bit geckos due to a change in NSS. has something like 20-30 dupes and it's been reported a few times in RH bugzilla as well.
Comment 8 Wan-Teh Chang 2007-05-07 13:58:13 PDT
Comment on attachment 264031 [details] [diff] [review]
Patch v2

r=wtc.  The comments below document my understanding of the code
in question.

This is a safe change.  It does assume that nsThread::Shutdown
is called by the main thread.

What this patch does is to remove a dangling pointer.  If we
call the destructor (nsThread::Exit) on gMainThread, we should
set the thread private data to NULL, otherwise the thread private
data will become a dangling pointer.

The nsThread object pointed to by gMainThread has two references:
- one reference is gMainThread itself,
- the other reference is the thread private data.

The thread private data destructor releases one reference.
nsThread::Shutdown releases the second reference (gMainThread).
Comment 9 Kai Engert (:kaie) 2007-05-07 15:09:08 PDT
Comment on attachment 264031 [details] [diff] [review]
Patch v2

Asking for approval to land this on stable branches.
Comment 10 Daniel Veditz [:dveditz] 2007-05-08 10:57:02 PDT
If I understand you correctly, Mozilla-released binaries don't need this fix, but distros who build with system NSPR might?

Changing threading code well into the release-candidate process (no more time for full testing, just quick breadth and spot checks) is a scary proposition to us. Can the distros just take this patch themselves for this release and we'll get it early next time?
Comment 11 Daniel Veditz [:dveditz] 2007-05-08 11:09:03 PDT
Comment on attachment 264031 [details] [diff] [review]
Patch v2

Threading changes scare us, want a second look from someone who knows the xpcom code side of things.
Comment 12 Wan-Teh Chang 2007-05-08 11:27:06 PDT
Dan, I agree this patch should wait until the next release.
The super reviewer should verify the patch's assumption that
nsThread::Shutdown is always called by the main thread.
Comment 13 Darin Fisher 2007-05-08 23:01:36 PDT
Comment on attachment 264031 [details] [diff] [review]
Patch v2

nsThread::Shutdown is called from NS_ShutdownXPCOM, which is only called on the main thread.  So, looks good to me!
Comment 14 Daniel Veditz [:dveditz] 2007-05-10 08:23:10 PDT
Not respinning for this, distros using a newer NSPR will have to manually include this patch until next time.
Comment 15 Daniel Veditz [:dveditz] 2007-06-22 10:45:54 PDT
Comment on attachment 264031 [details] [diff] [review]
Patch v2

approved for and, a=dveditz for release-drivers
Comment 16 Kai Engert (:kaie) 2007-07-09 10:56:04 PDT
checked in to both branches

Note You need to log in before you can comment on or make changes to this bug.