Crash on exit caused by thread local storage cleanup [@ PL_DHashTableOperate]

RESOLVED FIXED in 4.7

Status

defect
RESOLVED FIXED
12 years ago
8 years ago

People

(Reporter: kaie, Assigned: wtc)

Tracking

({crash})

Dependency tree / graph
Bug Flags:
wanted1.8.1.x -

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [sg:critical?] double free?, crash signature)

Attachments

(1 attachment, 1 obsolete attachment)

We recently changed NSPR to clean up its thread local storage on exit (bug 362768). This patch has not yet been picked up by Mozilla, but it will be soon.

We also fixed a bug in Mozilla XPCOM, in order to avoid a double free (bug 379550). This patch has not yet been picked up by Mozilla, but it will be soon.


We got reports that Mozilla software that uses both patches will crash on exit.

(Please see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=242370 and potentially https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238770 )


The call stack given in Red Hat bug 242370 suggests there might be a double free bug in XPConnect. See https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=242370#c11
Any relationship to bug 335018?  I'd guess not; it seems like this stuff isn't something that's allowed to run this late in shutdown, so we need to make sure it's cleaned up earlier so that the TLS destructors can't run later.
Summary: Crash on exit caused by thread local storage cleanup → Crash on exit caused by thread local storage cleanup [@ PL_DHashTableOperate]
I'm now convinced that it is unsafe for NSPR's shared library
finalization function to call back to the thread private data
destructors.  First, the libraries in which the destructors are
defined in may have been shut down or unloaded.  Second, this
breaks backward compatibility, because NSPR did not call the
TPD destructors when it is unloaded before.

This patch adds a new version of _pt_thread_death, called
_pt_thread_death_no_tpd_destructors, that is the same as
_pt_thread_death except that it doesn't call the TPD
destructors.  Note that this patch changes _pt_thread_death
to call _PR_DestroyThreadPrivate first, as opposed to after
removing the thread from the linked list in 'pt_book'.  I
can't think of a reason why the ordering would matter.

Kai, please review and test this patch.  Please also go
through the original test for which we added _PR_Fini in
the first place.  Thanks!
Attachment #268128 - Flags: review?(kengert)
Assignee: nobody → wtc
Component: XPConnect → NSPR
Product: Core → NSPR
QA Contact: xpconnect → nspr
Target Milestone: --- → 4.7
Version: 1.8 Branch → 4.7
If you can suggest a better name than _pt_thread_death_no_tpd_destructors,
I'd appreciate it.
Status: NEW → ASSIGNED
(In reply to comment #2)
> This patch adds a new version of _pt_thread_death, called
> _pt_thread_death_no_tpd_destructors, that is the same as
> _pt_thread_death except that it doesn't call the TPD
> destructors.  Note that this patch changes _pt_thread_death
> to call _PR_DestroyThreadPrivate first, as opposed to after
> removing the thread from the linked list in 'pt_book'.  I
> can't think of a reason why the ordering would matter.

This seems to be a very critical piece of code, and I propose we try our best to minimize risk. I propose we avoid the risk of reordering the cleanup steps.

I'm attaching a slightly different patch that will keep the order of things.
Posted patch Patch v1Splinter Review
Wan-Teh, would you be ok with this slightly changed version of your patch?

I convert the existing implementation into a function that takes a parameter, whether or not to call TPD.

The existing function _pt_thread_death will pass PR_TRUE and therefore continue to call the TPD, while keeping the current order of events.

The _PR_Fini function will pass PR_FALSE and the TPD will not get called.


I will now continue with your other comments.
Wan-Teh, I talked to Ray Strode.
He is still able to reproduce the original bug.
He tested attachment 270044 [details] [diff] [review] and and it works for him.
Comment on attachment 268128 [details] [diff] [review]
Add a version of _pt_thread_death that doesn't call TPD destructors

So, I'm fine with this patch in general, but I would like to avoid the risk of re-ordering calls. This is the only reason I'm giving this patch a r-

I would like to request that we keep the order of calls, and I went ahead and implemented that in the other patch (which was tested to work).
Attachment #268128 - Flags: review?(kengert) → review-
Comment on attachment 270044 [details] [diff] [review]
Patch v1

Wan-Teh, can you please review?
Attachment #270044 - Flags: review?(wtc)
Comment on attachment 270044 [details] [diff] [review]
Patch v1

r=wtc.
Attachment #270044 - Flags: review?(wtc) → review+
Checked in to nspr head:
Checking in pr/src/pthreads/ptthread.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v  <--  ptthread.c
new revision: 3.75; previous revision: 3.74
done

Checked in to NSPRPUB_PRE_4_2_CLIENT_BRANCH:
Checking in pr/src/pthreads/ptthread.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v  <--  ptthread.c
new revision: 3.42.2.21; previous revision: 3.42.2.20
done

Marking fixed
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
This patch will need to land on the 1.8 stable branches as well, right?
Flags: wanted1.8.1.x?
Flags: wanted1.8.0.x?
Whiteboard: [sg:critical?] double free?
Dan, no, you don't need to worry about this bug for the Mozilla
1.8 stable branches because the patch we decided to use in the
end is an NSPR patch.
Dan, this bug does not apply to 1.8.1.x or 1.8.0.x, because they still use NSPR 4.6.

The original (bad) patch from bug 362768 was only added to NSPR 4.7.
This newer fix is only required for NSPR 4.7, only.

Only distributions who explicitly picked up the earlier patch require the new patch.

Only Mozilla 1.9 uses NSPR 4.7 and has already picked up this patch.
Will we ever require the 1.8.1 branch releases to upgrade to a newer NSS (thus dragging in a newer NSPR) like we did with FF1.5 going from NSS 3.10.x to 3.11.x?
If we ever apply NSPR 4.7 to an older branch, we will use a NSPR release. There is no NSPR 4.7 release yet.

Probably at the time when a NSS 3.12 release is made it will be accompanied by a NSPR 4.7 release.

The first NSPR 4.7 release will therefore contain the fix from this bug, and we will be fine.
Attachment #268128 - Attachment is obsolete: true
Group: security
NSPR 4.7 has been released and includes this fix.
Flags: wanted1.8.1.x?
Flags: wanted1.8.1.x-
Flags: wanted1.8.0.x?
Crash Signature: [@ PL_DHashTableOperate]
You need to log in before you can comment on or make changes to this bug.