Closed Bug 557586 Opened 14 years ago Closed 14 years ago

XPCPerThreadData::~XPCPerThreadData race leading to possible crash

Categories

(Core :: XPConnect, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a5

People

(Reporter: jseward, Assigned: jseward)

Details

Attachments

(1 file)

There's a race in XPCPerThreadData::~XPCPerThreadData that can cause
XPCPerThreadData::gLock to be PR_Destroy'd and nulled out whilst a
second thread is holding it.  In the worst case this could cause an
essentially unreproducible segfault at shutdown.  All assuming the
following analysis is correct, of course.

What follows is a bit of background and an interleaving of execution
which I believe can lead to the failure.

All line numbers pertain to js/src/xpconnect/src/xpcthreadcontext.cpp
(m-c as of today).


BACKGROUND

This code appears to maintain a linked list of XPCPerThreadDatas.
Each such element has a field mNextThread which points to the
next element.  The list is rooted by a global variable

325  XPCPerThreadData* XPCPerThreadData::gThreads        = nsnull;

which is protected by this lock

324  PRLock*           XPCPerThreadData::gLock           = nsnull;

XPCPerThreadData::~XPCPerThreadData() is intended to remove itself
from the linked list |gThreads|.  Then, if the list is empty, |gLock|
is destroyed and nulled out.

However, these actions are not carried out atomically, and it can be
that a second thread T2 entering ~XPCPerThreadData() just after a
previous thread T1 entered it, can cause T1 to destroy and null out
|gLock| while T2 still holds it.


FAILURE INTERLEAVING

Imagine there are two threads, T1 and T2, there are two entries in
|gThreads|, and |gLock| is non-NULL.  Here's the relevant code:

394    // Unlink 'this' from the list of threads.
395    if(gLock)
396    {
397        nsAutoLock lock(gLock);
398        if(gThreads == this)
399            gThreads = mNextThread;
400        else
401        {
402            XPCPerThreadData* cur = gThreads;
403            while(cur)
404            {
405                if(cur->mNextThread == this)
406                {
407                    cur->mNextThread = mNextThread;
408                    break;
409                }
410                cur = cur->mNextThread;
411            }
412        }
413    }
414
415    if(gLock && !gThreads)
416    {
417        PR_DestroyLock(gLock);
418        gLock = nsnull;
419    }


T1: 395 (taken)
    397 (T1 acquires lock)
    398 .. 412 (T1 removes itself from list)
    At this point |gThreads| is non-NULL (still
    one entry in it), and we still hold the lock.
    413 (T1 drops lock at end of scope)
    414 stall

                 T2: 395 (taken)
                     397 (T2 acquires lock)
                     398 .. 412 (T2 removes itself from list)
                     |gThreads| is now null
                     412 stall, still holding the lock

T1: 415 (taken - since |gLock| is non-NULL and gThreads is NULL)
    416 .. 419 destroy |gLock| and null it

                 T2: 413 (drop lock at end of scope).
                         Unfortunately it's now NULL.
                         Possible segfault.


This race was pointed out by Helgrind.  It reports that the write of
|gThreads| at 399 conflicts with the read of it at 415:

Possible data race during write of size 8 at 0x6b23220 by thread #16
   at 0x558138C: XPCPerThreadData::~XPCPerThreadData() 
                 (xpcthreadcontext.cpp:399)
   by 0x55813AD: xpc_ThreadDataDtorCB(void*) (xpcthreadcontext.cpp:427)
   by 0x76DFA4E: _PR_DestroyThreadPrivate (prtpd.c:265)
   by 0x76F5033: _pt_root (ptthread.c:270)
   by 0x4C2A5D8: mythread_wrapper (hg_intercepts.c:213)
   by 0x4E34A03: start_thread (pthread_create.c:300)
   by 0xAB2680C: clone (clone.S:112)
 This conflicts with a previous read of size 8 by thread #14
   at 0x5581357: XPCPerThreadData::~XPCPerThreadData() 
                 (xpcthreadcontext.cpp:415)
   by 0x55813AD: xpc_ThreadDataDtorCB(void*) (xpcthreadcontext.cpp:427)
   by 0x76DFA4E: _PR_DestroyThreadPrivate (prtpd.c:265)
   by 0x76F5033: _pt_root (ptthread.c:270)
   by 0x4C2A5D8: mythread_wrapper (hg_intercepts.c:213)
   by 0x4E34A03: start_thread (pthread_create.c:300)
   by 0xAB2680C: clone (clone.S:112)
This isn't a problem in practice, because we know the main thread outlives all other threads: we explicitly join all other threads (at least the ones which might run xpconnect) during shutdown well before we get around to destroying the XPCPerThreadData for the main thread.
Comment on attachment 437374 [details] [diff] [review]
patch to make update and delete-decision atomic

i can live with this.
Attachment #437374 - Flags: review+
(In reply to comment #1)
> This isn't a problem in practice, [...]

It would be better if the code was "obviously correct" without having
to invoke non-local invariants, given that in this case it's easy to
make it so.
Keywords: checkin-needed
Assignee: nobody → jseward
Status: NEW → ASSIGNED
http://hg.mozilla.org/mozilla-central/rev/3f75c967a6cc
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a5
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: