If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

possible race on nsThread::sGlobalObserver

RESOLVED WORKSFORME

Status

()

Core
XPCOM
RESOLVED WORKSFORME
8 years ago
5 months ago

People

(Reporter: jseward, Unassigned)

Tracking

Trunk
x86_64
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

8 years ago
NOTE, this bug is currently unconfirmed.  See
https://bugzilla.mozilla.org/show_bug.cgi?id=551155#c12
for background and rationale.


HOWTO REPRO: any Fx startup running on Helgrind


All source lines are in xpcom/threads/nsThread.cpp

There is a global variable:

56   nsIThreadObserver* nsThread::sGlobalObserver;


One thread reads it, whilst in nsThread::ProcessNextEvent(int, int*)

501  PRBool notifyGlobalObserver = (sGlobalObserver != nsnull);
502  if (notifyGlobalObserver) 
503    sGlobalObserver->OnProcessNextEvent(this, mayWait && !ShuttingDown(),
504                                        mRunningEvent);


The main thread (program's root thread) later writes it (line 683)

672 nsresult
673 NS_SetGlobalThreadObserver(nsIThreadObserver* aObserver)
674 {
675   if (aObserver && nsThread::sGlobalObserver) {
676     return NS_ERROR_NOT_AVAILABLE;
677   }
678
679   if (!NS_IsMainThread()) {
680     return NS_ERROR_UNEXPECTED;
681   }
682
683   nsThread::sGlobalObserver = aObserver;
684   return NS_OK;
685 }


I can't see an obvious mechanism (locks etc) which make this safe, and
there are no comments in the source indicating that the race is known
about or expected.
(Reporter)

Comment 1

8 years ago
Raw report is:

Possible data race during write of size 8 at 0x5ad02b8 by thread #1
   at 0x58799DD: NS_SetGlobalThreadObserver(nsIThreadObserver*)
                 (nsThread.cpp:683)
   by 0x14092D06: nsXPConnect::GetXPConnect() (nsXPConnect.cpp:196)
   by 0x14092E29: nsXPConnect::GetSingleton() (nsXPConnect.cpp:209)
   by 0x140AC831: nsIXPConnectConstructor(nsISupports*, nsID const&, void**)
                 (xpcmodule.cpp:68)
   by 0x58428A0: nsGenericFactory::CreateInstance(nsISupports*, nsID const&,
                 void**) (nsGenericFactory.cpp:80)
   by 0x5875A33: nsComponentManagerImpl::CreateInstanceByContractID(char
                 const*, nsISupports*, nsID const&, void**)
                 (nsComponentManager.cpp:1685)
   by 0x5874D2D: nsComponentManagerImpl::GetServiceByContractID(char const*,
                 nsID const&, void**) (nsComponentManager.cpp:2252)
   by 0x583CCE9: CallGetService(char const*, nsID const&, void**)
                 (nsComponentManagerUtils.cpp:94)
   by 0x583CD3D: nsGetServiceByContractID::operator()(nsID const&, void**)
                 const (nsComponentManagerUtils.cpp:278)
   by 0x583C158: nsCOMPtr_base::assign_from_gs_contractid(
                 nsGetServiceByContractID, nsID const&) (nsCOMPtr.cpp:132)
   by 0x12ABAB63: nsChromeRegistry::ProcessManifestBuffer(char*, int,
                  nsILocalFile*, int) (nsCOMPtr.h:604)
   by 0x12ABCF71: nsChromeRegistry::ProcessManifest(nsILocalFile*, int)
                  (nsChromeRegistry.cpp:1383)

 This conflicts with a previous read of size 8 by thread #2
   at 0x5879B59: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:501)
   by 0x5841B16: NS_ProcessNextEvent_P(nsIThread*, int) (nsThreadUtils.cpp:250)
   by 0x12D2954B: nsSocketTransportService::Run()
                 (nsSocketTransportService2.cpp:581)
   by 0x5879C2C: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:527)
   by 0x5841B16: NS_ProcessNextEvent_P(nsIThread*, int) (nsThreadUtils.cpp:250)
   by 0x587A0A1: nsThread::ThreadFunc(void*) (nsThread.cpp:254)
   by 0x61018EA: _pt_root (ptthread.c:230)
   by 0x4C2A687: mythread_wrapper (hg_intercepts.c:221)

Comment 2

7 years ago
Could this explain bug #508263?
(Reporter)

Comment 3

7 years ago
Why do you think this might be related to 508263?  What link are
you seeing?
(Reporter)

Comment 4

7 years ago
This race is an instance of unsafe (racey) publication (of memory),
which is described nicely, along with rebuttals of common arguments
why it's OK, at

http://code.google.com/p/data-race-test/wiki/AboutRaces#Racy_publication

In this case, nsXPConnect::GetXPConnect creates a new nsXPConnect
object, and, by calling NS_SetGlobalThreadObserver, sets the global
variable nsThread::sGlobalObserver to point at it.

Meanwhile, a second thread in nsThread::ProcessNextEvent watches for
sGlobalObserver making a null to non-null transition.  When that is
observed, it then uses the contents of the object:

  PRBool notifyGlobalObserver = (sGlobalObserver != nsnull);
  if (notifyGlobalObserver) 
    sGlobalObserver->OnProcessNextEvent(this, mayWait && !ShuttingDown(),
                                        mRunningEvent);

For this to be safe relies on the assumption that a sequence of writes
performed by one CPU appear in that same order when observed by all
other CPUs in the system. (*)

In this case, it assumes that the assignment of the new object's
address to sGlobalObserver is observed (by the reading CPU) after the 
writes that initialise its contents.  

If that isn't so, it could be that the reading thread concludes 
that the object is ready to go, and so calls ::OnProcessNextEvent,
before its contents have migrated to the reading thread's CPU.

However far fetched this sounds: only on Intel and AMD cpus are we
guaranteed the property (*) above.  In particular, multiprocessor ARMs
and Power processors are allowed to reorder stores in between
processors.

--------

That notwithstanding, there's a much more obvious race, which happens
if two threads call NS_SetGlobalThreadObserver without locking.  But
that's not what this bug is about (iow, I am not claiming I observed
that happening).
sGlobalObserver is no more.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 5 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.