Closed Bug 833349 Opened 12 years ago Closed 12 years ago

data race on nsFactoryEntry::mFactory

Categories

(Core :: XPCOM, defect, P3)

x86_64
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 684887

People

(Reporter: jseward, Unassigned)

Details

(Whiteboard: [data-race] [helgrind])

I see this on every startup of Firefox on desktop. Having stared at the relevant bits of code for a while, I don't know if it's (1) a bug in nsFactoryEntry, (2) a bug in InitializeSSLServerCertVerificationThreads' usage of nsFactoryEntry, or (3) a false positive caused by Helgrind not seeing some synchronisation event inside nsFactoryEntry. (1) and (3) strike me as relatively unlikely, because, assuming there are more uses of nsFactoryEntry than just the SSL machinery, we'd see similar race reports from them too. But I don't -- all the races also seem to involve InitializeSSLServerCertVerificationThreads in some way. So I'm guessing (2). But it's only a guess. It seems like nsFactoryEntry::GetFactory is called twice on the same object, from different threads, without synchronisation. The object initially has mFactory == null. This results in the first caller writing mFactory at line 1760 and the second caller reading it unsynchronised at line 1742. already_AddRefed<nsIFactory> nsFactoryEntry::GetFactory() { 1742 if (!mFactory) { // RegisterFactory then UnregisterFactory can leave an entry in mContractIDs // pointing to an unusable nsFactoryEntry. if (!mModule) return NULL; if (!mModule->Load()) return NULL; if (mModule->Module()->getFactoryProc) { mFactory = mModule->Module()->getFactoryProc(*mModule->Module(), *mCIDEntry); } else if (mCIDEntry->getFactoryProc) { mFactory = mCIDEntry->getFactoryProc(*mModule->Module(), *mCIDEntry); } else { NS_ASSERTION(mCIDEntry->constructorProc, "no getfactory or constructor"); 1760 mFactory = new mozilla::GenericFactory(mCIDEntry->constructorProc); } if (!mFactory) return NULL; } nsIFactory* factory = mFactory.get(); NS_ADDREF(factory); return factory; }
Race report pertaining to the above analysis: ---------------------------------------------------------------- Possible data race during read of size 8 at 0x4D1DFE0 by thread #1 Locks held: none at 0x7CB0D6A: nsFactoryEntry::GetFactory() (xpcom/components/nsComponentManager.cpp:1742) by 0x7CB0EC0: nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) (xpcom/components/nsComponentManager.cp p:1030) by 0x7C7A31A: CallCreateInstance(char const*, nsISupports*, nsID const&, void**) (ff-opt/xpcom/build/nsComponentManagerUtils.cpp:138) by 0x7C7A34C: nsCreateInstanceByContractID::operator()(nsID const&, void**) const (ff-opt/xpcom/build/nsComponentManagerUtils.cpp:178) by 0x7C792F5: nsCOMPtr_base::assign_from_helper(nsCOMPtr_helper const&, nsID const&) (ff-opt/xpcom/build/nsCOMPtr.cpp:110) by 0x6BE14B8: nsStreamTransportService::Init() (ff-opt/netwerk/base/src/../../../dist/include/nsCOMPtr.h:691) by 0x6BD129B: nsStreamTransportServiceConstructor(nsISupports*, nsID const&, void**) (in /home/sewardj/MOZ/MC-17-01-2013-HG/ff-opt/toolkit/library/libxu l.so) by 0x7C8064A: mozilla::GenericFactory::CreateInstance(nsISupports*, nsID const&, void**) (ff-opt/xpcom/build/GenericFactory.cpp:16) by 0x7CB0F99: nsComponentManagerImpl::CreateInstance(nsID const&, nsISupports*, nsID const&, void**) (xpcom/components/nsComponentManager.cpp:949) by 0x7CB223A: nsComponentManagerImpl::GetService(nsID const&, nsID const&, void**) (xpcom/components/nsComponentManager.cpp:1237) by 0x7C7A240: CallGetService(nsID const&, nsID const&, void**) (ff-opt/xpcom/build/nsComponentManagerUtils.cpp:51) by 0x7C7A497: nsGetServiceByCIDWithError::operator()(nsID const&, void**) const (ff-opt/xpcom/build/nsComponentManagerUtils.cpp:234) This conflicts with a previous write of size 8 by thread #4 Locks held: none at 0x6BAD847: nsCOMPtr_base::assign_assuming_AddRef(nsISupports*) (ff-opt/layout/xul/base/src/tree/src/../../../../../../dist/include/nsCOMPtr.h:438) by 0x7C79166: nsCOMPtr_base::assign_with_AddRef(nsISupports*) (ff-opt/xpcom/build/nsCOMPtr.cpp:49) by 0x7CB074E: nsCOMPtr<nsIFactory>::operator=(nsIFactory*) (ff-opt/xpcom/components/../../dist/include/nsCOMPtr.h:624) by 0x7CB0E1D: nsFactoryEntry::GetFactory() (xpcom/components/nsComponentManager.cpp:1760) by 0x7CB0EC0: nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) (xpcom/components/nsComponentManager.cpp:1030) by 0x7C7A31A: CallCreateInstance(char const*, nsISupports*, nsID const&, void**) (ff-opt/xpcom/build/nsComponentManagerUtils.cpp:138) by 0x77AE5FB: mozilla::psm::InitializeSSLServerCertVerificationThreads() (ff-opt/security/manager/ssl/src/../../../../dist/include/nsComponentManagerUtils.h:248) by 0x6BFBC4A: nsSocketTransportService::Run() (netwerk/base/src/nsSocketTransportService2.cpp:617) Address 0x4D1DFE0 is 16 bytes inside a block of size 32 alloc'd at 0x4A077FC: malloc (/home/sewardj/VgTRUNK/merge/coregrind/m_replacemalloc/vg_replace_malloc.c:270) by 0x62A7078: moz_xmalloc (memory/mozalloc/mozalloc.cpp:54) by 0x7CAFCDD: nsComponentManagerImpl::RegisterCIDEntry(mozilla::Module::CIDEntry const*, nsComponentManagerImpl::KnownModule*) (ff-opt/xpcom/components/../../dist/include/mozilla/mozalloc.h:200) by 0x7CB14B7: nsComponentManagerImpl::RegisterModule(mozilla::Module const*, mozilla::FileLocation*) (xpcom/components/nsComponentManager.cpp:405) by 0x7CB2F76: nsComponentManagerImpl::Init() (xpcom/components/nsComponentManager.cpp:346) by 0x7C81FC8: NS_InitXPCOM2_P (xpcom/build/nsXPComInit.cpp:446) by 0x6BADD72: ScopedXPCOMStartup::Initialize() (toolkit/xre/nsAppRunner.cpp:1185) by 0x6BB3FB5: XREMain::XRE_main(int, char**, nsXREAppData const*) (toolkit/xre/nsAppRunner.cpp:3886) by 0x6BB4249: XRE_main (toolkit/xre/nsAppRunner.cpp:4093) by 0x40208E: do_main(int, char**, nsIFile*) (browser/app/nsBrowserApp.cpp:195) by 0x402191: main (browser/app/nsBrowserApp.cpp:388) ----------------------------------------------------------------
Whiteboard: [data-race] [helgrind]
All access to the factory entries is supposed to be synchronized through nsComponentManagerImpl::mMon. I believe that this is a bug in nsFactoryEntry::GetFactory which should doing some sort of lock/test/unlock/get/lock/test/set magic, I guess. This is likely only showing up for NSS because everything else gets initialized on the main thread. I want to make the XPCOM component stuff mainthread-only, but that's difficult to untangle so I haven't. For this bug we should probably bandaid something on mMon.
Priority: -- → P3
Actually, see bug 684887 which I wrote to fix this. I was scared of landing that without being able to detect the deadlocks, but perhaps now with chromehang for telemetry it would be ok?
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.