Closed Bug 98755 Opened 23 years ago Closed 23 years ago

nsNativeComponentLoader not thread-safe error?

Categories

(Core :: XPCOM, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.0

People

(Reporter: leif, Assigned: dougt)

References

Details

Attachments

(2 files, 2 obsolete files)

Hi, I'm not sure what happend, but I'm getting assertions now, like ###!!! ASSERTION: nsNativeComponentLoader not thread-safe: 'owningThread == NS_CurrentThread()', file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line 528 ###!!! Break: at file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line 528 ###!!! ASSERTION: nsNativeComponentLoader not thread-safe: 'owningThread == NS_CurrentThread()', file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line 528 ###!!! Break: at file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line 528 ###!!! ASSERTION: nsNativeComponentLoader not thread-safe: 'owningThread == NS_CurrentThread()', file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line 528 ###!!! Break: at file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line 528 I thought this was something I had done in my code. But, I backed out all my changes, and it's still happening. I had a bried discussion with Shaver last night on IRC, which I'm attaching here. Dmose also says he's seeing the same problem, and he'll post more information as well. Thanks, -- leif From IRC w. shaver: <leif> quick question, just need a hint where to start looking. I get this error, apparently I'm running code on the "wrong" thread: <leif> ###!!! ASSERTION: nsNativeComponentLoader not thread-safe: 'owningThread == NS_CurrentThread()', file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line 528 <leif> ###!!! Break: at file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line 528 <leif> ###!!! ASSERTION: nsNativeComponentLoader not thread-safe: 'owningThread == NS_CurrentThread()', file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line 528 <leif> how can I debug this? <shaver> that sucks <leif> hehe <leif> I'm screwed? :) <shaver> it looks like you're creating an instance (the first one from a new component) on a non-main thread <shaver> that should be fine <shaver> can you file a bug on that, against me? <shaver> cc: dougt and dp <leif> ehhh, but, this must be something I'm doing wrong... <leif> I rearranged some code, and it suddenly started happening <shaver> well, it looks like you're calling into the component manager on a non-main thread <shaver> and it has to load a new component <shaver> so it's touching the native component loader on that Other thread, and the assertions are bitching <leif> ok, makes sense I think, not sure which component I'm loading. But, I do use the Proxy objects heavily here, to cause the callbacks to happen on a main thread
Summary: nsNativeComponentLoader not thread-safe → nsNativeComponentLoader not thread-safe error?
This actually is unrelated to leif's patch; I started seeing these assertions on the trunk about a week and a half ago after some component manager checkins by dp; I'll go try and figure out which ones in a bit. Note that this is actually being triggered because for some reason, the LDAP addressbook code (see http://abzilla.mozdev.org) is being called at autocomplete time, which it shouldn't be. I run under gdb with the XPCOM_DEBUG_BREAK set to stack, and here's what the first assertion's stack looks like: (gdb) where #0 0x40190a20 in nsDebug::Break ( aFile=0x401f20e0 "/home/dmose/s/main-browser/mozilla/xpcom/base/nsDebug.cpp", aLine=528) at /home/dmose/s/main-browser/mozilla/xpcom/base/nsDebug.cpp:344 #1 0x40190895 in nsDebug::Assertion ( aStr=0x401ed8a0 "nsNativeComponentLoader not thread-safe", aExpr=0x401f2120 "owningThread == NS_CurrentThread()", aFile=0x401f20e0 "/home/dmose/s/main-browser/mozilla/xpcom/base/nsDebug.cpp", aLine=528) at /home/dmose/s/main-browser/mozilla/xpcom/base/nsDebug.cpp:290 #2 0x40190ee9 in NS_CheckThreadSafe (owningThread=0x8061f58, msg=0x401ed8a0 "nsNativeComponentLoader not thread-safe") at /home/dmose/s/main-browser/mozilla/xpcom/base/nsDebug.cpp:528 #3 0x401725df in nsNativeComponentLoader::AddRef (this=0x806b1d8) at /home/dmose/s/main-browser/mozilla/xpcom/components/nsNativeComponentLoader.cpp:87 #4 0x4016e3ce in nsComponentManagerImpl::GetLoaderForType (this=0x806a758, aType=0, aLoader=0x432473ac) at /home/dmose/s/main-browser/mozilla/xpcom/components/nsComponentManager.cpp:2383 #5 0x4016b1b9 in nsComponentManagerImpl::FindFactory (this=0x806a758, aClass=@0x4324744c, aFactory=0x432474dc) at /home/dmose/s/main-browser/mozilla/xpcom/base/nsCOMPtr.h:1069 #6 0x4017085a in nsComponentManager::FindFactory (aClass=@0x4324744c, aFactory=0x432474dc) at /home/dmose/s/main-browser/mozilla/xpcom/components/nsComponentManager.cpp:3021 #7 0x40964fde in RDFServiceImpl::GetResource (this=0x80f4c58, aURI=0x89892c0 "moz-abldapcard://nsdirectory.netscape.com:-1/ou=People,dc=netscape,dc=com", aResource=0x4324756c) at ../../../dist/include/xpcom/nsCOMPtr.h:1069 #8 0x42bb743e in nsAbLDAPDirectory::CreateCard (this=0x8943a38, uri=0x8964d58, dn=0x43247694 "ou=People,dc=netscape,dc=com", card=0x4324766c) at ../../../dist/include/xpcom/nsCOMPtr.h:1069 #9 0x42bbbfaf in nsAbQueryLDAPMessageListener::OnLDAPMessageSearchEntry ( this=0x845ea20, aMessage=0x8432148, result=0x4324787c) at ../../../dist/include/xpcom/nsCOMPtr.h:642 #10 0x42bba1d1 in nsAbQueryLDAPMessageListener::OnLDAPMessage (this=0x845ea20, aMessage=0x8432148) at ../../../dist/include/xpcom/nsCOMPtr.h:1069 #11 0x42bfc399 in nsLDAPConnection::InvokeMessageCallback (this=0x893d658, aMsgHandle=0x89887f8, aMsg=0x8432148, aRemoveOpFromConnQ=0) at ../../../../dist/include/xpcom/nsCOMPtr.h:649 #12 0x42bfd4a5 in nsLDAPConnectionLoop::Run (this=0x8342c80) at ../../../../dist/include/xpcom/nsCOMPtr.h:642 #13 0x40182cb9 in nsThread::Main (arg=0x83426d8) at ../../dist/include/xpcom/nsCOMPtr.h:649 #14 0x40251339 in _pt_root (arg=0x89366e0) at /home/dmose/s/main-browser/mozilla/nsprpub/pr/src/pthreads/ptthread.c:214 #15 0x40280d8c in pthread_start_thread (arg=0x43247be0) at manager.c:274 #16 0x40280e6f in pthread_start_thread_event (arg=0x43247be0) at manager.c:298 (gdb)
None of my changes should cause thread control flow changes. They were purely reducing allocations of strings. Let me do a debug build and see the assertions too.
Also the attached stack trace looks wierd: - FindFactory() doesn't call GetLoaderForType() - GetLoaderForType() is addrefffing native component loader ?
nsComponentManager::FindFactory -> nsFactoryEntry::GetFactory (inline) -> nsComponentManager::GetLoaderForType GetLoaderForType addrefs the loader it hands back -- for a (most common) type of 0, that'll always be the native component loader. Stack is fine, I think. I also think it's fine, or should be, to call the component loader on a thread that is different from that which initialized XPCOM (and thereby created the native component loader). The problem is really that we haven't made the component manager and loaders be threadsafe, or at least haven't verified that they are and used NS_IMPL_QUERYINTERFACE_THREADSAFE to indicate as much.
I see. ComponentManager did go through one round of head scratching and bug fixing for thread safeness early on (2 years ago). Native loader wasn't around then. FYI
timeless@timeless-bsd:~/mozilla/obj-gtk-i386-unknown-freebsd4.4/dist/bin: ./run-mozilla.sh ./TestProtocols "jar:resource:/chrome/classic.jar\!/skin/classic/global/scrollbars.css" Type Manifest File: /home/timeless/mozilla/obj-gtk-i386-unknown-freebsd4.4/dist/bin/components/xpti .dat Trying to load: jar:resource:/chrome/classic.jar!/skin/classic/global/scrollbars.css ###!!! ASSERTION: nsNativeComponentLoader not thread-safe: 'owningThread == NS_CurrentThread()', file /home/timeless/mozilla/xpcom/base/nsDebug.cpp, line 528 [repeated 8 times] shaver: what are your plans for this bug, will we proxy to the main thread, mark thread safe, remove the warning, make the code work on other threads?
mike I have similar bugs for 1.01/1.1. If you want to reassign, feel free.
Don't have to ask me twice.
Assignee: shaver → dougt
Target Milestone: --- → mozilla1.1
I spend some time looking at the nsNativeComponentLoader and I do not think there is much to do to make this loader threadsafe. There are three objects which are of concern. First the nsISupport impl must be made threadsafe. The second object is the mDLLStore, which is a nsObjectHashtable class, is already created threadsafe. Lastly, there is mDeferredComponents, a nsVoidArray. This object is only accessed during registration, but should be protected anyways. bringing this baby in. lets see if I can get it pre-approved.
Keywords: mozilla1.0
Target Milestone: mozilla1.1alpha → mozilla1.0
Checking in nsIComponentRegistrar.idl; /cvsroot/mozilla/xpcom/components/nsIComponentRegistrar.idl,v <-- nsIComponentRegistrar.idl new revision: 1.4; previous revision: 1.3 done
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
wrong bug. sorry.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
plussing this.
Keywords: mozilla1.0mozilla1.0+
Attached patch patch v1 (obsolete) — Splinter Review
protects mRefcnt by using the threadsafe macros. protects mDeferredComponents with mMon a PRMonitor.
+ PR_DestroyMonitor(mMon); + mMon = PR_NewMonitor(); *Always* use nsAutoMonitor::NewMonitor nsAutoMonitor::DestroyMonitor if you are going to use nsAutoMonitor. Does this really need to be a monitor rather than a lock? Does it really ever reenter?
Why? Our examples in nsAutoLock.h show that this is correct. There are many other places where we create a monitor through nspr and pass it to nsAutoMonitor. Why is this bad? I will double check and see if I can use a lock instead. An previous draft of this patch used to also protect the dllstore hash table. In that case, I really needed a monitor.
Then the examples are broken. The deadlock detection code in nsAutoLock.cpp depends on the monitors and locks being created using that static member. At one point there was talk of adding a type so that the compiler could help enforce this rule.
Bug 58904 asks for strong types for locks and monitors in XPCOM that trivially wrap the NSPR ones, adding lifetime tracking value for the deadlock detection code in nsAutoLock.cpp (which has saved several peoples' @$$es in the last few years, lemmetellya). dougt, I don't see any example comment in nsAutoLock.h showing a PR_NewMonitor call, or saying you can use PR_NewMonitor directly. OTOH, the comment I wrote saying to use nsAutoMonitor::NewMonitor instead is buried in nsAutoLock.cpp, dagnabit. Some modified for public consumption version of that comment should go in nsAutoLock.h. But I don't see why a monitor is needed here, when a lock should suffice. What control flow must nest, reentering the monitor that's already entered? /be
after double checking, I can use a lock. brendan, take a look at nsAutoLock.h line 53. This example is using a lock. I will move your comment to the right place and get us a new patch. Coming soon.
oh geez, that sucks. it is okay to use PLLock* allocated directly from nspr, but not a PRMonitor? Not very consistent.
Yes, monitors and locks are different: there is no nsAutoLock::NewLock static method, but there is an nsAutoMonitor::NewMonitor -- confusing, as you imply, and comments will help -- thanks. The right thing are flyweight NSPR-wrapping XPCOM lock and monitor types with ctors and dtors, but that won't make 1.0. /be
Attachment #77352 - Attachment is obsolete: true
Attachment #77398 - Flags: superreview+
Comment on attachment 77398 [details] [diff] [review] some comment to stop others from making my same mistakes sr=jband comments are good
dougt: I'm not comfortable with the assertion that mDLLStore is protected by virtue of being created with the threadsafe flag set. That protection seems fairly useless to me given that this code also does an enumeration of the hashtable - a very unprotected activity. It would help it you layed out which entry points the component manager *might* call simultaneously. Which are only called on one thread at shutdown? It is guarenteed that all other calls will be finished when shutdown happens? I'm curious why this code allows for multiple calls to Init. Does that really happen? If so is it protected from simultaneous calls? A map of how the component manager uses the loaders would help - and might allow you to avoid locking down at this level if there are cases where it is really not needed. Is the goal to make this bulletproof in a free threading environment or just sort of safe?
Luckly unloading is never called :-/ We can get into trouble if two threads both call Unload(). This is a problem with any threadsafe enumerator which needs to unlock before calling out, right? Assuming that we do expose Unloading to clients, it would be a required thing to document that this call should only be made on the primordial thread, right? > It would help it you layed out which entry points the component manager *might* call simultaneously. I traced though the paths from the component manager to the loader(s) and try to bucket the calls into sets startup-only, running, shutdown. However, there is a great amount of overlap. One example is the loader's init method. Looking at it, you might thing that that would only be called at startup. however, createInstance could wind up calling it. Since CI is allowed to be called on multiple thread, it is plausible that init be called simultaneously. If the loader has not be created and both CI's need that loader, we must have a sync point. > I'm curious why this code allows for multiple calls to Init. Does that really happen? If so is it protected from simultaneous calls? I am not happy with this at all. The component manager never reinits the loader. It does it once, then caches the loader. It should be removed. I will attack this with these changes. > A map of how the component manager uses the loaders would help - and might allow you to avoid locking down at this level if there are cases where it is really not needed. Is there a tool that does this? I started on a pad and paper and it just started getting messy. > Is the goal to make this bulletproof in a free threading environment or just sort of safe? This is a loaded question. This patch certianly moves us in the direction of a more bulletproof implementation.
Comment on attachment 77398 [details] [diff] [review] some comment to stop others from making my same mistakes >+ /** >+ * Constructor >+ * The constructor locks the given lock. During destruction >+ * the lock will be unlocked. Just a thought: "acquires the given lock... the lock will be released" avoids the "lock the lock" and "unlock the lock" semi-stutter. Might avoid passive voice in the second sentence, as the first does: "The destructor releases the lock." >+ /** >+ * lock >+ * Client may call this to reenter the given lock. Take special >+ * note that locking a locked lock has undefined results. s/reenter/reacquire/ "Reenter" should be restricted to monitors and reentrancy. Locks are not "reentrant" as the second sentence notes (but "you'll hang or crash" might be a better warning than "undefined results" :-). >+ /** >+ * unlock >+ * Client may call this to unlock the given lock. Take special >+ * note unlocking an unlocked lock has undefined results. "release the given lock." Thanks for doing this, r=brendan@mozilla.org with a bit more love for the comments as proposed above. /be
Attachment #77398 - Flags: review+
Comment on attachment 77393 [details] [diff] [review] uses a lock to protect mDeferredComponents r=jband. Doug has convinced me that this is 'good enough'.
Attachment #77393 - Flags: review+
> One example is the loader's init method. Looking at > it, you might thing that that would only be called at startup. however, > createInstance could wind up calling it. When I broke loaders out, I fought with myself over this lazy-creation decision. On the one hand, I didn't want people to pay to spin up Python or Java just because they had a component registered that might someday be created. On the other hand, by not doing that initialization at startup when we're "guaranteed" to be single-theaded, we impose this locking overhead. Though we can still avoid the locking during the initial startup phase, if we need to.) Given that the dominant case is that there are only the JS and native component loaders, maybe we should eagerly create them during single-threaded spinup and avoid the locks. Back when I was first trying to wedge PLDHash into the component manager, I dreamed up a lock-free scheme that took advantage of the fact that changes to the table (component registrations) are very rare compared to reads from it, after the initial startup phase, by letting readers race freely, and requiring writers to update a copy of the table and only swap it in after they'd synchronized with all threads via a posted XPCOM event (to ensure that they were no longer in the critical section where they might be holding references to the original table store). Turns out it's similar to the read-copy-update stuff that they're starting to use for Linux, so I don't need to write too much about it. =) (http://lse.sourceforge.net/locking/rclock.html) We can't do this now, by any means, but I wanted to blurb the bug before I forgot again, in case we want to revisit this code to reduce locking costs. I'll look at the patch now: curious to see why we need to lock around mDeferredComponents, since I thought that was only accessed during the single-threaded initial registration process. (That's why I didn't lock it in the first place, maybe a mistake!)
Comment on attachment 77393 [details] [diff] [review] uses a lock to protect mDeferredComponents From above: > I'm curious why this code allows for multiple calls to Init. Does that really happen? If so is it protected from simultaneous calls? I am not happy with this at all. The component manager never reinits the loader. It does it once, then caches the loader. It should be removed. I will attack this with these changes. ----- I don't see that attack here: we allocate a lock in Init, but don't acquire it, so it would seem that we could end up racing pretty freely through there. What's the plan for this? As the patch stands, the only thing the lock protects is mDeferredComponents, which really seems like it should only be touched during single-threaded autoreg. Nay?
okay, i didn't remove the reinit code. I will post a new patch will which remove the check before creating the mLock and mDllStore. nsNativeComponentLoader::Init() code does not have a race condition as it is only called during PlatformInit() from the nsComponentManager. This is run only when XPCOM startups. WRT the mDeferredComponents lock. I would be happy to remove the lock if we only supported XPCOM registeration from a single thread. Is that where we want to be?
I think we absolutely want to only permit autoregistration on the main thread. Do we use the deferral mechanism for directed registration, such as XPInstall? Either way, I'm more than happy requiring people to proxy such registration to the main thread. Thrilled, even.
I have no problems with that. I will make a comment in the IDL and attach a new patch. Stay tuned.
Attached patch patch 3Splinter Review
thread restriction of AutoReg allows us not protect mDeferredComponents.
Attachment #77393 - Attachment is obsolete: true
Comment on attachment 78396 [details] [diff] [review] patch 3 sr=jband. I'm obviously becoming a push over :) I'd slap an NS_ASSERTION(!mDllStore,...) in there to catch unexpected cheaters.
Attachment #78396 - Flags: superreview+
right. done.
Comment on attachment 78396 [details] [diff] [review] patch 3 r=waterson, assuming jband's assertion is there.
Attachment #78396 - Flags: review+
Checking in nsIComponentRegistrar.idl; /cvsroot/mozilla/xpcom/components/nsIComponentRegistrar.idl,v <-- nsIComponentRegistrar.idl new revision: 1.5; previous revision: 1.4 done Checking in nsNativeComponentLoader.cpp; /cvsroot/mozilla/xpcom/components/nsNativeComponentLoader.cpp,v <-- nsNativeComponentLoader.cpp new revision: 1.79; previous revision: 1.78 done
Comment on attachment 77398 [details] [diff] [review] some comment to stop others from making my same mistakes a=asa (on behalf of drivers) for checkin to the 1.0 branch
Attachment #77398 - Flags: approval+
Comment on attachment 78396 [details] [diff] [review] patch 3 a=asa (on behalf of drivers) for checkin to the 1.0 branch
Attachment #78396 - Flags: approval+
landed on branch: Checking in nsIComponentRegistrar.idl; /cvsroot/mozilla/xpcom/components/nsIComponentRegistrar.idl,v <-- nsIComponentRegistrar.idl new revision: 1.4.2.2; previous revision: 1.4.2.1 done Checking in nsNativeComponentLoader.cpp; /cvsroot/mozilla/xpcom/components/nsNativeComponentLoader.cpp,v <-- nsNativeComponentLoader.cpp new revision: 1.78.10.2; previous revision: 1.78.10.1 done
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
adding branch resolution keyword
Keywords: fixed1.0.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: