Closed Bug 98755 Opened 23 years ago Closed 22 years ago

nsNativeComponentLoader not thread-safe error?

Categories

(Core :: XPCOM, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.0

People

(Reporter: leif, Assigned: dougt)

References

Details

Attachments

(2 files, 2 obsolete files)

Hi,

I'm not sure what happend, but I'm getting assertions now, like

###!!! ASSERTION: nsNativeComponentLoader not thread-safe: 'owningThread ==
NS_CurrentThread()', file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp,
line 528
###!!! Break: at file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line
528
###!!! ASSERTION: nsNativeComponentLoader not thread-safe: 'owningThread ==
NS_CurrentThread()', file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp,
line 528
###!!! Break: at file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line
528
###!!! ASSERTION: nsNativeComponentLoader not thread-safe: 'owningThread ==
NS_CurrentThread()', file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp,
line 528
###!!! Break: at file /export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line
528


I thought this was something I had done in my code. But, I backed out all my
changes, and it's still happening. I had a bried discussion with Shaver last
night on IRC, which I'm attaching here. Dmose also says he's seeing the same
problem, and he'll post more information as well.

Thanks,

-- leif

From IRC w. shaver:

<leif> quick question, just need a hint where to start looking. I get this
error, apparently I'm running code on the "wrong" thread:
<leif> ###!!! ASSERTION: nsNativeComponentLoader not thread-safe: 'owningThread
== NS_CurrentThread()', file
/export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line 528
<leif> ###!!! Break: at file
/export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line 528
<leif> ###!!! ASSERTION: nsNativeComponentLoader not thread-safe: 'owningThread
== NS_CurrentThread()', file
/export/disk1/opensrc/mozilla/xpcom/base/nsDebug.cpp, line 528
<leif> how can I debug this?
<shaver> that sucks
<leif> hehe
<leif> I'm screwed? :)
<shaver> it looks like you're creating an instance (the first one from a new
component) on a non-main thread
<shaver> that should be fine
<shaver> can you file a bug on that, against me?
<shaver> cc: dougt and dp
<leif> ehhh, but, this must be something I'm doing wrong...
<leif> I rearranged some code, and it suddenly started happening
<shaver> well, it looks like you're calling into the component manager on a
non-main thread
<shaver> and it has to load a new component
<shaver> so it's touching the native component loader on that Other thread, and
the assertions are bitching
<leif> ok, makes sense I think, not sure which component I'm loading. But, I do
use the Proxy objects heavily here, to cause the callbacks to happen on a main
thread
Summary: nsNativeComponentLoader not thread-safe → nsNativeComponentLoader not thread-safe error?
This actually is unrelated to leif's patch; I started seeing these assertions on
the trunk about a week and a half ago after some component manager checkins by
dp; I'll go try and figure out which ones in a bit.

Note that this is actually being triggered because for some reason, the LDAP
addressbook code (see http://abzilla.mozdev.org) is being called at autocomplete
time, which it shouldn't be.

I run under gdb with the XPCOM_DEBUG_BREAK set to stack, and here's what the
first  assertion's stack looks like:

(gdb) where
#0  0x40190a20 in nsDebug::Break (
    aFile=0x401f20e0
"/home/dmose/s/main-browser/mozilla/xpcom/base/nsDebug.cpp", aLine=528) at
/home/dmose/s/main-browser/mozilla/xpcom/base/nsDebug.cpp:344
#1  0x40190895 in nsDebug::Assertion (
    aStr=0x401ed8a0 "nsNativeComponentLoader not thread-safe", 
    aExpr=0x401f2120 "owningThread == NS_CurrentThread()", 
    aFile=0x401f20e0
"/home/dmose/s/main-browser/mozilla/xpcom/base/nsDebug.cpp", aLine=528) at
/home/dmose/s/main-browser/mozilla/xpcom/base/nsDebug.cpp:290
#2  0x40190ee9 in NS_CheckThreadSafe (owningThread=0x8061f58, 
    msg=0x401ed8a0 "nsNativeComponentLoader not thread-safe")
    at /home/dmose/s/main-browser/mozilla/xpcom/base/nsDebug.cpp:528
#3  0x401725df in nsNativeComponentLoader::AddRef (this=0x806b1d8)
    at
/home/dmose/s/main-browser/mozilla/xpcom/components/nsNativeComponentLoader.cpp:87
#4  0x4016e3ce in nsComponentManagerImpl::GetLoaderForType (this=0x806a758, 
    aType=0, aLoader=0x432473ac)
    at
/home/dmose/s/main-browser/mozilla/xpcom/components/nsComponentManager.cpp:2383
#5  0x4016b1b9 in nsComponentManagerImpl::FindFactory (this=0x806a758, 
    aClass=@0x4324744c, aFactory=0x432474dc)
    at /home/dmose/s/main-browser/mozilla/xpcom/base/nsCOMPtr.h:1069
#6  0x4017085a in nsComponentManager::FindFactory (aClass=@0x4324744c, 
    aFactory=0x432474dc)
    at
/home/dmose/s/main-browser/mozilla/xpcom/components/nsComponentManager.cpp:3021
#7  0x40964fde in RDFServiceImpl::GetResource (this=0x80f4c58, 
    aURI=0x89892c0
"moz-abldapcard://nsdirectory.netscape.com:-1/ou=People,dc=netscape,dc=com",
aResource=0x4324756c)
    at ../../../dist/include/xpcom/nsCOMPtr.h:1069
#8  0x42bb743e in nsAbLDAPDirectory::CreateCard (this=0x8943a38, 
    uri=0x8964d58, dn=0x43247694 "ou=People,dc=netscape,dc=com", 
    card=0x4324766c) at ../../../dist/include/xpcom/nsCOMPtr.h:1069
#9  0x42bbbfaf in nsAbQueryLDAPMessageListener::OnLDAPMessageSearchEntry (
    this=0x845ea20, aMessage=0x8432148, result=0x4324787c)
    at ../../../dist/include/xpcom/nsCOMPtr.h:642
#10 0x42bba1d1 in nsAbQueryLDAPMessageListener::OnLDAPMessage (this=0x845ea20, 
    aMessage=0x8432148) at ../../../dist/include/xpcom/nsCOMPtr.h:1069
#11 0x42bfc399 in nsLDAPConnection::InvokeMessageCallback (this=0x893d658, 
    aMsgHandle=0x89887f8, aMsg=0x8432148, aRemoveOpFromConnQ=0)
    at ../../../../dist/include/xpcom/nsCOMPtr.h:649
#12 0x42bfd4a5 in nsLDAPConnectionLoop::Run (this=0x8342c80)
    at ../../../../dist/include/xpcom/nsCOMPtr.h:642
#13 0x40182cb9 in nsThread::Main (arg=0x83426d8)
    at ../../dist/include/xpcom/nsCOMPtr.h:649
#14 0x40251339 in _pt_root (arg=0x89366e0)
    at /home/dmose/s/main-browser/mozilla/nsprpub/pr/src/pthreads/ptthread.c:214
#15 0x40280d8c in pthread_start_thread (arg=0x43247be0) at manager.c:274
#16 0x40280e6f in pthread_start_thread_event (arg=0x43247be0) at manager.c:298
(gdb) 
None of my changes should cause thread control flow changes. They were purely
reducing allocations of strings.

Let me do a debug build and see the assertions too.
Also the attached stack trace looks wierd:
- FindFactory() doesn't call GetLoaderForType()
- GetLoaderForType() is addrefffing native component loader ?
nsComponentManager::FindFactory 
  ->  nsFactoryEntry::GetFactory (inline)
    -> nsComponentManager::GetLoaderForType

GetLoaderForType addrefs the loader it hands back -- for a (most common) type of
0, that'll always be the native component loader.

Stack is fine, I think.  I also think it's fine, or should be, to call the
component loader on a thread that is different from that which initialized XPCOM
(and thereby created the native component loader).  The problem is really that
we haven't made the component manager and loaders be threadsafe, or at least
haven't verified that they are and used NS_IMPL_QUERYINTERFACE_THREADSAFE to
indicate as much.
I see. ComponentManager did go through one round of head scratching and bug
fixing for thread safeness early on (2 years ago). Native loader wasn't around
then. FYI
timeless@timeless-bsd:~/mozilla/obj-gtk-i386-unknown-freebsd4.4/dist/bin: 
./run-mozilla.sh ./TestProtocols 
"jar:resource:/chrome/classic.jar\!/skin/classic/global/scrollbars.css"
Type Manifest File: 
/home/timeless/mozilla/obj-gtk-i386-unknown-freebsd4.4/dist/bin/components/xpti
.dat

Trying to load:
        jar:resource:/chrome/classic.jar!/skin/classic/global/scrollbars.css
###!!! ASSERTION: nsNativeComponentLoader not thread-safe: 'owningThread == 
NS_CurrentThread()', file /home/timeless/mozilla/xpcom/base/nsDebug.cpp, line 
528
[repeated 8 times]

shaver: what are your plans for this bug, will we proxy to the main thread, 
mark thread safe, remove the warning, make the code work on other threads?
mike I have similar bugs for 1.01/1.1.  If you want to reassign, feel free.
Don't have to ask me twice.
Assignee: shaver → dougt
Target Milestone: --- → mozilla1.1
I spend some time looking at the nsNativeComponentLoader and I do not think
there is much to do to make this loader threadsafe.  

There are three objects which are of concern.  First the nsISupport impl must be
made threadsafe.  The second object is the mDLLStore, which is a
nsObjectHashtable class, is already created threadsafe.  Lastly, there is
mDeferredComponents, a nsVoidArray.  This object is only accessed during
registration, but should be protected anyways. 

bringing this baby in.  lets see if I can get it pre-approved.
Keywords: mozilla1.0
Target Milestone: mozilla1.1alpha → mozilla1.0
Checking in nsIComponentRegistrar.idl;
/cvsroot/mozilla/xpcom/components/nsIComponentRegistrar.idl,v  <-- 
nsIComponentRegistrar.idl
new revision: 1.4; previous revision: 1.3
done
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
wrong bug.  sorry.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
plussing this.
Keywords: mozilla1.0mozilla1.0+
Attached patch patch v1 (obsolete) — Splinter Review
protects mRefcnt by using the threadsafe macros.
protects mDeferredComponents with mMon a PRMonitor.
+        PR_DestroyMonitor(mMon);
+        mMon = PR_NewMonitor();

*Always* use nsAutoMonitor::NewMonitor nsAutoMonitor::DestroyMonitor if you are
going to use nsAutoMonitor.

Does this really need to be a monitor rather than a lock? Does it really ever
reenter?
Why?  Our examples in nsAutoLock.h show that this is correct.  There are many
other places where we create a monitor through nspr and pass it to
nsAutoMonitor.  Why is this bad?

I will double check and see if I can use a lock instead.  An previous draft of
this patch used to also protect the dllstore hash table.  In that case, I really
needed a monitor.  
Then the examples are broken. The deadlock detection code in nsAutoLock.cpp
depends on the monitors and locks being created using that static member. At one
point there was talk of adding a type so that the compiler could help enforce
this rule.
Bug 58904 asks for strong types for locks and monitors in XPCOM that trivially
wrap the NSPR ones, adding lifetime tracking value for the deadlock detection
code in nsAutoLock.cpp (which has saved several peoples' @$$es in the last few
years, lemmetellya).

dougt, I don't see any example comment in nsAutoLock.h showing a PR_NewMonitor
call, or saying you can use PR_NewMonitor directly.  OTOH, the comment I wrote
saying to use nsAutoMonitor::NewMonitor instead is buried in nsAutoLock.cpp,
dagnabit.  Some modified for public consumption version of that comment should
go in nsAutoLock.h.

But I don't see why a monitor is needed here, when a lock should suffice. What
control flow must nest, reentering the monitor that's already entered?

/be
after double checking, I can use a lock.  

brendan, take a look at nsAutoLock.h line 53.  This example is using a lock.  I
will move your comment to the right place and get us a new patch.  Coming soon.
oh geez, that sucks.  it is okay to use PLLock* allocated directly from nspr,
but not a PRMonitor?  Not very consistent.  
Yes, monitors and locks are different: there is no nsAutoLock::NewLock static
method, but there is an nsAutoMonitor::NewMonitor -- confusing, as you imply,
and comments will help -- thanks.  The right thing are flyweight NSPR-wrapping
XPCOM lock and monitor types with ctors and dtors, but that won't make 1.0.

/be
Attachment #77352 - Attachment is obsolete: true
Attachment #77398 - Flags: superreview+
Comment on attachment 77398 [details] [diff] [review]
some comment to stop others from making my same mistakes

sr=jband comments are good
dougt: I'm not comfortable with the assertion that mDLLStore is protected by
virtue of being created with the threadsafe flag set. That protection seems
fairly useless to me given that this code also does an enumeration of the
hashtable - a very unprotected activity.

It would help it you layed out which entry points the component manager *might*
call simultaneously. Which are only called on one thread at shutdown? It is
guarenteed that all other calls will be finished when shutdown happens?

I'm curious why this code allows for multiple calls to Init. Does that really
happen? If so is it protected from simultaneous calls?

A map of how the component manager uses the loaders would help - and might allow
you to avoid locking down at this level if there are cases where it is really
not needed.

Is the goal to make this bulletproof in a free threading environment or just
sort of safe?
Luckly unloading is never called :-/  We can get into trouble if two threads
both call Unload().  This is a problem with any threadsafe enumerator which
needs to unlock before calling out, right?  Assuming that we do expose Unloading
to clients, it would be a required thing to document that this call should only
be made on the primordial thread, right?

> It would help it you layed out which entry points the component manager
*might* call simultaneously. 

I traced though the paths from the component manager to the loader(s) and try to
bucket the calls into sets startup-only, running, shutdown.  However, there is a
great amount of overlap.  One example is the loader's init method.  Looking at
it, you might thing that that would only be called at startup.  however,
createInstance could wind up calling it.  Since CI is allowed to be called on
multiple thread, it is plausible that init be called simultaneously. If the
loader has not be created and both CI's need that loader, we must have a sync point.

> I'm curious why this code allows for multiple calls to Init. Does that really
happen? If so is it protected from simultaneous calls?

I am not happy with this at all.  The component manager never reinits the
loader.  It does it once, then caches the loader.  It should be removed.  I will
attack this with these changes.

> A map of how the component manager uses the loaders would help - and might
allow you to avoid locking down at this level if there are cases where it is
really not needed.

Is there a tool that does this?  I started on a pad and paper and it just
started getting messy. 

> Is the goal to make this bulletproof in a free threading environment or just
sort of safe?

This is a loaded question.  This patch certianly moves us in the direction of a
more bulletproof implementation.  

Comment on attachment 77398 [details] [diff] [review]
some comment to stop others from making my same mistakes

>+    /**
>+     * Constructor
>+     * The constructor locks the given lock.  During destruction
>+     * the lock will be unlocked.

Just a thought: "acquires the given lock... the lock will be released" avoids
the "lock the lock" and "unlock the lock" semi-stutter.

Might avoid passive voice in the second sentence, as the first does: "The
destructor releases the lock."

>+    /** 
>+     * lock
>+     * Client may call this to reenter the given lock. Take special
>+     * note that locking a locked lock has undefined results.

s/reenter/reacquire/

"Reenter" should be restricted to monitors and reentrancy.  Locks are not
"reentrant" as the second sentence notes (but "you'll hang or crash" might be a
better warning than "undefined results" :-).

>+    /** 
>+     * unlock
>+     * Client may call this to unlock the given lock. Take special
>+     * note unlocking an unlocked lock has undefined results.

"release the given lock."

Thanks for doing this, r=brendan@mozilla.org with a bit more love for the
comments as proposed above.

/be
Attachment #77398 - Flags: review+
Comment on attachment 77393 [details] [diff] [review]
uses a lock to protect mDeferredComponents

r=jband. Doug has convinced me that this is 'good enough'.
Attachment #77393 - Flags: review+
> One example is the loader's init method.  Looking at
> it, you might thing that that would only be called at startup.  however,
> createInstance could wind up calling it.

When I broke loaders out, I fought with myself over this lazy-creation decision.
On the one hand, I didn't want people to pay to spin up Python or Java just
because they had a component registered that might someday be created.

On the other hand, by not doing that initialization at startup when we're
"guaranteed" to be single-theaded, we impose this locking overhead.   Though
we can still avoid the locking during the initial startup phase, if we need to.)
 Given that the dominant case is that there are only the JS and native component
loaders, maybe we should eagerly create them during single-threaded spinup and
avoid the locks.

Back when I was first trying to wedge PLDHash into the component manager, I
dreamed up a lock-free scheme that took advantage of the fact that changes to
the table (component registrations) are very rare compared to reads from it,
after the initial startup phase, by letting readers race freely, and requiring
writers to update a copy of the table and only swap it in after they'd
synchronized with all threads via a posted XPCOM event (to ensure that they were
no longer in the critical section where they might be holding references to the
original table store).  Turns out it's similar to the read-copy-update stuff
that they're starting to use for Linux, so I don't need to write too much about
it. =) (http://lse.sourceforge.net/locking/rclock.html)  We can't do this now,
by any means, but I wanted to blurb the bug before I forgot again, in case we
want to revisit this code to reduce locking costs.

I'll look at the patch now: curious to see why we need to lock around
mDeferredComponents, since I thought that was only accessed during the
single-threaded initial registration process.  (That's why I didn't lock it in
the first place, maybe a mistake!)
Comment on attachment 77393 [details] [diff] [review]
uses a lock to protect mDeferredComponents

From above:

> I'm curious why this code allows for multiple calls to Init. Does that really
happen? If so is it protected from simultaneous calls?

I am not happy with this at all.  The component manager never reinits the
loader.  It does it once, then caches the loader.  It should be removed.  I
will
attack this with these changes.

-----

I don't see that attack here: we allocate a lock in Init, but don't acquire 
it, so it would seem that we could end up racing pretty freely through there.

What's the plan for this?  As the patch stands, the only thing the lock
protects is mDeferredComponents, which really seems like it should only be
touched during single-threaded autoreg.  Nay?
okay, i didn't remove the reinit code.  I will post a new patch will which
remove the check before creating the mLock and mDllStore.

nsNativeComponentLoader::Init() code does not have a race condition as it is
only called during PlatformInit() from the nsComponentManager.  This is run only
when XPCOM startups. 


WRT the mDeferredComponents lock.  I would be happy to remove the lock if we
only supported XPCOM registeration from a single thread.  Is that where we want
to be? 
I think we absolutely want to only permit autoregistration on the main thread. 
Do we use the deferral mechanism for directed registration, such as XPInstall? 
Either way, I'm more than happy requiring people to proxy such registration to
the main thread.  Thrilled, even.
I have no problems with that.  I will make a comment in the IDL and attach a new
patch.  Stay tuned.
Attached patch patch 3Splinter Review
thread restriction of AutoReg allows us not protect mDeferredComponents.
Attachment #77393 - Attachment is obsolete: true
Comment on attachment 78396 [details] [diff] [review]
patch 3

sr=jband. I'm obviously becoming a push over :)

I'd slap an NS_ASSERTION(!mDllStore,...) in there to catch 
unexpected cheaters.
Attachment #78396 - Flags: superreview+
right.  done.
Comment on attachment 78396 [details] [diff] [review]
patch 3

r=waterson, assuming jband's assertion is there.
Attachment #78396 - Flags: review+
Checking in nsIComponentRegistrar.idl;
/cvsroot/mozilla/xpcom/components/nsIComponentRegistrar.idl,v  <-- 
nsIComponentRegistrar.idl
new revision: 1.5; previous revision: 1.4
done
Checking in nsNativeComponentLoader.cpp;
/cvsroot/mozilla/xpcom/components/nsNativeComponentLoader.cpp,v  <-- 
nsNativeComponentLoader.cpp
new revision: 1.79; previous revision: 1.78
done
Comment on attachment 77398 [details] [diff] [review]
some comment to stop others from making my same mistakes

a=asa (on behalf of drivers) for checkin to the 1.0 branch
Attachment #77398 - Flags: approval+
Comment on attachment 78396 [details] [diff] [review]
patch 3

a=asa (on behalf of drivers) for checkin to the 1.0 branch
Attachment #78396 - Flags: approval+
landed on branch:

Checking in nsIComponentRegistrar.idl;
/cvsroot/mozilla/xpcom/components/nsIComponentRegistrar.idl,v  <-- 
nsIComponentRegistrar.idl
new revision: 1.4.2.2; previous revision: 1.4.2.1
done
Checking in nsNativeComponentLoader.cpp;
/cvsroot/mozilla/xpcom/components/nsNativeComponentLoader.cpp,v  <-- 
nsNativeComponentLoader.cpp
new revision: 1.78.10.2; previous revision: 1.78.10.1
done

Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
adding branch resolution keyword
Keywords: fixed1.0.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: