Closed
Bug 194568
Opened 22 years ago
Closed 22 years ago
Creating two instances of the XULPrototype Cache
Categories
(Core :: XPCOM, defect, P1)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla1.4alpha
People
(Reporter: mscott, Assigned: dougt)
References
Details
Attachments
(2 files, 1 obsolete file)
|
6.46 KB,
patch
|
alecf
:
superreview+
|
Details | Diff | Splinter Review |
|
4.39 KB,
patch
|
dbaron
:
review+
|
Details | Diff | Splinter Review |
I'm able to produce a scenario where we end up creating two instances of the
XULPrototype Cache such that the XBLService has one instance and XBLListeners
(and other parts of the product) use another instance.
The symptom of this problem is an infinite loop where we end up trying to reload
the same set of xbl bindings over and over again.
Here's how this situation can happen.
1) In nsChromeProtocolHandler::NewChannel, we create the an instance of the XUL
prototype cache service:
nsCOMPtr<nsIXULPrototypeCache> cache =
do_GetService(kXULPrototypeCacheCID, &rv);
IF we have not previously created a prototype cache, we do the following:
2) Go to the service manager which detects that we have not created an instance yet.
3) The xul prototype cache is registered by the layout module. So the Component
manager loads layout and initializes the layout module
4) The act of initializing the layout module causes the static method:
CSSFrameConstructor::InitGlobals to get called which creates an instance of the
XBL Service:
static nsresult InitGlobals() { return CallGetService("@mozilla.org/xbl;1",
&gXBLService); }
5) Now the service manager has not created the xbl service yet, so we got to the
component manager which creates our first instance. In the constructor for the
XBL Service, we attempt to create a XUL Prototype Cache:
nsXBLService::nsXBLService()
{
....
CallGetService("@mozilla.org/xul/xul-prototype-cache;1", &gXULCache);
}
6) This causes us to go back into the service manager, we still have not created
an instance of the xul prototype cache yet since we are still on the stack from
our attempt to create one back in step (1). So we create an instance and give it
back to the XBL Service.
7) Now the stack unwinds back down to the component manager's attempt to create
the xul prototype cache for the chrome protocol handler. We've successfully
loaded the layout module, so we now ask it to create the 2nd instance of the cache.
We now have two instances of the cache running at the same time. In my
particular case, nsXBLListener's end up using a different instance from the XBL
service, and this causes us to never find any bindings in the cache.
Comment 1•22 years ago
|
||
> So the Component manager loads layout and initializes the layout module
So we seem to have a few options:
1) Move the XUL prototype cache into the chrome stuff so it does not require
layout to be loaded
2) Find whoever is trying to open a chrome channel before layout has been inited
and make them stop. Who's the caller of NewChannel?
3) Make some of those getService calls lazy....
And this has come up before; it would be good to have the service manager
asserting if it's reentered for the same contractid/cid.
Comment 2•22 years ago
|
||
bz's questions are all good, but I have one too (I hope it's not dumb):
Why doesn't nsComponentManagerImpl::GetService in the outer activation (step 1,
the one looking by CID) re-try the lookup (around line 2006 of
nsComponentManger.cpp) and find the instance created by contract-id by the inner
activation (the one looking up by contract-id).
/be
| Assignee | ||
Comment 3•22 years ago
|
||
scott - can you provide a test case. I would be happy to debug this problem.
Calling GetService(CID) followed by GetService(ContractID) is known to work. I
just tested it on the 1.3 branch and it worked fine. Although these object's
are suppose to be used as services, this is the simple test i used:
static NS_DEFINE_CID(kPRBoolCID, NS_SUPPORTS_PRBOOL_CID);
int
main(int argc, char* argv[])
{
NS_InitXPCOM2(NULL, nsnull, nsnull);
nsCOMPtr<nsISupportsPRBool> a = do_GetService(kPRBoolCID);
nsCOMPtr<nsISupportsPRBool> b = do_GetService(NS_SUPPORTS_PRBOOL_CONTRACTID);
NS_ASSERTION(a == b, "ERROR!");
NS_ShutdownXPCOM(nsnull);
return 0;
}
It is a known problem that CreateInstance isn't thread safe. This problem may
be do to the factory implementation not properly implementing CreateInstance (or
rather, the generic module code). See bug 5795. Looking at the "constructor"
function of the xul prototype cache (the function that is called by the generic
module code), there is nothing to protect two objects from being created:
http://lxr.mozilla.org/seamonkey/source/content/xul/document/src/nsXULPrototypeCache.cpp#213
Scott, it would be interesting to just cache the value of the new and return it
if this function is called again.
Comment 4•22 years ago
|
||
> Calling GetService(CID) followed by GetService(ContractID) is known to work.
In this case the second GetService call comes from _inside_ the first GetService
call....
Comment 5•22 years ago
|
||
dougt: this is a legitimate bug, one that I've encountered myself before. Its a
hard problem to solve though. The call stack would look like this (theoretical..
I'm writing this off the top of my head)
do_GetService("component1")
nsComponentManager::CreateInstanceByCID()
nsGenericFactory::CreateInstance()
nsSomeServiceConstructor()
do_GetService("component2")
nsComponentManager::CreateInstanceByCID()
nsGenericFactory::CreateInstance()
nsSomeServiceConstructor()
do_GetService("component1")
what needs to happen is that nsComponentManager::CreateInstanceByCID needs to
keep a stack of services (probably just a reference to the CID that got passed
in each time) that it is in the process of creating, and when someone calls
do_GetService the list needs to be walked.. if the service in question is found,
then an assertion should be thrown and an error returned.
The component manager can't cache the instance of the service in question
because the initial object hasn't been returned by the factory constructor method.
| Reporter | ||
Comment 6•22 years ago
|
||
Boris was curious about who was calling nsChromeProtocolHandler::NewChannel
before layout has been initialized. In my particular case, I had two prefs set
in my prefs file:
user_pref("capability.principal.codebase.p1.granted", "UniversalXPConnect");
user_pref("capability.principal.codebase.p0.id", "http://kezwald.mcom.com");
On startup, we first ensure that we have a profile, which causes prefs to load.
Our script security manager is a prefs observer, looking for changes to
capability.principal.codebase.*. It gets notified when prefs load. The security
manager attempts to create a new URI out of http://kezwald.mcom.com. The http
protocol handler is created and during it's initialization, it attempts to load
a navigator string bundle in order to determine our locale. Loading the chrome
resource for the string bundle causes us to enter the chrome protocol handler.
Then we enter the scenario I described in the initial bug description.
Alec, your theoretical stack trace looked pretty much like what I had on my
machine over the weekend when I was running into this.
| Assignee | ||
Comment 7•22 years ago
|
||
over to me.
alec, thank you for the clarification. I will look into it this week.
Assignee: hyatt → dougt
Severity: normal → critical
Component: XP Toolkit/Widgets: XUL → XPCOM
OS: Windows 2000 → All
Priority: -- → P1
Hardware: PC → All
Target Milestone: --- → mozilla1.4alpha
Comment 8•22 years ago
|
||
I agree we should have assertions, but they may not indicate how to fix the
underlying problem.
One thought: for
nsSomeServiceConstructor()
do_GetService("component2")
we could require nsSomeServiceConstructor to re-lookup the service instance it
is trying to create, to see whether it was created by the nested call into the
component manager. Is there a way to do that?
/be
| Assignee | ||
Comment 9•22 years ago
|
||
we do provide a |isServiceInstantiated| method.
| Assignee | ||
Comment 10•22 years ago
|
||
i think we have to protect both the gs(cid) path, as well as the gs(contractID)
path.
| Assignee | ||
Comment 11•22 years ago
|
||
(a) instead of calling FindFactory() which only returns a nsIFactory, I am
instead calling what FindFactory does so that I can do two things - get access
to the CID in the CI(contractID) case and assert if someone is calling CI when
there exist a service for this same CID (a assertion nothing more).
(b) I am blocking all reentrant CID creation. I think this is what we want.
Can you look over the patch and try it out?
Alec - should I be using a nsSmallVoidArray?
Comment 12•22 years ago
|
||
I wonder if it would be faster to use the nsDoubleHashSet or whatever that
jkeiser invented.. it might be faster given that we don't want to spend all our
time looking up and down an array? But maybe for small sets its irrelevant? hmm...
as for nsSmallVoidArray, probably not - if anything you probably want
nsAutoVoidArray.. nsSmallVoidArray is for structures that might never have any
values, so a zero-length array takes up only a word as long as you never add
data to it.
| Assignee | ||
Comment 13•22 years ago
|
||
the list is quite short.
Updated•22 years ago
|
Attachment #116435 -
Flags: superreview?(alecf)
Comment 14•22 years ago
|
||
Comment on attachment 116435 [details] [diff] [review]
patch v.1
I'm not sure I agree that we need an assertion when you do a CreateInstance()
for something that is already a service - I think this occurs naturally in the
code.. perhaps a warning so we don't bug the crap out of developers
The other thing I was thinking was that perhaps the CID-checking should be
debug-only...if not for performance (which I suppose we'll see on tinderbox if
its an issue) - but also because we are CPU-intense as it is and this only
makes us worse at a key point (creation of objects) in XPCOM - if the
architecture of some component is flawed enough to cause this assertion, then
hopefully we'd start catching it in a debug build.
sr=alecf with the assertion changed to a warning, and optionally with the CID
list thing made debug-only.
Attachment #116435 -
Flags: superreview?(alecf) → superreview+
| Assignee | ||
Comment 15•22 years ago
|
||
I can agree with making this all debug-only. However, the problem will still
exist where you can create two instances of the XUL prototype cache by the steps
mscott outlined. Someone is going to have to solve the problem in the xul
prototype cache.
It is also fair to note that some developers will not be using a debug version
of XPCOM so detection of this class of bug will not occur. Also, we have to
consider the case where all component configuration combinations are not be
tested. For example, a user can download a dozen different components from
mozdev.org and we have no real clue how they interact. Any cycle that is
produce may lead to this class of problems. Returning an error from XPCOM
allows these cases to unwind instead of a much worse alternative.
I am going to check this in as debug only for now. Enabling it is something we
can consider tomorrow.
Flags: blocking1.4a?
| Assignee | ||
Comment 16•22 years ago
|
||
Checking in nsComponentManager.cpp;
/cvsroot/mozilla/xpcom/components/nsComponentManager.cpp,v <--
nsComponentManager.cpp
new revision: 1.226; previous revision: 1.225
done
Checking in nsComponentManager.h;
/cvsroot/mozilla/xpcom/components/nsComponentManager.h,v <-- nsComponentManager.h
new revision: 1.85; previous revision: 1.84
done
Comment 17•22 years ago
|
||
Um... there should never be a CreateInstance call for _services_ unless it's
called from GetService. There should never be a GetService for an existing
service calling CreateInstance unless we are recursing (as in this bug). Any
instance of such a recursion is a proble, in my opinion and needs to be detected
and either asserted on or (preferably) handled gracefully.
| Assignee | ||
Comment 18•22 years ago
|
||
bz - that is what we are now doing in debug builds. the question now stands -
do we do this in opt builds?
Comment 19•22 years ago
|
||
Right. Sorry, I didn't make it clear I was replying to comment 14's suggestion
to make it a warning instead of an assert.
Comment 20•22 years ago
|
||
no, I totally disagree. what I'm saying is that it IS reasonable to have
multiple instances of a class as well as have a service version of the class.
There's no reason that XPCOM should prevent that..
For example, something that I'm working on right now for bug 176559 - we need a
global nsINodeInfoManager to keep track of (and cache) the most commonly
accessed nsINodeInfo objects. But we also need a per-document nsINodeInfoManager
for nodes which are not in the "commonly accessed" list. The simplest and most
straight forward way to maintain the global version is to just do a GetService()
and let XPCOM handle the singleton aspect of that particular instance. From
there, each document will create their own nsINodeInfoManagers and refer to both
it and to the service version.
While it is certainly incorrect to call CreateInstance() on a object that is
SUPPOSED to be a service, I don't think you can make the blanket statement that
all class that are instantiated somewhere via GetService() cannot be
CreateInstanced()'d elsewhere.
Comment 21•22 years ago
|
||
ugh, and just for this reason I asked that the NS_ASSERTION be a NS_WARNING as a
part of the review. What happened there?
| Assignee | ||
Comment 22•22 years ago
|
||
alec, you must compile with SHOW_CI_ON_EXISTING_SERVICE defined to have this
extra ci check.
Where should this bug go now? Who owns the xul prototype cache or who has a
good argument why the reentrant CID creation check should be enabled?
| Assignee | ||
Comment 23•22 years ago
|
||
brendan - do you have any comments?
This seems like it's just the problem that was discussed in bug 5795 and bug
128646, so it's good to see a fix coming. It looks like it's already landed in
ifdefs, although I'm wondering why the SHOW_CI_ON_EXISTING_SERVICE ifdef doesn't
cover more of the code.
However, I wonder (along the lines of alecf's comments) why these checks aren't
in the GetService functions instead of the CreateInstance functions. I don't
think you'd want multiple calls to CreateInstance that happen to occur on
different threads at the same time to fail.
| Assignee | ||
Comment 25•22 years ago
|
||
dbaron - see comment 11 (b). Maybe it isn't what we want. :-)
Updated•22 years ago
|
Flags: blocking1.4a? → blocking1.4a-
| Reporter | ||
Comment 26•22 years ago
|
||
I still run into this infinite loop. Just found another code path that causes it
to happen. I'm always getting bit by it because the act of loading the layout
module, causes nsCSSFrameConstructor to get created which calls
nsCSSFrameConstructor::InitGlobals. InitGlobals calls get service on the xbl
service which in turn creates a xul prototype cache in it's constructor.
Any time a xul prototype cache object is instantiated through get service before
layout has been loaded, causes this infinite loop.
Just found a new case with minotaur where this happens if we don't have a
minotaur profile.
dbaron, is there a way we can break this cycle between the CSS frame
constructor, the xbl service and the xul prototype cache?
| Reporter | ||
Comment 27•22 years ago
|
||
Here's a less complicated stack trace from the one I showed before when I filed
the bug.
nsChromeRegistry::FlushCaches gets an instance of the xul prototype cache which
kicks off the chain of events I described in my previous comment 25.
nsChromeRegistry::FlushCaches() line 1420 + 30 bytes
nsChromeRegistry::LoadProfileDataSource() line 3219 + 11 bytes
nsChromeRegistry::Observe(nsChromeRegistry * const 0x00b6fee4, nsISupports *
0x00aefae0, const char * 0x0132f7e8, const unsigned short * 0x0132f668) line
3556 + 11 bytes
nsObserverService::NotifyObservers(nsObserverService * const 0x00a92980,
nsISupports * 0x00aefae0, const char * 0x0132f7e8, const unsigned short *
0x0132f668) line 212
nsProfile::SetCurrentProfile(nsProfile * const 0x00aefae0, const unsigned short
* 0x00b2ba08) line 1260
nsProfile::LoadDefaultProfileDir(nsCString & {...}, int 0x00000001) line 548 +
28 bytes
nsProfile::StartupWithArgs(nsProfile * const 0x00aefae0, nsICmdLineService *
0x00af0230, int 0x00000001) line 356 + 16 bytes
nsAppShellService::DoProfileStartup(nsAppShellService * const 0x00b5c3c8,
nsICmdLineService * 0x00af0230, int 0x00000001) line 273 + 31 bytes
InitializeProfileService(nsICmdLineService * 0x00af0230) line 908 + 31 bytes
main1(int 0x00000001, char * * 0x003677b8, nsISupports * 0x00ad7d68, const
nsXREAppData & {...}) line 1189 + 14 bytes
xre_main(int 0x00000001, char * * 0x003677b8, const nsXREAppData & {...}) line
1692 + 41 bytes
The easy fix is to make gXBLService lazily initialized. We'd have to fix this
anyway once the XPCOM bug is fixed.
| Assignee | ||
Comment 29•22 years ago
|
||
dbaron - see comment #20. This is all disabled. Are you suggestion that you
would want to see these checks enabled by default?
Yes, definitely, for threadsafety. I still think the checks are wrong, though.
| Assignee | ||
Comment 31•22 years ago
|
||
i have no problem enabling as long as performance isn't hit hard. alec, agree?
Comment 32•22 years ago
|
||
ok, so I'm all for turning on the checks for when a reentrant service creation
happens (i.e. you're creating service "Foo" which has the side effect of
creating service "bar" which in turn requests service "Foo") - but ONLY for
services. Not for normal createinstance stuff.
| Reporter | ||
Comment 33•22 years ago
|
||
Doug, in a debug build, I still run into this problem. I just noticed that
XPCOM_CHECK_PENDING_CIDS is defined for debug builds. Should I still be seeing
this problem with two instances of a service getting created with this define
enabled?
| Assignee | ||
Comment 34•22 years ago
|
||
scott, yes. you should be seeing an assertion, but as dbaron mentioned, the
check are in the wrong place.
| Reporter | ||
Comment 35•22 years ago
|
||
With this new updated patch Doug and I came up with, I now run into the xpcom
warning. The add and remove pendind CIDs needed to be placed in GetService not
in CreateInstance.
Note: The define is still only turned on for debug builds.
| Assignee | ||
Updated•22 years ago
|
Attachment #120003 -
Flags: review?(dbaron)
Comment on attachment 120003 [details] [diff] [review]
move add/remove pending cds into getservice not create instance
Both of the AddPendingCID sections are mis-indented. With this patch, all
AddPendingCID and RemovePendingCID calls are within the monitor, so I think you
should document that the caller must be inside the monitor and remove the
nsAutoMonitor inside AddPendingCID and RemovePendingCID. The NS_WARNING in
AddPendingCID should also mention GetService rather than CI, and should also
mention the possibility that the assertion is triggered by near-simultaneous
calls to GetService on multiple threads. (I've seen that happen.) I also
think you should get rid of the ifdefs.
| Assignee | ||
Comment 37•22 years ago
|
||
I added your suggestions, but stilll haven't removed the #ifdef's.
Attachment #120003 -
Attachment is obsolete: true
Attachment #120012 -
Flags: review+
Attachment #120003 -
Flags: review?(dbaron) → review-
| Reporter | ||
Comment 38•22 years ago
|
||
I filed Bug #201891 to break the XULPrototypeCache scenario which causes us to
try to create two instances of the xul prototype cache. If we fix that bug then
we won't see the warning message being introduced by Doug's patch here. And
Minotaur won't have these infinite loop issues =).
| Assignee | ||
Comment 39•22 years ago
|
||
i check the fix mscott and I worked on:
Checking in nsComponentManager.cpp;
/cvsroot/mozilla/xpcom/components/nsComponentManager.cpp,v <--
nsComponentManager.cpp
new revision: 1.231; previous revision: 1.230
done
marking as fixed.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 40•21 years ago
|
||
so, what does it mean if i hit this warning:
NS_WARNING("Creation in progress (Reentrant GS - see bug 194568)");
i am basically calling GetServiceByContractID for the same service on multiple
threads. does it mean that this bug isn't quite fixed?
Comment 41•21 years ago
|
||
It seems that it is not possible to call GetService for the same service from
two threads at the same time. The problem goes away if I create it once from
the main thread before spawning the worker threads. Afterwards it is ok if the
worker threads call GetService to acquire a reference to the service. Perhaps
we should enforce this somehow?
Comment 42•21 years ago
|
||
Darin, can you file a bug? It sounds like we simply are not distinguishing
reentry on the same thread from races among threads. If AddPending associated a
thread-id with the pending CID ....
/be
You need to log in
before you can comment on or make changes to this bug.
Description
•