Closed
Bug 209804
Opened 22 years ago
Closed 21 years ago
Assertion in xpcom/threads/nsAutoLock.cpp
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.7final
People
(Reporter: tenthumbs, Assigned: darin.moz)
References
Details
(Keywords: assertion, fixed-aviary1.0, fixed1.7.5)
Attachments
(3 files, 1 obsolete file)
46.41 KB,
text/plain
|
Details | |
1.15 KB,
patch
|
dougt
:
review+
|
Details | Diff | Splinter Review |
5.82 KB,
patch
|
dbaron
:
review+
dbaron
:
superreview+
asa
:
approval1.7-
roc
:
approval1.7.5+
|
Details | Diff | Splinter Review |
I started seeing this in my Linux builds at shutdown.
###!!! ASSERTION: potential deadlock between Monitor@80c25a0 and Lock@87c7390:
'Error', file nsAutoLock.cpp, line 265
Break: at file nsAutoLock.cpp, line 265
I have no idea how serious this is.
Comment 1•22 years ago
|
||
I don't have any idea either. I do not see this -- timeless, brendan?
It's disappeared, returned, and disappeared again. Right now it's gone.
Comment 4•22 years ago
|
||
important to show stacks for such assertions. I will factor stack (callsite
tree) tracing out of nsTraceMalloc, make it a primitive configurable that any
build can add and use. That should help fix bug 200972.
Someone mark this dup or worksforme, ok?
/be
Assignee | ||
Comment 5•22 years ago
|
||
tenthumbs: if you can get a stack trace, then please compare it to that in bug
210037.
Depends on: 210037
Comment 6•21 years ago
|
||
I'm seeing this and here is the stack:
Function |
Part
----------------------------------------------------------------------------+-----------------------------
_ZN11nsDebugImpl5AbortEPKci |
NSDEBUGIMPL.CPP
_ZN11nsDebugImpl5BreakEPKci |
NSDEBUGIMPL.CPP
_ZN11nsDebugImpl9AssertionEPKcS1_S1_i |
NSDEBUGIMPL.CPP
_ZN7nsDebug9AssertionEPKcS1_S1_i |
NSDEBUG.CPP
_ZN14nsAutoLockBaseC2EPvNS_14nsAutoLockTypeE |
NSAUTOLOCK.CPP
_ZN10nsAutoLockC1EP6PRLock |
NSCACHEENTRYDESCRIPTOR.CPP
_ZN22nsCacheEntryDescriptor5CloseEv |
NSCACHEENTRYDESCRIPTOR.CPP
_ZN22nsCacheEntryDescriptorD0Ev |
NSCACHEENTRYDESCRIPTOR.CPP
_ZN22nsCacheEntryDescriptor7ReleaseEv |
NSCACHEENTRYDESCRIPTOR.CPP
_ZN8nsCOMPtrI23nsICacheEntryDescriptorE22assign_assuming_AddRefEPS0_ |
IMGREQUEST.CPP
_ZN8nsCOMPtrI23nsICacheEntryDescriptorE18assign_with_AddRefEP11nsISupports |
IMGREQUEST.CPP
_ZN8nsCOMPtrI23nsICacheEntryDescriptorEaSEPS0_ |
IMGREQUEST.CPP
_ZN10imgRequest11RemoveProxyEP15imgRequestProxyji |
IMGREQUEST.CPP
_ZN15imgRequestProxy6CancelEj |
IMGREQUESTPROXY.CPP
_ZN12nsImageFrame8IconLoad8ShutdownEv |
NSLAYOUTMODULE.O
_ZN12nsImageFrame14ReleaseGlobalsEv |
NSLAYOUTMODULE.O
_Z8ShutdownP9nsIModule |
NSLAYOUTMODULE.O
_ZN15nsGenericModule8ShutdownEv |
NSGENERICFACTORY.CPP
_ZN15nsGenericModuleD1Ev |
NSGENERICFACTORY.CPP
_ZN15nsGenericModule7ReleaseEv |
NSGENERICFACTORY.CPP
_ZN5nsDll8ShutdownEv |
XCDLL.CPP
_Z13nsFreeLibraryP5nsDllP17nsIServiceManageri |
NSNATIVECOMPONENTLOADER.CPP
_Z17nsFreeLibraryEnumP9nsHashKeyPvS1_ |
NSNATIVECOMPONENTLOADER.CPP
_Z13hashEnumerateP12PLDHashTableP15PLDHashEntryHdrjPv |
NSHASHTABLE.CPP
PL_DHashTableEnumerate |
PLDHASH.C
_ZN11nsHashtable9EnumerateEPFiP9nsHashKeyPvS2_ES2_ |
NSHASHTABLE.CPP
_ZN23nsNativeComponentLoader9UnloadAllEi |
NSNATIVECOMPONENTLOADER.CPP
_ZN22nsComponentManagerImpl15UnloadLibrariesEP17nsIServiceManageri |
NSCOMPONENTMANAGER.CPP
_ZN22nsComponentManagerImpl8ShutdownEv |
NSCOMPONENTMANAGER.CPP
NS_ShutdownXPCOM |
NSXPCOMINIT.O
NS_ShutdownXPCOM |
NSXPCOMGLUE.CPP
GRE_Shutdown |
NSXPCOMGLUE.CPP
main |
NSAPPRUNNER.O
__text |
CRT0.OBJ
0x182915A6 |
libc04.dll:1
Comment 7•21 years ago
|
||
See bug 210037 comment 3, conclusion 2.
Whether we should avoid holding a cache service lock when calling into the XPCOM
component manager (conclusion 1 from that bug, applied to the cache code instead
of to NSS) is something dougt should comment upon.
It's never a good idea to call out of module while holding a lock. But if
enough code does that with XPCOM as the callee, maybe we just need to declare
XPCOM "most nested" in such locking scenarios, and all * -> XPCOM arcs in the
deadlock detection graph, but not XPCOM -> *.
/be
Comment 8•21 years ago
|
||
> XPCOM "most nested" in such locking scenarios, and all * -> XPCOM arcs in the
er, "allow * -> XPCOM arcs ...", not "all ...."
This XPCOM bug would stand for fixing the component manager to set aside the
list during shutdown, and exit its monitor; and also documenting and
evangelizing the nesting rule.
/be
Assignee | ||
Comment 9•21 years ago
|
||
(In reply to comment #7)
> See bug 210037 comment 3, conclusion 2.
>
> Whether we should avoid holding a cache service lock when calling into the XPCOM
> component manager (conclusion 1 from that bug, applied to the cache code instead
> of to NSS) is something dougt should comment upon.
>
> It's never a good idea to call out of module while holding a lock. But if
> enough code does that with XPCOM as the callee, maybe we just need to declare
> XPCOM "most nested" in such locking scenarios, and all * -> XPCOM arcs in the
> deadlock detection graph, but not XPCOM -> *.
>
> /be
i'm all for possibly changing the cache to not hold its lock while calling into
XPCOM. it's almost always a good idea in my experience to not do that. that
said, i'm not sure how easy it would be to change the cache.
Comment 10•21 years ago
|
||
Darin: I was proposing relieving you and many others from having to do that in
the particular case of calling the XPCOM component or service manager. That way
the burden would be on XPCOM to use thread-local temporaries and the like to
avoid nesting the other day.
/be
Assignee | ||
Comment 11•21 years ago
|
||
(In reply to comment #10)
> Darin: I was proposing relieving you and many others from having to do that in
> the particular case of calling the XPCOM component or service manager. That way
> the burden would be on XPCOM to use thread-local temporaries and the like to
> avoid nesting the other day.
yeah, i agree that that is a good plan. the component manager shouldn't need to
hold its monitor while shutting down. it should race to set the mShuttingDown
flag, and then all losers of that race should just return early with some error.
Comment 12•21 years ago
|
||
what darin said. we can remove the monitor in UnloadLibary and "protect"
mLoaderData behind a check against mShuttingDown. I am not sure why we would
have to create anything on the stack.
Comment 13•21 years ago
|
||
In an SMP complex with weak store order, you can't be sure everyone has seen the
store to mShuttingDown. So you can't be sure other threads aren't racing
through the hash table.
You need a memory barrier, which locks and monitors provide. So enter the
monitor, set mShuttingDown, set the hash table aside as cheaply as possible
(that is why I suggested copying the PLDHashTable to a stack temporary and
zapping the original's memory with memset(0)), leave the monitor, etc.
/be
Reporter | ||
Comment 14•21 years ago
|
||
I'm now seeing 10 of these at every shutdown. Here's all the stack traces. I'm
including all of them because they're similar but not that similar.
Comment 15•21 years ago
|
||
I just noticed this in WinXP in a debug trunk build. It might have been there
for a while since I don't normally start the debug build from the command line.
To reproduce:
Start Mozilla, select profile, close browser.
Environment is a custom debug build, vc6, ms sdk, winxp, xeon, with all
extensions enabled except xmlterm or negotiateauth and svg gdi+.
-> OS All
Stack follows:
NTDLL! 77f75a58()
nsDebugImpl::Assertion(nsDebugImpl * const 0x002e6d10, const char * 0x0012fa18,
const char * 0x100fecc4, const char * 0x100fec80, int 299) line 272
nsDebug::Assertion(const char * 0x0012fa18, const char * 0x100fecc4, const char
* 0x100fec80, int 299) line 109
nsAutoLockBase::nsAutoLockBase(void * 0x026e0f78, nsAutoLockBase::nsAutoLockType
eAutoLock) line 299 + 27 bytes
nsAutoLock::nsAutoLock(PRLock * 0x026e0f78) line 178 + 21 bytes
nsCacheEntryDescriptor::Close(nsCacheEntryDescriptor * const 0x02c0ff78) line 411
nsCacheEntryDescriptor::~nsCacheEntryDescriptor() line 51
nsCacheEntryDescriptor::`scalar deleting destructor'(unsigned int 1) + 15 bytes
nsCacheEntryDescriptor::Release(nsCacheEntryDescriptor * const 0x02c0ff78) line
36 + 144 bytes
nsCOMPtr<nsICacheEntryDescriptor>::assign_assuming_AddRef(nsICacheEntryDescriptor
* 0x00000000) line 495
nsCOMPtr<nsICacheEntryDescriptor>::assign_with_AddRef(nsISupports * 0x00000000)
line 1023
nsCOMPtr<nsICacheEntryDescriptor>::operator=(nsICacheEntryDescriptor *
0x00000000) line 608
imgRequest::RemoveProxy(imgRequestProxy * 0x02f115f0, unsigned int 2147500037,
int 0) line 168
imgRequestProxy::Cancel(imgRequestProxy * const 0x02f115f0, unsigned int
2147500037) line 209
nsImageFrame::IconLoad::Shutdown() line 287
nsImageFrame::ReleaseGlobals() line 142
Shutdown(nsIModule * 0x00a44358) line 405
nsGenericModule::Shutdown() line 344 + 10 bytes
nsGenericModule::~nsGenericModule() line 247
nsGenericModule::`scalar deleting destructor'(unsigned int 1) + 15 bytes
nsGenericModule::Release(nsGenericModule * const 0x00a44358) line 249 + 142 bytes
nsDll::Shutdown() line 380 + 18 bytes
nsFreeLibrary(nsDll * 0x00a1b058, nsIServiceManager * 0x00000000, int 3) line 275
nsFreeLibraryEnum(nsHashKey * 0x00a1b0e8, void * 0x00a1b058, void * 0x0012fe84)
line 344 + 64 bytes
hashEnumerate(PLDHashTable * 0x002ef768, PLDHashEntryHdr * 0x0099642c, unsigned
int 24, void * 0x0012fe68) line 115 + 26 bytes
PL_DHashTableEnumerate(PLDHashTable * 0x002ef768, int (PLDHashTable *,
PLDHashEntryHdr *, unsigned int, void *)* 0x10013d00 hashEnumerate(PLDHashTable
*, PLDHashEntryHdr *, unsigned int, void *), void * 0x0012fe68) line 619 + 34 bytes
nsHashtable::Enumerate(int (nsHashKey *, void *, void *)* 0x10063c30
nsFreeLibraryEnum(nsHashKey *, void *, void *), void * 0x0012fe84) line 303 + 21
bytes
nsNativeComponentLoader::UnloadAll(nsNativeComponentLoader * const 0x002ef720,
int 3) line 961
nsComponentManagerImpl::UnloadLibraries(nsIServiceManager * 0x00000000, int 3)
line 3139 + 22 bytes
nsComponentManagerImpl::Shutdown() line 878
NS_ShutdownXPCOM(nsIServiceManager * 0x00000000) line 776 + 11 bytes
NS_ShutdownXPCOM(nsIServiceManager * 0x00000000) line 184 + 10 bytes
GRE_Shutdown() line 468 + 7 bytes
main(int 1, char * * 0x002e2638) line 1724 + 5 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 77e814c
OS: Linux → All
Assignee | ||
Comment 17•21 years ago
|
||
taking
Assignee: dougt → darin
Target Milestone: --- → mozilla1.7final
Assignee | ||
Comment 18•21 years ago
|
||
Two solutions for this bug come to mind:
(1) Make nsComponentManagerImpl not hold mMon while calling UnloadAll on the
component loaders.
(2) Make nsImageFrame::ReleaseGlobals be called at xpcom-shutdown time instead
of from the layout module destructor.
CC'ing bz and dbaron to get their thoughts on option #2. (They seem to have CVS
blame for the IconLoad class.)
dougt, can you comment on option #1? I don't understand why that monitor needs
to be held while calling UnloadAll. We don't seem to hold the monitor while
calling other methods on the component loaders, so why is it important in
UnloadLibraries? Does it have something to do with
nsIComponentManagerObsolete::FreeLibraries (which calls UnloadLibraries also)?
Comment 19•21 years ago
|
||
I do not remember is there is any issue in removing this monitor. Seams like
since we are calling out and not protecting anything this monitor can be safely
removed.
Looking at very old versions of this file, it seams that we:
A) use to protect a hash during unload
B) then we added a callback to the thing that was being unloaded
C) then we seperated the hash removal and the callback notification
D) during this seperatation we left the monitor in both places.
Assignee | ||
Comment 20•21 years ago
|
||
dougt: thanks for gathering that history on the file. it does seem like the
monitor is only (mainly?) protecting the hash table today. i don't mind
removing the monitor on the trunk to see if we hit anything, but will we have
enough bake time on the trunk to know that it solves the problem well enough to
be able to take the patch on the 1.7 branch? maybe we'll just have to see ...
Comment 21•21 years ago
|
||
yes, if you remove this we would want a few weeks worth of bake time. I don't
think this monitor causes any user reported problems.
Assignee | ||
Comment 22•21 years ago
|
||
ok, here's the xpcom patch.
Assignee | ||
Updated•21 years ago
|
Attachment #146309 -
Flags: superreview?(brendan)
Attachment #146309 -
Flags: review?(dougt)
Comment 23•21 years ago
|
||
Comment on attachment 146309 [details] [diff] [review]
v1 patch
Doesn't Shutdown need to use the monitor, or some kind of thread barrier
(join), to avoid racing with other threads using XPCOM?
/be
Updated•21 years ago
|
Attachment #146309 -
Flags: review?(dougt) → review+
Assignee | ||
Comment 24•21 years ago
|
||
brendan: yes, and all of those other threads join with the main thread from the
xpcom-shutdown event handler. or, in the case of the DNS threads, they detach
and reference no more xpcom code or objects after xpcom-shutdown. this patch
shouldn't be a problem for mozilla since xpcom shutdown happens on the main
thread, and all of our threads go away cleanly essentially at the time when the
xpcom-shutdown notification gets sent out. the only real concern here is what
some module destructor might do. (the module destructor for nsImageFrame
apparently calls code libpr0n, which calls code in necko.)
Module destructors should never call into any other modules, period. (Consider
what would happen if we start unloading libraries, as we probably should have
from the start.) Anything that needs to call into other modules should be done
using an "xpcom-shutdown" nsIObserver.
Assignee | ||
Comment 26•21 years ago
|
||
(In reply to comment #25)
> Module destructors should never call into any other modules, period. (Consider
> what would happen if we start unloading libraries, as we probably should have
> from the start.) Anything that needs to call into other modules should be done
> using an "xpcom-shutdown" nsIObserver.
so, then you are advocating solution #2 from comment #18?
Comment 27•21 years ago
|
||
Attachment #146441 -
Flags: superreview?(dbaron)
Attachment #146441 -
Flags: review?(dbaron)
Assignee | ||
Comment 28•21 years ago
|
||
Comment on attachment 146441 [details] [diff] [review]
comment 18 - 2
>Index: nsImageFrame.h
> static void ReleaseGlobals() {
>+ NS_IF_RELEASE(gIconLoad);
> NS_IF_RELEASE(sIOService);
> }
you didn't mean to keep the release of sIOService here right? since
it isn't valid to call into another module from a module destructor,
it cannot be valid to call sIOService->Release() from here. therefore,
this code should be removed. perhaps it makes sense to assert that
sIOService is null instead.
Attachment #146441 -
Flags: review?(dbaron) → review-
I agree with comment 28. Also, rather than adding your own observer, I'd prefer
if you used ContentShutdownObserver in nsLayoutModule.cpp. Given that, this
could be a +1/-1 patch (just change where nsImageFrame::ReleaseGlobals is called).
Attachment #146441 -
Flags: superreview?(dbaron)
Assignee | ||
Comment 30•21 years ago
|
||
So, fixing nsImageFrame::ReleaseGlobals() to be called earlier (from
ContentShutdownObserver) is not enough. We have to move other things too, like
for instance nsLayoutStylesheetCache::Shutdown(). I'm going to try moving all
of layout shutdown into the xpcom-shutdown observer.
Assignee | ||
Comment 31•21 years ago
|
||
ok, this patch forces all layout shutdown to occur from the xpcom-shutdown
observer. it seems to work :-)
Attachment #146441 -
Attachment is obsolete: true
Assignee | ||
Updated•21 years ago
|
Attachment #146309 -
Flags: superreview?(brendan)
Assignee | ||
Updated•21 years ago
|
Attachment #147352 -
Flags: superreview?(dbaron)
Attachment #147352 -
Flags: review?(dbaron)
Comment on attachment 147352 [details] [diff] [review]
v2 patch
r+sr=dbaron. We probably don't need to move everything, but it shouldn't hurt
anything, I don't think...
Attachment #147352 -
Flags: superreview?(dbaron)
Attachment #147352 -
Flags: superreview+
Attachment #147352 -
Flags: review?(dbaron)
Attachment #147352 -
Flags: review+
Assignee | ||
Comment 33•21 years ago
|
||
fixed-on-trunk
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 34•21 years ago
|
||
Comment on attachment 147352 [details] [diff] [review]
v2 patch
Requesting approval to land this on the branch, but I'd like to give this some
reasonable bake time on the trunk first.
We also might want to just skip fixing this on the branch since I don't believe
that there is any real potential for a deadlock here.
At most it would be a good patch to accept on the branch for the purpose of
eliminating up the "scary looking" assertions.
Attachment #147352 -
Flags: approval1.7?
Without a report of deadlock actually occurring, I don't see the point of fixing
this on the branch.
Reporter | ||
Comment 36•21 years ago
|
||
I still got one in my trunk build from this morning.
Assignee | ||
Comment 37•21 years ago
|
||
(In reply to comment #36)
> I still got one in my trunk build from this morning.
tenthumbs: can you get a stack trace if possible? it is likely that this
assertion could happen elsewhere, and for that reason we should probably still
land the xpcom component manager change.
Updated•21 years ago
|
Attachment #147352 -
Flags: approval1.7? → approval1.7-
Updated•21 years ago
|
Flags: blocking1.7+ → blocking1.7-
Comment 38•21 years ago
|
||
Ben wants this on aviary, I'll do the honours today.
Comment 39•21 years ago
|
||
Done.
Attachment #147352 -
Flags: approval1.7.x?
Comment on attachment 147352 [details] [diff] [review]
v2 patch
Mike, please put this on 1.7,
Attachment #147352 -
Flags: approval1.7.x? → approval1.7.x+
Updated•20 years ago
|
Keywords: fixed-aviary1.0,
fixed1.7.5
You need to log in
before you can comment on or make changes to this bug.
Description
•