Closed Bug 209804 Opened 22 years ago Closed 21 years ago

Assertion in xpcom/threads/nsAutoLock.cpp

Categories

(Core :: XPCOM, defect)

x86
All
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla1.7final

People

(Reporter: tenthumbs, Assigned: darin.moz)

References

Details

(Keywords: assertion, fixed-aviary1.0, fixed1.7.5)

Attachments

(3 files, 1 obsolete file)

I started seeing this in my Linux builds at shutdown. ###!!! ASSERTION: potential deadlock between Monitor@80c25a0 and Lock@87c7390: 'Error', file nsAutoLock.cpp, line 265 Break: at file nsAutoLock.cpp, line 265 I have no idea how serious this is.
I don't have any idea either. I do not see this -- timeless, brendan?
It's disappeared, returned, and disappeared again. Right now it's gone.
offhand this is probably the psm lock bug. darin filed it
important to show stacks for such assertions. I will factor stack (callsite tree) tracing out of nsTraceMalloc, make it a primitive configurable that any build can add and use. That should help fix bug 200972. Someone mark this dup or worksforme, ok? /be
tenthumbs: if you can get a stack trace, then please compare it to that in bug 210037.
Depends on: 210037
Blocks: 160540
I'm seeing this and here is the stack: Function | Part ----------------------------------------------------------------------------+----------------------------- _ZN11nsDebugImpl5AbortEPKci | NSDEBUGIMPL.CPP _ZN11nsDebugImpl5BreakEPKci | NSDEBUGIMPL.CPP _ZN11nsDebugImpl9AssertionEPKcS1_S1_i | NSDEBUGIMPL.CPP _ZN7nsDebug9AssertionEPKcS1_S1_i | NSDEBUG.CPP _ZN14nsAutoLockBaseC2EPvNS_14nsAutoLockTypeE | NSAUTOLOCK.CPP _ZN10nsAutoLockC1EP6PRLock | NSCACHEENTRYDESCRIPTOR.CPP _ZN22nsCacheEntryDescriptor5CloseEv | NSCACHEENTRYDESCRIPTOR.CPP _ZN22nsCacheEntryDescriptorD0Ev | NSCACHEENTRYDESCRIPTOR.CPP _ZN22nsCacheEntryDescriptor7ReleaseEv | NSCACHEENTRYDESCRIPTOR.CPP _ZN8nsCOMPtrI23nsICacheEntryDescriptorE22assign_assuming_AddRefEPS0_ | IMGREQUEST.CPP _ZN8nsCOMPtrI23nsICacheEntryDescriptorE18assign_with_AddRefEP11nsISupports | IMGREQUEST.CPP _ZN8nsCOMPtrI23nsICacheEntryDescriptorEaSEPS0_ | IMGREQUEST.CPP _ZN10imgRequest11RemoveProxyEP15imgRequestProxyji | IMGREQUEST.CPP _ZN15imgRequestProxy6CancelEj | IMGREQUESTPROXY.CPP _ZN12nsImageFrame8IconLoad8ShutdownEv | NSLAYOUTMODULE.O _ZN12nsImageFrame14ReleaseGlobalsEv | NSLAYOUTMODULE.O _Z8ShutdownP9nsIModule | NSLAYOUTMODULE.O _ZN15nsGenericModule8ShutdownEv | NSGENERICFACTORY.CPP _ZN15nsGenericModuleD1Ev | NSGENERICFACTORY.CPP _ZN15nsGenericModule7ReleaseEv | NSGENERICFACTORY.CPP _ZN5nsDll8ShutdownEv | XCDLL.CPP _Z13nsFreeLibraryP5nsDllP17nsIServiceManageri | NSNATIVECOMPONENTLOADER.CPP _Z17nsFreeLibraryEnumP9nsHashKeyPvS1_ | NSNATIVECOMPONENTLOADER.CPP _Z13hashEnumerateP12PLDHashTableP15PLDHashEntryHdrjPv | NSHASHTABLE.CPP PL_DHashTableEnumerate | PLDHASH.C _ZN11nsHashtable9EnumerateEPFiP9nsHashKeyPvS2_ES2_ | NSHASHTABLE.CPP _ZN23nsNativeComponentLoader9UnloadAllEi | NSNATIVECOMPONENTLOADER.CPP _ZN22nsComponentManagerImpl15UnloadLibrariesEP17nsIServiceManageri | NSCOMPONENTMANAGER.CPP _ZN22nsComponentManagerImpl8ShutdownEv | NSCOMPONENTMANAGER.CPP NS_ShutdownXPCOM | NSXPCOMINIT.O NS_ShutdownXPCOM | NSXPCOMGLUE.CPP GRE_Shutdown | NSXPCOMGLUE.CPP main | NSAPPRUNNER.O __text | CRT0.OBJ 0x182915A6 | libc04.dll:1
See bug 210037 comment 3, conclusion 2. Whether we should avoid holding a cache service lock when calling into the XPCOM component manager (conclusion 1 from that bug, applied to the cache code instead of to NSS) is something dougt should comment upon. It's never a good idea to call out of module while holding a lock. But if enough code does that with XPCOM as the callee, maybe we just need to declare XPCOM "most nested" in such locking scenarios, and all * -> XPCOM arcs in the deadlock detection graph, but not XPCOM -> *. /be
> XPCOM "most nested" in such locking scenarios, and all * -> XPCOM arcs in the er, "allow * -> XPCOM arcs ...", not "all ...." This XPCOM bug would stand for fixing the component manager to set aside the list during shutdown, and exit its monitor; and also documenting and evangelizing the nesting rule. /be
(In reply to comment #7) > See bug 210037 comment 3, conclusion 2. > > Whether we should avoid holding a cache service lock when calling into the XPCOM > component manager (conclusion 1 from that bug, applied to the cache code instead > of to NSS) is something dougt should comment upon. > > It's never a good idea to call out of module while holding a lock. But if > enough code does that with XPCOM as the callee, maybe we just need to declare > XPCOM "most nested" in such locking scenarios, and all * -> XPCOM arcs in the > deadlock detection graph, but not XPCOM -> *. > > /be i'm all for possibly changing the cache to not hold its lock while calling into XPCOM. it's almost always a good idea in my experience to not do that. that said, i'm not sure how easy it would be to change the cache.
Darin: I was proposing relieving you and many others from having to do that in the particular case of calling the XPCOM component or service manager. That way the burden would be on XPCOM to use thread-local temporaries and the like to avoid nesting the other day. /be
(In reply to comment #10) > Darin: I was proposing relieving you and many others from having to do that in > the particular case of calling the XPCOM component or service manager. That way > the burden would be on XPCOM to use thread-local temporaries and the like to > avoid nesting the other day. yeah, i agree that that is a good plan. the component manager shouldn't need to hold its monitor while shutting down. it should race to set the mShuttingDown flag, and then all losers of that race should just return early with some error.
what darin said. we can remove the monitor in UnloadLibary and "protect" mLoaderData behind a check against mShuttingDown. I am not sure why we would have to create anything on the stack.
In an SMP complex with weak store order, you can't be sure everyone has seen the store to mShuttingDown. So you can't be sure other threads aren't racing through the hash table. You need a memory barrier, which locks and monitors provide. So enter the monitor, set mShuttingDown, set the hash table aside as cheaply as possible (that is why I suggested copying the PLDHashTable to a stack temporary and zapping the original's memory with memset(0)), leave the monitor, etc. /be
Attached file stack traces
I'm now seeing 10 of these at every shutdown. Here's all the stack traces. I'm including all of them because they're similar but not that similar.
I just noticed this in WinXP in a debug trunk build. It might have been there for a while since I don't normally start the debug build from the command line. To reproduce: Start Mozilla, select profile, close browser. Environment is a custom debug build, vc6, ms sdk, winxp, xeon, with all extensions enabled except xmlterm or negotiateauth and svg gdi+. -> OS All Stack follows: NTDLL! 77f75a58() nsDebugImpl::Assertion(nsDebugImpl * const 0x002e6d10, const char * 0x0012fa18, const char * 0x100fecc4, const char * 0x100fec80, int 299) line 272 nsDebug::Assertion(const char * 0x0012fa18, const char * 0x100fecc4, const char * 0x100fec80, int 299) line 109 nsAutoLockBase::nsAutoLockBase(void * 0x026e0f78, nsAutoLockBase::nsAutoLockType eAutoLock) line 299 + 27 bytes nsAutoLock::nsAutoLock(PRLock * 0x026e0f78) line 178 + 21 bytes nsCacheEntryDescriptor::Close(nsCacheEntryDescriptor * const 0x02c0ff78) line 411 nsCacheEntryDescriptor::~nsCacheEntryDescriptor() line 51 nsCacheEntryDescriptor::`scalar deleting destructor'(unsigned int 1) + 15 bytes nsCacheEntryDescriptor::Release(nsCacheEntryDescriptor * const 0x02c0ff78) line 36 + 144 bytes nsCOMPtr<nsICacheEntryDescriptor>::assign_assuming_AddRef(nsICacheEntryDescriptor * 0x00000000) line 495 nsCOMPtr<nsICacheEntryDescriptor>::assign_with_AddRef(nsISupports * 0x00000000) line 1023 nsCOMPtr<nsICacheEntryDescriptor>::operator=(nsICacheEntryDescriptor * 0x00000000) line 608 imgRequest::RemoveProxy(imgRequestProxy * 0x02f115f0, unsigned int 2147500037, int 0) line 168 imgRequestProxy::Cancel(imgRequestProxy * const 0x02f115f0, unsigned int 2147500037) line 209 nsImageFrame::IconLoad::Shutdown() line 287 nsImageFrame::ReleaseGlobals() line 142 Shutdown(nsIModule * 0x00a44358) line 405 nsGenericModule::Shutdown() line 344 + 10 bytes nsGenericModule::~nsGenericModule() line 247 nsGenericModule::`scalar deleting destructor'(unsigned int 1) + 15 bytes nsGenericModule::Release(nsGenericModule * const 0x00a44358) line 249 + 142 bytes nsDll::Shutdown() line 380 + 18 bytes nsFreeLibrary(nsDll * 0x00a1b058, nsIServiceManager * 0x00000000, int 3) line 275 nsFreeLibraryEnum(nsHashKey * 0x00a1b0e8, void * 0x00a1b058, void * 0x0012fe84) line 344 + 64 bytes hashEnumerate(PLDHashTable * 0x002ef768, PLDHashEntryHdr * 0x0099642c, unsigned int 24, void * 0x0012fe68) line 115 + 26 bytes PL_DHashTableEnumerate(PLDHashTable * 0x002ef768, int (PLDHashTable *, PLDHashEntryHdr *, unsigned int, void *)* 0x10013d00 hashEnumerate(PLDHashTable *, PLDHashEntryHdr *, unsigned int, void *), void * 0x0012fe68) line 619 + 34 bytes nsHashtable::Enumerate(int (nsHashKey *, void *, void *)* 0x10063c30 nsFreeLibraryEnum(nsHashKey *, void *, void *), void * 0x0012fe84) line 303 + 21 bytes nsNativeComponentLoader::UnloadAll(nsNativeComponentLoader * const 0x002ef720, int 3) line 961 nsComponentManagerImpl::UnloadLibraries(nsIServiceManager * 0x00000000, int 3) line 3139 + 22 bytes nsComponentManagerImpl::Shutdown() line 878 NS_ShutdownXPCOM(nsIServiceManager * 0x00000000) line 776 + 11 bytes NS_ShutdownXPCOM(nsIServiceManager * 0x00000000) line 184 + 10 bytes GRE_Shutdown() line 468 + 7 bytes main(int 1, char * * 0x002e2638) line 1724 + 5 bytes mainCRTStartup() line 338 + 17 bytes KERNEL32! 77e814c
OS: Linux → All
I want this looked at for 1.7.
Severity: normal → major
Flags: blocking1.7+
taking
Assignee: dougt → darin
Target Milestone: --- → mozilla1.7final
Two solutions for this bug come to mind: (1) Make nsComponentManagerImpl not hold mMon while calling UnloadAll on the component loaders. (2) Make nsImageFrame::ReleaseGlobals be called at xpcom-shutdown time instead of from the layout module destructor. CC'ing bz and dbaron to get their thoughts on option #2. (They seem to have CVS blame for the IconLoad class.) dougt, can you comment on option #1? I don't understand why that monitor needs to be held while calling UnloadAll. We don't seem to hold the monitor while calling other methods on the component loaders, so why is it important in UnloadLibraries? Does it have something to do with nsIComponentManagerObsolete::FreeLibraries (which calls UnloadLibraries also)?
I do not remember is there is any issue in removing this monitor. Seams like since we are calling out and not protecting anything this monitor can be safely removed. Looking at very old versions of this file, it seams that we: A) use to protect a hash during unload B) then we added a callback to the thing that was being unloaded C) then we seperated the hash removal and the callback notification D) during this seperatation we left the monitor in both places.
dougt: thanks for gathering that history on the file. it does seem like the monitor is only (mainly?) protecting the hash table today. i don't mind removing the monitor on the trunk to see if we hit anything, but will we have enough bake time on the trunk to know that it solves the problem well enough to be able to take the patch on the 1.7 branch? maybe we'll just have to see ...
yes, if you remove this we would want a few weeks worth of bake time. I don't think this monitor causes any user reported problems.
Attached patch v1 patchSplinter Review
ok, here's the xpcom patch.
Attachment #146309 - Flags: superreview?(brendan)
Attachment #146309 - Flags: review?(dougt)
Comment on attachment 146309 [details] [diff] [review] v1 patch Doesn't Shutdown need to use the monitor, or some kind of thread barrier (join), to avoid racing with other threads using XPCOM? /be
Attachment #146309 - Flags: review?(dougt) → review+
brendan: yes, and all of those other threads join with the main thread from the xpcom-shutdown event handler. or, in the case of the DNS threads, they detach and reference no more xpcom code or objects after xpcom-shutdown. this patch shouldn't be a problem for mozilla since xpcom shutdown happens on the main thread, and all of our threads go away cleanly essentially at the time when the xpcom-shutdown notification gets sent out. the only real concern here is what some module destructor might do. (the module destructor for nsImageFrame apparently calls code libpr0n, which calls code in necko.)
Module destructors should never call into any other modules, period. (Consider what would happen if we start unloading libraries, as we probably should have from the start.) Anything that needs to call into other modules should be done using an "xpcom-shutdown" nsIObserver.
(In reply to comment #25) > Module destructors should never call into any other modules, period. (Consider > what would happen if we start unloading libraries, as we probably should have > from the start.) Anything that needs to call into other modules should be done > using an "xpcom-shutdown" nsIObserver. so, then you are advocating solution #2 from comment #18?
Attached patch comment 18 - 2 (obsolete) — Splinter Review
Attachment #146441 - Flags: superreview?(dbaron)
Attachment #146441 - Flags: review?(dbaron)
Comment on attachment 146441 [details] [diff] [review] comment 18 - 2 >Index: nsImageFrame.h > static void ReleaseGlobals() { >+ NS_IF_RELEASE(gIconLoad); > NS_IF_RELEASE(sIOService); > } you didn't mean to keep the release of sIOService here right? since it isn't valid to call into another module from a module destructor, it cannot be valid to call sIOService->Release() from here. therefore, this code should be removed. perhaps it makes sense to assert that sIOService is null instead.
Attachment #146441 - Flags: review?(dbaron) → review-
I agree with comment 28. Also, rather than adding your own observer, I'd prefer if you used ContentShutdownObserver in nsLayoutModule.cpp. Given that, this could be a +1/-1 patch (just change where nsImageFrame::ReleaseGlobals is called).
Attachment #146441 - Flags: superreview?(dbaron)
So, fixing nsImageFrame::ReleaseGlobals() to be called earlier (from ContentShutdownObserver) is not enough. We have to move other things too, like for instance nsLayoutStylesheetCache::Shutdown(). I'm going to try moving all of layout shutdown into the xpcom-shutdown observer.
Attached patch v2 patchSplinter Review
ok, this patch forces all layout shutdown to occur from the xpcom-shutdown observer. it seems to work :-)
Attachment #146441 - Attachment is obsolete: true
Attachment #146309 - Flags: superreview?(brendan)
Attachment #147352 - Flags: superreview?(dbaron)
Attachment #147352 - Flags: review?(dbaron)
Comment on attachment 147352 [details] [diff] [review] v2 patch r+sr=dbaron. We probably don't need to move everything, but it shouldn't hurt anything, I don't think...
Attachment #147352 - Flags: superreview?(dbaron)
Attachment #147352 - Flags: superreview+
Attachment #147352 - Flags: review?(dbaron)
Attachment #147352 - Flags: review+
fixed-on-trunk
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment on attachment 147352 [details] [diff] [review] v2 patch Requesting approval to land this on the branch, but I'd like to give this some reasonable bake time on the trunk first. We also might want to just skip fixing this on the branch since I don't believe that there is any real potential for a deadlock here. At most it would be a good patch to accept on the branch for the purpose of eliminating up the "scary looking" assertions.
Attachment #147352 - Flags: approval1.7?
Without a report of deadlock actually occurring, I don't see the point of fixing this on the branch.
I still got one in my trunk build from this morning.
(In reply to comment #36) > I still got one in my trunk build from this morning. tenthumbs: can you get a stack trace if possible? it is likely that this assertion could happen elsewhere, and for that reason we should probably still land the xpcom component manager change.
Attachment #147352 - Flags: approval1.7? → approval1.7-
Flags: blocking1.7+ → blocking1.7-
Ben wants this on aviary, I'll do the honours today.
No longer blocks: 160540
Attachment #147352 - Flags: approval1.7.x?
Comment on attachment 147352 [details] [diff] [review] v2 patch Mike, please put this on 1.7,
Attachment #147352 - Flags: approval1.7.x? → approval1.7.x+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: