209804 - Assertion in xpcom/threads/nsAutoLock.cpp

Reporter

Description

•

22 years ago

I started seeing this in my Linux builds at shutdown. ###!!! ASSERTION: potential deadlock between Monitor@80c25a0 and Lock@87c7390: 'Error', file nsAutoLock.cpp, line 265 Break: at file nsAutoLock.cpp, line 265 I have no idea how serious this is.

Doug Turner (:dougt)

Comment 1

•

22 years ago

I don't have any idea either. I do not see this -- timeless, brendan?

tenthumbs

Reporter

Comment 2

•

22 years ago

It's disappeared, returned, and disappeared again. Right now it's gone.

timeless

Comment 3

•

22 years ago

offhand this is probably the psm lock bug. darin filed it

Brendan Eich [:brendan]

Comment 4

•

22 years ago

important to show stacks for such assertions. I will factor stack (callsite tree) tracing out of nsTraceMalloc, make it a primitive configurable that any build can add and use. That should help fix bug 200972. Someone mark this dup or worksforme, ok? /be

Darin Fisher

Assignee

Comment 5

•

22 years ago

tenthumbs: if you can get a stack trace, then please compare it to that in bug 210037.

Depends on: 210037

Nicholas Allen

Updated

•

21 years ago

Blocks: 160540

Mike Kaply [:mkaply]

Comment 6

•

21 years ago

I'm seeing this and here is the stack: Function | Part ----------------------------------------------------------------------------+----------------------------- _ZN11nsDebugImpl5AbortEPKci | NSDEBUGIMPL.CPP _ZN11nsDebugImpl5BreakEPKci | NSDEBUGIMPL.CPP _ZN11nsDebugImpl9AssertionEPKcS1_S1_i | NSDEBUGIMPL.CPP _ZN7nsDebug9AssertionEPKcS1_S1_i | NSDEBUG.CPP _ZN14nsAutoLockBaseC2EPvNS_14nsAutoLockTypeE | NSAUTOLOCK.CPP _ZN10nsAutoLockC1EP6PRLock | NSCACHEENTRYDESCRIPTOR.CPP _ZN22nsCacheEntryDescriptor5CloseEv | NSCACHEENTRYDESCRIPTOR.CPP _ZN22nsCacheEntryDescriptorD0Ev | NSCACHEENTRYDESCRIPTOR.CPP _ZN22nsCacheEntryDescriptor7ReleaseEv | NSCACHEENTRYDESCRIPTOR.CPP _ZN8nsCOMPtrI23nsICacheEntryDescriptorE22assign_assuming_AddRefEPS0_ | IMGREQUEST.CPP _ZN8nsCOMPtrI23nsICacheEntryDescriptorE18assign_with_AddRefEP11nsISupports | IMGREQUEST.CPP _ZN8nsCOMPtrI23nsICacheEntryDescriptorEaSEPS0_ | IMGREQUEST.CPP _ZN10imgRequest11RemoveProxyEP15imgRequestProxyji | IMGREQUEST.CPP _ZN15imgRequestProxy6CancelEj | IMGREQUESTPROXY.CPP _ZN12nsImageFrame8IconLoad8ShutdownEv | NSLAYOUTMODULE.O _ZN12nsImageFrame14ReleaseGlobalsEv | NSLAYOUTMODULE.O _Z8ShutdownP9nsIModule | NSLAYOUTMODULE.O _ZN15nsGenericModule8ShutdownEv | NSGENERICFACTORY.CPP _ZN15nsGenericModuleD1Ev | NSGENERICFACTORY.CPP _ZN15nsGenericModule7ReleaseEv | NSGENERICFACTORY.CPP _ZN5nsDll8ShutdownEv | XCDLL.CPP _Z13nsFreeLibraryP5nsDllP17nsIServiceManageri | NSNATIVECOMPONENTLOADER.CPP _Z17nsFreeLibraryEnumP9nsHashKeyPvS1_ | NSNATIVECOMPONENTLOADER.CPP _Z13hashEnumerateP12PLDHashTableP15PLDHashEntryHdrjPv | NSHASHTABLE.CPP PL_DHashTableEnumerate | PLDHASH.C _ZN11nsHashtable9EnumerateEPFiP9nsHashKeyPvS2_ES2_ | NSHASHTABLE.CPP _ZN23nsNativeComponentLoader9UnloadAllEi | NSNATIVECOMPONENTLOADER.CPP _ZN22nsComponentManagerImpl15UnloadLibrariesEP17nsIServiceManageri | NSCOMPONENTMANAGER.CPP _ZN22nsComponentManagerImpl8ShutdownEv | NSCOMPONENTMANAGER.CPP NS_ShutdownXPCOM | NSXPCOMINIT.O NS_ShutdownXPCOM | NSXPCOMGLUE.CPP GRE_Shutdown | NSXPCOMGLUE.CPP main | NSAPPRUNNER.O __text | CRT0.OBJ 0x182915A6 | libc04.dll:1

Brendan Eich [:brendan]

Comment 7

•

21 years ago

See bug 210037 comment 3, conclusion 2. Whether we should avoid holding a cache service lock when calling into the XPCOM component manager (conclusion 1 from that bug, applied to the cache code instead of to NSS) is something dougt should comment upon. It's never a good idea to call out of module while holding a lock. But if enough code does that with XPCOM as the callee, maybe we just need to declare XPCOM "most nested" in such locking scenarios, and all * -> XPCOM arcs in the deadlock detection graph, but not XPCOM -> *. /be

Brendan Eich [:brendan]

Comment 8

•

21 years ago

> XPCOM "most nested" in such locking scenarios, and all * -> XPCOM arcs in the er, "allow * -> XPCOM arcs ...", not "all ...." This XPCOM bug would stand for fixing the component manager to set aside the list during shutdown, and exit its monitor; and also documenting and evangelizing the nesting rule. /be

Darin Fisher

Assignee

Comment 9

•

21 years ago

(In reply to comment #7) > See bug 210037 comment 3, conclusion 2. > > Whether we should avoid holding a cache service lock when calling into the XPCOM > component manager (conclusion 1 from that bug, applied to the cache code instead > of to NSS) is something dougt should comment upon. > > It's never a good idea to call out of module while holding a lock. But if > enough code does that with XPCOM as the callee, maybe we just need to declare > XPCOM "most nested" in such locking scenarios, and all * -> XPCOM arcs in the > deadlock detection graph, but not XPCOM -> *. > > /be i'm all for possibly changing the cache to not hold its lock while calling into XPCOM. it's almost always a good idea in my experience to not do that. that said, i'm not sure how easy it would be to change the cache.

Brendan Eich [:brendan]

Comment 10

•

21 years ago

Darin: I was proposing relieving you and many others from having to do that in the particular case of calling the XPCOM component or service manager. That way the burden would be on XPCOM to use thread-local temporaries and the like to avoid nesting the other day. /be

Darin Fisher

Assignee

Comment 11

•

21 years ago

(In reply to comment #10) > Darin: I was proposing relieving you and many others from having to do that in > the particular case of calling the XPCOM component or service manager. That way > the burden would be on XPCOM to use thread-local temporaries and the like to > avoid nesting the other day. yeah, i agree that that is a good plan. the component manager shouldn't need to hold its monitor while shutting down. it should race to set the mShuttingDown flag, and then all losers of that race should just return early with some error.

Doug Turner (:dougt)

Comment 12

•

21 years ago

what darin said. we can remove the monitor in UnloadLibary and "protect" mLoaderData behind a check against mShuttingDown. I am not sure why we would have to create anything on the stack.

Brendan Eich [:brendan]

Comment 13

•

21 years ago

In an SMP complex with weak store order, you can't be sure everyone has seen the store to mShuttingDown. So you can't be sure other threads aren't racing through the hash table. You need a memory barrier, which locks and monitors provide. So enter the monitor, set mShuttingDown, set the hash table aside as cheaply as possible (that is why I suggested copying the PLDHashTable to a stack temporary and zapping the original's memory with memset(0)), leave the monitor, etc. /be

tenthumbs

Reporter

Comment 14

•

21 years ago

Attached file stack traces — Details

I'm now seeing 10 of these at every shutdown. Here's all the stack traces. I'm including all of them because they're similar but not that similar.

Bob Clary [:bc] (inactive)

Comment 15

•

21 years ago

I just noticed this in WinXP in a debug trunk build. It might have been there for a while since I don't normally start the debug build from the command line. To reproduce: Start Mozilla, select profile, close browser. Environment is a custom debug build, vc6, ms sdk, winxp, xeon, with all extensions enabled except xmlterm or negotiateauth and svg gdi+. -> OS All Stack follows: NTDLL! 77f75a58() nsDebugImpl::Assertion(nsDebugImpl * const 0x002e6d10, const char * 0x0012fa18, const char * 0x100fecc4, const char * 0x100fec80, int 299) line 272 nsDebug::Assertion(const char * 0x0012fa18, const char * 0x100fecc4, const char * 0x100fec80, int 299) line 109 nsAutoLockBase::nsAutoLockBase(void * 0x026e0f78, nsAutoLockBase::nsAutoLockType eAutoLock) line 299 + 27 bytes nsAutoLock::nsAutoLock(PRLock * 0x026e0f78) line 178 + 21 bytes nsCacheEntryDescriptor::Close(nsCacheEntryDescriptor * const 0x02c0ff78) line 411 nsCacheEntryDescriptor::~nsCacheEntryDescriptor() line 51 nsCacheEntryDescriptor::`scalar deleting destructor'(unsigned int 1) + 15 bytes nsCacheEntryDescriptor::Release(nsCacheEntryDescriptor * const 0x02c0ff78) line 36 + 144 bytes nsCOMPtr<nsICacheEntryDescriptor>::assign_assuming_AddRef(nsICacheEntryDescriptor * 0x00000000) line 495 nsCOMPtr<nsICacheEntryDescriptor>::assign_with_AddRef(nsISupports * 0x00000000) line 1023 nsCOMPtr<nsICacheEntryDescriptor>::operator=(nsICacheEntryDescriptor * 0x00000000) line 608 imgRequest::RemoveProxy(imgRequestProxy * 0x02f115f0, unsigned int 2147500037, int 0) line 168 imgRequestProxy::Cancel(imgRequestProxy * const 0x02f115f0, unsigned int 2147500037) line 209 nsImageFrame::IconLoad::Shutdown() line 287 nsImageFrame::ReleaseGlobals() line 142 Shutdown(nsIModule * 0x00a44358) line 405 nsGenericModule::Shutdown() line 344 + 10 bytes nsGenericModule::~nsGenericModule() line 247 nsGenericModule::`scalar deleting destructor'(unsigned int 1) + 15 bytes nsGenericModule::Release(nsGenericModule * const 0x00a44358) line 249 + 142 bytes nsDll::Shutdown() line 380 + 18 bytes nsFreeLibrary(nsDll * 0x00a1b058, nsIServiceManager * 0x00000000, int 3) line 275 nsFreeLibraryEnum(nsHashKey * 0x00a1b0e8, void * 0x00a1b058, void * 0x0012fe84) line 344 + 64 bytes hashEnumerate(PLDHashTable * 0x002ef768, PLDHashEntryHdr * 0x0099642c, unsigned int 24, void * 0x0012fe68) line 115 + 26 bytes PL_DHashTableEnumerate(PLDHashTable * 0x002ef768, int (PLDHashTable *, PLDHashEntryHdr *, unsigned int, void *)* 0x10013d00 hashEnumerate(PLDHashTable *, PLDHashEntryHdr *, unsigned int, void *), void * 0x0012fe68) line 619 + 34 bytes nsHashtable::Enumerate(int (nsHashKey *, void *, void *)* 0x10063c30 nsFreeLibraryEnum(nsHashKey *, void *, void *), void * 0x0012fe84) line 303 + 21 bytes nsNativeComponentLoader::UnloadAll(nsNativeComponentLoader * const 0x002ef720, int 3) line 961 nsComponentManagerImpl::UnloadLibraries(nsIServiceManager * 0x00000000, int 3) line 3139 + 22 bytes nsComponentManagerImpl::Shutdown() line 878 NS_ShutdownXPCOM(nsIServiceManager * 0x00000000) line 776 + 11 bytes NS_ShutdownXPCOM(nsIServiceManager * 0x00000000) line 184 + 10 bytes GRE_Shutdown() line 468 + 7 bytes main(int 1, char * * 0x002e2638) line 1724 + 5 bytes mainCRTStartup() line 338 + 17 bytes KERNEL32! 77e814c

OS: Linux → All

Mike Kaply [:mkaply]

Comment 16

•

21 years ago

I want this looked at for 1.7.

Severity: normal → major

Flags: blocking1.7+

Darin Fisher

Assignee

Comment 17

•

21 years ago

taking

Assignee: dougt → darin

Target Milestone: --- → mozilla1.7final

Darin Fisher

Assignee

Comment 18

•

21 years ago

Two solutions for this bug come to mind: (1) Make nsComponentManagerImpl not hold mMon while calling UnloadAll on the component loaders. (2) Make nsImageFrame::ReleaseGlobals be called at xpcom-shutdown time instead of from the layout module destructor. CC'ing bz and dbaron to get their thoughts on option #2. (They seem to have CVS blame for the IconLoad class.) dougt, can you comment on option #1? I don't understand why that monitor needs to be held while calling UnloadAll. We don't seem to hold the monitor while calling other methods on the component loaders, so why is it important in UnloadLibraries? Does it have something to do with nsIComponentManagerObsolete::FreeLibraries (which calls UnloadLibraries also)?

Doug Turner (:dougt)

Comment 19

•

21 years ago

I do not remember is there is any issue in removing this monitor. Seams like since we are calling out and not protecting anything this monitor can be safely removed. Looking at very old versions of this file, it seams that we: A) use to protect a hash during unload B) then we added a callback to the thing that was being unloaded C) then we seperated the hash removal and the callback notification D) during this seperatation we left the monitor in both places.

Darin Fisher

Assignee

Comment 20

•

21 years ago

dougt: thanks for gathering that history on the file. it does seem like the monitor is only (mainly?) protecting the hash table today. i don't mind removing the monitor on the trunk to see if we hit anything, but will we have enough bake time on the trunk to know that it solves the problem well enough to be able to take the patch on the 1.7 branch? maybe we'll just have to see ...

Doug Turner (:dougt)

Comment 21

•

21 years ago

yes, if you remove this we would want a few weeks worth of bake time. I don't think this monitor causes any user reported problems.

Darin Fisher

Assignee

Comment 22

•

21 years ago

Attached patch v1 patch — Details — Splinter Review

ok, here's the xpcom patch.

Darin Fisher

Assignee

Updated

•

21 years ago

Attachment #146309 - Flags: superreview?(brendan)

Attachment #146309 - Flags: review?(dougt)

Brendan Eich [:brendan]

Comment 23

•

21 years ago

Comment on attachment 146309 [details] [diff] [review] v1 patch Doesn't Shutdown need to use the monitor, or some kind of thread barrier (join), to avoid racing with other threads using XPCOM? /be

Doug Turner (:dougt)

Updated

•

21 years ago

Attachment #146309 - Flags: review?(dougt) → review+

Darin Fisher

Assignee

Comment 24

•

21 years ago

brendan: yes, and all of those other threads join with the main thread from the xpcom-shutdown event handler. or, in the case of the DNS threads, they detach and reference no more xpcom code or objects after xpcom-shutdown. this patch shouldn't be a problem for mozilla since xpcom shutdown happens on the main thread, and all of our threads go away cleanly essentially at the time when the xpcom-shutdown notification gets sent out. the only real concern here is what some module destructor might do. (the module destructor for nsImageFrame apparently calls code libpr0n, which calls code in necko.)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 25

•

21 years ago

Module destructors should never call into any other modules, period. (Consider what would happen if we start unloading libraries, as we probably should have from the start.) Anything that needs to call into other modules should be done using an "xpcom-shutdown" nsIObserver.

Darin Fisher

Assignee

Comment 26

•

21 years ago

(In reply to comment #25) > Module destructors should never call into any other modules, period. (Consider > what would happen if we start unloading libraries, as we probably should have > from the start.) Anything that needs to call into other modules should be done > using an "xpcom-shutdown" nsIObserver. so, then you are advocating solution #2 from comment #18?

timeless

Comment 27

•

21 years ago

Attached patch comment 18 - 2 (obsolete) — Details — Splinter Review

timeless

Updated

•

21 years ago

Attachment #146441 - Flags: superreview?(dbaron)

Attachment #146441 - Flags: review?(dbaron)

Darin Fisher

Assignee

Comment 28

•

21 years ago

Comment on attachment 146441 [details] [diff] [review] comment 18 - 2 >Index: nsImageFrame.h > static void ReleaseGlobals() { >+ NS_IF_RELEASE(gIconLoad); > NS_IF_RELEASE(sIOService); > } you didn't mean to keep the release of sIOService here right? since it isn't valid to call into another module from a module destructor, it cannot be valid to call sIOService->Release() from here. therefore, this code should be removed. perhaps it makes sense to assert that sIOService is null instead.

Attachment #146441 - Flags: review?(dbaron) → review-

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 29

•

21 years ago

I agree with comment 28. Also, rather than adding your own observer, I'd prefer if you used ContentShutdownObserver in nsLayoutModule.cpp. Given that, this could be a +1/-1 patch (just change where nsImageFrame::ReleaseGlobals is called).

timeless

Updated

•

21 years ago

Attachment #146441 - Flags: superreview?(dbaron)

Darin Fisher

Assignee

Comment 30

•

21 years ago

So, fixing nsImageFrame::ReleaseGlobals() to be called earlier (from ContentShutdownObserver) is not enough. We have to move other things too, like for instance nsLayoutStylesheetCache::Shutdown(). I'm going to try moving all of layout shutdown into the xpcom-shutdown observer.

Darin Fisher

Assignee

Comment 31

•

21 years ago

Attached patch v2 patch — Details — Splinter Review

ok, this patch forces all layout shutdown to occur from the xpcom-shutdown observer. it seems to work :-)

Attachment #146441 - Attachment is obsolete: true

Darin Fisher

Assignee

Updated

•

21 years ago

Attachment #146309 - Flags: superreview?(brendan)

Darin Fisher

Assignee

Updated

•

21 years ago

Attachment #147352 - Flags: superreview?(dbaron)

Attachment #147352 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 32

•

21 years ago

Comment on attachment 147352 [details] [diff] [review] v2 patch r+sr=dbaron. We probably don't need to move everything, but it shouldn't hurt anything, I don't think...

Attachment #147352 - Flags: superreview?(dbaron)

Attachment #147352 - Flags: superreview+

Attachment #147352 - Flags: review?(dbaron)

Attachment #147352 - Flags: review+

Darin Fisher

Assignee

Comment 33

•

21 years ago

fixed-on-trunk

Status: NEW → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

Darin Fisher

Assignee

Comment 34

•

21 years ago

Comment on attachment 147352 [details] [diff] [review] v2 patch Requesting approval to land this on the branch, but I'd like to give this some reasonable bake time on the trunk first. We also might want to just skip fixing this on the branch since I don't believe that there is any real potential for a deadlock here. At most it would be a good patch to accept on the branch for the purpose of eliminating up the "scary looking" assertions.

Attachment #147352 - Flags: approval1.7?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 35

•

21 years ago

Without a report of deadlock actually occurring, I don't see the point of fixing this on the branch.

tenthumbs

Reporter

Comment 36

•

21 years ago

I still got one in my trunk build from this morning.

Darin Fisher

Assignee

Comment 37

•

21 years ago

(In reply to comment #36) > I still got one in my trunk build from this morning. tenthumbs: can you get a stack trace if possible? it is likely that this assertion could happen elsewhere, and for that reason we should probably still land the xpcom component manager change.

Asa Dotzler [:asa]

Updated

•

21 years ago

Attachment #147352 - Flags: approval1.7? → approval1.7-

Asa Dotzler [:asa]

Updated

•

21 years ago

Flags: blocking1.7+ → blocking1.7-

Mike Shaver (:shaver emeritus)

Comment 38

•

21 years ago

Ben wants this on aviary, I'll do the honours today.

Mike Shaver (:shaver emeritus)

Comment 39

•

21 years ago

Done.

Nicholas Allen

Updated

•

21 years ago

No longer blocks: 160540

timeless

Updated

•

21 years ago

Attachment #147352 - Flags: approval1.7.x?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 40

•

21 years ago

Comment on attachment 147352 [details] [diff] [review] v2 patch Mike, please put this on 1.7,

Attachment #147352 - Flags: approval1.7.x? → approval1.7.x+

Mike Kaply [:mkaply]

Updated

•

21 years ago

Keywords: fixed-aviary1.0, fixed1.7.5

stack traces 21 years ago tenthumbs 46.41 KB, text/plain		Details
v1 patch 21 years ago Darin Fisher 1.15 KB, patch	dougt : review+	Details \| Diff \| Splinter Review
comment 18 - 2 21 years ago timeless 3.62 KB, patch	darin.moz : review-	Details \| Diff \| Splinter Review
v2 patch 21 years ago Darin Fisher 5.82 KB, patch	dbaron : review+ dbaron : superreview+ asa : approval1.7- roc : approval1.7.5+	Details \| Diff \| Splinter Review