Closed Bug 209804 Opened 21 years ago Closed 20 years ago

Assertion in xpcom/threads/nsAutoLock.cpp

Categories

(Core :: XPCOM, defect)

x86
All
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla1.7final

People

(Reporter: tenthumbs, Assigned: darin.moz)

References

Details

(Keywords: assertion, fixed-aviary1.0, fixed1.7.5)

Attachments

(3 files, 1 obsolete file)

I started seeing this in my Linux builds at shutdown.

###!!! ASSERTION: potential deadlock between Monitor@80c25a0 and Lock@87c7390:
'Error', file nsAutoLock.cpp, line 265
Break: at file nsAutoLock.cpp, line 265

I have no idea how serious this is.
I don't have any idea either.  I do not see this -- timeless, brendan?
It's disappeared, returned, and disappeared again. Right now it's gone.
offhand this is probably the psm lock bug. darin filed it
important to show stacks for such assertions.  I will factor stack (callsite
tree) tracing out of nsTraceMalloc, make it a primitive configurable that any
build can add and use.  That should help fix bug 200972.

Someone mark this dup or worksforme, ok?

/be
tenthumbs: if you can get a stack trace, then please compare it to that in bug
210037.
Depends on: 210037
Blocks: 160540
I'm seeing this and here is the stack:

 Function                                                                   |
Part                        
----------------------------------------------------------------------------+-----------------------------
 _ZN11nsDebugImpl5AbortEPKci                                                |
NSDEBUGIMPL.CPP             
 _ZN11nsDebugImpl5BreakEPKci                                                |
NSDEBUGIMPL.CPP             
 _ZN11nsDebugImpl9AssertionEPKcS1_S1_i                                      |
NSDEBUGIMPL.CPP             
 _ZN7nsDebug9AssertionEPKcS1_S1_i                                           |
NSDEBUG.CPP                 
 _ZN14nsAutoLockBaseC2EPvNS_14nsAutoLockTypeE                               |
NSAUTOLOCK.CPP              
 _ZN10nsAutoLockC1EP6PRLock                                                 |
NSCACHEENTRYDESCRIPTOR.CPP  
 _ZN22nsCacheEntryDescriptor5CloseEv                                        |
NSCACHEENTRYDESCRIPTOR.CPP  
 _ZN22nsCacheEntryDescriptorD0Ev                                            |
NSCACHEENTRYDESCRIPTOR.CPP  
 _ZN22nsCacheEntryDescriptor7ReleaseEv                                      |
NSCACHEENTRYDESCRIPTOR.CPP  
 _ZN8nsCOMPtrI23nsICacheEntryDescriptorE22assign_assuming_AddRefEPS0_       |
IMGREQUEST.CPP              
 _ZN8nsCOMPtrI23nsICacheEntryDescriptorE18assign_with_AddRefEP11nsISupports |
IMGREQUEST.CPP              
 _ZN8nsCOMPtrI23nsICacheEntryDescriptorEaSEPS0_                             |
IMGREQUEST.CPP              
 _ZN10imgRequest11RemoveProxyEP15imgRequestProxyji                          |
IMGREQUEST.CPP              
 _ZN15imgRequestProxy6CancelEj                                              |
IMGREQUESTPROXY.CPP         
 _ZN12nsImageFrame8IconLoad8ShutdownEv                                      |
NSLAYOUTMODULE.O            
 _ZN12nsImageFrame14ReleaseGlobalsEv                                        |
NSLAYOUTMODULE.O            
 _Z8ShutdownP9nsIModule                                                     |
NSLAYOUTMODULE.O            
 _ZN15nsGenericModule8ShutdownEv                                            |
NSGENERICFACTORY.CPP        
 _ZN15nsGenericModuleD1Ev                                                   |
NSGENERICFACTORY.CPP        
 _ZN15nsGenericModule7ReleaseEv                                             |
NSGENERICFACTORY.CPP        
 _ZN5nsDll8ShutdownEv                                                       |
XCDLL.CPP                   
 _Z13nsFreeLibraryP5nsDllP17nsIServiceManageri                              |
NSNATIVECOMPONENTLOADER.CPP 
 _Z17nsFreeLibraryEnumP9nsHashKeyPvS1_                                      |
NSNATIVECOMPONENTLOADER.CPP 
 _Z13hashEnumerateP12PLDHashTableP15PLDHashEntryHdrjPv                      |
NSHASHTABLE.CPP             
 PL_DHashTableEnumerate                                                     |
PLDHASH.C                   
 _ZN11nsHashtable9EnumerateEPFiP9nsHashKeyPvS2_ES2_                         |
NSHASHTABLE.CPP             
 _ZN23nsNativeComponentLoader9UnloadAllEi                                   |
NSNATIVECOMPONENTLOADER.CPP 
 _ZN22nsComponentManagerImpl15UnloadLibrariesEP17nsIServiceManageri         |
NSCOMPONENTMANAGER.CPP      
 _ZN22nsComponentManagerImpl8ShutdownEv                                     |
NSCOMPONENTMANAGER.CPP      
 NS_ShutdownXPCOM                                                           |
NSXPCOMINIT.O               
 NS_ShutdownXPCOM                                                           |
NSXPCOMGLUE.CPP             
 GRE_Shutdown                                                               |
NSXPCOMGLUE.CPP             
 main                                                                       |
NSAPPRUNNER.O               
 __text                                                                     |
CRT0.OBJ                    
 0x182915A6                                                                 |
libc04.dll:1                
See bug 210037 comment 3, conclusion 2.

Whether we should avoid holding a cache service lock when calling into the XPCOM
component manager (conclusion 1 from that bug, applied to the cache code instead
of to NSS) is something dougt should comment upon.

It's never a good idea to call out of module while holding a lock.  But if
enough code does that with XPCOM as the callee, maybe we just need to declare
XPCOM "most nested" in such locking scenarios, and all * -> XPCOM arcs in the
deadlock detection graph, but not XPCOM -> *.

/be
> XPCOM "most nested" in such locking scenarios, and all * -> XPCOM arcs in the

er, "allow * -> XPCOM arcs ...", not "all ...."

This XPCOM bug would stand for fixing the component manager to set aside the
list during shutdown, and exit its monitor; and also documenting and
evangelizing the nesting rule.

/be
(In reply to comment #7)
> See bug 210037 comment 3, conclusion 2.
> 
> Whether we should avoid holding a cache service lock when calling into the XPCOM
> component manager (conclusion 1 from that bug, applied to the cache code instead
> of to NSS) is something dougt should comment upon.
> 
> It's never a good idea to call out of module while holding a lock.  But if
> enough code does that with XPCOM as the callee, maybe we just need to declare
> XPCOM "most nested" in such locking scenarios, and all * -> XPCOM arcs in the
> deadlock detection graph, but not XPCOM -> *.
> 
> /be


i'm all for possibly changing the cache to not hold its lock while calling into
XPCOM.  it's almost always a good idea in my experience to not do that.  that
said, i'm not sure how easy it would be to change the cache.
Darin: I was proposing relieving you and many others from having to do that in
the particular case of calling the XPCOM component or service manager.  That way
the burden would be on XPCOM to use thread-local temporaries and the like to
avoid nesting the other day.

/be
(In reply to comment #10)
> Darin: I was proposing relieving you and many others from having to do that in
> the particular case of calling the XPCOM component or service manager.  That way
> the burden would be on XPCOM to use thread-local temporaries and the like to
> avoid nesting the other day.

yeah, i agree that that is a good plan.  the component manager shouldn't need to
hold its monitor while shutting down.  it should race to set the mShuttingDown
flag, and then all losers of that race should just return early with some error.
what darin said.  we can remove the monitor in UnloadLibary and "protect"
mLoaderData behind a check against mShuttingDown.  I am not sure why we would
have to create anything on the stack.
In an SMP complex with weak store order, you can't be sure everyone has seen the
store to mShuttingDown.  So you can't be sure other threads aren't racing
through the hash table.

You need a memory barrier, which locks and monitors provide.  So enter the
monitor, set mShuttingDown, set the hash table aside as cheaply as possible
(that is why I suggested copying the PLDHashTable to a stack temporary and
zapping the original's memory with memset(0)), leave the monitor, etc.

/be
Attached file stack traces
I'm now seeing 10 of these at every shutdown. Here's all the stack traces. I'm
including all of them because they're similar but not that similar.
I just noticed this in WinXP in a debug trunk build. It might have been there
for a while since I don't normally start the debug build from the command line.

To reproduce:

Start Mozilla, select profile, close browser.

Environment is a custom debug build, vc6, ms sdk, winxp, xeon, with all
extensions enabled except xmlterm or negotiateauth and svg gdi+.

-> OS All

Stack follows:

NTDLL! 77f75a58()
nsDebugImpl::Assertion(nsDebugImpl * const 0x002e6d10, const char * 0x0012fa18,
const char * 0x100fecc4, const char * 0x100fec80, int 299) line 272
nsDebug::Assertion(const char * 0x0012fa18, const char * 0x100fecc4, const char
* 0x100fec80, int 299) line 109
nsAutoLockBase::nsAutoLockBase(void * 0x026e0f78, nsAutoLockBase::nsAutoLockType
eAutoLock) line 299 + 27 bytes
nsAutoLock::nsAutoLock(PRLock * 0x026e0f78) line 178 + 21 bytes
nsCacheEntryDescriptor::Close(nsCacheEntryDescriptor * const 0x02c0ff78) line 411
nsCacheEntryDescriptor::~nsCacheEntryDescriptor() line 51
nsCacheEntryDescriptor::`scalar deleting destructor'(unsigned int 1) + 15 bytes
nsCacheEntryDescriptor::Release(nsCacheEntryDescriptor * const 0x02c0ff78) line
36 + 144 bytes
nsCOMPtr<nsICacheEntryDescriptor>::assign_assuming_AddRef(nsICacheEntryDescriptor
* 0x00000000) line 495
nsCOMPtr<nsICacheEntryDescriptor>::assign_with_AddRef(nsISupports * 0x00000000)
line 1023
nsCOMPtr<nsICacheEntryDescriptor>::operator=(nsICacheEntryDescriptor *
0x00000000) line 608
imgRequest::RemoveProxy(imgRequestProxy * 0x02f115f0, unsigned int 2147500037,
int 0) line 168
imgRequestProxy::Cancel(imgRequestProxy * const 0x02f115f0, unsigned int
2147500037) line 209
nsImageFrame::IconLoad::Shutdown() line 287
nsImageFrame::ReleaseGlobals() line 142
Shutdown(nsIModule * 0x00a44358) line 405
nsGenericModule::Shutdown() line 344 + 10 bytes
nsGenericModule::~nsGenericModule() line 247
nsGenericModule::`scalar deleting destructor'(unsigned int 1) + 15 bytes
nsGenericModule::Release(nsGenericModule * const 0x00a44358) line 249 + 142 bytes
nsDll::Shutdown() line 380 + 18 bytes
nsFreeLibrary(nsDll * 0x00a1b058, nsIServiceManager * 0x00000000, int 3) line 275
nsFreeLibraryEnum(nsHashKey * 0x00a1b0e8, void * 0x00a1b058, void * 0x0012fe84)
line 344 + 64 bytes
hashEnumerate(PLDHashTable * 0x002ef768, PLDHashEntryHdr * 0x0099642c, unsigned
int 24, void * 0x0012fe68) line 115 + 26 bytes
PL_DHashTableEnumerate(PLDHashTable * 0x002ef768, int (PLDHashTable *,
PLDHashEntryHdr *, unsigned int, void *)* 0x10013d00 hashEnumerate(PLDHashTable
*, PLDHashEntryHdr *, unsigned int, void *), void * 0x0012fe68) line 619 + 34 bytes
nsHashtable::Enumerate(int (nsHashKey *, void *, void *)* 0x10063c30
nsFreeLibraryEnum(nsHashKey *, void *, void *), void * 0x0012fe84) line 303 + 21
bytes
nsNativeComponentLoader::UnloadAll(nsNativeComponentLoader * const 0x002ef720,
int 3) line 961
nsComponentManagerImpl::UnloadLibraries(nsIServiceManager * 0x00000000, int 3)
line 3139 + 22 bytes
nsComponentManagerImpl::Shutdown() line 878
NS_ShutdownXPCOM(nsIServiceManager * 0x00000000) line 776 + 11 bytes
NS_ShutdownXPCOM(nsIServiceManager * 0x00000000) line 184 + 10 bytes
GRE_Shutdown() line 468 + 7 bytes
main(int 1, char * * 0x002e2638) line 1724 + 5 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 77e814c
OS: Linux → All
I want this looked at for 1.7.
Severity: normal → major
Flags: blocking1.7+
taking
Assignee: dougt → darin
Target Milestone: --- → mozilla1.7final
Two solutions for this bug come to mind:

 (1) Make nsComponentManagerImpl not hold mMon while calling UnloadAll on the 
     component loaders.

 (2) Make nsImageFrame::ReleaseGlobals be called at xpcom-shutdown time instead
     of from the layout module destructor.

CC'ing bz and dbaron to get their thoughts on option #2.  (They seem to have CVS
blame for the IconLoad class.)

dougt, can you comment on option #1?  I don't understand why that monitor needs
to be held while calling UnloadAll.  We don't seem to hold the monitor while
calling other methods on the component loaders, so why is it important in
UnloadLibraries?  Does it have something to do with
nsIComponentManagerObsolete::FreeLibraries (which calls UnloadLibraries also)?
I do not remember is there is any issue in removing this monitor.  Seams like
since we are calling out and not protecting anything this monitor can be safely
removed.

Looking at very old versions of this file, it seams that we:
A) use to protect a hash during unload
B) then we added a callback to the thing that was being unloaded
C) then we seperated the hash removal and the callback notification
D) during this seperatation we left the monitor in both places.
dougt: thanks for gathering that history on the file.  it does seem like the
monitor is only (mainly?) protecting the hash table today.  i don't mind
removing the monitor on the trunk to see if we hit anything, but will we have
enough bake time on the trunk to know that it solves the problem well enough to
be able to take the patch on the 1.7 branch?  maybe we'll just have to see ...
yes, if you remove this we would want a few weeks worth of bake time.  I don't
think this monitor causes any user reported problems.
Attached patch v1 patchSplinter Review
ok, here's the xpcom patch.
Attachment #146309 - Flags: superreview?(brendan)
Attachment #146309 - Flags: review?(dougt)
Comment on attachment 146309 [details] [diff] [review]
v1 patch

Doesn't Shutdown need to use the monitor, or some kind of thread barrier
(join), to avoid racing with other threads using XPCOM?

/be
Attachment #146309 - Flags: review?(dougt) → review+
brendan: yes, and all of those other threads join with the main thread from the
xpcom-shutdown event handler.  or, in the case of the DNS threads, they detach
and reference no more xpcom code or objects after xpcom-shutdown.  this patch
shouldn't be a problem for mozilla since xpcom shutdown happens on the main
thread, and all of our threads go away cleanly essentially at the time when the
xpcom-shutdown notification gets sent out.  the only real concern here is what
some module destructor might do. (the module destructor for nsImageFrame
apparently calls code libpr0n, which calls code in necko.)
Module destructors should never call into any other modules, period.  (Consider
what would happen if we start unloading libraries, as we probably should have
from the start.)  Anything that needs to call into other modules should be done
using an "xpcom-shutdown" nsIObserver.
(In reply to comment #25)
> Module destructors should never call into any other modules, period.  (Consider
> what would happen if we start unloading libraries, as we probably should have
> from the start.)  Anything that needs to call into other modules should be done
> using an "xpcom-shutdown" nsIObserver.

so, then you are advocating solution #2 from comment #18?
Attached patch comment 18 - 2 (obsolete) — Splinter Review
Attachment #146441 - Flags: superreview?(dbaron)
Attachment #146441 - Flags: review?(dbaron)
Comment on attachment 146441 [details] [diff] [review]
comment 18 - 2

>Index: nsImageFrame.h

>   static void ReleaseGlobals() {
>+    NS_IF_RELEASE(gIconLoad);
>     NS_IF_RELEASE(sIOService);
>   }

you didn't mean to keep the release of sIOService here right?  since
it isn't valid to call into another module from a module destructor,
it cannot be valid to call sIOService->Release() from here.  therefore,
this code should be removed.  perhaps it makes sense to assert that
sIOService is null instead.
Attachment #146441 - Flags: review?(dbaron) → review-
I agree with comment 28.  Also, rather than adding your own observer, I'd prefer
if you used ContentShutdownObserver in nsLayoutModule.cpp.  Given that, this
could be a +1/-1 patch (just change where nsImageFrame::ReleaseGlobals is called).
Attachment #146441 - Flags: superreview?(dbaron)
So, fixing nsImageFrame::ReleaseGlobals() to be called earlier (from
ContentShutdownObserver) is not enough.  We have to move other things too, like
for instance nsLayoutStylesheetCache::Shutdown().  I'm going to try moving all
of layout shutdown into the xpcom-shutdown observer.
Attached patch v2 patchSplinter Review
ok, this patch forces all layout shutdown to occur from the xpcom-shutdown
observer.  it seems to work :-)
Attachment #146441 - Attachment is obsolete: true
Attachment #146309 - Flags: superreview?(brendan)
Attachment #147352 - Flags: superreview?(dbaron)
Attachment #147352 - Flags: review?(dbaron)
Comment on attachment 147352 [details] [diff] [review]
v2 patch

r+sr=dbaron.  We probably don't need to move everything, but it shouldn't hurt
anything, I don't think...
Attachment #147352 - Flags: superreview?(dbaron)
Attachment #147352 - Flags: superreview+
Attachment #147352 - Flags: review?(dbaron)
Attachment #147352 - Flags: review+
fixed-on-trunk
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Comment on attachment 147352 [details] [diff] [review]
v2 patch

Requesting approval to land this on the branch, but I'd like to give this some
reasonable bake time on the trunk first.

We also might want to just skip fixing this on the branch since I don't believe
that there is any real potential for a deadlock here.

At most it would be a good patch to accept on the branch for the purpose of
eliminating up the "scary looking" assertions.
Attachment #147352 - Flags: approval1.7?
Without a report of deadlock actually occurring, I don't see the point of fixing
this on the branch.
I still got one in my trunk build from this morning.
(In reply to comment #36)
> I still got one in my trunk build from this morning.

tenthumbs: can you get a stack trace if possible?  it is likely that this
assertion could happen elsewhere, and for that reason we should probably still
land the xpcom component manager change.
Attachment #147352 - Flags: approval1.7? → approval1.7-
Flags: blocking1.7+ → blocking1.7-
Ben wants this on aviary, I'll do the honours today.
No longer blocks: 160540
Attachment #147352 - Flags: approval1.7.x?
Comment on attachment 147352 [details] [diff] [review]
v2 patch

Mike, please put this on 1.7,
Attachment #147352 - Flags: approval1.7.x? → approval1.7.x+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: