Closed
Bug 332730
Opened 18 years ago
Closed 18 years ago
lock reentrancy through nsCacheEntryDescriptor::Close (deadlock)
Categories
(Core :: Networking: Cache, defect, P1)
Core
Networking: Cache
Tracking
()
RESOLVED
FIXED
mozilla1.9alpha1
People
(Reporter: benjamin, Assigned: darin.moz)
References
Details
Attachments
(2 files, 3 obsolete files)
5.54 KB,
text/plain
|
Details | |
61.83 KB,
patch
|
Details | Diff | Splinter Review |
On my xptcall branch I may have changed proxy event processing or proxied object releasing; but I'm hitting the following stack pretty consistently. Seems to me that nsCacheEntryDescriptor::Close ought to protect against reentrancy somehow: necko.dll!nsCacheEntryDescriptor::Close() Line 415 necko.dll!nsCacheEntryDescriptor::~nsCacheEntryDescriptor() Line 75 necko.dll!nsCacheEntryDescriptor::`scalar deleting destructor'() + 0xf bytes necko.dll!nsCacheEntryDescriptor::Release() Line 53 + 0x90 bytes necko.dll!nsCOMPtr<nsICacheEntryDescriptor>::~nsCOMPtr<nsICacheEntryDescriptor>() Line 584 necko.dll!nsCacheService::ProcessPendingRequests(nsCacheEntry * entry=0x04291798) Line 1572 necko.dll!nsCacheService::CloseDescriptor(nsCacheEntryDescriptor * descriptor=0x04291818) Line 1352 + 0xf bytes necko.dll!nsCacheEntryDescriptor::Close() Line 420 + 0x9 bytes necko.dll!nsCacheEntryDescriptor::~nsCacheEntryDescriptor() Line 75 necko.dll!nsCacheEntryDescriptor::`scalar deleting destructor'() + 0xf bytes necko.dll!nsCacheEntryDescriptor::Release() Line 53 + 0x90 bytes necko.dll!nsCOMPtr<nsICacheEntryDescriptor>::assign_assuming_AddRef(nsICacheEntryDescriptor * newPtr=0x00000000) Line 569 necko.dll!nsCOMPtr<nsICacheEntryDescriptor>::assign_with_AddRef(nsISupports * rawPtr=0x00000000) Line 1225 necko.dll!nsCOMPtr<nsICacheEntryDescriptor>::operator=(nsICacheEntryDescriptor * rhs=0x00000000) Line 714 necko.dll!nsHttpChannel::CloseCacheEntry(unsigned int status=0) Line 1747 necko.dll!nsHttpChannel::OnStopRequest(nsIRequest * request=0x04291ed8, nsISupports * ctxt=0x00000000, unsigned int status=0) Line 4162 necko.dll!nsInputStreamPump::OnStateStop() Line 567 necko.dll!nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream * stream=0x04291c80) Line 391 + 0xb bytes xpcom_core.dll!nsInputStreamReadyEvent::EventHandler(PLEvent * plevent=0x04291fc4) Line 121 xpcom_core.dll!PL_HandleEvent(PLEvent * self=0x04291fc4) Line 688 + 0xa bytes
Comment 1•18 years ago
|
||
er... is the destructor reentering itself? and why does stuff call ProcessPendingEvents while holding a lock?
Reporter | ||
Comment 2•18 years ago
|
||
The destructor is reentering, but not for the same object... it's a different object being released/destroyed.
Reporter | ||
Comment 3•18 years ago
|
||
Alfred, any thoughts?
Comment 4•18 years ago
|
||
In this case, nsCacheEntryDescriptor::SetDataSize has a lock - entries are evicted, which eventually causes a CacheEntryDescriptor to get Closed, which tries to get the same lock (I think)
Comment 5•18 years ago
|
||
*** Bug 339819 has been marked as a duplicate of this bug. ***
Comment 6•18 years ago
|
||
*** Bug 340916 has been marked as a duplicate of this bug. ***
Comment 7•18 years ago
|
||
The cache service has one global lock. See bug 340916 for a Linux stack.
Summary: lock reentrancy through nsCacheEntryDescriptor::Close → lock reentrancy through nsCacheEntryDescriptor::Close (deadlock)
Assignee | ||
Comment 8•18 years ago
|
||
This should do the trick. My solution is to "proxy the release" of the cache entry descriptor to the thread where OnCacheEntryAvailable is called. Strictly speaking we should be able to release cache entry descriptors on any thread, but in this case we have to avoid releasing it while holding the cache service lock. That means deferring the Release until later, which in this case can be when the cache listener's OnCacheEntryAvailable method is called. I also noticed that we should proxy the destruction of the cache listener to the thread on which its OnCacheEntryAvailable method is called. That way we avoid executing other code during the cache service lock.
Attachment #224974 -
Flags: review?(alfredkayser)
Assignee | ||
Updated•18 years ago
|
Severity: normal → critical
Status: NEW → ASSIGNED
Priority: -- → P1
Comment 10•18 years ago
|
||
That patch doesn't fix it. All it does is add a ProxyRelease on the stack, because it calls the Release immediately. #4 0x002a4214 in PR_Lock (lock=0x866ec20) at ptsynch.c:206 #5 0x01306095 in nsAutoLock::nsAutoLock (this=0xbfb4253c, aLock=0x866ec20) at ../../../../dist/include/xpcom/nsAutoLock.h:183 #6 0x01373730 in nsCacheEntryDescriptor::Close (this=0x100146d8) at nsCacheEntryDescriptor.cpp:414 #7 0x0137379f in nsCacheEntryDescriptor::~nsCacheEntryDescriptor$delete ( this=0x100146d8) at nsCacheEntryDescriptor.cpp:73 #8 0x0137320f in nsCacheEntryDescriptor::Release (this=0x100146d8) at nsCacheEntryDescriptor.cpp:51 #9 0x00db6462 in nsCOMPtr_base::~nsCOMPtr_base$base (this=0x1055a39c) at nsCOMPtr.cpp:81 #10 0x00596842 in nsCOMPtr<nsICacheEntryDescriptor>::~nsCOMPtr ( this=0x1055a39c) at ../../../dist/include/xpcom/nsCOMPtr.h:542 #11 0x0059616a in nsMsgMailNewsUrl::~nsMsgMailNewsUrl$base (this=0x1055a37c) at nsMsgMailNewsUrl.cpp:79 #12 0x02818527 in nsImapUrl::~nsImapUrl$delete (this=0x1055a378) at nsImapUrl.cpp:151 #13 0x00595e87 in nsMsgMailNewsUrl::Release (this=0x1055a37c) at nsMsgMailNewsUrl.cpp:82 ---Type <return> to continue, or q <return> to quit--- #14 0x028182e8 in nsImapUrl::Release (this=0x1055a378) at nsImapUrl.cpp:155 #15 0x00db6462 in nsCOMPtr_base::~nsCOMPtr_base$base (this=0xfdc61ec) at nsCOMPtr.cpp:81 #16 0x00ebd43c in nsCOMPtr<nsIURI>::~nsCOMPtr (this=0xfdc61ec) at ../../../dist/include/xpcom/nsCOMPtr.h:542 #17 0x00ebfb0b in imgRequest::~imgRequest$delete (this=0xfdc61d0) at imgRequest.cpp:91 #18 0x00ebdb7d in imgRequest::Release (this=0xfdc61d0) at imgRequest.cpp:74 #19 0x00dc227c in NS_ProxyRelease (target=0x8561930, doomed=0xfdc61d0, alwaysProxy=0) at nsProxyRelease.cpp:75 #20 0x01371e6a in nsCacheEntry::~nsCacheEntry (this=0x1049f960) at nsCacheEntry.cpp:93 #21 0x0137a231 in nsMemoryCacheDevice::EvictEntry (this=0x88f3110, entry=0x1049f960, deleteEntry=1) at nsMemoryCacheDevice.cpp:349 #22 0x0137a425 in nsMemoryCacheDevice::EvictEntriesIfNecessary (this=0x88f3110) at nsMemoryCacheDevice.cpp:370 #23 0x0137a66e in nsMemoryCacheDevice::OnDataSizeChange (this=0x88f3110, entry=0x10b3fca0, deltaSize=13719552) at nsMemoryCacheDevice.cpp:317 #24 0x0137798b in nsCacheService::OnDataSizeChange (entry=0x10b3fca0, deltaSize=13719552) at nsCacheService.cpp:1376 #25 0x01373eb2 in nsCacheEntryDescriptor::SetDataSize (this=0x10b40048, dataSize=13719552) at nsCacheEntryDescriptor.cpp:223
Assignee | ||
Comment 11•18 years ago
|
||
Whoops. I need to set the forceAsync flag to true! Thanks for testing the patch.
Assignee | ||
Comment 12•18 years ago
|
||
Actually that stack is a little different. The nsCacheEntry data item is being released while holding onto the cache lock, which is causing a similar problem to the bug reported.
Assignee | ||
Comment 13•18 years ago
|
||
OK, here's a better version of the patch. It adds support for queuing up objects to be released when we exit the cache lock, and it makes use of that to delay the release of nsCacheEntry::mData as well as nsCacheRequest::mListener.
Attachment #224974 -
Attachment is obsolete: true
Attachment #225400 -
Flags: superreview?(bzbarsky)
Attachment #225400 -
Flags: review?(cbiesinger)
Attachment #224974 -
Flags: review?(alfredkayser)
Comment 14•18 years ago
|
||
Comment on attachment 225400 [details] [diff] [review] v2 patch + // This method locks the cache service. Use nsCacheServiceAutoLock instead + // of calling this method directly. + static void Lock(); + + // This method unlocks the cache service. + static void Unlock(); why not make this private then, and nsCacheServiceAutoLock a friend?
Attachment #225400 -
Flags: review?(cbiesinger) → review+
Comment 15•18 years ago
|
||
Comment on attachment 225400 [details] [diff] [review] v2 patch This patch has been working great for me, no lockups in two days. Thank you!
Comment 16•18 years ago
|
||
I won't get to this until sometime next week...
Assignee | ||
Updated•18 years ago
|
Attachment #225400 -
Flags: superreview?(bzbarsky) → superreview?(bryner)
Comment 17•18 years ago
|
||
Comment on attachment 225400 [details] [diff] [review] v2 patch >Index: netwerk/cache/src/nsCacheEntry.h >+ nsISupports * mData; // Document that this is a strong ref and that we're not using an nsCOMPtr to make it easier to call nsCacheService::ReleaseObject_Locked? >Index: netwerk/cache/src/nsCacheRequest.h >+ nsICacheListener * mListener; Same here. sr=bzbarsky with those nits fixed
Attachment #225400 -
Flags: superreview?(bryner) → superreview+
Assignee | ||
Comment 18•18 years ago
|
||
> why not make this private then, and nsCacheServiceAutoLock a friend?
OK, sure... I'll do that.
Assignee | ||
Comment 19•18 years ago
|
||
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 20•18 years ago
|
||
Here's the version of the patch that I commited to the trunk. I'd like to get this on the 1.8 branch once it has baked a bit. We might want to even consider it for a 1.8.0.x update.
Attachment #225400 -
Attachment is obsolete: true
Attachment #226262 -
Flags: approval1.8.0.6?
Attachment #226262 -
Flags: approval-branch-1.8.1?
This might have regressed Tp on btek a bit.
Yeah, sure looks like it did.
Comment 23•18 years ago
|
||
this also seems to have leaked 6K on nye bloat tbox. bug 341700 is causing a bunch of leakage, and this checkin might have just made that bug leak more. balsa has been orange since the checkin. :(
Updated•18 years ago
|
Attachment #226262 -
Flags: approval-branch-1.8.1? → approval1.8.1?
Comment 24•18 years ago
|
||
The leak log on balsa looks like this: --ALL-LEAKS-----------------------------------leaks------leaks%----------------------- TOTAL 8204 -10.32% nsDiskCacheStreamIO 2752 -14.00% nsDiskCacheBinding 2236 -14.00% XPCWrappedNative 1456 0.00% nsLocalFile 496 -20.00% nsXPCWrappedJS 240 0.00% XPCNativeScriptableShared 216 0.00% nsXPCWrappedJSClass 176 0.00% nsCacheService 124 0.00% xptiInterfaceInfo 80 0.00% nsThread 72 0.00% nsRect 64 0.00% nsXPCComponents 56 0.00% nsStringBuffer 48 -14.29% nsBaseAppShell 48 0.00% nsJSID 36 0.00% nsCacheEntryHashTable 36 0.00% nsSystemPrincipal 36 0.00% BackstagePass 20 0.00% nsRunnable 12 0.00% So it does look likely that this patch is causing leaks there...
Assignee | ||
Updated•18 years ago
|
Attachment #226262 -
Flags: approval1.8.1?
Attachment #226262 -
Flags: approval1.8.0.6?
Assignee | ||
Comment 26•18 years ago
|
||
*** Bug 340224 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 27•18 years ago
|
||
OK, I found the problem. nsCacheEntry's destructor had a bogus early return that was preventing mData from being released. For disk cache entries, mData is a nsDiskCacheBinding, and it has an owning reference to a nsDiskCacheStreamIO. Hence, the leak is fixed by simply removing the bogus early return. That was not causing a problem previously because the code used to use a nsCOMPtr for mData, which would have its destructor called irregardless. Revised patch checked into trunk.
Assignee | ||
Comment 28•18 years ago
|
||
Here's the patch that I checked in. (Ignore the unintended printf.)
Attachment #226262 -
Attachment is obsolete: true
Assignee | ||
Comment 29•18 years ago
|
||
fixed-on-trunk
Status: REOPENED → RESOLVED
Closed: 18 years ago → 18 years ago
Resolution: --- → FIXED
Comment 30•16 years ago
|
||
(In reply to comment #20) > Created an attachment (id=226262) [details] > v2.1 patch > > Here's the version of the patch that I commited to the trunk. I'd like to get > this on the 1.8 branch once it has baked a bit. We might want to even consider > it for a 1.8.0.x update. This never made it into 1.8. Should it land on 1.8 for Thunderbird 2, which is going to live on for many more months? bug 339819 cites TB. Other examples? * Bug 338232 cites Seamonkey (Frequent PR_Assert failures "lock->owner != me") * bug 400435 thunderbird appeared to deadlock (won't let me view attachment so can't tell) Is the patch too high risk?
Comment 31•16 years ago
|
||
I don't think we know if this is a large problem for 2.0, or how high a risk the patch is. The patch would also affect FF 2.0x while that's still around as well (but perhaps it's end-of-life is nigh or arrived, I'm not sure)
Comment 32•16 years ago
|
||
Years ago Scott did a series of m-builds and I suggest doing am M-Build with this patch put in for regression testing. It would enable some fact finding to make a landing derision. The FTP directory is branched out of the normal navigation flow so has a low risk of inadvertent access.
You need to log in
before you can comment on or make changes to this bug.
Description
•