Closed Bug 31367 Opened 24 years ago Closed 24 years ago

appearance of leaking nsCachedNetData

Categories

(Core :: Networking: Cache, defect, P2)

x86
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: dbaron, Assigned: dbaron)

References

()

Details

(Keywords: memory-leak, Whiteboard: [nsbeta2-][tind-mlk])

Attachments

(3 files)

DESCRIPTION:  When loading http://www.netscape.com/ in viewer, a bunch of
nsCachedNetData are leaked.  I think these are responsible for leaking a bunch
of nsHTTPChannel and nsCacheEntryChannel, the first of which hold on to a bunch
of other things which amount to at least half of the leaks on the page.  The
problem seems to be that nsReplacementPolicy::AssociateCacheEntryWithRecord
addrefs entry for the cache manager, and this reference is never released.

See refcount balancer data (when my computer is on) at:
http://dbaron.student.harvard.edu/u/david/leaks/netscape/nsCachedNetData.balance-1
and many of the other files ending in "balance" in the directory
http://dbaron.student.harvard.edu/u/david/leaks/netscape/

STEPS TO REPRODUCE:
 * setenv XPCOM_MEM_LEAK_LOG 1
 * ./mozilla-viewer http://www.netscape.com/
 * exit viewer

DOES NOT WORK CORRECTLY ON:
 * build from 2000-03-09 tree closure, with some leaks fixed
*** Bug 32607 has been marked as a duplicate of this bug. ***
Target Milestone: --- → M17
Moving post beta bugs to M18 which is now the post-beta milestone.
Target Milestone: M17 → M18
I went through the first 27 of the top100 in the PageCycler, and this leak
(nsCachedNetData only, *not counting anything it owned*) accounted for 40K of
the 94K of leaks shown on the leak stats (which are admittedly not everything). 
This is the second largest leak I know of in normal use of the browser (the
largest being bug 38601, which doesn't happen in the PageCycler).  On that
basis, I'm nominating this for nsbeta2.  Over a few hours of browsing this could
easily add up.
Keywords: nsbeta2
[nsbeta2-], assuming the total leak, including contained pointers, is 97K.
Whiteboard: [nsbeta2-]
could you tell me if
	Index: nsCacheManager.cpp
===================================================================
RCS file: /cvsroot/mozilla/netwerk/cache/mgr/nsCacheManager.cpp,v
retrieving revision 1.11
diff -c -2 -r1.11 nsCacheManager.cpp
*** nsCacheManager.cpp	2000/04/19 04:43:04	1.11
--- nsCacheManager.cpp	2000/05/17 07:59:23
***************
*** 290,293 ****
--- 290,294 ----
      nsStringKey hashTableKey(nsCString(key, keyLength));
      deletedEntry = (nsCachedNetData*)gCacheManager->mActiveCacheRecords->
Remove(&hashTableKey);
+     NS_IF_RELEASE( deletedEntry );
  //  NS_ASSERTION(deletedEntry == aEntry, "Hash table inconsistency");
  	  nsAllocator::Free( key );

solves the leak? I am having problem doing linux/windows builds right now. I 
think the replacement policy also leaks when it evicts an entry ( calls 
DeleteCacheEntry() which I can't seem to cause to happen but I haven't tried 
very hard) and the destructor appears to free the memory but not the objects but 
maybe I am misunderstanding how the heap arena allocator works.
Actually I think I am mistaken. I don't think the cache manager is supposed to 
release it's reference. If you look at the Release function of nsNetCacheData it 
does special stuff when the reference count is 1 (  only the cache manager is 
holding on to a reference). I wish I new how much time/space we were saving with 
all of this trickey memory managment.
Well, the objects should be deleted eventually - right now they just pile up.
ACtually it looks to me that the piling up is by design ( which seems like a bad 
idea to me) as the area is only freed when the replacement policy is destroyed. 
I don't think this is technically a leak. The memory is freed when the Arena is 
deallocated. The refcount is messed up since we don't actually delete the 
objects but instead just free there memory. nsCachedNetData shoouldn't be 
holding on to channels ( it is in fact the reverse). 
Also wanted to add that these objects get recycled by 
nsReplacementPolicy::AssociateCacheEntryWithRecord. 
For what it's worth, the leak (or pile up of nsCachedNetData objects) goes away 
if add the release as the above patch shows (or just use a smart pointer.)
Actually... using a smart pointer crashes the system.  
I think the thing to do here is add an NS_LOG_RELEASE where things get recycled 
into the arena again.  I just want to double-check that the arena is working 
before proposing this -- I have a patch in my tree.  I don't think this is a 
real leak.
Summary: leaking nsCachedNetData → appearance of leaking nsCachedNetData
Whiteboard: [nsbeta2-] → [nsbeta2-][tind-mlk]
Keywords: nsbeta3
gordon: could you investigate the risk of fixing this? (given our understanding 
of the cache and the time line) This is a minus otherwise. (remove it if you 
think this needs to be renominated)
Whiteboard: [nsbeta2-][tind-mlk] → [nsbeta2-][tind-mlk][nsbeta3-]
This might be the right fix:

Index: nsCachedNetData.cpp
===================================================================
RCS file: /cvsroot/mozilla/netwerk/cache/mgr/nsCachedNetData.cpp,v
retrieving revision 1.24
diff -U8 -r1.24 nsCachedNetData.cpp
--- nsCachedNetData.cpp	2000/07/29 02:32:11	1.24
+++ nsCachedNetData.cpp	2000/08/08 02:22:07
@@ -253,16 +253,22 @@
 NS_IMETHODIMP_(nsrefcnt)
 nsCachedNetData::Release(void)
 {
     nsrefcnt count;
     NS_PRECONDITION(1 != mRefCnt, "dup release");
     count = PR_AtomicDecrement((PRInt32 *)&mRefCnt);
     NS_LOG_RELEASE(this, count, "nsCachedNetData");
     if (count == 1) {
+        // Say that we are destroyed, because this memory is going
+        // to be reallocated by the arena or freed when the arena
+        // is destroyed.
+
+        // XXX Sometimes makes negative (open 2 browser windows)
+        NS_LOG_RELEASE(this, 0, "nsCachedNetData");
 
         nsCacheManager::NoteDormant(this);
         
         if (!GetFlag(RECYCLED)) {
             // Clear flag, in case the protocol handler forgot to
             mFlags &= ~UPDATE_IN_PROGRESS;
 
             // First, flush any altered cache entry data to the database


Problem is, I've seen this cause negative numbers.  (But I see them for a few
other things too...)
Actually, I'm now seeing real leaks of nsCachedNetData where they are only
AddRef'd once, so the circular references are never broken by the ::Release
hack.  I think one of the pointers needs to be a weak reference.

The ones leaked are created within nsReplacementPolicy::AddAllRecordsInCache -
perhaps these don't show up on tinderbox because everything in the tinderbox
machine's caches is actually used in the tests.

The real leaks cause leaks of nsDiskCacheRecord and the nsNetDiskCache.
Target Milestone: M18 → Future
opening up for reconsideration. We need to nail this.
Whiteboard: [nsbeta2-][tind-mlk][nsbeta3-] → [nsbeta2-][tind-mlk]
adusting summary.  We are leaking at least one nsCachedNetData object per page
load in gtkEmbed.  Not sure yet if this is specific to embedding or if seamonkey
has a similar problem.  David, have you look into this anymore since the 11th?
Summary: appearance of leaking nsCachedNetData → leaking nsCachedNetData
How do you know you're leaking them?  The leak stats?
this should now be getting covered under one of the task items of today's cache
meeting. And so off to rpotts since this is mem cache only at this stage. I
would leave the nsbeta3 nomination for jud to that call. 
Assignee: gordon → rpotts
NO!!  The nsCachedNetData is at the cache manager level.  I agreed to work on 
the *memory cache* for embedding, NOT fix the cache manager eviction policy!!

I'm kicking this one over to dp, since I think that he's looking at eviction at 
the cache manager level...

-- rick
Assignee: rpotts → dp
For reference, here's what I know about this bug:

 * It seems that most of the problem here is statistical.  **IF** I understand
what's going on correctly, the refcounting of nsCachedNetData is weird -- they
are given an additional refcount when they are created (but not by their
constructor) and the release method is hacked to return them to the arena when
the refcount falls to one.  (I'm not sure why this was done rather than
overriding operator delete.)  This confuses the leak logging, and I think the
fix I pasted into the bug above fixes that problem.

* However, with that fix I occasionally see *negative* refcounts.  (Maybe that
was related to the crashes in nsCachedNetData::Release and is now fixed??)  I
tried adding refcount stabilization in the Release method, and it didn't help
(but it should be there anyway!).  Since negative counts confuse nsTraceRefcnt,
I was reluctant to check in the fix above.

* I have on occasion seen real leaks of nsCachedNetData (as leak roots).  When I
saw these leaks, the leaked object was |AddRef|ed once and never |Release|d.  I
haven't seen these for a few weeks.  Then again, I haven't looked much.

* dougt says he sees real leaks, but didn't answer my question above because he
wasn't (before :-) cc:ed on the bug.
Neeti is the owner of cache.
Assignee: dp → neeti
Retitling to the way it was before dougt changed it without enough explanation.

I think the above patch is the right thing to do as far as leak statistics go. 
Could some cache expert (Neeti?) review this and see if it makes sense? 
However, I'm a little uncomfortable with its effects on bloat statistics.  When
are DORMANT nsCachedNetData objects not RECYCLED?
Summary: leaking nsCachedNetData → appearance of leaking nsCachedNetData
Looks good to me. However, I am not sure when dormant objects are not recycled?
Rick, when could DORMANT objects be not recycled? 

Thanks,
Neeti
As I understand it, DORMANT entries are those that are not currently active.
I believe the story goes something like this:
- DORMANT entries are not active and are elgible for eviction (if too many 
entries) or deletion (if too much storage).
- if a DORMANT entry is deleted, it becomes RECYCLED (ie. freed).
- If a DORMANT entry is evicted, it becomes VESTIGIAL.
- If a VESTIGIAL entry is deleted, it becomes RECYCLED (ie. freed).

Since the nsCachedNetData objects are arena allocated, they cannot be 
individually freed.  They can only be recycled in 
nsReplacementPolicy::AssociateCacheEntryWithRecord(...) when 
mRankedEntries[mNumEntries] != null.

So, if the recycling is working correctly, there should *never* be more 
nsCachedNetData instances than the max # entries in the replacement policy.

-- rick
It seems to me that we want to manipulate the refcount logging when the 
CachedNetData entry becomes RECYCLED.

DORMANT entries are still valid, just not active.  RECYCLED entries have been 
freed and can be reused...

-- rick
But I don't think all entries become RECYCLED when we shut down, or do they?
They do all become DORMANT (this patch relies on that).  Maybe I'll try some
other things...

Also, if the number is bounded, then having inaccurate bloat statistics isn't
too much of an issue.
Oh, that's another issue :-)  You could argue that when the CacheManager is 
shutdown it should clean up it's NetCachedData records :-)  Right now it 
doesn't.

Perhaps we need a shutdown methods in the ReplacementPolicy that clears the 
mRecordID of each DORMANT entry and delete's it.  This should cause the entry to 
get RECYCLED (without removing the underlying cache data).

-- rick
Taking this bug since I seem to be the only one working on it, and I'd like to
keep it on my radar of bugs I'm working on.  If none of the solutions I find is
acceptable, I'll give it back later.
Assignee: neeti → dbaron
Status: NEW → ASSIGNED
Priority: P3 → P2
Target Milestone: Future → mozilla0.9
hey david,
I like this patch :-)  (r=rpotts)
sr=scc
Fix checked in 2000-10-29 13:24-0800.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
verified:
Linux 2000123106
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: