Closed Bug 1006181 Opened 7 years ago Closed 7 years ago

HTTP cache v2 memory only storage area not freed on limit

Categories

(Core :: Networking: Cache, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla32
Tracking Status
firefox31 --- unaffected
firefox32 --- fixed
b2g-v1.4 --- unaffected

People

(Reporter: mayhemer, Assigned: mayhemer)

References

Details

(Keywords: crash, memory-leak, Whiteboard: [MemShrink:P1])

Crash Data

Attachments

(4 files)

...or limit is computed wrong.

Based on http://forums.mozillazine.org/viewtopic.php?p=13519625#p13519625 and http://imgur.com/QZ5pr1C where memory-storage has 10GB.
Whiteboard: [MemShrink]
Status: NEW → ASSIGNED
Attached patch v1Splinter Review
- this is actually a long standing bug!
- entries recreated as 'memory-only' (inhibit caching or ssl) were not updated the frecency (mFrecency was left at 0)
- such entries are never purged since those are considered in-progress of opening and purging them so soon is wrong
- fixed with call and return result of NewWriteHandle() on the recreated entry
- NewWriteHandle() sets the handle as mWriter (which is absolutely correct, see OnHandleClosed for the effect) and also updates the frecency automatically
Attachment #8417733 - Flags: review?(michal.novotny)
Wow. How often does this occur in practice -- what kind of usage patterns would cause it? I guess it can't be happening too often; if lots of users were seeing behaviour that bad we'd hear more about it...?
(In reply to Nicholas Nethercote [:njn] from comment #3)
> Wow. How often does this occur in practice -- what kind of usage patterns
> would cause it? I guess it can't be happening too often; if lots of users
> were seeing behaviour that bad we'd hear more about it...?

I can only speak for my experience (screenshot of my memory usage is included in the top comment)

It seems to be explicitly cache2 related, so I'd imagine that it's limited only to nightly users and anyone else who has flipped the pref for it, so your general fx users aren't affected (yet). It has been happening for ages and actually caused me to move from the 32bit to the 64bit build just to make it take longer to bug me. Let netflix run on a machine unfettered for a few hours and you should see something similar.
http://forums.mozillazine.org/viewtopic.php?f=23&t=2799341#p13349873

When this problem presents in the 32bit version, the application stutters and freezes then goes oom and crashes. When I first started running into it (above - though i may have mentioned it in an earlier post) cache2 was not yet recognized by the memory profiler, so it's been a bug since at least February
> - this is actually a long standing bug!

Ah, I interpreted this as meaning that it pre-dates cache2, but judging from the patch that's not true. So IIUC it's not in any Firefox releases. Still, good to fix! (And Honza gets a gold star for implementing those memory reporters for cache2 :)
Duplicate of this bug: 1006263
(In reply to Nicholas Nethercote [:njn] from comment #6)
> > - this is actually a long standing bug!
> 
> Ah, I interpreted this as meaning that it pre-dates cache2, but judging from
> the patch that's not true. So IIUC it's not in any Firefox releases. Still,
> good to fix! (And Honza gets a gold star for implementing those memory
> reporters for cache2 :)

:*)

Yep, just a cache2 bug and the reporters help here.  This is actually a pretty old bug in this new code - but surprisingly never happened to me or someone else...  w/o the reporters we wouldn't find out!!
Duplicate of this bug: 1005963
Comment on attachment 8417733 [details] [diff] [review]
v1

Review of attachment 8417733 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache2/CacheEntry.cpp
@@ +465,5 @@
> +  // fails and also doesn't update the entry frecency.  Not updating
> +  // frecency causes entries to not be purged from our memory pools.
> +  nsRefPtr<CacheEntryHandle> writeHandle =
> +    newEntry->NewWriteHandle();
> +  return writeHandle.forget();

Where we replace the normal handle with write handle in case CacheStorage::AsyncOpenURI() is called with OPEN_TRUNCATE flag? It seems to me that it would be better to create the write handle in CacheStorageService::AddStorageEntry().
Flags: needinfo?(honzab.moz)
(In reply to Michal Novotny (:michal) from comment #10)
> Comment on attachment 8417733 [details] [diff] [review]
> v1
> 
> Review of attachment 8417733 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/cache2/CacheEntry.cpp
> @@ +465,5 @@
> > +  // fails and also doesn't update the entry frecency.  Not updating
> > +  // frecency causes entries to not be purged from our memory pools.
> > +  nsRefPtr<CacheEntryHandle> writeHandle =
> > +    newEntry->NewWriteHandle();
> > +  return writeHandle.forget();
> 
> Where we replace the normal handle with write handle in case
> CacheStorage::AsyncOpenURI() is called with OPEN_TRUNCATE flag? 

It doesn't work like that.  There is also no concept of "replacing a handle with write hanle" at all.  

We create a write handler when we give the entry to a callback that is going to write it the first time or when revalidating it (where reval is also a range request for partial entries).

> It seems to
> me that it would be better to create the write handle in
> CacheStorageService::AddStorageEntry().

Definitely not.  The handle that is returned must not act as a writer.  It's there just to prevent purging the entry from the memory pool.  Writer is created later when there is a callback that will act as a writer - as described above.  The state of the callback is examined much later (onCacheEntryCheck).  This already works pretty well, no need to change it.

Role of the write handler is to revert internal state of the entry back to the previous one, REVAL->READY and WRITING->EMPTY, +call on any pending callbacks, when the consumer of the entry didn't do it's job (no call to MetaDataReady(), SetValid() or OpenOutputStream()) and just threw the entry away.  

Hence I don't see a point why a write handler should be returned from CacheStorageService::AddStorageEntry() at all.
Flags: needinfo?(honzab.moz)
Attachment #8417733 - Flags: review?(michal.novotny) → review+
I'd like to understand how netflix manages to store 10GB worth of stuff when we allegedly enforce a 5MB max object size in the memory cache (so I'd assume large video loads wouldn't get stored).  Honza, we already talked about this but I wanted to make sure we don't forget to figure out why, so I'm needinfo-ing you :)
Flags: needinfo?(honzab.moz)
Netflix stronk! 

For real though, I couldn't tell you. If you check dup bug 1006263 I uploaded a memory report that might tell you something. I can re-up it here if it's relevant/useful at all.
(In reply to Jason Duell (:jduell) from comment #13)
> I'd like to understand how netflix manages to store 10GB worth of stuff when
> we allegedly enforce a 5MB max object size in the memory cache (so I'd
> assume large video loads wouldn't get stored).  Honza, we already talked
> about this but I wanted to make sure we don't forget to figure out why, so
> I'm needinfo-ing you :)

I think you were talking about it with me. We check whether the entry is too big only in CacheEntry::SetPredictedDataSize() which is called from nsHttpChannel::CallOnStartRequest(). When the content length is unknown (-1) we always cache the entry. I think we should check the entry's size also in CacheFileIOManager::WriteInternal().
(In reply to Michal Novotny (:michal) from comment #15)
> I think we should check the entry's size also in
> CacheFileIOManager::WriteInternal().

Filed bug 1007071 on it.
Flags: needinfo?(honzab.moz)
(In reply to Jason Duell (:jduell) from comment #13)
> I'd like to understand how netflix manages to store 10GB worth of stuff when
> we allegedly enforce a 5MB max object size in the memory cache (so I'd
> assume large video loads wouldn't get stored).  Honza, we already talked
> about this but I wanted to make sure we don't forget to figure out why, so
> I'm needinfo-ing you :)

I don't have access to Netflix (location).  As Michal has already answered in comment 15, when the transfer encoding is chunked we don't restrict the size per entry.  In conjunction with this bug entries are left there forever.

I think it's worth another try after the patch here gets to Nightly.
https://hg.mozilla.org/mozilla-central/rev/9ac7ac429216
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Attached file memory-report.json.gz
Do patches land on the x64 build at the same time as the 32bit one? Because I'm still seeing this behavior.
QA Whiteboard: [@ NS_ABORT_OOM(unsigned int) | PL_DHashTableInit(PLDHashTable*, PLDHashTableOps const*, void*, unsigned int, unsigned int) | nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*)]
Robert, can you please recheck with the latest Nightly and if it still leaks tell us what you get at about:buildconfig?  Thanks!
Flags: needinfo?(rjward0)
Appears to still be happening. Session hasn't been open too long yet, but currently seeing

2,361.05 MB (100.0%) -- explicit
├──1,420.64 MB (60.17%) -- network
│  ├──1,419.88 MB (60.14%) ++ cache2

build config:


Build Machine

B-2008-IX-0007
Source

Built from https://hg.mozilla.org/mozilla-central/rev/4b6d63b05a0a
Build platform
target
x86_64-pc-mingw32
Build tools
Compiler 	Version 	Compiler flags
cl 	16.00.30319.01 	-TC -nologo -W3 -Gy -wd4244 -wd4819 -we4553
cl 	16.00.30319.01 	-wd4099 -TP -nologo -W3 -Gy -wd4251 -wd4244 -wd4345 -wd4351 -wd4482 -wd4800 -wd4819 -we4553 -GR- -DNDEBUG -DTRIMMED -Zi -UDEBUG -DNDEBUG -O1 -Oy-
Configure arguments

--target=x86_64-pc-mingw32 --host=x86_64-pc-mingw32 --enable-crashreporter --enable-release --enable-update-channel=nightly --enable-update-packaging --enable-jemalloc --with-google-api-keyfile=/c/builds/gapi.data --enable-signmar --enable-profiling --enable-js-diagnostics
Flags: needinfo?(rjward0)
Thanks, the patch is definitely there.  Can you please look at about:cache and see how many entries and of what size are in the memory-only cache?

It might be we also need bug 1007071 here.  Thanks.
Flags: needinfo?(rjward0)
http://paste2.org/0pj9M2mU

memory
Number of entries: 	3210
Maximum storage size: 	32768 KiB
Storage in use: 	3400511 KiB
Storage disk location: 	none, only stored in memory 

average entry size: 1082499.17755231 bytes (1.03MB)
Flags: needinfo?(rjward0)
Reopening.  Easily reproducible locally with a large PHP generated file.  The problem is that the CacheFile::IsWriteInProgress condition is wrong.  mMetadata->IsDirty() is left at |true| forever on memory only entries.  

I will add a test this time.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → ASSIGNED
Attached patch ad1 v1Splinter Review
This time tested more carefully.  An automated test will be a bit more complicated.  In the mean time, please check the code (we can land it only manually tested this time).
Attachment #8422065 - Flags: review?(michal.novotny)
Whiteboard: [MemShrink] → [MemShrink:P1]
Attachment #8422065 - Flags: review?(michal.novotny) → review+
Crash Signature: [@ NS_ABORT_OOM(unsigned int) | PL_DHashTableInit(PLDHashTable*, PLDHashTableOps const*, void*, unsigned int, unsigned int) | nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*)]
QA Whiteboard: [@ NS_ABORT_OOM(unsigned int) | PL_DHashTableInit(PLDHashTable*, PLDHashTableOps const*, void*, unsigned int, unsigned int) | nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*)]
Keywords: crash, mlk
https://hg.mozilla.org/mozilla-central/rev/0f5952a67494
Status: ASSIGNED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
This crash is currently #2 on the 30beta topcrash list. Is the fix one that should safely ride the trains or should it be uplifted?
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #29)
> This crash is currently #2 on the 30beta topcrash list. Is the fix one that
> should safely ride the trains or should it be uplifted?

Let's get a new bug on file for this. Cache2 is not active on 30 beta, this must be something else. The signature itself is, from all I know, generic OOM (but it sounds bad that CC is falling over, running out of memory while trying to free memory is rather nasty).
sorry, typo, it's #12 not #2. But still topcrash worthy on beta.  I'll file a new bug against [@ NS_ABORT_OOM(unsigned int) | PL_DHashTableInit(PLDHashTable*, PLDHashTableOps const*, void*, unsigned int, unsigned int) | nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*)] for 30/31
You need to log in before you can comment on or make changes to this bug.