Closed Bug 194296 Opened 22 years ago Closed 9 years ago

minimize memory cache size and growth

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: cathleennscp, Unassigned)

References

Details

(Keywords: memory-footprint, Whiteboard: post-m3?)

this bug can be broken down a bit: 1. allow cache sessions to have a maximum cache entry size (hard limit). e.g., imagelib should be able to say i don't want to store decoded images larger than N bytes and have the cache take care of all the dooming, etc. 2. reduce overhead resulting from cache data structures (bug 192353 and probably others). 3. anything else?
Depends on: 192353
Blocks: 215636
QA Contact: tever → cacheqa
might have done all that we can here. will address any problems that still exist after some space trace analysis in a different bug
Whiteboard: post-m3?
Just a note of some thinking on this. MetaData of cache entries is reasonable optimized internally. However, get/setMetaDateElement uses a key/value system, whereby both are strings. However, only a fairly restricted set of keys is used: nsFtpConnection uses "servertype", nsHttpChannel uses "response-head", "request-method", and "auth" nsHTMLDocument uses "charset" nsImapURL uses "ContentModified" mimemoz2.cpp uses "contentType" tabbrowser.xml uses "Icon" (only to set a flag, but not to use it) To maybe instead a key, a enum of one byte is sufficient! This saves for each entry the saving of these key names in the diskcache... The Metadata is now stored as "response-head"'/0'"...."'/0' for each set key. Using a keyByte, assuming about 3 MetaDataElements per cacheItem, and about 3000 items in the diskCache, saves about 72 K of the diskmap data. A special case are the 'request-*' headers saved as MetaDataElement in the cache by nsHttpChannel. Question again, do they really need to be saved/cached? Other than cookies (which are handled differently anyway) it doesn't seem to be really used anywere. Even further, question is whether the full 'response-head' really needs to be saved on disk. These could save about 50-100 bytes per entry (assuming 3000 entries) this again a 150K-300K savings... (Beside codesize savings...) ---------------------------------------------------- key: http://www.doom9.org/images/mp4ui-optimize.gif fetch count: 2 last fetched: 01/07/04 13:47:14 last modified: 01/07/04 11:54:48 expires: 01/16/04 15:37:30 Data size: 3492 file on disk: none Security: This document does not have any security info associated with it. Client: HTTP request-method: GET request-Accept-Encoding: gzip,deflate response-head: HTTP/1.1 200 OK Via: HTTP/1.1 ips-alm-a.alm.nl.ibm.com (IBM-PROXY-WTE) Vary: Accept-Encoding Date: Wed, 07 Jan 2004 11:52:10 GMT Server: Apache/1.3.27 (Unix) PHP/4.3.1 mod_gzip/1.3.26.1a Last-Modified: Tue, 07 Oct 2003 22:44:51 GMT Etag: "cca6a-da4-3f8341e3" Accept-Ranges: bytes Content-Length: 3492 Content-Type: image/gif ---------------------------------------------------- Also 'request-method' could changed from a key/value to a one byte flag in the cacheEntry itself. This saves bytes in the disk cache, as well as saving code like this (note that http is (beside images) the most occurring cache entry): // Get the method that was used to generate the cached response 1232 rv = mCacheEntry->GetMetaDataElement("request-method", getter_Copies(buf)); 1233 if (NS_FAILED(rv)) return rv; 1234 1235 nsHttpAtom method = nsHttp::ResolveAtom(buf); 1236 if (method == nsHttp::Head) { 1237 // The cached response does not contain an entity. We can only reuse 1238 // the response if the current request is also HEAD. 1239 if (mRequestHead.Method() != nsHttp::Head) 1240 return NS_OK; 1241 } 1242 buf.Adopt(0); 1243 Ok, looking further down: // Parse the cached HTTP response headers 1259 NS_ASSERTION(!mCachedResponseHead, "memory leak detected"); 1260 mCachedResponseHead = new nsHttpResponseHead(); 1261 if (!mCachedResponseHead) 1262 return NS_ERROR_OUT_OF_MEMORY; 1263 rv = mCachedResponseHead->Parse((char *) buf.get()); 1264 if (NS_FAILED(rv)) return rv; 1265 buf.Adopt(0); 1266 1267 // If we were only granted read access, then assume the entry is valid. 1268 if (mCacheAccess == nsICache::ACCESS_READ) { 1269 mCachedContentIsValid = PR_TRUE; 1270 return NS_OK; 1271 } >> If mCaccheAcces == ACCESS_READ, the early return makes the parsing of the cached header redundant. May be first this check, then parse? 1272 1273 // If the cached content-length is set and it does not match the data size 1274 // of the cached content, then the cached response is partial... 1275 // either we need to issue a byte range request or we need to refetch the 1276 // entire document. 1277 PRUint32 contentLength = (PRUint32) mCachedResponseHead->ContentLength(); 1278 if (contentLength != PRUint32(-1)) { 1279 PRUint32 size; 128 If we set the 'content-length' at creation time as a PRUint32 item in the cache entry (e.g. FullContentLength), we don't need to save and parse the response header just to get the length... Similary: why not store these: 1319 if (mCachedResponseHead->NoStore() || 1320 (mCachedResponseHead->NoCache() && mConnectionInfo->UsingSSL())) and 1330 else if (mCachedResponseHead->MustValidate()) { as bitflags in the cacheentry? Also, the existance of 'response-*' or 'response-cookie' set 'ResponseWouldVary' only based on the stored header. So instead of the header just store a 'vary' flag for this. One very specific case are other cases of 'response-...' fields, (also) stored as MetaData, which are compared between current header and cached header, when also the 'Vary' field is set in the header. At least the MetaData fields are redundant, because there are also in the parsed header (or other way round). Question is: Is the 'Vary' thing really used? Can we not just regard anything with a 'Vary' as to be validated anyway? And probably best not to cache anyway? So, concluding: * MetaDataElement fields can be further optimized (using a byte as key) * Storing the 'response-header' can be reduced in storing a couple of specific fields (fullContentLength, noStore, noCache, mustValidate: 4 bytes and 3 bits) instead of 50-100 bytes string. This also saves some processing (reparsing cached headers and such) The move of about 5 bytes of HTTP specific data from MetaData to the CacheEntry itself is worth the trouble, most (if not all) of the (disk) cache items are HttpChannel items.
Oops, further looking through the code, I can see that a cached 'response-header' truly useful. Even so, probably still worth of moving out from MetaData to CacheEntry itself, as 99% of disk cached elements are html.
> as 99% of disk cached elements are html. that seems a bit high, considering that most html pages contain images and other media elements.
christian: i think alfred meant "http" instead of "html"
Yes, http requests and in the disk cache, which are most heavy users of CacheMetaDataElements. The memory cache is nowadays only (internal) images. I am working on a hack to test out the impact of using 'byte' keys. In this, I can reduce the the CacheMetaDataElements coding a lot. Instead of a linked lists build and maintained, only a 'disk-save' ready buffer is used containing key/values as: keyByte, valueString, zeroBytes. Get/Set and Flatten/Unflatten directly operate from this buffer. This results in only one malloc'ed piece per cache entry (which is resized if more than one entry is set). Profile is currently either 0, 1 or 3 elements per cache. Walking this keyvalue should be as fast or faster than current code. It reduces the need for Flatten/unflatten... Furthermore the current 'VisitMetaDataElements' stuff is only used by the 'about:cache', and adds several layers of calls, and a callback IDL, class and function just to dump the metadata fields to some html. Changing that to a function call from 'about' to the cache with the buffer and the formatstring to use, instead of a callback will reduce complexity of the IDL for this bit, somewhat. So, first I will do some measurements if this is leading somewhere (in terms of codesize, cachespeed, memorysize).
Created bug 230675 to simplify nsICacheVisitor (20K codesize savings!) as a spin off of this one, as it is of interrest for all.
moving minimo bugs to the new bugzilla product.
Component: Networking: Cache → General
Product: Browser → Minimo
Version: Trunk → unspecified
Blocks: 236580
No longer blocks: 215636
Component: General → Networking: Cache
Product: Minimo → Core
Summary: [minimo] minimize memory cache size and growth → minimize memory cache size and growth
Version: unspecified → 1.0 Branch
Version: 1.0 Branch → Trunk
So, this bug is really about reducing the size of the memory cache which is only about 'caching' images... Currently, it seems that: a. images are not flushed from cache b. cache size limits are not checked c. images are cached in the fully decoded 24-bit+8-bit format (in some case both the frame format and OS dependent format) So the problem is not really in the cache code, but in the way that images are maintained in memory...
Blocks: 213391
Status: NEW → ASSIGNED
The discussion about the extravant memory usage of the 'caching' of images is raised again by the bfcache discussion. It seems that when using many tabs, with many images per page, and a lot of 'viewers' (bfcache) the memory usage of decoded images is a lot. This is not due to the memory cache itself, but to the way that decoded images (nsImage and friends) are kept around in the document object for that page. Two things that can be done to improve this: 1. Enable to prevent keep decoded images around for 'bfcached' pages. 2. Allow to disable the use of the memory cache for images. Ad 1. This would possibly slowdown bfcache, as images need to be reloaded (hopefully from the disk cache), decoded, and converted to the OS pixmap. But this would also prevent the very extravant memory usage caused by keeping all those images around, just for the 'fast backwards' feature. Created bug 327280 to investigate this further... Ad 2. The memory cache for images keeps dereferenced decoded images as long as there is space. The actual cache limits is only used for determining how many dereferenced images are kept. That's why so many people complain that the memory cache doesn't seem to keep within limits, as all referenced images are allways kept in the 'cache'. To allow to prevent keeping dereferenced decoded images, one could disable the memory cache completely, but better would be to either: a. Implement a preference to not administrate decoded images in the memory cache, and therefor not to keep dereferenced decoded images around (effectively disabling memory cache only for images, but enabling it for other things to be cached) b. Split the memory cache into an 'image cache' and 'memory cache for other items (http cache)', allowing to selectively enable/disable either of them. Propose to do option 2b: split the memory cache into (HTTP) memory cache and to image cache...
Assignee: darin → nobody
Status: ASSIGNED → NEW
Firefox 10 has built-in BarTab, so is this bug still valid?
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.