Open Bug 559729 (http_cache) Opened 11 years ago Updated 3 years ago
Meta-bug: improve HTTP cache
This a bug to track our strategy for improving our HTTP cache's performance, following out of our 2010 Caching Summit. Proposal: 1) we should see how well bug 175600 (8192 file max count in cache), bug 290032 (better hashing algorithm), bug 193911 (double default cache size) play together, and land them if they improve our hit rates. 2) Look into Youtube videos blowing away the cache (bug 81640). May be fixed by larger cache, or we may want to reduce priority based on file size or content-type. 3) Once we've got a baseline improved cache, compare its hit rate with Chromium's. Lift Chromium's cache is if seems like a win and not too much work (bug 514213). 4) Explore using different cache retention policies. Some mentioned at the Mozilla Caching Summit 2010 include: prioritizing by content-type (CSS and JS before images, etc.); supplementing LRU with smartness about what sites a visitor goes to frequently/daily; and possibly allowing servers to specify cache priority headers (needs some smartness to prevent abuse). Anyone who is interested in these, please open a bug with your ideas and link it to this bug.
Some other cache bugs, complete with patches, awaiting review and commit, that reduce code size and improve performance. Let's first do some dusting before starting to revamp.
224441: disk cache does not include indices in size calculation; can run out of space on small partition 456547: Only create the offline cache when really needed 329674: Make the disk cache favor local system endian byte order 331032: Decom nsICacheEntryInfo, replace by ExpirationTime attribute to nsICachingChannel 417154: nsMemoryCacheDevice::EvictEntriesIfNecessary could be optimized 230675: 'decom' of nsICacheVisitor.idl: saves 10% / 150K from nkcache_s.lib 405407: Merge nsDiskCacheStreamIO and nsDiskCacheStreamOutput
2) Look into Youtube videos blowing away the cache (bug 81640). May be fixed by larger cache, or we may want to reduce priority based on file size or content-type. Actually, YouTube itself doesn't blow the cache, only when users download the video itself (using a youtube download extension or such), and even then as soon as the destination path is known and the size is half the cache, further writing is to the destination path. bug 55307, bug 55690 and bug 69938 are about downloads being saved first in the cache, and only flushed from the cache the downloaded chunk is larger than half the cache. When the cache is enlarged, we could consider to reduce the limit for large file in the cache to less than half the total. A better solution would be to doom entries as soon as they 'saved' elsewhere, but when? When saving a software package, it is generally no longer needed in the cache, but when saving an image, the same image may be reused in a next page visit...
OS: Linux → All
Hardware: x86 → All
Version: Other Branch → Trunk
> When the cache is enlarged, we could consider to reduce the > limit for large file in the cache to less than half the total. +1 on that idea. Perhaps as a starting point, we could limit large files to the current max size (25MB) when we double the cache size: if there are better ideas, I'd be interested to hear them. Alfred, are you willing to open a bug for this and/or close the other bugs, so we've got a more unified approach going forward? We've landed the HTTP hash fix already (bug 290032 comment 94) So we should be ready to bump up the file limit and cache size and see how we fare. Except that doubling the cache size caused perf regressions last time we tried to land (bug 193911 comment 35). Is anyone willing to track that down? Re: list of other suggested bugs. For now, let's limit this meta-bug to things that might significantly affect the cache hit rate, as opposed to minor bugs or things that optimize load performance when we do hit cache. (Yes, those patches should be reviewed and hopefully landed. But I think improving our hit rate is low-hanging fruit that can hopefully be had just by tweaking a couple of knobs, a little debugging, and some work on large files.) P.S. Alfred, please make sure to list bugs as "bug xxxx, bug yyyy", so bugzilla creates a link to them.
No longer depends on: 290032
Added new bug to pick default cache size based on user's free space available, as other browsers are now doing.
The entire disk cache disappearing when the browser crashes can significantly impact cache hit rate: bug 105843 In some cases, the cache appears not to be used, so effectively there's a cache miss: bug 102809 Collisions can cause items to be evicted from the cache even when it's not full: bug 387545. This might be a minor problem now that there's a better hash function.
Adding bug 105843 and bug 387545, though I suspect that the code/perf tradeoff makes them less low-hanging fruit than the other bugs here.
Depends on: 105843
Sorry... the bug that causes the cache not to be used in some cases is bug 120809, not bug 102809 as I wrote in comment #6.
I do not know if this is the right place for saying this but I was just looking at my about:cache page in ff4b7 which I have had open a few days. What I mostly noticed was most of the cache entries where expired or never supposed to be cached (expired 1970-01-01). I ask why these entries are even cached or at least cleaned up during times that the browser is not in use? At least cleaning the cache of old entries while idle would save some pref since the cache would not have to clean while the browser to trying to load pages. Also I was looking at the Net tab in firebug earlier while browsing around and noticed that a certain css file was not caching. From what I know about caching its headers look like it should have been caching for at least 2 days. Only thing I could see that may have caused it was the uri had a server variable on the end. (http://cdn.animea-server.net/lite.css?v=1) Other than that I look forward to a good cache in my favourite browser. In the past (and I may do it again) I have set up a local squid caching proxy simply because it did a much better job at caching and made many of the page I look at load instantly compared to just relying on the browser's cache.
(In reply to comment #9) > Also I was looking at the Net tab in firebug earlier while browsing around and > noticed that a certain css file was not caching. From what I know about caching > its headers look like it should have been caching for at least 2 days. Only > thing I could see that may have caused it was the uri had a server variable on > the end. > (http://cdn.animea-server.net/lite.css?v=1) This file is cached without flaw in my browser, which is a recently nightly. I can see the cached usage and 304 responses on refresh. AFAIK, entries with a zero expiry (1970, etc.) are ones that either have expired or had no expiry set. If they're in your disk cache that way, they probably had no expiry set (which doesn't necessarily mean they are expired.) But even expired items have to be worked with for a period of time to render the page, so they live in memory for a while. -[Unknown]
(In reply to comment #10) > This file is cached without flaw in my browser, which is a recently nightly. I > can see the cached usage and 304 responses on refresh. YAY! So it must have been fixed since b7 came out (I double checked the cache lists and for b7 it was definitely not caching). > AFAIK, entries with a zero expiry (1970, etc.) are ones that either have > expired or had no expiry set. If they're in your disk cache that way, they > probably had no expiry set (which doesn't necessarily mean they are expired.) Maybe this has also been dealt with since b7. After looking further into it, it seems to be mostly items in the memory cache and at that most of them are facebook like.php stuff with the following headers. To me these headers say they should not be there, especially for 3 days after the page was closed like most of mine have been. Cache-Control: private, no-cache, no-store, must-revalidate Expires: Sat, 01 Jan 2000 00:00:00 GMT Pragma: no-cache For the disk cache on proper look (had to save page to html file and view with text editor since it was hanging firefox while rendering) the bulk of the items are fine here. Most of what I saw in there was the couple of thousand google-analytics.com 35byte pages which probably have a no-cache header but doesnt.
> I do not know if this is the right place for saying this Hugh et al, This is a very interesting discussion, but it belongs in its own bug, not in the tracker bug for all HTTP cache bugs. Could you please open a new bug (under Core::Networking::Cache) and move the discussion there? Thanks.
Whiteboard: [evang-wanted] → [evang-wanted][Snappy]
Whiteboard: [evang-wanted][Snappy] → [evang-wanted][Snappy:P2]
Adding dep on the cache2-on-by-default bug.
Depends on: cache2enable
I suggest adding the following: bug #233293 bug #804731 bug #864047 Being an end-user, I do not presume to actually update the list of "Depends on" for these.
Whiteboard: [evang-wanted][Snappy:P2] → [evang-wanted][Snappy:P2][necko-backlog]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
You need to log in before you can comment on or make changes to this bug.