Open Bug 559729 (http_cache) Opened 14 years ago Updated 21 days ago

Meta-bug: improve HTTP cache

Categories

(Core :: Networking: Cache, defect, P3)

defect

Tracking

()

People

(Reporter: jduell.mcbugs, Unassigned)

References

(Depends on 5 open bugs)

Details

(Whiteboard: [evang-wanted][Snappy:P2][necko-backlog])

Attachments

(1 obsolete file)

This a bug to track our strategy for improving our HTTP cache's performance, following out of our 2010 Caching Summit.

Proposal:  

1) we should see how well bug 175600 (8192 file max count in cache), bug 290032 (better hashing algorithm), bug 193911 (double default cache size) play together, and land them if they improve our hit rates.

2) Look into Youtube videos blowing away the cache (bug 81640).  May be fixed by larger cache, or we may want to reduce priority based on file size or content-type.

3) Once we've got a baseline improved cache, compare its hit rate with Chromium's.  Lift Chromium's cache is if seems like a win and not too much work (bug 514213).

4) Explore using different cache retention policies.  Some mentioned at the Mozilla Caching Summit 2010 include:  prioritizing by content-type (CSS and JS before images, etc.); supplementing LRU with smartness about what sites a visitor goes to frequently/daily; and possibly allowing servers to specify cache priority headers (needs some smartness to prevent abuse).  Anyone who is interested in these, please open a bug with your ideas and link it to this bug.
Some other cache bugs, complete with patches, awaiting review and commit, that reduce code size and improve performance. 
Let's first do some dusting before starting to revamp.
224441: disk cache does not include indices in size calculation; can run out of space on small partition
456547: Only create the offline cache when really needed
329674: Make the disk cache favor local system endian byte order
331032: Decom nsICacheEntryInfo, replace by ExpirationTime attribute to nsICachingChannel
417154: nsMemoryCacheDevice::EvictEntriesIfNecessary could be optimized
230675: 'decom' of nsICacheVisitor.idl: saves 10% / 150K from nkcache_s.lib
405407: Merge nsDiskCacheStreamIO and nsDiskCacheStreamOutput
2) Look into Youtube videos blowing away the cache (bug 81640).  May be fixed
by larger cache, or we may want to reduce priority based on file size or
content-type.
Actually, YouTube itself doesn't blow the cache, only when users download the video itself (using a youtube download extension or such), and even then as soon as the destination path is known and the size is half the cache, further writing is to the destination path. 

bug 55307, bug 55690 and bug 69938 are about downloads being saved first in the cache, and only flushed from the cache the downloaded chunk is larger than half the cache.  When the cache is enlarged, we could consider to reduce the limit for large file in the cache to less than half the total.

A better solution would be to doom entries as soon as they 'saved' elsewhere, but when? When saving a software package, it is generally no longer needed in the cache, but when saving an image, the same image may be reused in a next page visit...
OS: Linux → All
Hardware: x86 → All
Version: Other Branch → Trunk
Whiteboard: [evang-wanted]
> When the cache is enlarged, we could consider to reduce the 
> limit for large file in the cache to less than half the total.

+1 on that idea. Perhaps as a starting point, we could limit large files to the current max size (25MB) when we double the cache size: if there are better ideas, I'd be interested to hear them.  Alfred, are you willing to open a bug for this and/or close the other bugs, so we've got a more unified approach going forward? 

We've landed the HTTP hash fix already (bug 290032 comment 94)  So we should be ready to bump up the file limit and cache size and see how we fare.  Except that doubling the cache size caused perf regressions last time we tried to land (bug 193911 comment 35).  Is anyone willing to track that down?

Re: list of other suggested bugs. For now, let's limit this meta-bug to things that might significantly affect the cache hit rate, as opposed to minor bugs or things that optimize load performance when we do hit cache.  (Yes, those patches should be reviewed and hopefully landed.  But I think improving our hit rate is low-hanging fruit that can hopefully be had just by tweaking a couple of knobs, a little debugging, and some work on large files.)

P.S. Alfred, please make sure to list bugs as "bug xxxx, bug yyyy", so bugzilla creates a link to them.
No longer depends on: 290032
Blocks: 559942
Added new bug to pick default cache size based on user's free space available, as other browsers are now doing.
No longer blocks: 559942
Depends on: 559942
The entire disk cache disappearing when the browser crashes can significantly impact cache hit rate: bug 105843

In some cases, the cache appears not to be used, so effectively there's a cache miss: bug 102809

Collisions can cause items to be evicted from the cache even when it's not full: bug 387545. This might be a minor problem now that there's a better hash function.
Depends on: 387545
Adding bug 105843 and bug 387545, though I suspect that the code/perf tradeoff makes them less low-hanging fruit than the other bugs here.
Depends on: 105843
Sorry... the bug that causes the cache not to be used in some cases is bug 120809, not bug 102809 as I wrote in comment #6.
Alias: http_cache
Depends on: 567360
Depends on: 569709
Depends on: 571091
Depends on: 578541
Depends on: 55307
Depends on: 588507
Depends on: 588509
Depends on: 588521
Depends on: 589120
Depends on: 584283
Depends on: 585777
Depends on: 592422
Depends on: 593198
Depends on: 592520
Depends on: 588804
Depends on: 585752
Depends on: 596714
Depends on: 596778
Depends on: 597224
Depends on: 595389
Depends on: 596476
Depends on: 597304
Depends on: 598739
Depends on: 599065
Depends on: 598243
Depends on: 602611
Depends on: 454001
Depends on: 614039
I do not know if this is the right place for saying this but I was just looking at my about:cache page in ff4b7 which I have had open a few days. What I mostly noticed was most of the cache entries where expired or never supposed to be cached (expired 1970-01-01).

I ask why these entries are even cached or at least cleaned up during times that the browser is not in use? At least cleaning the cache of old entries while idle would save some pref since the cache would not have to clean while the browser to trying to load pages.

Also I was looking at the Net tab in firebug earlier while browsing around and noticed that a certain css file was not caching. From what I know about caching its headers look like it should have been caching for at least 2 days. Only thing I could see that may have caused it was the uri had a server variable on the end.
(http://cdn.animea-server.net/lite.css?v=1)

Other than that I look forward to a good cache in my favourite browser. In the past (and I may do it again) I have set up a local squid caching proxy simply because it did a much better job at caching and made many of the page I look at load instantly compared to just relying on the browser's cache.
(In reply to comment #9)
> Also I was looking at the Net tab in firebug earlier while browsing around and
> noticed that a certain css file was not caching. From what I know about caching
> its headers look like it should have been caching for at least 2 days. Only
> thing I could see that may have caused it was the uri had a server variable on
> the end.
> (http://cdn.animea-server.net/lite.css?v=1)

This file is cached without flaw in my browser, which is a recently nightly.  I can see the cached usage and 304 responses on refresh.

AFAIK, entries with a zero expiry (1970, etc.) are ones that either have expired or had no expiry set.  If they're in your disk cache that way, they probably had no expiry set (which doesn't necessarily mean they are expired.)

But even expired items have to be worked with for a period of time to render the page, so they live in memory for a while.

-[Unknown]
(In reply to comment #10)
> This file is cached without flaw in my browser, which is a recently nightly.  I
> can see the cached usage and 304 responses on refresh.

YAY! So it must have been fixed since b7 came out (I double checked the cache lists and for b7 it was definitely not caching).

> AFAIK, entries with a zero expiry (1970, etc.) are ones that either have
> expired or had no expiry set.  If they're in your disk cache that way, they
> probably had no expiry set (which doesn't necessarily mean they are expired.)

Maybe this has also been dealt with since b7. After looking further into it, it seems to be mostly items in the memory cache and at that most of them are facebook like.php stuff with the following headers. To me these headers say they should not be there, especially for 3 days after the page was closed like most of mine have been.

Cache-Control: private, no-cache, no-store, must-revalidate
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Pragma: no-cache

For the disk cache on proper look (had to save page to html file and view with text editor since it was hanging firefox while rendering) the bulk of the items are fine here. Most of what I saw in there was the couple of thousand google-analytics.com 35byte pages which probably have a no-cache header but doesnt.
> I do not know if this is the right place for saying this 

Hugh et al,

This is a very interesting discussion, but it belongs in its own bug, not in the tracker bug for all HTTP cache bugs.  Could you please open a new bug (under Core::Networking::Cache) and move the discussion there?  Thanks.
I went ahead and created bug 619376 for the issues in comment 9 through 12.
Depends on: 647391
Depends on: 648429
Depends on: 648605
Depends on: 645848
Depends on: 650995
Depends on: 651234
Depends on: 662436
Depends on: 651011
Depends on: 663341
Depends on: 663580
Depends on: 663979
Depends on: 665707
Depends on: 666059
Depends on: 671971
Depends on: 674869
Depends on: 683762
Depends on: 683817
Whiteboard: [evang-wanted] → [evang-wanted][Snappy]
Depends on: 701909
Whiteboard: [evang-wanted][Snappy] → [evang-wanted][Snappy:P2]
Depends on: 708436
Depends on: 512849
Blocks: 96032
Depends on: 752684
Depends on: 446876
Depends on: 831025
Depends on: 877301
Adding dep on the cache2-on-by-default bug.
Depends on: cache2enable
I suggest adding the following:
bug #233293
bug #804731
bug #864047

Being an end-user, I do not presume to actually update the list of "Depends on" for these.
Whiteboard: [evang-wanted][Snappy:P2] → [evang-wanted][Snappy:P2][necko-backlog]
Component: Tracking → Networking: Cache
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
Severity: normal → S3

The severity field for this bug is relatively low, S3. However, the bug has 37 votes and 57 CCs.
:kershaw, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(kershaw)

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Flags: needinfo?(kershaw)
Attachment #9386533 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.