Closed Bug 514213 Opened 15 years ago Closed 10 years ago

Replace necko cache with the Google Chrome cache

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: michal, Assigned: michal)

References

(Blocks 1 open bug)

Details

There are several bugs regarding performance of the necko cache (#513074, #513008, #512849) and replacing it with Google Chrome cache is a possible solution. Here are some topics to be discussed before we can replace the cache.


1) Chrome cache isn't thread-safe. It is designed to be used on single thread only. This is a step back and it is a question if we should follow this approach (i.e. use cache on single thread too) or change the cache to be thread-safe.


2) As far as I can see there is no support for offline cache. I've talked with HonzaB about it and he said that this shouldn't be a big problem if the cache API will be the same (or at least very similar). We would keep our offline cache implementation and replace only memory and disk cache.


3) It is not possible to append data to the cache entry. This isn't actually a problem, this is just annoying (e.g. when resuming download that was interrupted and is partially cached).


4) Simultaneous reading and writing (bug #446876) is said to be supported according to documentation. I wonder if there is some catch since Chrome behaves the same way as Firefox. I.e. when downloading the same document in 2 tabs the second tab is blocked until the first finishes.


5) Chrome cache uses mmap() to memory map headers of the data files. I'm not sure if we have PR_MemMap() supported on every platform. I can't see support on OS/2 and BeOS (please correct me if I'm wrong). Btw. the side effect of using mmap() is that the cache files are endian dependent. I think (but I'm not sure) that with current necko cache it is possible to share profile between big-endian and little-endian machines.
1) which threads access the cache currently?
2) From what I understand of the chromium cache, I don't think the cache API can or should stay the same: the current APIs does various operations synchronously which we shouldn't support (because we really don't want to make mainthread calls out to the disk)
3) do downloads typically end up in the cache? What are the cases this affects in normal operation?
5) we support mmap everywhere that matters (we're using it unconditionally for JAR stuff). I really don't think we should care about profile arch-independence as long as we invalidate correctly.
Note that there are certain environments (e.g. a Solaris+Linux setup on Sparc and x86 with the home directory in NFS or AFS) where that would mean losing the cache fairly often.  Maybe that's ok.
I'm personally ok with that... it seems like an edge case that we don't want to optimize for. I'd like to make fastload arch-dependent too, to avoid all the stupid word-flipping we do now and let us use mmap there too.
What word flipping? Fastload uses little-endian order. That covers x86 and ARM. What systems are big-endian that we support?

Do you have any profile data showing byte reordering is above noise on big-endian systems? It's a mistake to prematurely optimize -- Dijkstra said the root of all evil. Worse still if there's no gain at all, rather than too little gain compared to opportunity costs.
 
/be
(In reply to comment #1)
> 1) which threads access the cache currently?

Most calls are from main thread and some are from nsStreamTransportService's thread (due to NS_AsyncCopy()). I'm not sure if there are any others...

> 3) do downloads typically end up in the cache? What are the cases this affects
> in normal operation?

By downloads I meant all transfers (e.g. images in the page). When you cancel the page loading by using stop button or by navigating back in the history these transfers are interrupted and the cache contains partial content. When visiting the page again nsHttpChannel validates this partial content, downloads only missing part and completes the cache entry. With Google Chrome cache it must first doom the old entry, create new entry, copy content from old entry to the new one and then append the missing part.
That doesn't actually discard the old data, it's just an implementation detail. Given the size-sorting strategy of the chromium cache that makes perfect sense.
(In reply to comment #1)
> 2) From what I understand of the chromium cache, I don't think the cache API
> can or should stay the same: the current APIs does various operations
> synchronously which we shouldn't support (because we really don't want to make
> mainthread calls out to the disk)

It could theoretically stay the same since chromium cache support both synchronous and asynchronous read/write. But I agree that we should avoid using it. Does it make sense to keep this old API and just add new methods/interfaces for async operations?
Why would you want to keep an old API we don't want people using?
Compatibility with extensions? But you are probably right...
Blocks: http_cache
There are a number of patches for the current cache code that only need a review and commit, that reduce cache code size and improve overall cache performance:
224441: disk cache does not include indices in size calculation; can run out of space on small partition
456547: Only create the offline cache when really needed
329674: Make the disk cache favor local system endian byte order
331032: Decom nsICacheEntryInfo, replace by ExpirationTime attribute to nsICachingChannel
417154: nsMemoryCacheDevice::EvictEntriesIfNecessary could be optimized
230675: 'decom' of nsICacheVisitor.idl: saves 10% / 150K from nkcache_s.lib
405407: Merge nsDiskCacheStreamIO and nsDiskCacheStreamOutput

Using mmap for reading cached file could be interesting, it helped jar reading, but only if the readers can be smart enough to see that the object is mmapped, and can be handled as a memory object. Using Read on mmapped object doesn't bring much benefit.

One of the weak areas of the current cache is the hashing, may be Chromium can help there?
Forgive my ignorance, but I do have a question...
Is there a mechanism for removing cached items based on a timestamp or some other trigger?  What I'm getting at here is understanding the potential of a garbage collector for caches (if there is not one already, which there probably is).

I see two obvious benefits of this:
1.  Eliminates keeping cached data for a site a user visits infrequently.
2.  Minimizes the necessary memory footprint required to hold said cached items.

I'm sure there are many other benefits as well.

If there is such a mechanism already in place, I would appreciate it if someone could summarize how it works just for my own knowledge.

Thank you for your time.
Paul: see http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsMemoryCacheDevice.cpp#52 and check out the cited LRU-SP paper if you can find it non-paywalled on the web.

Caches are a simplified memory structure compared to a garbage-collected heap, so the challenge is "replacement policy" or algorithm. You don't have to throw out anything just because time has passed.

Rather, new demands from the user's evolving workload cause old cached resources to be evicted.

This leads to web app developers needing more control over priority, and even the ability to pin resources in browser memory, but those are met by other caches such as the HTML5 App Cache (http://www.whatwg.org/specs/web-apps/current-work/multipage/offline.html#appcache).

/be
(In reply to comment #12)
> Paul: see
> http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsMemoryCacheDevice.cpp#52
> and check out the cited LRU-SP paper if you can find it non-paywalled on the
> web.
> 
> Caches are a simplified memory structure compared to a garbage-collected heap,
> so the challenge is "replacement policy" or algorithm. You don't have to throw
> out anything just because time has passed.

Again, I am ignorant to a very large extent, so bear with me... but although we don't have to throw something out, should we not try to do so somehow, in the interest of being more lightweight and efficient?
> 
> Rather, new demands from the user's evolving workload cause old cached
> resources to be evicted.

Would this be assuming the user's cache reaches a certain % usage?  For users with extremely large caches (say, one based dynamically on % of available hard drive, like IE and Chromium implements), it seems a cached item could remain in cache almost indefinitely, unnecessarily using both disk and memory resources.
> 
> This leads to web app developers needing more control over priority, and even
> the ability to pin resources in browser memory, but those are met by other
> caches such as the HTML5 App Cache
> (http://www.whatwg.org/specs/web-apps/current-work/multipage/offline.html#appcache).

So if I'm understanding this correctly (and again, I'm just starting to learn about this), it is almost as if HTML5 App Cache implements some database-type features to allow web app developers the control they need.  Could a similar methodology be used for standard caching, perhaps even sharing a common database-type setup with web app cache?

> 
> /be

Thank you for your time.
http://4.bp.blogspot.com/_TtjWg9_1wNM/TQKiUqYk1QI/AAAAAAAAANM/P0hKyEOGZWc/s400/howaboutnever1.gif
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.