Closed Bug 197431 Opened 21 years ago Closed 8 years ago

implement 2-level disk cache for increased performance

Categories

(Core :: Networking: Cache, defect, P1)

defect

Tracking

()

RESOLVED WONTFIX
mozilla1.4alpha

People

(Reporter: gordon, Unassigned)

Details

By keeping some of the disk cache block buffers in memory after the descriptors
have closed, rather than flushing them to disk, we can implement a memory cache
layer on top of the disk cache, which could reduce both reads and writes to disk.

In some cases, we may be able to completely avoid writing to disk at all, such
as entries which get immediately updated.

I expect to see the biggest difference in back/forward performance (iBench)
rather than in jrgm's page-load cycle.
Priority: -- → P1
Target Milestone: --- → mozilla1.4alpha
I'm wondering how much performance it is actually possible to squeeze out by
doing this?  The OS disk cache should already be suppressing some of the
unnecessary writes and making the reads extremely cheap.  Are there any
measurements to know how much time is currently being spent shuttling data
between the cache and disk?
Linux seems to do a much better job caching than other platforms.  Whenever we
reduce the number of file operations, we tend to see benefits mainly on Windows
and Mac.

My plan is to quickly implement something simple and measure the difference. 
I'll post numbers as I get them.
Status: NEW → ASSIGNED
even on linux there is a measurable cost to creating a new disk cache file. 
this bug is mainly about reducing "insertion" cost.  currently, we block layout
to write disk cache files.  if you simply disable the disk cache, then you'll
see about a 3-5% Tp improvement (not counting cached page loads).
Hmm, thanks for that insight Darin.  If there's a significant delay getting data
out to the OS then a cascaded cache might have a performance gain even though
the time spent in cache code would increase.  I take it that switching to async
cache writes would be a non-trivial change to make?
well, for documents under 16k, we currently flush to disk when the cache entry
is closed.  the plan for this bug is to delay this work until the entire page
has completely loaded, or until some threshold on the number or size of
unflushed cache entries is reached.  we have a solution in mind that shouldn't
be overly complex.  documents under 16k represent ~80% of the content stored in
the disk cache, so this seems like the right step to take.
Actually, as _CACHE_MAP_ is only flushed at close (and later may be also during
a forced 'sync' / 'flush' point), we can just as easily postpone any flushes on
diskcache block files, as they only will be valid (i.e. re-usable) if also
_CACHE_MAP_ is valid (i.e. flushed). 
Exception would be that may be other users are wanting to read the cache entry
between last close and the flush... To be explored...
Actually, as _CACHE_MAP_ is only flushed at close (and later may be also during
a forced 'sync' / 'flush' point), we can just as easily postpone any flushes on
diskcache block files, as they only will be valid (i.e. re-usable) if also
_CACHE_MAP_ is valid (i.e. flushed). 
Exception would be that may be other users are wanting to read the cache entry
between last close and the flush... To be explored...
Actually it seems that the DiskCacheBlockFile code allready doesn't flush on 
writes, even after closing the cache entry. The block files are only flushed on
close of Mozilla. So the raised issue is allready covered in the current code,
but some remains are left (some commented out code, with //XXX Flush records...).

This bug could remove this old commented out code, and even more only 
update the _CACHE_MAP_ file on close, instead of marking it dirty.
This would save an update/create of the _CACHE_MAP_ at startup. Instead of that
the code could just rename/remove the _CACHE_MAP_ at startup, and write a new
one at close. 

Flushing the blockfiles and its cache_map file when Mozilla is open doesn't make
sense, because as soon as something is cached (new or updated) it is really
'dirty' again. Flushing the files only make sense at close, to make sure that
the files and the cache_map are consistent with the next startup of Mozilla/Necko...
Addition to last comment: 
If we only write _CACHE_MAP_ at close of the diskcache (close of app), then we
don't need to keep this file continously opened, saving an open filedescriptor
during runtime. 
Another trick would be to copy the database/filesystem behaviour for this: 
* regularily synchronise the disk with memory (flushing everything)
* keep a 'transaction log' for all changes since last sync.

In case of a crash, one could then update _CACHE_MAP_ with the transaction log
from the last sync. This however means that still the transaction log needs to
be flushed regurarily (after each disk cache change?)
Summary:
1. Postpone flushing of buffers < 16K (until page has loaded)
2. Asynschronous writing/flushing to blockfiles (and normal files?)
3. Don't keep _CACHE_MAP open all the time, do [open,write,close] to update it
during shutdown

3. is easiest, results in less open file handles, and has minimal risk
2. depends on support for asychronous file writing, and may have more overhead)
(see for example: 
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpconASynchronousFileIO.asp)
or http://www.die.net/doc/linux/man/man3/aio_write.3.html
or http://www.kegel.com/c10k.html)
1. is most difficult as it required some means to detect for 'page has loaded'
> 2. depends on support for asychronous file writing, and may have more 
> overhead)

Necko has support for asynchronous file writing.  See nsIStreamTransportService.
 In summary, we have 4 background
threads used for reading from and writing to streams.  I don't
think there is much advantage to using system level APIs for
asynchronous file I/O.


> 1. is most difficult as it required some means to detect for 'page has 
> loaded'

I think we should add a nsIObserverService notification for
when a page load completes.  Then we could use this as a signal
to flush those cache entries.  We might want to delay this flush
on a short interval timer, so that we defer flushing if there
is another page being loaded right away.
(In reply to comment #11)
> 1. is most difficult as it required some means to detect for 'page has loaded'

If that's such a problem, why don't we do it with a timer ? See comment 10,
although I wouldn't go so far as using complete transaction logs.
Assignee: gordon → nobody
Status: ASSIGNED → NEW
QA Contact: tever → networking.cache
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.