Closed Bug 224441 Opened 21 years ago Closed 8 years ago

disk cache does not include indices in size calculation; can run out of space on small partition

Categories

(Core :: Networking: Cache, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: bugzilla, Assigned: alfredkayser)

Details

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5) Gecko/20031007
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5) Gecko/20031007

The disk cache seems to only use the sizes of the files stored in the cache in
calculating how much space it takes up. It ignores the size of the indices
(_CACHE_MAP_ , _CACHE_00[1-3]_) when working out how much space the cache is
taking up. This has the effect that a cache limited to a certain size actually
takes up more space than the setting browser.cache.disk.capacity is supposed to
be limiting it to. (If this behaviour is by design, maybe some UI work would
help stop the preferences-->advanced-->cache page implying that the user is
being given the opportunity to set the size of the cache on disk.)

If the disk cache is on a small partition, it can happen that as well as
exceeding the soft limit of browser.cache.disk.capacity, it can fill the
partition to capacity, despite browser.cache.disk.capacity being less than the
size of the drive. 

This can cause the partition to run out of space, resulting in pages loaded from
the cache being empty or truncated.

Reproducible: Always

Steps to Reproduce:
1. Set up an 8 MB partition (or ramdisk) 
2. Set the cache to 6 MB, and the cache location to the partition, using the
cache property panel.
3. Use mozilla for a while, until the cache fille

Actual Results:  
Partition fills past capacity; pages display corrupted, disappear when back
button pressed. Shift-reload necessary to make corrupted page display.

Expected Results:  
Partition fills to 6 MB; as indices grow larger, fewer files are stored in cache. 

Workaround: disable disk cache (browser.cache.disk.enable=false)
note that _CACHE_00[1-3]_ files contain also all cached objects <= 16 KB in size
This is an automated message, with ID "auto-resolve01".

This bug has had no comments for a long time. Statistically, we have found that
bug reports that have not been confirmed by a second user after three months are
highly unlikely to be the source of a fix to the code.

While your input is very important to us, our resources are limited and so we
are asking for your help in focussing our efforts. If you can still reproduce
this problem in the latest version of the product (see below for how to obtain a
copy) or, for feature requests, if it's not present in the latest version and
you still believe we should implement it, please visit the URL of this bug
(given at the top of this mail) and add a comment to that effect, giving more
reproduction information if you have it.

If it is not a problem any longer, you need take no action. If this bug is not
changed in any way in the next two weeks, it will be automatically resolved.
Thank you for your help in this matter.

The latest beta releases can be obtained from:
Firefox:     http://www.mozilla.org/projects/firefox/
Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html
Seamonkey:   http://www.mozilla.org/projects/seamonkey/
Setting disk cache size to 2000 KB and browsing until it's full in Firefox
1.5b2, about cache reports as follows:

Disk cache device

Number of entries: 	186
Maximum storage size: 	2000 KiB
Storage in use: 	1999 KiB
Cache Directory: 	J:\Documents and Settings\lewis\Local Settings\Application
Data\Mozilla\Firefox\Profiles\7c0c5mvh.default\Cache

DIR, on the other hand, reports 

              45 File(s)      2,703,816 bytes

Which is 2640 KiB .

If we exclude the _CACHE files from the calculation, the space taken is under
2000 KiB.

Thus, I believe all the original concerns are still valid: 

'use up to n MB for cache' still implies that this limits the size of the cache,
when it actually doesn't; if browser.cache.disk.capacity were used to notify the
browser of the maximum free space available for the the cache to grow to into,
the cache would still overflow it. If the user's space usage is limited by
capacity or quota, the cache will still experience write errors and corrupt the
displayed pages.
(In reply to comment #3)
> If the user's space usage is limited by
> capacity or quota, the cache will still experience write errors and corrupt the
> displayed pages.

Write errors ought to not corrupt displayed pages. please file a bug about that.
Two issues here:
* size of _CACHE_MAP is not accounted for (which is between 9k and 132k)
* size of the metadata for each cache entry is not accounted for (see also bug
277473)

The first one is easy (patch underway...)
The second one is less easy, allthough:
If we make SetMetaDataElement of nsCacheEntry to also call Inc/DecTotalSize of
is mCacheDevice then also the MetaData size can be tracked in mTotalSize of both
Memory and Disk caches. 
Is is possible however that MetaData is set in the entry before it is
connected/stored in a device? If so, then the SetCacheDevice call should check
for MetaData allready set, and then update mTotalSize if needed.
To correctly account for the overhead of the map and block files (total size of
_CACHE_MAP, and the 4K header for each _CACHE_001 file).
Changes:
* renamed mDataSize to mTotalSize to better reflect it usage
* inline 3 getters for totalsize, entrycount and capacity of DiskCacheDevice
* replaced the 3 with kBlockFiles for style

Testscript:
* Started with empty cache.
* Visitied www.ibm.com and www.mozilla.org
* Restarted seamonkey
* Visitied www.ibm.com and www.mozilla.org and some others (netscape.com,
tweakers.net, ...)
* about:cache
* Close seamonkey

Testresult:
Number of entries:	284
Bytes according to Windows: 1.214.242 bytes
Size on Disk:		    1.253.376 bytes
Storage in use: 1225 KiB (= 1.254.400 bytes)

Problably because of accounting of datasize and metasize in chunks of 1024
bytes, the actual result is very close to the 'about:cache' result

At least, this will make the accounting of total size in the diskcache more
precise.
Attachment #199409 - Flags: review?(darin)
Actually, for the diskcache, it may be even better to track the total file size
of all the files in the cache as the 'mTotalSize'. Because, within the
blockfiles there will be other overhead, as soon as the blockspace is
fragmented, so that the blockfile also contains empty blocks.

If we really want to maximise the number of bytes stored on disk, the best way
is thus to account for the actual file sizes (this will, BTW, still not account
for filesystem overhead... ;-)). If we want to do this, I propose a separate bug
to do so.
I'm happy to ignore blockfile fragmentation.

I believe that the cache should primarily account for the data it stores on
behalf of its consumer.  If its consumer says that it wants to store 50
megabytes of data, then the cache should allow the consumer to store 50
megabytes of data.  If there is additional overhead, then that is an
implementation detail that should not be visible to the consumer of the API.  In
other words, I think we should only count data + meta-data (including cache key,
session name, etc.) in the size totals.

I think we should also make the cache more fault tolerant to deal with the page
load errors described in comment #0.
Status: UNCONFIRMED → NEW
Ever confirmed: true
At the moment, the GUI does not let the user say that they want to store 50
megabytes of data, but that they want the cache to take up 50 megabytes. If this
is to change, please don't forget to fix the GUI: at the moment it reads 'use up
to'[textbox]' MB of disk space for the cache', which is stating that it limits
the space the cache takes up to [textbox] megabytes.

It should instead read something like 'store up to'[textbox]' MB of data in the
cache'.
Lewis: Sure, but that is a different problem.  One could argue that the text in
the preferences UI is vague enough to cover either implementation.
I vote for making the hard limit the limit for diskusage (at least as measured
by total file size). For anything stored as files in cache, this is allready the
case. For material stored in BlockFiles it is just a question of measuring total
size of the BlockFiles and the _CACHE_MAP itself.

As seen from some of the comments above people want to be able really control
the maximum storage usage of the cache. Caching is always about providing a
limited space and using that as efficient as possible (compare the L1/L2 caches
of a CPU, or the 8MB cache of the disk controller, etc). 
Fair enough, but what happens in the extreme edge case where the user sets the
size less than 4k?  Do we still create _CACHE_MAP_?
Alfred: ping?  any response to comment #13?  also, are we sure that our block files do not become so fragmented that the fragmentation becomes a great limiting factor in our ability to store data in the cache?
As for the extreme case: any value below 50K or so should be considered as a useless value for diskcaching. IE for example only the unit is MB, so the minimum value is then automatically 1MB. (0 is no cache).

As for the fragmentation, with my patch for the bitmap handling of the blockfiles, this is much less. Note that any entry uses at most 3 bits (1, 2, or 3 bits), and free'd entries are always filled up from the begin of the map. In worst case, the whole bitmap may have only 1bit holes left, which will be used soon enough as at least 1/3trd of the new entries will take up the 1bit holes, and there will always be 2 or 3 bit entries that will get freed.

Note, the main reason (IMHO) that users want to set a limit is because of available diskspace, not precisely to want she/he wants to store in the cache (as you don't have any control actual storage of objects). One wants generally given a provided size (like the 256MB CPU cache) make the most use of it...
-> default owner
Assignee: darin → nobody
Assignee: nobody → alfredkayser
What should we do with this bug?
Find an actual reviewer for the patch?
Michal or bjarne are the most obvious people to look into this.
Attachment #199409 - Flags: review?(darin.moz) → review?(michal.novotny)
Comment on attachment 199409 [details] [diff] [review]
Correctly account for overhead of map/blockfiles in totalsize

There is one fundamental problem with this patch. There is no information about the free space in the block files. This means:

1) When the cache becomes full we start evicting some entries, but evicting entries stored in block files won't free any space.

2) We preallocate block files in 4MB chunks (although this code is temporarily disabled right now), so even an empty cache occupies 12MB. If user sets the cache capacity below this limit, then nothing will be stored in the cache.

3) Even if we would count the free space in the block files, there would need to be some advanced logic when checking the cache fullness. E.g. let's say that the cache is full according to total size but there is a free space in some block file, then we don't need to evict any entry but we need to ensure that the entry will fit into the certain block file...
Attachment #199409 - Flags: review?(michal.novotny) → review-
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: