Closed Bug 663341 Opened 13 years ago Closed 13 years ago

Avoid creating large numbers of files in HTTP cache

Categories

(Core :: Networking: Cache, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 593198

People

(Reporter: danialhorton, Unassigned)

References

(Blocks 1 open bug, )

Details

User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20100101 Firefox/5.0
Build Identifier: 

The new smartcache design defaults to a 1GB cache on large hdds resulting in up to 100,000 tiny files being created across a massive subdirectory tree. Mechanical HDD's cannot write(including deletion) many small files at once without flooding the hdd IO and causing thrashing which is what occurs when you clear a full cache (60,000+ files)

The new design would be suitable for a SSD but is fragmentation and thrashing hell on a mechanical hard drive.

Windows itself is castrated to a several hundred file delete rate per second for many small byte sized files usually maxing out a core while attempting to delete so many files. and when clearing the cache from within firefox it locks up the entire interface until the purge completes

Further, much of the content within the cache is under a KB. this in itself is a problem because you're using a 4K(as is the ntfs default on windows atleast) cluster for every single file under 4K which adds to fragmentation and a massive increase of the On Disk size of the files, Firefox's cache caps to the actual size limit but the On Disk size displays that up to 1.7x as much HDD space is actually in use.

Firefox should adapt an archive or image based cache that works similar to a virtual drive image, the thrashing would only occur when the cache file is first created, cache fragmentation would be eradicated similar to how a min/max identical page file remains unfragmented.

Reproducible: Always
This has been proposed several times before (please search before you file a bug !), but it isn't necessarily the best option, because you're basically implementing your own filesystem as an extra layer. Even Chrome doesn't use it.

Small files are already cached in the CACHE_00x files - if you see any outside of those files, that's because the CACHE_00x file was locked by another thread. Your single archive file would have the same problem.

See bug 593198 for a more immediate changes that need to be done. And bug 512849.

And indeed, clearing the cache is currently a very long operation.
"but it isn't necessarily the best option, because you're basically implementing your own filesystem as an extra layer."

of all the developers i've discussed the idea with external to the moz project, they thought it the most logical and given the open source virtual filesystems available easy as pie to implement


"Even Chrome doesn't use it."

Chrome is also ****, so whats your point.
Please do find the original for this and mark this duplicate...
Whiteboard: DUPEME
i looked through about 600 results and none of them seemed similar.

maybe the idea was raised on a message board but not posted here?
I don't recall seeing a bug that suggests this any time recently (but I only go back 2 years :).

The idea here may be a good and/or necessary one.  I'm going to leave it open and file it under our cache meta-bug.  Yet another thing to consider when we redesign the entire cache...

Meanwhile we'll hack inelegantly around the issue by amortizing the IO: see bug 651011.
Blocks: http_cache
Actually I guess bug 593198 covers this closely enough to merit a dupe.
Status: UNCONFIRMED → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Summary: Dump the new 10's of thousands of cache files design for a Solid cache. → Avoid creating large numbers of files in HTTP cache
The original one was bug 21578. Similar ones are bug 425730 and bug 484569 (although, using a database would be weird in my opinion). There have been other discussions before (on the newsgroup I think).
bug 21578 is what i was  thinking :), no wonder i never found it, i was searching for terms around solid or preallocated caching.
the sqlite idea is something i discarded based on the problems with maintaining large sqlite files, particularly where vaccuum and reindexing is concerned, a 50MB sqlite file can take 2-5minutes to optimise when its a bit of a mess so it would be even less useful for more than 500MBs.
Whiteboard: DUPEME
You need to log in before you can comment on or make changes to this bug.