99817 - URL-based cache naming scheme

Reporter

Description

•

23 years ago

Since I have more disk space than I can actually use, I decided a couple weeks
ago to set my disk cache size to ... 650MB.  The idea is to save all pages that
I'm loading and burn a CD when the cache quota is exceeded.  In the past years
I've often had that problem: I remember seeing an information in a web page, but
I don't know where, I don't know how to find it again, and more importantly,
when I find it in Google, the page is not here anymore!  This is the motivation
for saving all pages I see and archive them someday.  Two problems appear when I
want to do that:  first my cache size seems to always stay under 25MB, not sure
why or if it's a local problem; second it is really hard to browse though the
archived data, and when the cache reaches 650MB, I'll have to recursively grep
all files, which is awful.  My idea was then that mozilla could (if I'm not the
only one that needs that feature) let the user choose between the current naming
scheme and a more human-readable one.  For instance I have a couple GB of
mirrors on my machine (laptop :-) ) and it is so easy to find information when I
do a `wget -m <some url>'!  So I thought I should suggest this scheme for the
mozilla cache management:
~user/.mozilla/default/********.slt/Cache/www.mozilla.org/start/, etc... I'm not
that much into development so I have no idea whether this would be really much
slower than the current naming method.  But for sure I'd rather have a slightly
slower caching algorithm and have the archiving capability!  Any idea?

Damian Yerrick

Comment 1

•

23 years ago

Well-written. I couldn't find any dupes.

This could be a big plus for evangelism.  I have several friends who would switch
from IE to Mozilla if it gained the ability to save everything you've ever seen on
the web.

Ariel Gonzalez

Comment 2

•

23 years ago

I think this would be a very good feature for Mozilla. It's just plain
convenient if you want to mirror sites somewhere. Maybe it could be even easier
code-wise, since if you save the filenames and change the modified date to the
one reported by the server, you could do away with having an "index" file
altogether. Offline browsing would also benefit.

Status: UNCONFIRMED → NEW

Ever confirmed: true

R.K.Aa.

Comment 3

•

23 years ago

This sounds like the potential for some horrible bloat, IMHO.
The cache code was already rewritten once.
For browsing purposes, cache speed is the ONE main concern.
For saving purposes, there are already suitable RFE's like for instance bug
11632 and bug 40873.

Recommend wontfix.

Damian Yerrick

Comment 4

•

23 years ago

However, bug 11632 (closest to what was suggested here) does not address the
issue of automating the process for every page viewed.

I would suggest keeping the primary browser cache how it currently is, but adding
a preference to do this:  Every time a file is removed from the cache, move the
file to the user's web mirror folder instead of just deleting it.

Ariel Gonzalez

Comment 5

•

23 years ago

Or if changing the cache to use this type of naming scheme, a built in function
to "decode" all the cache files into a "domain" based system. I know that there
are shell scripts to do this already, but having one built in would be great for
the overall userbase

Bradley Baetz (:bbaetz)

Comment 6

•

23 years ago

Some disk files may have more than one web file in it... (If not now, then ISTR
that that will be the case soonish)

As well, the cache has no concept of urls - it uses a "cache key", which is
usually the url, but not always (eg forms POSTed to the same url). You could
iterate through those, and you'd be right most of the time, I guess.

Not everything is stored in the cache - nocache data, and so on.

gordon

Assignee

Comment 7

•

23 years ago

Moving target milestone to FUTURE until we have a chance to triage this bug further.

Target Milestone: --- → Future

David A. Cobb

Comment 8

•

23 years ago

Excellent idea -- in fact I also thought of the same thing and I manually [UGH]
implement it for ftp'd files.  I think <mailto:d_yerrick@hotmail.com> has
identified how to make it pallatable to folks who imagine the cache is somehow
"secure."  Let the user elect to save a set of pages using this
name-replication, like certain other products have a "make available offline"
function.

gordon

Assignee

Updated

•

22 years ago

Priority: -- → P5

timeless

Updated

•

22 years ago

Summary: [RFE] URL-based cache naming scheme → URL-based cache naming scheme

gordon

Assignee

Updated

•

22 years ago

Status: NEW → ASSIGNED

gordon

Assignee

Comment 9

•

21 years ago

This feature is not useful for most users and could incur a severe performance
penalty.  If such a feature WAS attempted, it would be better implemented at a
semantic layer above the cache, perhaps http.

Marking WONTFIX.

Status: ASSIGNED → RESOLVED

Closed: 21 years ago

Resolution: --- → WONTFIX

Bugzilla

Quick Search

URL-based cache naming scheme

Categories

(Core :: Networking: Cache, enhancement, P5)

Tracking

()

People

(Reporter: johnny.accot, Assigned: gordon)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Updated

Comment 9