Closed Bug 252179 Opened 20 years ago Closed 8 years ago

include tag file in cache directory so backup/archival software can easily detect & exclude it - trivial feature, patch included

Categories

(Core :: Networking: Cache, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: baford, Unassigned)

Details

(Keywords: helpwanted)

Attachments

(1 file, 2 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.6) Gecko/20040114
Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.6) Gecko/20040114

I'm including in this RFE a quite trivial but IMHO quite useful patch that
simply ensures that Mozilla's cache directory always contains a certain small,
fixed-content file with a well-known name (".IsCacheDirectory"), and which
contains a well-known header.  This file serves as an application-independent
"tag" to indicate to other software that the directory contains regenerable
cache data and not anything "precious."  The immediate purpose is to allow
backup and archival software to easily recognize such cache directories and, if
the user chooses to, automatically exclude them from backups, rsyncs, or
whatever.  Many apps maintain caches of some kind, and adopting such a
convention, though trivial to implement, could greatly alleviate unnecessary
backup storage and network bandwidth wastage caused simply by the constant churn
of application caches.  This feature is especially beneficial for Mozilla to
have, since Mozilla's caches in particular tends to be substantial in size,
frequently changing, and located several levels deep under directories with
varying and somewhat obscure names. :)  For more details on my proposal please see:

http://www.brynosaurus.com/cachedir/

Thank you for your consideration!
Bryan

Reproducible: Always
Steps to Reproduce:
1.
2.
3.
I have no objection to implementing some sort of standard specification for this
sort of thing.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: helpwanted
Target Milestone: --- → Future
Have any other applications implemented this specification?  Has it been peer
reviewed?
(In reply to comment #3)
> Have any other applications implemented this specification?  Has it been peer
> reviewed?

No, so far it's just a preliminary proposal, and part of my purpose in
submitting this patch is to solicit discussion and peer review.  I apologize if
I didn't make that sufficiently clear in my initial posting.  I'm currently in
the process of contacting the maintainers of other relevant applications (both
apps that maintain cache directories and apps that could benefit from cache
directory tags) for additional feedback and discussion.  Certainly there is a
lot of room for enhancing and developing such a standard beyond the trivial
tagging convention that my current proposal describes.  Do you know of any
existing standards-oriented mailing list on which discussion of this idea would
be appropriate, or should I just set up a new list?

Thanks,
Bryan
".IsCacheDirectory" - wouldn't be file name respectiong "8.3" convention better
for Windows systems? Something like "IsCache.dir"?
(In reply to comment #5)
> ".IsCacheDirectory" - wouldn't be file name respectiong "8.3" convention better
> for Windows systems? Something like "IsCache.dir"?

I considered that issue (and it's now mentioned in my latest draft proposal) -
but (a) only very ancient DOS/Windows systems are restricted to the 8.3
limitation (i.e., pre-Win95), and (b) it seems that shortening the name and
removing the leading period would make the name much more likely to conflict
accidentally with random existing files on random systems.  Although the
required signature header is there to make such accidental collisions less
likely to result in precious data not being backed up (and thus eventual data
loss), I'm not entirely confident that the signature will always be checked.

So from my point of view it's basically paranoia about general robustness versus
support for ancient systems that are unlikely to have many applications that
would could benefit from the convention anyway.  (Does Mozilla run on Win 3.1,
for example?  Even if it does, the supplied patch shouldn't make it stop working
- it just won't successfully write the tag file.)

I have gotten this concern from one other person, though, so let me know if you
or other Mozilla folks feel strongly about it one way or the other.  Also, I'll
mention it on the xdg-list and see what they think.

Thanks,
Bryan
Hi Adam, 
(In reply to comment #5) 
> ".IsCacheDirectory" - wouldn't be file name respectiong "8.3" convention 
better 
> for Windows systems? Something like "IsCache.dir"? 
 
Since I've gotten this concern from two other people as well in the past 
couple weeks (and no one came to the defense of the old, long filename :)), 
I've renamed the cache directory tag file to "CACHEDIR.TAG" in the latest spec 
(version 0.4).   I left the header signature as-is, though (it's still the MD5 
of ".IsCacheDirectory").  Will this satisfactorily address your concern? 
 
Since I haven't gotten any other serious technical concerns in the past few 
weeks here or elsewhere that haven't been addressed, I believe the proposal is 
reasonably stable now and is probably ready to be used.  Would you like me to 
upload a new patch?  (It would just fix the tag filename and add a comment 
within the tag pointing back to the specification, as recommended in recent 
versions.)  Of course, I'll keep a link to the Mozilla project on the spec's 
home page and keep you updated about any future changes the spec may undergo. 
 
Thanks! 
Bryan 
I went ahead and updated the patch.  It uses the new tag filename CACHEDIR.TAG,
and includes a comment in the tag file pointing back to the spec.  The patch
applies to the latest Mozilla CVS source, as of today.

Cheers,
Bryan
Attachment #153700 - Attachment is obsolete: true
(In reply to comment #7)

> I've renamed the cache directory tag file to "CACHEDIR.TAG" in the latest
> ... 
> of ".IsCacheDirectory").  Will this satisfactorily address your concern? 

Fully, thanx!
Comment on attachment 155677 [details] [diff] [review]
Updated patch for cache dir tag spec v0.4 - shorter 8.3-compatible tag filename

>Index: netwerk/cache/src/nsDiskCacheDevice.cpp

>+    (void) TagCacheDirectory();

why not just give the function a |void| return type?


>+    nsCOMPtr<nsIFile> file;
>+    rv = mCacheDirectory->Clone(getter_AddRefs(file));
>+    nsCOMPtr<nsILocalFile> localFile(do_QueryInterface(file, &rv));

this tramples over the |rv| returned from |Clone|, so either check
|rv| before calling do_QueryInterface or don't bother storing it ;-)


>+    rv = localFile->OpenNSPRFileDesc(PR_RDWR | PR_CREATE_FILE, 00666, &tagFD);

wouldn't 00664 make more sense?  do you really mean to allow anyone
to write to this file?


>+    static const char *tagHdr =
>+	"Signature: 8a477f597d28d172789f06886806bc55\n"
>+	"# This file is a cache directory tag created by Mozilla.\n"
>+	"# For information about cache directory tags, see:\n"
>+	"#\thttp://www.brynosaurus.com/cachedir/\n";
>+    PRInt32 tagHdrLen = strlen(tagHdr);

nit:  PRInt32 tagHdrLen = sizeof(tagHdr)-1;


this function would be significantly simplified if you didn't have to return
an error code when failures are encountered.


fwiw, i too prefer the 8.3 compatible name.  or maybe i'm just not
a big fan of mixed case filenames ;-)
Attachment #155677 - Flags: review-
Hi!  Here's a new patch that addresses the code review comments, again compiled
and tested against the latest Mozilla CVS as of today.	Thanks!

Bryan
Attachment #155677 - Attachment is obsolete: true
Comment on attachment 156270 [details] [diff] [review]
Updated patch wrt review comments

much better, thanks

r=darin
Attachment #156270 - Flags: review+
-> default owner
Assignee: darin → nobody
Target Milestone: Future → ---
This bug has not seen any activity in 5 years, despite having a trivial patch (which by this point probably doesn't even apply anymore).  "ping" doesn't quite cover it. :)
Ah, another worthy necko bug emerging from the vaults :)

This is still a good idea (and more relevant now that the Cache can be up to 1 GB not 50 MB).  I suspect that a lot of backup software has simply learned where we put our Cache files by convention, but I agree it'd be nice to have a standard, and I'm willing to have Mozilla help push it forward.  I don't see much activity on Bryan's web page for the project, though.   Bryan, still there?  Still interested?  The most obvious pieces of software that would be good to get on board are the other browsers, and projects like rsync (and/or Apple Time Machine, and other vendors of backup software)
I would think we would get better results by putting the cache files somewhere where all programs already exclude from backups (e.g. %USERPROFILE%\appdata\Local\Temp\Mozilla\Firefox\Profiles instead of %USERPROFILE%\appdata\Local\Mozilla\Firefox\Profiles on Windows, Context.getCacheDir() on Android). That may also allow Android's temp file garbage collection to work.
(In reply to Brian Smith (:bsmith) from comment #16)
> I would think we would get better results by putting the cache files
> somewhere where all programs already exclude from backups (e.g.
> %USERPROFILE%\appdata\Local\Temp\Mozilla\Firefox\Profiles instead of
> %USERPROFILE%\appdata\Local\Mozilla\Firefox\Profiles on Windows,
> Context.getCacheDir() on Android). That may also allow Android's temp file
> garbage collection to work.

I'd certainly love to see the profile move to $XDG_CACHE_HOME (~/.cache by default), but that seems orthogonal to the use of CACHEDIR.TAG, which various backup software handles by default.  tar has an --exclude-caches option, as does other software.
(In reply to Jason Duell (:jduell) from comment #15)
> Ah, another worthy necko bug emerging from the vaults :)
> 
> This is still a good idea (and more relevant now that the Cache can be up to
> 1 GB not 50 MB).  I suspect that a lot of backup software has simply learned
> where we put our Cache files by convention, but I agree it'd be nice to have
> a standard, and I'm willing to have Mozilla help push it forward.  I don't
> see much activity on Bryan's web page for the project, though.   Bryan,
> still there?  Still interested?  The most obvious pieces of software that
> would be good to get on board are the other browsers, and projects like
> rsync (and/or Apple Time Machine, and other vendors of backup software)

I don't know about activity on the standard (which as far as I can tell seems inactive mostly due to not needing any new changes).  However, I've seen new pieces of backup software adding support for the standard, and new programs using the standard for their cache directories.  The last major release of ccache included CACHEDIR.TAG in ~/.ccache, for instance.  I'd certainly love to see this change made for the benefit of my own backups.
(In reply to Josh Triplett from comment #17)
> (In reply to Brian Smith (:bsmith) from comment #16)
> > I would think we would get better results by putting the cache files
> > somewhere where all programs already exclude from backups (e.g.
> > %USERPROFILE%\appdata\Local\Temp\Mozilla\Firefox\Profiles instead of
> > %USERPROFILE%\appdata\Local\Mozilla\Firefox\Profiles on Windows,
> > Context.getCacheDir() on Android). That may also allow Android's temp file
> > garbage collection to work.
> 
> I'd certainly love to see the profile move to $XDG_CACHE_HOME (~/.cache by
> default)

Er, s/profile/cache/
I can only find references to a few pieces of Linux/Unix software that does this. If we moved the cache to $XDG_CACHE_HOME (bug 239254 and/or bug 259356) and/or var/cache and analogous locations on other operating systems, I think we would get the desired results in a more widespread fashion.

Regarding the patch, we definitely shouldn't be re-writing the CACHDIR.TAG file every time we open the cache because that delays opening the cache and causes unnecessary writes to the disk (flash memory). It seems like it would make more sense in InitializeCacheDirectory(), if we are going to support this at all.
(In reply to Brian Smith (:bsmith) from comment #20)
> I can only find references to a few pieces of Linux/Unix software that does
> this.

tar supports it, and several backup tools support it.  And quite a few tools tag their cache directories with this file.  Seems sufficiently widespread to make it worth including, considering that it requires very little work to support.

> If we moved the cache to $XDG_CACHE_HOME (bug 239254 and/or bug
> 259356) and/or var/cache and analogous locations on other operating systems,
> I think we would get the desired results in a more widespread fashion.

Certainly not disputing that, at least for Linux and UNIX platforms.  (/var/cache doesn't make sense, but $XDG_CACHE_HOME does, and I'd love to see the patch in bug 239254 get applied.)

> Regarding the patch, we definitely shouldn't be re-writing the CACHDIR.TAG
> file every time we open the cache because that delays opening the cache and
> causes unnecessary writes to the disk (flash memory). It seems like it would
> make more sense in InitializeCacheDirectory(), if we are going to support
> this at all.

I do agree that it doesn't make sense to rewrite it every time.  However, the browser should always create it if it doesn't exist, to handle the common case of existing caches without the tag file.  When does InitializeCacheDirectory run?
FWIW, it seems that Obnam supports this type of cache directory flagging: http://code.liw.fi/obnam/yarns.html#exclude-cache-directories
I think we're better off trying to cooperate with the OS (which we largely dont do afaik) than doing the cache flagging.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
As far as I can tell, these days Firefox does actually use XDG_CACHE_HOME (default ~/.cache), which does make this bug obsolete.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: