Closed Bug 138117 Opened 23 years ago Closed 5 months ago

Completed downloads are not removed from Cache folder

Categories

(Toolkit :: Downloads API, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: pgauriar, Unassigned)

References

()

Details

(Keywords: relnote)

Attachments

(1 file)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:0.9.9+)
Gecko/20020417
BuildID:    2002041703

After completing a download, the cached file is not removed from the cache
folder. The cache folder will continue to grow no matter what the size limit is
set to in the prefs.

Reproducible: Always
Steps to Reproduce:
1. Clean out the cache folder (so that you can observer the difference in step 3)
2. Go to the provided link and choose Save As... or Open With...
3. When the file is finished, check the cache folder.  

Actual Results:  The cached file (of approximately 16 megs) is still in the
cache folder, even though the download was completed.

Expected Results:  The cached file should be removed upon the completion of the
download

This is a major bug since you could conceivably lose a lot of space very quickly
(like me downloading nightlies everyday for a week, plus Chimera builds).

I think that this would best be fixed by removing the cached file if the
download is interupted (via loss of connection, crash of Mozilla, etc) or
completed. If the download is simply paused, you're obviously going to want to
keep the file. 

This was seen on Mac OS X 10.1.3 and 10.1.4 on a Powerbook G3 500mHz with 12gig
hard drive and 512mb RAM.
I have been experiencing the same problem on MacOS 9, so it's just not specific
to Fizilla.
The archives and encoded files are removed by StuffitExpander after expanding, I
suppose.
Check the preferences of StuufitExpander.
This isn't an issue with Stuffit. This is definitely a Mozilla issue. The
completed download should be removed from the cache folder. Stuffit isn't
responsible for that, Mozilla is. 

This really needs to be fixed by Moz 1.0. This is a serious problem in
efficiency. Mozilla goes over my cache limit daily because I download a lot of
files and none of them are removed from the cache folder.
Just to clarify the problem a bit further:

When you download a file, two copies are created: one goes to your designated
download folder (say, the desktop), and has a normal filename like
"archive.sit".  A second copy is saved to the Mozilla cache folder under a
hashed filename such as "0183A748d01".  The problem is that this second copy is
completely ignored by the cache manager and never gets deleted, even with the
"Clear Disk Cache" button.  Obviously, this leads to a rapid and persistent
bloating of the cache folder.

I wonder if this bug should be filed under "Networking: Cache"?
I totally missed that category. I'm moving it. Maybe work will be done on it.
Component: File Handling → Networking: Cache
.
Assignee: law → gordon
QA Contact: sairuh → tever
*** This bug has been confirmed by popular vote. ***
Status: UNCONFIRMED → NEW
Ever confirmed: true
I am also having this problem on Windows XP. Is there a separate bug for 
Windows on this that I missed, or should this be set to OS: All? Was a real 
problem for me downloading Linux ISO files, having my boot disk filled up and 
crashing Windows...
I'm going to go ahead and change this to OS->All, Platform->All based on comment
8.  
OS: MacOS X → All
Hardware: Macintosh → All
*** Bug 142791 has been marked as a duplicate of this bug. ***
Proposed relnote: Downloaded files are never removed from the disc cache. This
can be problematic if downloaded files are very large. Workaround: go to the
profile directory and manually delete them.
With pre-downloading, a download may momentarily require storage for
as many as three copies, depending upon the placement of the user's
Cache directory, the temporary directory in use for the downloads,
and the user-specified target.

Despite its disclaimer of authoritativeness, the glossary accessible
to the end-user via the Help menu defines cache as "A collection of web
page copies..." .  This would seem to obviate putting download data into
the user's Cache directory.

This patch will suppress storage of download data within the user's
Cache directory.  Not scrubbing the metadata from the Cache doesn't seem
to affect things under Linux, while minimizing the change to the codebase.
Keywords: patch
*** Bug 155298 has been marked as a duplicate of this bug. ***
Darin, any comments on the attachment?  If Javascript is involved in the
download process, it's lazy garbage collection may allow cache entry descriptors
to stay in use much longer than necessary.
Priority: -- → P1
Target Milestone: --- → mozilla1.2beta
As I understand it this bug is now only relevant to the 1.0 branch. The trunk
builds since 1.2 don't have this problem (the removing) anymore.
Keywords: mozilla1.0.1
Version: Trunk → 1.0 Branch
Let mark this fixed then.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
If this happens on the 1.0 branch, it should stay open, while the version says
"1.0 branch".

I've cleared the milestone and sent this to Download Manager.
Status: RESOLVED → REOPENED
Component: Networking: Cache → Download Manager
QA Contact: tever → petersen
Resolution: FIXED → ---
Target Milestone: mozilla1.2beta → ---
reassigning to owner of Download component.
Assignee: gordon → blaker
Status: REOPENED → NEW
Nav triage team: removed nomination.  Not relevant to trunk.
Keywords: nsbeta1
(In reply to comment #15)
> As I understand it this bug is now only relevant to the 1.0 branch. The trunk
> builds since 1.2 don't have this problem (the removing) anymore.

To make it clear: With trunk builds downloads are removed if cache limit is
reached, but not directly after the download is completed. So the cache doesn't
grow infinitely but the files are not deleted if the disk cache limit isn't
reached yet. 

I tested it now with Mozilla 1.0.2 and this release shows the same behavior as
the trunk builds, so IMO this bug could be resolved. The deletion is not
directly after download as the reporter wanted in comment 0, but this is IMHO a
minor issue (the real problem was the size limit exceeding, see dupe). A fix for
bug 55307 would solve that. If not, it should be filed a new bug (for trunk) to
do that.
Product: Browser → Seamonkey
*** Bug 289890 has been marked as a duplicate of this bug. ***
*** Bug 270519 has been marked as a duplicate of this bug. ***
I can reproduce this on FF1.5b1 and SeaMonkey 1.0a.

Re: bug 270247 comment 3, using LiveHTTPHeaders confirms that the cache content
is not being validated on subsequent requests to the same URL; no HEAD request
is made, so afiacs (please correct me if I am wrong) this problem does in fact
cause all current releases of Mozilla products to ignore the HTTP standard,
section 9.3 GET:

   The response to a GET request is cacheable if and only if it meets
   the requirements for HTTP caching described in section 13.

13.1.4 Explicit User Agent Warnings

   ... The user agent SHOULD NOT
   default to either non-transparent behavior, or behavior that results
   in abnormally ineffective caching, but MAY be explicitly configured
   to do so by an explicit action of the user.
*** Bug 270247 has been marked as a duplicate of this bug. ***
Aside from the cache limit issue, this bug means that if a file is downloaded,
an updated version of it cannot be downloaded without clearing the entire cache.
 This makes mozilla quite a hassle to use when working with files that can be
frequently modified.  Having to regularly clear the cache to ensure that only
current versions of files are downloaded defeats the usefulness of having a
cache in the first place.
Can someone please provide specific steps to reproduce this problem?  A link to
a file to download, with details about what goes wrong would be appreciated.  I
personally have never experienced this problem.  Firefox appears to follow the
HTTP/1.1 specification's cache rules to the letter (with only some minor 
exceptions).
The problem is when an old file is updated on the remove server, but it is still
in the browser disk cache.

Steps to reproduce:
1. set browser.cache.check_doc_frequency to 3
2. upload a file to a web server, and set the modification time to 1/1/2005
3. fetch the file
4. upload a version of the file
5. fetch the file again
6. restart the browser
7. fetch the file again

Steps 2 & 4 illustrate a use case, rather than the cause; the problem can also
be seen with a static file and LiveHTTPHeaders or a network analyser.
http://ftp.mozilla.org/pub/mozilla.org/mozilla/nightly/latest/mozilla-win32-stub-installer.exe

Expected results:
ftp.mozilla.org doesn't appear to provide an expiry time or other
age-controlling header, so I would expect the cache to validate its copy.

Actual results:
the browser cache returns the stale version.
For my download of mozilla-win32-stub-installer.exe in FF1.5b, the cache entry
has an expiry date nine days after when it was last fetched.

There are three workarounds:
1. set browser.cache.check_doc_frequency to 1, or
2. hold shift and right click on the link, and select Save Link As...
3. the webserver can explicitly require regular revalidation, but this needs to
have been set up before the first download.

The problems on the last three duplicates are identical, and would appear to be
resolved if `Downloads' are removed from the cache.  If this is not the right
home for the problem, should I re-open bug 270247 as an enhancement to deal with
this scenario?
The use case is working as designed.  HTTP/1.1 says that a document served with
a Last-modified header may be cached for a period of time determined
heuristically by the browser.  A value of 1/10th the period between now and when
the file was last modified is recommended.  So, when the server hosts a file
like this, it is saying to all browsers that the file will not change again for
a long time.  Are you suggesting that downloading a file should always bypass
the browser cache?

I recommend marking this bug as invalid or wontfix.
> Are you suggesting that downloading a file should always 
> bypass the browser cache?

No.  I am suggesting that the cache should verify its contents in certain
circumstances where the original headers did not provide explicit expiration
times.  In the scenario when the user has specifically requested a file, and the
browser has classified it as a `download', the browser should not break semantic
transparency; it should request a HEAD to be sure what it is giving the user is
actually a download of the file they requested.  The file could have been
removed, the server could be down; etc.
Once the cache contents are validated, the download should proceed using the cache.

> I recommend marking this bug as invalid or wontfix.

The problem in the original description appears to have been resolved.
IMO, a toggle to disable the cache for files elsewhere saved to disk would be a
useful improvement; for the same reason that bug 81640 is a P1.
Are we forgetting about corrupted downloads here? It does happen, and saving a
corrupted download out of the cache is a very dumb thing.
I can see why it might be nice to force an end-to-end cache validation when
downloading an item, but I'm not sure that I would implement that for every link
click.  Most downloads start from a link click that results in a file that the
browser cannot render.  In those cases, it would be bad to restart the download
because it is hard to know if a download can be restarted without side-effects.
 So, if we only validate explicit downloads (file->save as), then we are not
being consistent.

> In the scenario when the user has specifically requested a file, and the
> browser has classified it as a `download', the browser should not break 
> semantic transparency...

We're not breaking semantic transparency -- at least not according to RFC 2616.
> We're not breaking semantic transparency

The user agent is not fetching the file that is on the server, and its not
informing the user that it is not doing what they requested.  Whenever a browser
doesnt perform exactly like wget (barring bugs of course), its breaking semantic
transparency; which is ok, but it should only do this with good cause (i.e. the
user has specifically requested this e.g. user pref.), and it should keep the
user in the loop.

Irrespective of http caching issues, it is reasonable that files that are listed
in the Download Manager dont need to be also retained in the disk cache.  This
bug only relates to items in the cache that are also in the Download Manager.

A use case that is more pertinant to this bug:
1. Set a cache size of 1000 KiB, and clear the cache
2. Load this page in one tab: http://outreach.jach.hawaii.edu/pressroom/2003-estar/
3. Do a little browsing of images.google.com (small images only) in another tab
until the disk cache is full
4. View the contents of the disk cache: about:cache?device=disk
5. Go back to the first tab, and right-click on estardiagram-large.png (full
size PNG 230kB), and save it to disk
6. Refresh about:cache?device=disk.
7. Do a little browsing of images.google.com (small images only), and keep an
eye on the cache contents.

Expected results:
No change to the existing items in the cache at step 5.

Actual results:
25% of the disk cache has been replaced with an image that has been saved
outside of the cache.  In Step 7, the 230kb image stays in my cache for quite a
while, pushing out lots of useful little files.
(In reply to comment #29)
> it should request a HEAD to be sure what it is giving the user is
> actually a download of the file they requested. 

(surely you mean GET with If-Modified-Since, or with If-None-Match)

>  it is reasonable that files that are listed
> in the Download Manager dont need to be also retained in the disk cache. 

What if I use File|Save Page As? Surely that shouldn't remove files from the
cache. So that statement in its generally is not useful.
> (surely you mean GET with If-Modified-Since, or with If-None-Match)

Yes; HEAD was illustrative.

> What if I use File|Save Page As?

I did not realise that operation populated the Download Manager.  This behaviour
seems broken to me (btw bug 143949 intends to fix this for images), and feels
like it is a artifact from the days when View Source and Save Page As really did
download fresh copies.  Are there scenarios when a Save Page As wont be coming
out of the cache?
Assignee: bross2 → download-manager
QA Contact: chrispetersen
GOOD WORK
Assignee: download → nobody
Component: Download & File Handling → Download Manager
Product: SeaMonkey → Toolkit
QA Contact: download.manager
Version: 1.0 Branch → unspecified
Moving to p3 because no activity for at least 24 weeks.
Priority: P1 → P3
See Also: → 1472482

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: critical → --

The severity field is not set for this bug.
:mak, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(mak)

I tried downloading a 100MiB test file, checking the cache from about:cache before and after the fact, and I don't see the download file being duplicated. Thus I think this was fixed some time ago, maybe when the background file saver was implemented.
If anyone has concerns or better steps to reproduce the problem, I'd suggest filing a new bug, as this one contains quite outdated information.

Status: NEW → RESOLVED
Closed: 22 years ago5 months ago
Flags: needinfo?(mak)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: