Closed Bug 55307 Opened 24 years ago Closed 14 years ago

Downloads are stored in the cache first

Categories

(Core Graveyard :: File Handling, defect, P3)

defect

Tracking

(blocking2.0 final+)

RESOLVED FIXED
mozilla2.0b7
Tracking Status
blocking2.0 --- final+

People

(Reporter: david, Assigned: byronm, NeedInfo)

References

(Blocks 2 open bugs)

Details

(Keywords: topembed-)

Attachments

(1 file, 3 obsolete files)

On Windows NT, the cache directory gets stored inside the user's roaming
profile.  The problem with this is that some sites (such as here at Purdue) put
a very small quota on the roaming profile.  (I think my roaming profile quota is
2 megs.) I know that you can specify a quota for Netscape's cache, but if your
max cache size is set at 1 meg (what mine is set at) and you download an 8 meg
file (like the latest Mozilla nightly), you automatically exceed your quota.
Not only that, but if you want to download something fairly large like, say, the
latest Mandrake .ISO image file (on any platform), you might not have enough
space on the file system containing the cache directory to download it there
first and then move it to its final location.  Not to mention that on Unixes, a
'mv' across file systems involves a copy, which is very inefficient (especially
for large files).

This is related to bug #42606, but not exactly the same thing.
law or mscott, is this yours?
this should be mine....
Assignee: asa → mscott
setting bug to New
Status: UNCONFIRMED → NEW
Ever confirmed: true
The file being downloaded has to be stored *somewhere*.  It seems like every
time someone thinks they've found a good place for partially downloaded files,
someone says it doesn't work on their machine or with their way of handling files.

* Storing it in RAM causes you to run out of memory on large downloads (bug
91795).  
* Storing it in the unix temporary directory uses up limited temp space (bug
69938).  
* Storing it in the destination folder frustrates people who try to start using
the incomplete file before it finishes downloading (bug 103877).
* Storing it in the cache annoys people keep their cache on a remote machine 
(bug 89903, bug 74085, this bug).

(And then there's the side problem of "where do you put it before the user
chooses a destination?"  That's bug 55690, which contains one of bugzilla's
longer flamewars.)

Of the problems with various locations, it seems to me that the problem with the
cache is the most tractable: few users have remote profiles, and those who
choose to use a remote cache already take performance hits during noraml
browsing.  Users with remote profiles should be asked to specify a local cache
directory, but not at gunpoint (using a remote cache should be a visible option).

By the way, can someone explain how it is that all of the bugs I listed above
are open at the same time?  Does Mozilla really sometimes store partial
downloads in cache, sometimes in RAM, sometimes in /tmp, and sometimes in the
destination folder?  If so, when does it use each location?

Can someone create a metabug called "location of partially downloaded files" to
track these issues?  (I don't think I can, but I haven't tried.)
> Users with remote profiles should be asked to specify a local cache
> directory, but not at gunpoint (using a remote cache should be a visible 
> option).

You could generalize this idea to:

1) The default behavior is the current behavior (except on the Mac which has a
default download directory as a system-wide preference--see #2).

2) There is a user-specifiable default download directory that overrides this
behavior.

This would make the Mac users happy becuase Moz would respect their systemwide
preference.  It would make NT admins happy because they could specify a
different default location for partial downloads.  It would make me happy
because (although I don't work for Purdue any more) it handles the NT case, plus
on my Linux boxen I can (once and for all) point Moz at my download directory
(which is on /home, not on /tmp because I don't want to overwrite it if I
reinstall the O/S).

Last but not least, it would make the people who like the current behavior happy.

-I would also like an option to skip the filename chooser altogether and just
use the remote filename unless it would overwrite an existing local file.  Call
the feature one-click downloading and patent it. :-)  Maybe this should be a
separate "bug" though...
->file handling, but retriage if needed.
Component: Browser-General → File Handling
QA Contact: doronr → sairuh
The file is not only saved to the users cache but also to the /tmp directory on
linux. It's now impossible for me to download a file of 512MB because the copy
routine simply fails
How about doing what rsync does? Save the file as a dot-file version of itself,
for instance, if you are downloading:

blahblahblah.iso

save the file as

.blahblahblah.iso

in the location the user chooses. If dotfiles aren't obscure enough you can name
the file something weirder like:

z89582.blahblahblah.iso

Then when it's done, rename to the correct name and *kaching!* you're all set!

In addition to solving the above problems, this allows for a mechanism of easily
resuming failed downloads. There could also be a pref for "Delete partial
downloads" that would default to on so that inexperienced users won't have
incomplete files littering their disk.

Just a thought...

-J
Blocks: 129923
If the limit is on the users cache, the error message still goes for
/tmp/xxxx.yy and the file is also copied (even being incomplete) to the user
chosen destination (Re: Bug 145661.) leaving the user with a 'corrupted' file.
*** Bug 172301 has been marked as a duplicate of this bug. ***
QA Contact: sairuh → petersen
*** Bug 185738 has been marked as a duplicate of this bug. ***
*** Bug 186019 has been marked as a duplicate of this bug. ***
I agree with  John Flynn  (above)
save the download file as ~moz-inc~<filename> or something like that.  This
would allow the file to be downloaded to *one* location not two.  And this would
allow resume/restart of an incomplete download.
*** Bug 198672 has been marked as a duplicate of this bug. ***
*** Bug 195179 has been marked as a duplicate of this bug. ***
*** Bug 132280 has been marked as a duplicate of this bug. ***
darin, is there a way to tell the cache not to store the file if we're
downloading it to disk anyway?
+topembed, nsbeta1

Several versions of this bug have been running around for a while, Niels in #4
really summarize the problem well.

We really need to strictly define the download behavior as not using extra disk
or memeory cache resources. There is no point in trying to reduce our footprint
if we grow everytime we download something.

Cache is disk space that prevents recurring use of the network. I don't think
that downloads, even partial downloads, fit this definition. Instead, a download
directory should be defined (or an OS default should be accepted).
Keywords: nsbeta1, topembed
topembed-
Keywords: topembedtopembed-
Blocks: 195179
based on receent comments, this might only happen w/ http and not ftp?
I've tried to download
ftp.mozilla.org/pub/mozilla/nightly/latest/mozilla-source.tar.gz (see the
duplicate bug 198672):

1) http://ftp.mozilla.org/pub/mozilla/nightly/latest/mozilla-source.tar.gz
creates a temporary copy in the system's temp dir *and* in the Mozilla cache.
2) ftp://ftp.mozilla.org/pub/mozilla/nightly/latest/mozilla-source.tar.gz only
creates a temporary copy in the system's temp dir.
If it just created a temporary file in the same directory that it is being
stored, then a simple rename of the file would be needed so even copying/moving
the final file is not needed speeding up a finished download too.

I have my harddrive split up for C and D. D is extremely large while C is for
system files only. As a result, downloading anything larger than about 300MB
(ISO for instance or large projects that I work on) result in the system temp
directory being used, which is on C. Anytime I want to do a large download, I
end up modifying my temporary directories to be on my D drive, although there is
no need for this except to make mozilla put the temporary files in a place that
has room. 

If the file I was downloading was saved to a temp file in the same directory
that I'm downloading to, then upon completion, it would finish, and I'd know it
would downoad without failing. Of course the ability to say if incomplete
download files should stick around would be useful. Maybe even add a checkbox to
the download window so it can be off by default and large downloads can have it
enabled as it downloads.
Flags: blocking1.4?
Flags: blocking1.4? → blocking1.4-
adt: nsbeta1-
Keywords: nsbeta1nsbeta1-
Blocks: 140818
*** Bug 217834 has been marked as a duplicate of this bug. ***
Darin, is it possible to tell an in-process HTTP channel (one you are getting
data from) to stop writing to cache?  The basic issue here is that the cache
access is set up in the guts of necko and the download code has no idea that
it's even happening....

I think we do want to save the file to cache if it's being opened in a helper
app, btw; just not if it's being saved to disk.
*** Bug 216856 has been marked as a duplicate of this bug. ***
Actually it is quite easy to fix this bug and the related bugs. Just stop using
the predownload and save the file directly to the target. But do not set the
filetype.
If it is a .zip-file just call it .tmp until it is finished.
On macosx just give it a custom icon that looks unfinished.
>Just stop using the predownload and save the file directly to the target.

that helps this bug how exactly?
I just assumed that since it has worked this way before, at least in the
mac version it would be easy to revert to that code and add support for
changing the filetype.
That would no be too diffucult right? Correct me if im wrong.

I really think this would solve a few other bugs like styated in
comment nr 4

I thought it would be a great compromise between what everybody wants.
true, but this bug is about downloads stored in the _cache_, not about downloads
stored in the temp folder
> Correct me if im wrong.

You are wrong.  On two counts:

1)  It would be very hard
2)  It has nothing to do with this bug.
>You are wrong.....

This is a reason to make a metabug, as someone before asked. It certainly would
help me (my bug is set duplicate). My question: Why does mozilla saves a large
file on two places: the saved file AND the tempdir? the solution would be obvious: 

>Just stop using the predownload and save the file directly to the target.

So what is wrong about the solution? It isn't only the solution to my problem,
also to some other problems stated in the 4th post. 
So maby you are wrong? ;-)
> file on two places: the saved file AND the tempdir? 

that is NOT THE TOPIC OF THIS BUG, as bz and I have been trying to explain
more SPAM: You mean bug 69938
Right.  Comment 27, comment 32, etc are about bug 69938.  Which is NOT THE SAME
as this bug.
comment 8 from John Flynn is the way mozilla should go imho.

comment 4 from Niels Aufbau - downloads can be stored in memory, temporarily,
until reached some limit; then the download should be suspended, or setting
could be done about that (tmp? cache? current directory?)

download preferences even have just one setting for now ;)

just MHO.
Would people stop making irrelevant comments, please?  Comment 8 has nothing to
do with this bug, really.  Nor does any comment, except comment 0 (the problem
description) and comment 25 (the solution description).  All the other comments
are about bug 69938, not this bug.

I would like to commend you all on making sure that all the people who would
consider fixing this bug are now carefully blocking out all bugmail from it.
OS: Windows NT → All
Hardware: PC → All
I have verified this behavior on both Redhat 9.0 with Mozilla 1.2.1 and Windows
2003 with Mozilla 1.6b.  I simply cannot complete a specific large download
becaues if this design flaw.

All I have to say is, "I would have expected this sort of design with IE, not
Mozilla."
*** Bug 214738 has been marked as a duplicate of this bug. ***
I have a few ideas:

Perhaps we could have a "download cache" folder which we could specify in the
settings. This would get around people's quota problem, keep the incomplete
files out of the way until the download is finished, and (once cross session
resuming is implimented: http://bugzilla.mozilla.org/show_bug.cgi?id=230870 )
prevent your downloads from being deleted when the browser crashes. It's worth
noting that this is how most download managers and P2P programs handle this
situation.

Or, as someone suggested, save the temporary file to the destination, only with
a modified name. I'm not sure how you identify files of different types in
Linux, but under Windows, the temporary download could be appended a mozilla
extension which represented an incomplete download, as well as an icon
representing that.

(filename).(extension).(Mozilla temporary download extension)

setup.exe.moztemp

when complete, could simply be renamed correctly... in this case, setup.exe
I'd vote for (configurable) storing the download to memory cache until user
chooses the destination directory. configurable temp directory (or just using
cache) for downloads is OK, but user should be able to turn it off.
Simple question, does the download need to be in the cache at all? No.

Ftp don't use the cache when downloading, but http does. Ergo, the
http-downloader must be fixed.
it's nice that that's what you vote for, but does that answer comment 25?

this bug needs an answer to:
"is it possible to tell an in-process HTTP channel (one you are getting
data from) to stop writing to cache?"

to be fixed.

>On Windows NT, the cache directory gets stored inside the user's roaming
>profile.  The problem with this is that some sites (such as here at Purdue) put
>a very small quota on the roaming profile.  (I think my roaming profile quota
>is 2 megs.) I know that you can specify a quota for Netscape's cache, but if 
>your max cache size is set at 1 meg (what mine is set at) and you download an 8 
>meg file (like the latest Mozilla nightly), you automatically exceed your 
>quota.  Not only that, but if you want to download something fairly large like,
>say, the latest Mandrake .ISO image file (on any platform), you might not have
>enough space on the file system containing the cache directory to download it
>there first and then move it to its final location.  

these problems can be solved by moving the changing the location of mozilla's
cache.  there is a preference that you can set to alter the location.  it can be
modified in the Advanced->Cache preferences.


>Not to mention that on Unixes, a 'mv' across file systems involves a copy,
>which is very inefficient (especially for large files).

this is true of all operating systems.

the cache is used whenever we download content over HTTP.  that is just the
default behavior.  it makes perfect sense for small files.  when files are large
however you want to stop using the cache.  this is what we do today.  if the
file exceeds a limit set by the cache itself, then the cache refuses to accept
anymore data for that entry.  the channel writing data to the cache will
silently ignore this error condition, and continue streaming data to its
listener.  this is not a perfect solution though...

>this bug needs an answer to:
>"is it possible to tell an in-process HTTP channel (one you are getting
>data from) to stop writing to cache?"

that's not possible today.

i think the original bug report has been mostly resolved.  however, there
remains a gotcha.  nsDownloader, for example, will fail if the cache entry being
created gets too large.  i'm not sure if that effects normal downloads (i think
maybe it doesn't).
In bug 229984 I detected the following:
Downloading a very large file (about 700MB) saves the file BOTH in the cache
directory, and in the specified location: Even if 'save link target as' is used.

This is for nobody nice.

To me, the solution would as follows:
Start downloading it, into the cache, until the user has specified the download
location. Currently Mozilla then starts to write to both locations. At least for
items growing larger than the cache limit (or near to it), then pro-actively
remove it from the cache (because it is being saved elsewhere). 
The key problem now is to tell the HTTP channel to stop writing to the cache, as
soon as the file is being saved elsewhere and the size reaches the cache limits.

Current effect during the was that near the end of the download I had twice an
almost 700MB knoppix image on my simple laptop stored somewhere. Also Windows
went crazy in writing the data to the disk. Actual download speed was about
1.4MB/s, it could have been faster if not for the double writes...
>Downloading a very large file (about 700MB) saves the file BOTH in the cache
>directory, and in the specified location: Even if 'save link target as' is used.

yes, of course.

>Start downloading it, into the cache, until the user has specified the download
>location.

ew, that requires that the cache is enabled... and that the item is not removed
from the cache before the user chooses a target... sounds like a fragile solution

>The key problem now is to tell the HTTP channel to stop writing to the cache, as
>soon as the file is being saved elsewhere and the size reaches the cache limits.

I'd rather tell the channel to stop writing to the cache once it's determined
that mozilla can't handle the content. that said, what about comment 44, so the
claim that this is mostly resolved is not true?
The idea with Mozilla content looks smart to me. The question is: does this mean
html and images only? This should not include any other file types, even handled
with Mozilla plugins. If user didn't open the file to be shown by Mozilla and
asks to save it (_not_to_open_) the file shouldn't be cached by Mozilla.
What is this yadayada anyway? Can't you guys just make the downloads go directly
to directory which it's supposed to be downloaded? Mine downloads has many times
canceled cause space in my C: is not enough but in the drive where it supposed
to be download is enough. As long as this bug exists i use wget :(
1.7 beta (just as alpha too) saves partial files in the asked directory with
"part" extension if the link is saved as : Save Link Target As...

This is correct behaviour for me...
Comment 49 has nothing to do with this, that's bug 69938: 
http://bugzilla.mozilla.org/show_bug.cgi?id=69938

Is someone working on this bug (it's still have status NEW)?

FireFox uses the same code, or?

+cc caillon - related to other cache bugs we've discussed.
This works for me. The file is saved in a temp name and then renamed as far as I
can see. It can be closed for all I care :-)))
no, it can't, because this bug is about the download being also stored
_in_the_cache_, which was not addressed with the mentioned change.
Sorry my cache was set to 1MB and the kernel is a tad larger than that. Hence it
was not stored there
I was going to add something along this bugs line as a request for enhancement.
 It would be nice to be able to specify at different download cache location
from the "standard" browser cache.  I use a ramdisk and like to point my Mozilla
cache to it, but if I download a file larger than the ramdisk size, even though
I gave it a non-ramdisk location (e.g. c:\downloads), the ramdisk fills up with
the temporary download of the file and then the download fails.  Perhaps a
"browser cache" and "download cache" option would help solve my problem or
perhaps the download just needs to be handled in a new way.
Assignee: mscott → nobody
Still seems to be a problem in SeaMonkey 1.1.16
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.21) Gecko/20090511 SeaMonkey/1.1.16

Eg PDF files served by IEEE frequently exceed 1M (my initial cache size)

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4336121&isnumber=4336114

With a cache size of 1MB this fails with a number of unhelpful error messages.
No indication is given that the problem lies in the cache size.
ps: Apart from using the Edit|Preferences|Advanced|cache SeaMonkey menu
wget can down load huge files without caching to work around this problem
QA Contact: chrispetersen → file-handling
When downloading files, the file is first stored in %TEMP% and then moved to the destination path, as soon as that is known.
It is also stored in the cache, as long as the file is smaller than half the cache (so, for the default cache of 50MB, files < 25MB are also stored in the cache). This can be seen as ok, as another download (or use in a page, e.g. with images), the data is still available in the cache, or as not ok, when the file is only saved to disk (usually happens with downloading large software files, such as a Linux distro). 
So, it is not stored in cache first, but *also* in cache (as long as the file is small enough).
Blocks: http_cache
We should simply skip storing downloads in the cache.

Regarding which flag to set for this:  see bug 588507.  For downloads we might want INHIBIT_CACHING, since that allows getting the content from cache, which will happen for instance if a user wants to download an HTML file that they've already browsed.
Assignee: nobody → byronm
(In reply to comment #25)
> is it possible to tell an in-process HTTP channel (one you are getting
> data from) to stop writing to cache?  The basic issue here is that the cache
> access is set up in the guts of necko and the download code has no idea that
> it's even happening....
 
I have implemented this functionality as part of my patch for bug 588507. I should be able to leverage this to prevent downloads from being stored to the cache. I just need to know how to identify if the channel has been opened for a download... its been suggested that Content-Disposition might be what I should check. Is that the best indicator, or should I check something else?
> I just need to know how to identify if the channel has been opened 
> for a download... its been suggested that Content-Disposition might 
> be what I should check. Is that the best indicator?

bz/biesi: either of you know?

I believe that if Content-Dispo = attachment, that's a pretty sure bet that we're doing a download.  But this won't capture things like a user right-clicking on an image, etc., and asking to download it.  

If there's only a way to detect a subset of downloads and avoid storing them in the cache, I'm fine with that--the change we're making to avoid the cache if content-length is larger than 5 MB will avoid the worst problems with downloads anyway.
In general, you don't know whether the thing is being downloaded until after you fire OnStartRequest.

If content-disposition:attachment and we're in a toplevel browsing context, that will be a download.  Otherwise it won't be.

Also, there have been requests to implement "view in browser" for content-disposition:attachment.

Not caching content-disposition:attachment seems like an OK heuristic to me, though.
OK, then let's just not cache "attachment" and then consider this bug done.
Attached patch no_downloads_in_cache (obsolete) — Splinter Review
This patch prevents downloads from being stored in the cache, to the best that we can detect them. It turned out that many, many sites' download links do not set Content-Disposition in their response headers. After chatting with bz on irc, we decided that having the external handler inform the channel that it is open to service a download was the better bet.

I have tested that this does indeed ban downloads from the cache. As soon as I get green back from try, I will flag it for review.

Note: I do not have much experience modifying interfaces yet. I tried to follow convention/the docs carefully, but if anything is wrong, please let me know.
Attached patch no_downloads_in_cache (obsolete) — Splinter Review
Updated version prevents potential segfault when dereferencing mCacheEntry.
Attachment #472979 - Attachment is obsolete: true
You need to rev the interface id, right?
Attached patch no_downloads_in_cache (obsolete) — Splinter Review
Updates UUID of modified nsIHttpChannelInternal interface.
Attachment #472987 - Attachment is obsolete: true
Comment on attachment 473127 [details] [diff] [review]
no_downloads_in_cache

This is green on try, so I think it is ready for review.
Attachment #473127 - Flags: review?(jduell.mcbugs)
Modified comment to fit on one line.
Attachment #473127 - Attachment is obsolete: true
Attachment #473416 - Flags: review?(jduell.mcbugs)
Attachment #473127 - Flags: review?(jduell.mcbugs)
Attachment #473416 - Flags: review?(jduell.mcbugs) → review+
Not as huge a win as our other caching improvements, but simple patch and ready to land.  FF 4?
blocking2.0: --- → ?
Attachment #473416 - Flags: approval2.0+
The patch has been pushed:
http://hg.mozilla.org/mozilla-central/rev/6ccd956d1df9

This is fixing a 10 year old bug! :)
Status: NEW → RESOLVED
Closed: 14 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b6
blocking2.0: ? → final+
Depends on: 935595
Product: Core → Core Graveyard
OP: Does this "WFM" for you?
Flags: needinfo?(david)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: