Closed Bug 55307 Opened 24 years ago Closed 14 years ago

Downloads are stored in the cache first

Categories

(Core Graveyard :: File Handling, defect, P3)

defect

Tracking

(blocking2.0 final+)

RESOLVED FIXED
mozilla2.0b7
Tracking Status
blocking2.0 --- final+

People

(Reporter: david, Assigned: byronm, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: topembed-)

Attachments

(1 file, 3 obsolete files)

On Windows NT, the cache directory gets stored inside the user's roaming profile. The problem with this is that some sites (such as here at Purdue) put a very small quota on the roaming profile. (I think my roaming profile quota is 2 megs.) I know that you can specify a quota for Netscape's cache, but if your max cache size is set at 1 meg (what mine is set at) and you download an 8 meg file (like the latest Mozilla nightly), you automatically exceed your quota. Not only that, but if you want to download something fairly large like, say, the latest Mandrake .ISO image file (on any platform), you might not have enough space on the file system containing the cache directory to download it there first and then move it to its final location. Not to mention that on Unixes, a 'mv' across file systems involves a copy, which is very inefficient (especially for large files). This is related to bug #42606, but not exactly the same thing.
law or mscott, is this yours?
this should be mine....
Assignee: asa → mscott
setting bug to New
Status: UNCONFIRMED → NEW
Ever confirmed: true
The file being downloaded has to be stored *somewhere*. It seems like every time someone thinks they've found a good place for partially downloaded files, someone says it doesn't work on their machine or with their way of handling files. * Storing it in RAM causes you to run out of memory on large downloads (bug 91795). * Storing it in the unix temporary directory uses up limited temp space (bug 69938). * Storing it in the destination folder frustrates people who try to start using the incomplete file before it finishes downloading (bug 103877). * Storing it in the cache annoys people keep their cache on a remote machine (bug 89903, bug 74085, this bug). (And then there's the side problem of "where do you put it before the user chooses a destination?" That's bug 55690, which contains one of bugzilla's longer flamewars.) Of the problems with various locations, it seems to me that the problem with the cache is the most tractable: few users have remote profiles, and those who choose to use a remote cache already take performance hits during noraml browsing. Users with remote profiles should be asked to specify a local cache directory, but not at gunpoint (using a remote cache should be a visible option). By the way, can someone explain how it is that all of the bugs I listed above are open at the same time? Does Mozilla really sometimes store partial downloads in cache, sometimes in RAM, sometimes in /tmp, and sometimes in the destination folder? If so, when does it use each location? Can someone create a metabug called "location of partially downloaded files" to track these issues? (I don't think I can, but I haven't tried.)
> Users with remote profiles should be asked to specify a local cache > directory, but not at gunpoint (using a remote cache should be a visible > option). You could generalize this idea to: 1) The default behavior is the current behavior (except on the Mac which has a default download directory as a system-wide preference--see #2). 2) There is a user-specifiable default download directory that overrides this behavior. This would make the Mac users happy becuase Moz would respect their systemwide preference. It would make NT admins happy because they could specify a different default location for partial downloads. It would make me happy because (although I don't work for Purdue any more) it handles the NT case, plus on my Linux boxen I can (once and for all) point Moz at my download directory (which is on /home, not on /tmp because I don't want to overwrite it if I reinstall the O/S). Last but not least, it would make the people who like the current behavior happy. -I would also like an option to skip the filename chooser altogether and just use the remote filename unless it would overwrite an existing local file. Call the feature one-click downloading and patent it. :-) Maybe this should be a separate "bug" though...
->file handling, but retriage if needed.
Component: Browser-General → File Handling
QA Contact: doronr → sairuh
The file is not only saved to the users cache but also to the /tmp directory on linux. It's now impossible for me to download a file of 512MB because the copy routine simply fails
How about doing what rsync does? Save the file as a dot-file version of itself, for instance, if you are downloading: blahblahblah.iso save the file as .blahblahblah.iso in the location the user chooses. If dotfiles aren't obscure enough you can name the file something weirder like: z89582.blahblahblah.iso Then when it's done, rename to the correct name and *kaching!* you're all set! In addition to solving the above problems, this allows for a mechanism of easily resuming failed downloads. There could also be a pref for "Delete partial downloads" that would default to on so that inexperienced users won't have incomplete files littering their disk. Just a thought... -J
Blocks: 129923
If the limit is on the users cache, the error message still goes for /tmp/xxxx.yy and the file is also copied (even being incomplete) to the user chosen destination (Re: Bug 145661.) leaving the user with a 'corrupted' file.
*** Bug 172301 has been marked as a duplicate of this bug. ***
QA Contact: sairuh → petersen
*** Bug 185738 has been marked as a duplicate of this bug. ***
*** Bug 186019 has been marked as a duplicate of this bug. ***
I agree with John Flynn (above) save the download file as ~moz-inc~<filename> or something like that. This would allow the file to be downloaded to *one* location not two. And this would allow resume/restart of an incomplete download.
*** Bug 198672 has been marked as a duplicate of this bug. ***
*** Bug 195179 has been marked as a duplicate of this bug. ***
*** Bug 132280 has been marked as a duplicate of this bug. ***
darin, is there a way to tell the cache not to store the file if we're downloading it to disk anyway?
+topembed, nsbeta1 Several versions of this bug have been running around for a while, Niels in #4 really summarize the problem well. We really need to strictly define the download behavior as not using extra disk or memeory cache resources. There is no point in trying to reduce our footprint if we grow everytime we download something. Cache is disk space that prevents recurring use of the network. I don't think that downloads, even partial downloads, fit this definition. Instead, a download directory should be defined (or an OS default should be accepted).
Keywords: nsbeta1, topembed
topembed-
Keywords: topembedtopembed-
Blocks: 195179
based on receent comments, this might only happen w/ http and not ftp?
I've tried to download ftp.mozilla.org/pub/mozilla/nightly/latest/mozilla-source.tar.gz (see the duplicate bug 198672): 1) http://ftp.mozilla.org/pub/mozilla/nightly/latest/mozilla-source.tar.gz creates a temporary copy in the system's temp dir *and* in the Mozilla cache. 2) ftp://ftp.mozilla.org/pub/mozilla/nightly/latest/mozilla-source.tar.gz only creates a temporary copy in the system's temp dir.
If it just created a temporary file in the same directory that it is being stored, then a simple rename of the file would be needed so even copying/moving the final file is not needed speeding up a finished download too. I have my harddrive split up for C and D. D is extremely large while C is for system files only. As a result, downloading anything larger than about 300MB (ISO for instance or large projects that I work on) result in the system temp directory being used, which is on C. Anytime I want to do a large download, I end up modifying my temporary directories to be on my D drive, although there is no need for this except to make mozilla put the temporary files in a place that has room. If the file I was downloading was saved to a temp file in the same directory that I'm downloading to, then upon completion, it would finish, and I'd know it would downoad without failing. Of course the ability to say if incomplete download files should stick around would be useful. Maybe even add a checkbox to the download window so it can be off by default and large downloads can have it enabled as it downloads.
Flags: blocking1.4?
Flags: blocking1.4? → blocking1.4-
adt: nsbeta1-
Keywords: nsbeta1nsbeta1-
Blocks: 140818
*** Bug 217834 has been marked as a duplicate of this bug. ***
Darin, is it possible to tell an in-process HTTP channel (one you are getting data from) to stop writing to cache? The basic issue here is that the cache access is set up in the guts of necko and the download code has no idea that it's even happening.... I think we do want to save the file to cache if it's being opened in a helper app, btw; just not if it's being saved to disk.
*** Bug 216856 has been marked as a duplicate of this bug. ***
Actually it is quite easy to fix this bug and the related bugs. Just stop using the predownload and save the file directly to the target. But do not set the filetype. If it is a .zip-file just call it .tmp until it is finished. On macosx just give it a custom icon that looks unfinished.
>Just stop using the predownload and save the file directly to the target. that helps this bug how exactly?
I just assumed that since it has worked this way before, at least in the mac version it would be easy to revert to that code and add support for changing the filetype. That would no be too diffucult right? Correct me if im wrong. I really think this would solve a few other bugs like styated in comment nr 4 I thought it would be a great compromise between what everybody wants.
true, but this bug is about downloads stored in the _cache_, not about downloads stored in the temp folder
> Correct me if im wrong. You are wrong. On two counts: 1) It would be very hard 2) It has nothing to do with this bug.
>You are wrong..... This is a reason to make a metabug, as someone before asked. It certainly would help me (my bug is set duplicate). My question: Why does mozilla saves a large file on two places: the saved file AND the tempdir? the solution would be obvious: >Just stop using the predownload and save the file directly to the target. So what is wrong about the solution? It isn't only the solution to my problem, also to some other problems stated in the 4th post. So maby you are wrong? ;-)
> file on two places: the saved file AND the tempdir? that is NOT THE TOPIC OF THIS BUG, as bz and I have been trying to explain
more SPAM: You mean bug 69938
Right. Comment 27, comment 32, etc are about bug 69938. Which is NOT THE SAME as this bug.
comment 8 from John Flynn is the way mozilla should go imho. comment 4 from Niels Aufbau - downloads can be stored in memory, temporarily, until reached some limit; then the download should be suspended, or setting could be done about that (tmp? cache? current directory?) download preferences even have just one setting for now ;) just MHO.
Would people stop making irrelevant comments, please? Comment 8 has nothing to do with this bug, really. Nor does any comment, except comment 0 (the problem description) and comment 25 (the solution description). All the other comments are about bug 69938, not this bug. I would like to commend you all on making sure that all the people who would consider fixing this bug are now carefully blocking out all bugmail from it.
OS: Windows NT → All
Hardware: PC → All
I have verified this behavior on both Redhat 9.0 with Mozilla 1.2.1 and Windows 2003 with Mozilla 1.6b. I simply cannot complete a specific large download becaues if this design flaw. All I have to say is, "I would have expected this sort of design with IE, not Mozilla."
*** Bug 214738 has been marked as a duplicate of this bug. ***
I have a few ideas: Perhaps we could have a "download cache" folder which we could specify in the settings. This would get around people's quota problem, keep the incomplete files out of the way until the download is finished, and (once cross session resuming is implimented: http://bugzilla.mozilla.org/show_bug.cgi?id=230870 ) prevent your downloads from being deleted when the browser crashes. It's worth noting that this is how most download managers and P2P programs handle this situation. Or, as someone suggested, save the temporary file to the destination, only with a modified name. I'm not sure how you identify files of different types in Linux, but under Windows, the temporary download could be appended a mozilla extension which represented an incomplete download, as well as an icon representing that. (filename).(extension).(Mozilla temporary download extension) setup.exe.moztemp when complete, could simply be renamed correctly... in this case, setup.exe
I'd vote for (configurable) storing the download to memory cache until user chooses the destination directory. configurable temp directory (or just using cache) for downloads is OK, but user should be able to turn it off.
Simple question, does the download need to be in the cache at all? No. Ftp don't use the cache when downloading, but http does. Ergo, the http-downloader must be fixed.
it's nice that that's what you vote for, but does that answer comment 25? this bug needs an answer to: "is it possible to tell an in-process HTTP channel (one you are getting data from) to stop writing to cache?" to be fixed.
>On Windows NT, the cache directory gets stored inside the user's roaming >profile. The problem with this is that some sites (such as here at Purdue) put >a very small quota on the roaming profile. (I think my roaming profile quota >is 2 megs.) I know that you can specify a quota for Netscape's cache, but if >your max cache size is set at 1 meg (what mine is set at) and you download an 8 >meg file (like the latest Mozilla nightly), you automatically exceed your >quota. Not only that, but if you want to download something fairly large like, >say, the latest Mandrake .ISO image file (on any platform), you might not have >enough space on the file system containing the cache directory to download it >there first and then move it to its final location. these problems can be solved by moving the changing the location of mozilla's cache. there is a preference that you can set to alter the location. it can be modified in the Advanced->Cache preferences. >Not to mention that on Unixes, a 'mv' across file systems involves a copy, >which is very inefficient (especially for large files). this is true of all operating systems. the cache is used whenever we download content over HTTP. that is just the default behavior. it makes perfect sense for small files. when files are large however you want to stop using the cache. this is what we do today. if the file exceeds a limit set by the cache itself, then the cache refuses to accept anymore data for that entry. the channel writing data to the cache will silently ignore this error condition, and continue streaming data to its listener. this is not a perfect solution though... >this bug needs an answer to: >"is it possible to tell an in-process HTTP channel (one you are getting >data from) to stop writing to cache?" that's not possible today. i think the original bug report has been mostly resolved. however, there remains a gotcha. nsDownloader, for example, will fail if the cache entry being created gets too large. i'm not sure if that effects normal downloads (i think maybe it doesn't).
In bug 229984 I detected the following: Downloading a very large file (about 700MB) saves the file BOTH in the cache directory, and in the specified location: Even if 'save link target as' is used. This is for nobody nice. To me, the solution would as follows: Start downloading it, into the cache, until the user has specified the download location. Currently Mozilla then starts to write to both locations. At least for items growing larger than the cache limit (or near to it), then pro-actively remove it from the cache (because it is being saved elsewhere). The key problem now is to tell the HTTP channel to stop writing to the cache, as soon as the file is being saved elsewhere and the size reaches the cache limits. Current effect during the was that near the end of the download I had twice an almost 700MB knoppix image on my simple laptop stored somewhere. Also Windows went crazy in writing the data to the disk. Actual download speed was about 1.4MB/s, it could have been faster if not for the double writes...
>Downloading a very large file (about 700MB) saves the file BOTH in the cache >directory, and in the specified location: Even if 'save link target as' is used. yes, of course. >Start downloading it, into the cache, until the user has specified the download >location. ew, that requires that the cache is enabled... and that the item is not removed from the cache before the user chooses a target... sounds like a fragile solution >The key problem now is to tell the HTTP channel to stop writing to the cache, as >soon as the file is being saved elsewhere and the size reaches the cache limits. I'd rather tell the channel to stop writing to the cache once it's determined that mozilla can't handle the content. that said, what about comment 44, so the claim that this is mostly resolved is not true?
The idea with Mozilla content looks smart to me. The question is: does this mean html and images only? This should not include any other file types, even handled with Mozilla plugins. If user didn't open the file to be shown by Mozilla and asks to save it (_not_to_open_) the file shouldn't be cached by Mozilla.
What is this yadayada anyway? Can't you guys just make the downloads go directly to directory which it's supposed to be downloaded? Mine downloads has many times canceled cause space in my C: is not enough but in the drive where it supposed to be download is enough. As long as this bug exists i use wget :(
1.7 beta (just as alpha too) saves partial files in the asked directory with "part" extension if the link is saved as : Save Link Target As... This is correct behaviour for me...
Comment 49 has nothing to do with this, that's bug 69938: http://bugzilla.mozilla.org/show_bug.cgi?id=69938 Is someone working on this bug (it's still have status NEW)? FireFox uses the same code, or?
+cc caillon - related to other cache bugs we've discussed.
This works for me. The file is saved in a temp name and then renamed as far as I can see. It can be closed for all I care :-)))
no, it can't, because this bug is about the download being also stored _in_the_cache_, which was not addressed with the mentioned change.
Sorry my cache was set to 1MB and the kernel is a tad larger than that. Hence it was not stored there
I was going to add something along this bugs line as a request for enhancement. It would be nice to be able to specify at different download cache location from the "standard" browser cache. I use a ramdisk and like to point my Mozilla cache to it, but if I download a file larger than the ramdisk size, even though I gave it a non-ramdisk location (e.g. c:\downloads), the ramdisk fills up with the temporary download of the file and then the download fails. Perhaps a "browser cache" and "download cache" option would help solve my problem or perhaps the download just needs to be handled in a new way.
Assignee: mscott → nobody
Still seems to be a problem in SeaMonkey 1.1.16 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.21) Gecko/20090511 SeaMonkey/1.1.16 Eg PDF files served by IEEE frequently exceed 1M (my initial cache size) http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4336121&isnumber=4336114 With a cache size of 1MB this fails with a number of unhelpful error messages. No indication is given that the problem lies in the cache size.
ps: Apart from using the Edit|Preferences|Advanced|cache SeaMonkey menu wget can down load huge files without caching to work around this problem
QA Contact: chrispetersen → file-handling
When downloading files, the file is first stored in %TEMP% and then moved to the destination path, as soon as that is known. It is also stored in the cache, as long as the file is smaller than half the cache (so, for the default cache of 50MB, files < 25MB are also stored in the cache). This can be seen as ok, as another download (or use in a page, e.g. with images), the data is still available in the cache, or as not ok, when the file is only saved to disk (usually happens with downloading large software files, such as a Linux distro). So, it is not stored in cache first, but *also* in cache (as long as the file is small enough).
Blocks: http_cache
We should simply skip storing downloads in the cache. Regarding which flag to set for this: see bug 588507. For downloads we might want INHIBIT_CACHING, since that allows getting the content from cache, which will happen for instance if a user wants to download an HTML file that they've already browsed.
Assignee: nobody → byronm
(In reply to comment #25) > is it possible to tell an in-process HTTP channel (one you are getting > data from) to stop writing to cache? The basic issue here is that the cache > access is set up in the guts of necko and the download code has no idea that > it's even happening.... I have implemented this functionality as part of my patch for bug 588507. I should be able to leverage this to prevent downloads from being stored to the cache. I just need to know how to identify if the channel has been opened for a download... its been suggested that Content-Disposition might be what I should check. Is that the best indicator, or should I check something else?
> I just need to know how to identify if the channel has been opened > for a download... its been suggested that Content-Disposition might > be what I should check. Is that the best indicator? bz/biesi: either of you know? I believe that if Content-Dispo = attachment, that's a pretty sure bet that we're doing a download. But this won't capture things like a user right-clicking on an image, etc., and asking to download it. If there's only a way to detect a subset of downloads and avoid storing them in the cache, I'm fine with that--the change we're making to avoid the cache if content-length is larger than 5 MB will avoid the worst problems with downloads anyway.
In general, you don't know whether the thing is being downloaded until after you fire OnStartRequest. If content-disposition:attachment and we're in a toplevel browsing context, that will be a download. Otherwise it won't be. Also, there have been requests to implement "view in browser" for content-disposition:attachment. Not caching content-disposition:attachment seems like an OK heuristic to me, though.
OK, then let's just not cache "attachment" and then consider this bug done.
Attached patch no_downloads_in_cache (obsolete) — Splinter Review
This patch prevents downloads from being stored in the cache, to the best that we can detect them. It turned out that many, many sites' download links do not set Content-Disposition in their response headers. After chatting with bz on irc, we decided that having the external handler inform the channel that it is open to service a download was the better bet. I have tested that this does indeed ban downloads from the cache. As soon as I get green back from try, I will flag it for review. Note: I do not have much experience modifying interfaces yet. I tried to follow convention/the docs carefully, but if anything is wrong, please let me know.
Attached patch no_downloads_in_cache (obsolete) — Splinter Review
Updated version prevents potential segfault when dereferencing mCacheEntry.
Attachment #472979 - Attachment is obsolete: true
You need to rev the interface id, right?
Attached patch no_downloads_in_cache (obsolete) — Splinter Review
Updates UUID of modified nsIHttpChannelInternal interface.
Attachment #472987 - Attachment is obsolete: true
Comment on attachment 473127 [details] [diff] [review] no_downloads_in_cache This is green on try, so I think it is ready for review.
Attachment #473127 - Flags: review?(jduell.mcbugs)
Modified comment to fit on one line.
Attachment #473127 - Attachment is obsolete: true
Attachment #473416 - Flags: review?(jduell.mcbugs)
Attachment #473127 - Flags: review?(jduell.mcbugs)
Attachment #473416 - Flags: review?(jduell.mcbugs) → review+
Not as huge a win as our other caching improvements, but simple patch and ready to land. FF 4?
blocking2.0: --- → ?
Attachment #473416 - Flags: approval2.0+
The patch has been pushed: http://hg.mozilla.org/mozilla-central/rev/6ccd956d1df9 This is fixing a 10 year old bug! :)
Status: NEW → RESOLVED
Closed: 14 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b6
blocking2.0: ? → final+
Depends on: 935595
Product: Core → Core Graveyard
OP: Does this "WFM" for you?
Flags: needinfo?(david)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: