Open Bug 40253 Opened 25 years ago Updated 2 years ago

Execute the download of all links (like wget --recursive)

Categories

(Firefox :: File Handling, enhancement)

enhancement

Tracking

()

People

(Reporter: netdragon, Unassigned)

References

Details

(Keywords: helpwanted)

Attachments

(1 obsolete file)

I think the browser, on request, should be able to download all the pages in a certain site while he/she goes and eats something, etc. The question is... How will the browser know when the site ends and others begin. I mean, if the browser wasn't given limits, it could download the WHOLE WEB! Also, it might download a site more than once. Obviously, a user would have to limit (A)number of pages downloaded (B)how many levels of links to execute (C)the domains that are allowed or a combination of the 3. Obviously, you would be able to stop it. It would also be able to recover on messed up pages. Pages predownloaded would be stored in a special cache dir and could be copied to another part of the disk to save. All images, etc. would be downloaded with the pages - therefore, you could copy a whole site to the hard disk with, of course, certain restrictions. IE - you couldn't copy cgis. Another idea I have is that someone can post a site map file to the site. The browser could then open this file and download the pages by how the sitemap file lists them. The user could even view the sitemap file and select which parts he/she wants to download. The sitemap file would contain the data to construct a tree. Each node could have a name, description, size info, and url.
*** Bug 40258 has been marked as a duplicate of this bug. ***
*** Bug 40254 has been marked as a duplicate of this bug. ***
*** Bug 40255 has been marked as a duplicate of this bug. ***
Sorry, I thought each time it wasn't sent.
Yes I agree, this would be a very useful function to have. It would actually make offline browing a useful tool.
This might become even more useful, though a UI puzzle, if one could select some specific set links to be followed. For instance, if I'm reading Freshmeat (http://www.freshmeat.net/), I might want to get the appindex record, homepage, and change-log for a given item, but not want to spider the whole fifteen-to- thirty-item page. This might also be useful in situations where one has to download a series of files by clicking their links individually, assuming that one could specify the save location once and have it apply to all of them. I'm not sure how you'd implement this UI-wise, though. Maybe a separate "link- tagging" mode, though that'd be sure to confuse a user who got into it by accident.
I had an idea - basically, the browser would build a flowchart, or tree, showing all the pages linked to by this one up to a certain point and display it. Then the user would mark the ones he/she wants downloaded. It would make this flowchart by downloading pages without the graphics.
Sorry for the spam. New QA Contact for Browser General. Thanks for your help Joseph (good luck with the new job) and welcome aboard Doron Rosenberg
QA Contact: jelwell → doronr
Assignee: asa → gagan
Status: UNCONFIRMED → NEW
Component: Browser-General → Networking
Ever confirmed: true
QA Contact: doronr → tever
rfe, confirming. sending to networking, I guess they'd be the ones to implement this if anyone
->helpwanted
Assignee: gagan → nobody
Keywords: helpwanted
I would recommend that whoever implements this take a look at how the unix wget program works. It has some really nice options for how links are followed, what is downloaded, etc.
There are plenty of offline download programs to look at for ideas. My question is what benefit doing this within the browser gives. Most separate offline downloading programs already give all the functionality you desire, and work well with a browser once downloaded. There seems to be a difference here between people who want a full offline downloading tool (and many already exist), and just a cache-ahead feature in the browser. The latter would make more sense, and would be a part of the normal cache and hence would eventually disappear.
I was not aware of the offline downloading programs. I think cache ahead would be a good feature and is in other bugs. Maybe making the browser capable of offline browsing would be more trouble than its worth if such things already exist. I still think there should be a capability to download all the files on a specific page. For instance, if anyone has downloaded DJGPP back in the DOS days, there were a million links on the page to ftp files, and you had to individually click on each one. It would be nice if there was a window that came up that gave you a list of all files linked to on that page (ie - binary files) and you could download them all at once without clicking on each link. I believe that has nothing to do with offline browsing. This is what Eric S. Smith was talking about. This would especially useful for if you were at the index of some directory and wanted to download all the files in that directory.
In the old days (nc4, nav3gold) I abused editor to do this for me. my technique: save page. edit page: s/a href/img src/ add a rel tag so that links start in the right place save page load hacked page in composer/editor save page. watch as payload is retrieved. I think we might be able to implement this w/ just a chrome javascript in which case this bug is EASILY fixed. Brian: would you like to take a stab at it? Poor nobody is even more doomed than I. [timeless] techbot1 bug-total &bug_status=new&bug_status=assigned&assigned_to=timeless@bemail.org <techbot1> 118 bugs found. [timeless] techbot1 bug-total &bug_status=new&bug_status=assigned&assigned_to=nobody@mozilla.org <techbot1> 178 bugs found.
Unfortunately, I ran out of Hard drive space on my laptop and had to delete the mozilla source. In about a couple weeks I will be building a computer with a 120 GB Raid drive - so that won't happen again. Until then, I can't do anything. I am also inexperienced in editing mozilla - so it might take me a while to figure out. I was going to start learning how if I hadn't run out of HD space. :(
Ok, I'm back in business. Ummm. Sure you can assign it to me if you want. I am starting to get doomed though :-( or possibly a :-) depending on how you look at it.
When assigning to me, realize that I have no plans of implementing this in the near future and that I can only find others to implement it for me.
mass move, v2. qa to me.
QA Contact: tever → benc
Summary: Execute the download of all links → Execute the download of all links (like wget --recursive)
Whiteboard: [Aufbau-P4]
Whiteboard: [Aufbau-P4]
Component: Networking → File Handling
Is this not the functionality Leech does on mozdev ? It has its own leaching tech as well as the option to use wget. http://leech.mozdev.org
-> defaults
Assignee: nobody → law
QA Contact: benc → petersen
This should be an extension, imo, but if someone does this I'm willing to review the patch.
Assignee: law → nobody
Priority: P3 → --
Download all link as ReGet is very useful funtion. What about make download button on Links tab in Page Info(Ctrl + I)? And make "Save all links in page..." menu in right click menu(hot menu?) that opens up above dialog would great!
*** Bug 221366 has been marked as a duplicate of this bug. ***
*** Bug 226219 has been marked as a duplicate of this bug. ***
QA Contact: chrispetersen → file-handling
There are a number of addons which provide this functionality. I propose this be closed/invalid.
Attached file Hacked page (obsolete) (deleted) —
[Security approval request comment] How easily can the security issue be deduced from the patch? Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? Which older supported branches are affected by this flaw? If not all supported branches, which bug introduced the flaw? Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? How likely is this patch to cause regressions; how much testing does it need? [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: User impact if declined: Fix Landed on Version: Risk to taking this patch (and alternatives if risky): String or UUID changes made by this patch: See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info. [Approval Request Comment] Regression caused by (bug #): User impact if declined: Testing completed (on m-c, etc.): Risk to taking this patch (and alternatives if risky): [Approval Request Comment] Bug caused by (feature/regressing bug #): User impact if declined: Testing completed (on m-c, etc.): Risk to taking this patch (and alternatives if risky): String or UUID changes made by this patch:
Attachment #684830 - Flags: ui-review-
Attachment #684830 - Flags: sec-approval?
Attachment #684830 - Flags: review-
Attachment #684830 - Flags: checkin-
Attachment #684830 - Flags: approval-mozilla-release?
Attachment #684830 - Flags: approval-mozilla-esr10?
Attachment #684830 - Flags: approval-mozilla-beta?
Attachment #684830 - Flags: approval-mozilla-aurora?
Attachment #684830 - Attachment is obsolete: true
Attachment #684830 - Attachment is patch: false
Attachment #684830 - Flags: ui-review-
Attachment #684830 - Flags: sec-approval?
Attachment #684830 - Flags: review-
Attachment #684830 - Flags: checkin-
Attachment #684830 - Flags: approval-mozilla-release?
Attachment #684830 - Flags: approval-mozilla-esr10?
Attachment #684830 - Flags: approval-mozilla-beta?
Attachment #684830 - Flags: approval-mozilla-aurora?
The content of attachment 684830 [details] has been deleted for the following reason: A copy of the facebook 'TLNEBF' page.
Product: Core → Firefox
Version: Trunk → unspecified
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: