Open Bug 331979 Opened 18 years ago Updated 2 years ago

Support metalink file download format

Categories

(Toolkit :: Downloads API, enhancement)

enhancement

Tracking

()

People

(Reporter: ian, Unassigned)

References

()

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-UK; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Build Identifier: 

We should consider supporting the Metalink file downloading format. It's a file format that provides http/ftp mirrors, and includes optional support for P2P (e.g. bittorrent) as well. It makes segmented downloading from mirrors easier for the average person. It also uses P2P & does automatic MD5/SHA-1 checksums of finished files. (see http://www.metalinker.org/ for more info).

It's already been implented in GetRight and is coming in the FlashGot firefox extension.

Reproducible: Always

Steps to Reproduce:
Depends on: 230870
.metalink files are simple XML text files that list the multiple locations (Mirrors) for files, along with checksums & other useful info. There will usually be more than 1 FTP & HTTP source, so a download manager can get a segment of each file from different sources at the same time. If the download manager doesn't support all or any of the P2P networks listed, then it just uses the mirrors.

<metalink version="3.0" xmlns="http://www.metalinker.org/">
<files>
  <file name=”example.ext”>
  <verification>
    <hash type="md5">example-md5-hash</hash>
    <hash type="sha1">example-sha1-hash</hash>
  </verification>
  <resources>
    <url type=”ftp”>ftp://ftp.example.com/example.ext</url>
    <url type=”http”>http://www.example2.com/example.ext</url> 
    <url type=”bittorrent>http://www.ex.com/file.torrent</url>
    <url type=”magnet”/>
    <url type=”ed2k”/>
    </resources>
  </file>
</files>
<metalink>

This bug probably depends on Bug 40106 (Accelerated Download for files with mirrors (swarming))
Depends on: 40106
aria2 (Unix) supports Metalink & BitTorrent from the command line - http://aria2.sourceforge.net/

Speed Download (Mac) supports Metalink in a beta version - http://www.yazsoft.com/

OpenOffice.org uses Metalinks for downloads at http://distribution.openoffice.org/p2p/magnet.html

A few more clients & sites are preparing to use Metalink.

Metalink @ Packages Resources (http://metalink.packages.ro/) provides automatically generated Metalinks for various projects along w/ source in Perl.

Speed Download (Mac) has been released with metalink support -
http://www.yazsoft.com/

wxDownload Fast (Cross platform, open source) too - http://dfast.sourceforge.net/

7 Linux/BSD distributions are now using it for ISO downloads.

Here's an easy explanation of metalink:

http://www.downloadsquad.com/2006/08/28/metalinks-integrated-bittorrent-http-and-ftp-downloads/

'Metalinks makes complex download pages obsolete by replacing long lists of download mirrors and BitTorrent trackers with a single .metalink file. As you might have already guessed, a .metalink file is a file that tells a download manager all the different ways it can download a file. The file itself takes the form of an open XML standard that can list an unlimited number of HTTP and FTP sources as well as BitTorrent trackers and ed2k and magnet links.'
 
Simba (http://simba.packages.ro/), RoPkg::Metalink (http://metalink.packages.ro/) and metalink tools (http://metalinks.sourceforge.net/) can all be used to generate metalinks. RoPkg::Metalink is for creating many metalinks for files synced to mirrors. metalink tools are command line apps for creating a few metalinks.

Around 10 Linux/BSD distributions use it for ISO downloads.
Orbit Downloader and Free Download Manager (beta) support Metalink. That makes 7 download programs (aria2, FlashGot, FDM, GetRight, Orbit, Speed Download, wxDownload Fast) that support it with about 4 more on the way.

I think downTHEMall (a FF extension) would be the easiest way to see FF working with Metalink because it's easy to install and use, and cross platform. An extension could be nicer than a installing a separate app. This could work nicely with Bouncer, if it made Metalinks.

About 10 Linux/BSD distributions use it now, along with OpenOffice.org and openSUSE.

With Metalink, (depending on how much of it you supported) you could have:

Automatic
- enhanced reliability from multiple links
- error correction, fix downloads, no need to re-download whole file, only the small percentage that had the error
- no more failed downloads, only canceled downloads
- use of local mirrors
- OS, language selection
- mirror/p2p integration
11 download programs now support Metalink, including Phex (P2P app), SmartFTP, KGet (part of the as yet unreleased KDE 4.0), and DownThemAll FF extension (also unreleased). There is a patch for Bouncer, but it hasn't been integrated upstream yet. There's also a GUI Metalink Editor, and cURL uses Perl scripts for its download pages.

Not all of these programs support segmented downloading, so Firefox wouldn't need to support that before using Metalink. Some clients just use Metalink to add multiple files to a download queue, or for getting checksum information so files can be automatically verified. Having multiple URLs to fall back on if some don't work can still be very useful.

Here are two recent articles on it:

"Metalink solves the first problem — how to find the most speedy way to download a file — by grouping different download protocols into one protocol. This enables Metalink clients to automatically switch between different mirror servers without explicit user instructions.

In the above example, not only multiple server information is described but also meta-data (e.g., the location of the server, document tags and license)...

[Metalink's] use of markup languages to describe information on the
Web and its attempt to shield users from heterogeneous download
protocols are signs of the emerging Semantic Web." [1]

"...A wider adoption of Semantic Web technology will depend on our
ability to bridge gaps between research and real-world applications.
Proof-of-concept examples (e.g., the one in the above) that build on
real-world applications (e.g., Metalink) can inspire people who are
not already in the Semantic Web circle to pick up the idea of the
Semantic Web, and go wild with it." [2]

[1]
http://www.geospatialsemanticweb.com/2007/02/25/metalink-unifies-internet-downloads

[2]
http://www.geospatialsemanticweb.com/2007/03/11/metalink-meets-rdf-and-sparql
Metalink support (ftp/http, file selection, whole file verification) has been added to DownThemAll! Firefox Add-on, available in nightly builds at http://bugs.code.downthemall.net/trac/wiki/NightlyBuilds

Screenshot: http://code.downthemall.net/maierman/metaselect4.png

KGet for KDE 4.0 has also added support to SVN at http://websvn.kde.org/trunk/KDE/kdenetwork/kget/

Free Download Manager 2.3 public beta also includes Metalink support.
DownThemAll! 1.0 beta (Firefox extension) has metalink support and is the easiest way to try it out in Firefox: http://www.downthemall.net/latest/downthemall-10-beta/

It's also on the mozilla wiki at http://wiki.mozilla.org/Metalink
Changing status to NEW.
Status: UNCONFIRMED → NEW
Ever confirmed: true
KGet 2 (part of KDE 4.0), Free Download Manager (GPL, Windows), TheWorld Browser (Windows web browser), Net Transport (Windows), Metalink Checker (GPL, python), & Retriever (Java, cross platform) now support Metalink. (over 30 apps total).

Metamirrors, Mirror Search, Origo and other sites are using it, including many (about 30) Linux/BSD distributions.
DownThemAll! 1.0 has been released and is a good way to test out Metalink.

http://www.downthemall.net/

http://www.metalinker.org/images/dta_ubuntu.png
The openSUSE download redirector (aka the MirrorBrain) and Ubuntu now provide Metalinks for downloads.

http://download.opensuse.org/distribution/11.0-Beta1/iso/cd/

http://cdimage.ubuntu.com/daily-live/current/
An initial Internet Draft describing the file format is available at http://tools.ietf.org/html/draft-bryan-metalink-00
Fedora's MirrorManager now creates Metalinks for Fedora ISOs. yum uses Metalink too.

https://hosted.fedoraproject.org/mirrormanager/
Assignee: file-handling → nobody
QA Contact: ian → file-handling
I do not think that implementing this in Firefox is all that useful currently. With bug 40106 I can see the use, but otherwise it seems like the only thing it would give us is the ability to autoselect a mirror for a download, and the webpage could do that just as well, perhaps better since it can use IP geolocation data.
(In reply to comment #15)
> I do not think that implementing this in Firefox is all that useful currently.
> With bug 40106 I can see the use, but otherwise it seems like the only thing it
> would give us is the ability to autoselect a mirror for a download, and the
> webpage could do that just as well, perhaps better since it can use IP
> geolocation data.

There are quite a few more thinks were metalinks come handy:
 * You can do failover in case a server becomes unresponsive during transfer or when resuming (provided there are multiple locations given in the metalink), or simply doesn't support resuming (and a lot of servers do not support resuming correctly).
 * mozilla apps can also use geolocation to choose the best mirror now that there is geolocation built-in with 1.9.1. And they can do this even if the site in question fails to select the best mirror automatically.
 * One may furthermore verify downloads; that is not strong authentication but at least that stuff got transfered without any corruptions. To add to that metalink supports chunk checksums the Download Manager can implement so that only corrupted chunks have to be re-downloaded instead of the whole file.
 * Furthermore it is an added value for website/mirror operators. They don't have to implement a full featured "download location selection manager" thingie doing their own geolocation and such. They may simply drop some xml textfile and be done.
 * You can effectively "bundle" downloads. You may use multiple <file>s within your metalink.
(In reply to comment #15)
> I do not think that implementing this in Firefox is all that useful currently.
> With bug 40106 I can see the use, but otherwise it seems like the only thing it
> would give us is the ability to autoselect a mirror for a download, and the
> webpage could do that just as well, perhaps better since it can use IP
> geolocation data.

Many metalink download clients support multi-source downloads, but some do not. Bug 40106 is not a hard dependency for metalink (my fault).

Metalinks are about making downloads easier, so they finish error-free if there is a way for them to. Let's look at a case where metalink is in use: large downloads. OpenOffice.org, openSUSE, Fedora, Ubuntu & other distribution ISOs.

Most browser's download managers are not dependable enough (or even capable) to download these large files. As Nils mentioned about in comment 16, you can failover if one (or many) servers have problems, or repair just a chunk of the file if there is an error.

I think if this feature was in Firefox it could help many people by making downloads easier.
(In reply to comment #15)
> I do not think that implementing this in Firefox is all that useful currently.
> With bug 40106 I can see the use, but otherwise it seems like the only thing it
> would give us is the ability to autoselect a mirror for a download, and the
> webpage could do that just as well, perhaps better since it can use IP
> geolocation data.

I guess I wouldn't be a liar by saying that supporting gopher protocol is even less useful, and yet Firefox does that (and it even has a gopher-related preference!). I also guess that lack of support from browsers could be one of the reasons why Metalink is not gaining more popularity. If implementing this isn't hard, why not?..
(In reply to comment #18)
> I also guess that lack of support from browsers could be one of
> the reasons why Metalink is not gaining more popularity. If implementing this
> isn't hard, why not?..

Of the 40+ programs that support metalink, most of them are download managers/helpers, but only one is a browser (TheWorld). The majority of people don't install these extra programs. Browser support could bring metalink to the mainstream without users even needing to know about it.

I don't know that adding metalink support is simple. Shawn Wilsher, who works on the Firefox download manager, said that some things are not in place for metalink support, I believe failover to alternate download sources. He plans a Firefox extension with metalink support (the excellent DownThemAll! is also available). XML & checksums are possible in Firefox.

A metalink download from one server would still be useful if it only verified the checksum of a whole file. Large downloads like OpenOffice.org or ISOs spend a little bit of time asking people to "md5sum filename" or on Windows it's even more of a hassle, finding/installing a separate checksum app then using it to figure out if a download has been corrupted.
(In reply to comment #15)
> I do not think that implementing this in Firefox is all that useful currently.
> With bug 40106 I can see the use, but otherwise it seems like the only thing it
> would give us is the ability to autoselect a mirror for a download, and the
> webpage could do that just as well, perhaps better since it can use IP
> geolocation data.

It may seem so, but that's not the same. Mirror selection at the server side can only work to a certain extent. It is quite limited to a guess that can't assess how useful the mirror will be in real life for a particular client. Mirrors can go offline at any time, could be overloaded, serve broken or infected content, and the network in between the client and the mirror can be a host of similar problems too.

Several concrete failure scenarios that can only be handled on the client-side are documented here:
http://en.opensuse.org/Libzypp/Failover#Example_scenarios
These are only some cases - in my job of maintaining a mirror infrastructure I have seen more.

HTTP as a protocol inherently has no provision for recontacting the server to ask for another mirror or such a thing. Any failure inevitably forces user interaction and a tiresome process of manual attempts to download again, from a better source. Not even content verification is provided by this "classical", manual way of downloading.

Only a "knowledge transfer" to the client can enable it to reliably perform a download. Metalinks accomplish this knowledge transfer and give the client all the information that it needs to handle failures autonomously, without user interaction. 

May I encourage reading http://lizards.opensuse.org/2008/12/16/best-way-to-download-opensuse/ - that post clarifies the user impact.

Furthermore, let's not forget users that have less good Internet connectivity than many of us. You may be able to download an OpenOffice.org (or Firefox) package quite easily, but that's not the case for everyone. It can be a nearly impossible for a user in e.g. an African country to install OpenOffice, a download of over 100 MB, with conventional means. Metalinks are a mature technology that solves this.

But not to be mistaken, mirror selection (and monitoring) on the server-side is useful as well - and can work together with the client-side!
Perhaps Firefox could add metalink support in stages, like most of the other download apps that support it already. It usually starts very simple & increases in complexity:

Stage 1 could include extracting a single FTP or HTTP URL from the metalink XML, then
download from that URL (no multipart/multisource download, just single source).

Stage 2, checksum the whole file to see if the file has been corrupted.

Stage 3, failover to alternate URLs if a server becomes unreachable.

Stage 4, use the chunk checksums in the metalink to tell if there are errors in a download, and only re-get the chunks with error so the download can be repaired.

Without getting too insanely complicated, or supporting multipart/multisource downloads, by stage 2 you've helped many people, especially those on Windows who don't have native checksumming tools (md5sum, etc).

Many of the people who download OpenOffice.org, openSUSE, Ubuntu, Fedora, Sabayon, and other projects that use metalink are on Windows. The first step in dealing with problems for downloads is usually a manual checksum verification which is a support nightmare when dealing with inexperienced people. 

Metalink aims to make downloads much easier and to be able to recover from
transmission errors, servers going down, etc, and complete without the user needing to know anything went wrong. As mentioned above by Peter, this is very helpful in countries with unreliable internet connections.

BTW, people from Mozilla, other browser makers, and everyone else are welcome to review and comment on our Metalink Internet Draft: http://tools.ietf.org/html/draft-bryan-metalink
For those interested, a rough proposal for Metalink in HTTP headers: 

http://tools.ietf.org/html/draft-bryan-metalinkhttp

At the moment, it looks like this:

   Link: <http://www2.example.com/example.ext>; rel="duplicate"
   Link: <ftp://ftp.example.com/example.ext>; rel="duplicate"
   Link: <http://example.com/example.ext.torrent>; rel="describedby";
   type="application/x-bittorrent"
   Link: <http://example.com/example.ext.metalink>; rel="describedby";
   type="application/metalink4+xml"
   Link: <http://example.com/example.ext.asc>; rel="describedby";
   type="application/pgp-signature"
   Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=
(In reply to comment #19)
> I don't know that adding metalink support is simple. Shawn Wilsher, who works
> on the Firefox download manager, said that some things are not in place for
> metalink support, I believe failover to alternate download sources. He plans a
> Firefox extension with metalink support (the excellent DownThemAll! is also
> available). XML & checksums are possible in Firefox.

Three options here:

1. NATIVE : Implementing this directly into Firefox. 
	- Difficulty level
	- Most of the developers (of Google Chrome aswell) are not supporting this.
2. NPAPI Plugin
	+ I think if one writes an NPAPI plugin that would solve the problem in most non-IE browsers.
	- The downside is that its again not inbuilt into the browser. So, user must install it. 
3. EXTENSION 
	- If DownThemAll! already support it. Why does (Shawn Wilsher is planning for) another extension?
Sorry, forgot to make a point. :P

So, Which option to go with. I feel NPAPI plugin would be best.
The Internet Draft version of Metalink is in IETF Last Call.

It would be great to have review from anyone, but browser people would be especially nice.

http://tools.ietf.org/html/draft-bryan-metalink
RFC 5854 'The Metalink Download Description Format' is out.

http://tools.ietf.org/html/rfc5854
The Wikimedia Foundation seems to search a solution to provide big video files to browsers based on a P2P solution. It seems to me that we have here a chance to push this feature request.

http://techblog.wikimedia.org/2010/09/video-labs-p2p-next-community-cdn-for-video-distribution/
This is a major user experience feature; it is criminal that this isn't in place.  It's just the sort of killer feature that FF needs to compete with Chrome.

I am unsure as to why this is dependent upon swarming: downloading from multiple hosts creates more problems for the host.  

In terms of a better user and admin experience this feature should prioritize:
1)Fail-over
2)Speed boost from "local" server (university/ISP/business mirrors).
3)Speed boost from P4P (optimizing for a "local" host within the same ISP to reduce ISP peering http://tinyurl.com/4u8yxu7).
FYI, Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP Header Fields is in IETF Last Call. it would be great to have comments from browser vendors.

http://tools.ietf.org/html/draft-bryan-metalinkhttp

(In reply to comment #28)
> This is a major user experience feature; it is criminal that this isn't in
> place.  It's just the sort of killer feature that FF needs to compete with
> Chrome.

I agree :)

Metalink seems like a feature for advanced users, but we've designed it in a way that it benefits everyone & the least technically included people will get a better download experience because of it. Nobody ever needs to have a large download get corrupted or not complete. That would help a lot of open source projects.
 
> I am unsure as to why this is dependent upon swarming: downloading from
> multiple hosts creates more problems for the host.  

That was my error, Metalink doesn't depend on swarming at all.

(In reply to comment #27)
> The Wikimedia Foundation seems to search a solution to provide big video files
> to browsers based on a P2P solution. It seems to me that we have here a chance
> to push this feature request.

That would be great. I think if this was included in Firefox then many more people would make use of it.

(In reply to comment #23)
> (In reply to comment #19)
> 
> Three options here:
> 
> 1. NATIVE : Implementing this directly into Firefox. 
>     - Difficulty level
>     - Most of the developers (of Google Chrome aswell) are not supporting this.
> 2. NPAPI Plugin
>     + I think if one writes an NPAPI plugin that would solve the problem in
> most non-IE browsers.
>     - The downside is that its again not inbuilt into the browser. So, user
> must install it. 
> 3. EXTENSION 
>     - If DownThemAll! already support it. Why does (Shawn Wilsher is planning
> for) another extension?

NPAPI plugin would be great.

DownThemAll! 2.0 is finally out & it has excellent Metalink support. it's obviously very popular and lets a lot of people try out Metalink.

we are still hoping for native support in Firefox. I think the effort that would be required would be worth it.
RFC 6249 'Metalink/HTTP: Mirrors and Hashes' is out.

It specifies how mirrors, hashes, Metalink/XML (for repair of downloads with partial file hashes), p2p information, and digital signatures are associated with downloads using HTTP header fields.

   Link: <http://www2.example.com/example.ext>; rel=duplicate
   Link: <ftp://ftp.example.com/example.ext>; rel=duplicate
   Link: <http://example.com/example.ext.torrent>; rel=describedby;
   type="application/x-bittorrent"
   Link: <http://example.com/example.ext.meta4>; rel=describedby;
   type="application/metalink4+xml"
   Link: <http://example.com/example.ext.asc>; rel=describedby;
   type="application/pgp-signature"
   Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
   DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ==

http://tools.ietf.org/html/rfc6249
Metalink has students interested in adding support to Firefox for Google Summer of Code.

Is this something Mozilla would accept, and provide a mentor for?

There are a decent amount of organizations using metalink, but without native browser support, getting regular users to take advantage of the features is extra work for support (installing an extension or download manager).

Use cases: 
- downloading large files (error repair) like Linux distributions, 
software, and games 
- downloading files available on a CDN or mirror network 
- webmail could use metalinks for "Download All Attachments" instead 
of putting all files in a .ZIP archive. 
- downloading a whole album & creating a directory structure instead 
of putting already compressed files in an archive
You should remove the swarming dependency.

I'll see if kicking the Firefox UI list will get any attention. : )
(In reply to Zach Lym from comment #28)
> This is a major user experience feature; it is criminal that this isn't in
> place.  It's just the sort of killer feature that FF needs to compete with
> Chrome.
> 
> I am unsure as to why this is dependent upon swarming: downloading from
> multiple hosts creates more problems for the host.  
> 
> In terms of a better user and admin experience this feature should
> prioritize:
> 1)Fail-over
> 2)Speed boost from "local" server (university/ISP/business mirrors).
> 3)Speed boost from P4P (optimizing for a "local" host within the same ISP to
> reduce ISP peering http://tinyurl.com/4u8yxu7).

thanks, Zach.

metalink does NOT depend on swarming (bug #40106) - I'm just unsure how to remove that dependency in bugzilla.
No longer depends on: 40106
Ant: we don't "provide mentors" - people who are interested in particular projects offer to mentor them. Therefore, for the best chance of success, you would need to find a Mozilla hacker with the appropriate skills and experience who was willing to mentor the project, and include that person's name in the application.

The student can certainly apply without such a name, but it does reduce their chances of success.

It is likely also that the relevant team (networking? not sure) would need to say that this was a feature they'd accept a patch for before we used a GSoC slot on it.

Gerv
(In reply to Gervase Markham [:gerv] from comment #34)
> Ant: we don't "provide mentors" - people who are interested in particular
> projects offer to mentor them. Therefore, for the best chance of success,
> you would need to find a Mozilla hacker with the appropriate skills and
> experience who was willing to mentor the project, and include that person's
> name in the application.

thanks, Gerv. let me rephrase that:

Metalink is in Google Summer of Code this year and is looking for a mentor from Mozilla to aid one of our students in adding native metalink support. we want to use our own GSoC slot on this, so Mozilla would not need to give up one.

if anyone is interested, please contact me! :)
(In reply to Ant Bryan from comment #35)
> (In reply to Gervase Markham [:gerv] from comment #34)
> > Ant: we don't "provide mentors" - people who are interested in particular
> > projects offer to mentor them. Therefore, for the best chance of success,
> > you would need to find a Mozilla hacker with the appropriate skills and
> > experience who was willing to mentor the project, and include that person's
> > name in the application.
> 
> thanks, Gerv. let me rephrase that:
> 
> Metalink is in Google Summer of Code this year and is looking for a mentor
> from Mozilla to aid one of our students in adding native metalink support.
> we want to use our own GSoC slot on this, so Mozilla would not need to give
> up one.
> 
> if anyone is interested, please contact me! :)

Hi  Ant Bryan,

In metalinker ideas for Gsoc 2013 page it says that developing native support for firefox is still open for students. Well, I'm interested in that. Can you please tell me whom to contact for this matter. because there are no mentors assigned to this project there. thanks.

-sudheera
awesome, Sudheera, thanks for the interest! we are in the same situation as last year where we would need an interested mentor from Mozilla.

some things we accomplished in the last year:

curl & wget added metalink support
DownThemAll, KGet & others added more metalink features
new programs added support like a Chrome extension
I am looking forward to this feature as well.

Please also include OpenPGP support in metalink.
Component: File Handling → Download Manager
Product: Core → Toolkit
+1. Would love to see this feature added.
+1 Users need a browser able to download files securely and check integrity. Metalink is helpful for that and used in open source community. It's XML based, simple, file or http headers can be used. Could be a great opportunity for Mozilla to announce a useful feature (as DownThemAll! is not available anymore for that). Mozilla could use it for it's download (and why not Send ?)

13 years is enough to take a good decision. 

http://releases.ubuntu.com/18.10/ubuntu-18.10-desktop-amd64.metalink
https://download.documentfoundation.org/libreoffice/stable/6.1.3/win/x86_64/LibreOffice_6.1.3_Win_x64.msi.mirrorlist
https://mirrors.slackware.com/slackware/slackware-iso/slackware64-14.2-iso/slackware64-14.2-install-dvd.iso.mirrorlist

+1 We should have this in Firefox, before Chrome does.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.