from http://www.gerv.net/security/link-fingerprints/ : > This is a method of making an HTTP URL reference not only a > particular resource, but a particular version of that resource, > in a way that the fetching client can validate. I think this should be implemented by Firefox. It's a nice, unobtrusive feature that could be of real benefit for downloading files. This bug is intended for discussing which hash algorithms to support, how the user interface should indicate a "checked" download, and eventually for tracking the implementation. As MD5 is most likely one of the algorithms to be provided, adding dependency on bug 92054.
Removing dependency on bug 92054 - link fingerprints need multiple algorithms (not only MD5), so the generic solution of bug 292368 is more appropriate.
Related to bug 292481 (Content-MD5 header). Adding the checksum to the link allows a website to store its data on remote mirrors (like the mozilla.org downoad servers), and still be sure that the content wasn't altered. A Content-MD5 attack would protect against a MitM attack, where the content was altered on the fly. Ofcourse, a Context-MD5 header can also be faked or modified by a MitM attacker, I'm not claiming it's the solution for all phishing problems. It's just an extra check that we need to implement.
This bug does not block bug 101743; that bug could be implemented without this one. This bug also has nothing to do with Content-MD5, which protects against corruption in transmission (something entirely different) and again could be implemented independently of this one. Gerv
The problem is that the ASCII HTML file might be altered as it is uploaded from the author's computer to the Web server. The most common alteration is changing the PC end-of-line CR-LF (x0D0A) to the UNIX end-of-line LF (x0A). While this might be reversed when downloading the file via FTP, Mozilla and Firefox do not reverse the alteration when downloading into a browser window. I reported this as bug 211130, which was closed as a duplicate of bug 38121 (which does not clearly indicate the impact on verifying Web pages against fingerprints). Other alterations are somtimes added by Web servers. HTML files at Geocities had scripts added to bring up advertisements or at least a Geocities watermark, neither of which were intended by the page authors. Such alterations invalidate any checksum used as fingerprints. I did some extensive testing about two years ago on the concept of using PGP for digitally signing Web pages. While a binary file -- transferred from one host to another without any alterations -- can be signed and then later verified, an ASCII file cannot unless the alterations are exactly reversed. Note that digitally signing an ASCII E-mail message works because (1) the signature applies only to a section (the message block) within the file and not to the entire file and (2) the verification process then takes into account the alterations (per RFC 2440). In the two years since I did that testing, I have come to the conclusion that verifying Web pages against checksums or digital signatures will not work in a global context of varying hardware hosts, operating systems, and browsers.
(In reply to comment #5) > In the two years since I did that testing, I have come to the conclusion that > verifying Web pages against checksums or digital signatures will not work in a > global context of varying hardware hosts, operating systems, and browsers. The concept of link fingerprints is intended for binary downloads. If it would be extended to web pages, we would have to alter/override Gecko's behaviour for named anchors - for binary downloads, though, the fragment identifier currently has no defined meaning, and can be used without causing any conflicts or "re-definitions".
How would this be restricted to binary downloads? Note that Web pages are not the only ASCII files that might be targeted by a URL in the HREF attribute of an anchor. The problem I describe in comment #5 applies to all ASCII files, not just HTML files. As an example of that problem, I took the MD5 hash of five versions of one of my own Web pages that I know to be non-dynamic: selecting the link and downloading it via "Save link target as", loading the page and saving it via "Save page as", the working copy on my local hard drive, downloading via WS_FTP95 LE, and downloading via DOS FTP. The last three all gave the same hash. The first two each gave a different hash. I got similar results for a TXT file that is on my Web server, except the first two versions gave the same hash (different from the last three). I am confident similar results would be obtained for a CSS file. I do use MD5 hashes to verify the integrity of downloaded binary files. This includes downloads of new virus definition data packages and downloads of new versions of Mozilla Suite. However, I prefer getting the file's original hash from an ASCII file or visually on the Web page. Currently, I compute the hash for the downloaded file via an MD5 application; I would prefer to see bug #101743 implemented to provide this automatically for the downloaded file.
(In reply to comment #7) > How would this be restricted to binary downloads? It would not be restricted by technical measures. Basically, you are saying that hashing doesn't work for plain text files - the reasons for this may be browsers adding information to saved files (in case of "Save page as..."), charset issues or using the FTP client's ASCII mode when uploading the file. IMO, this is no argument against implementing link fingerprints - after all, the primary goal of the spec is to provide a way to automate hash verification for downloaded *software*. Perhaps this should just be made clearer in the spec, so no one tries to use it for plain text files.
Have redirects been considered? IMHO hashes should only be considered "trusted" if they came from a link that was expictly clicked on by a user or triggered by a trusted code eg chrome. Either redirects should be dissallowed or the orginal hash must be compared against the final file, ignoring any new hashes which may have been presented during the redirect chain. A small fear I have is the people will believe that this is a replacement for signed binaries. It is not. It doesn't do anything to stop MITM attacks (unless used along with SSL), or hacked servers which supply the orginal link.
There's currently a link fingerprint implementation in MDHashTool: http://mdhashtool.mozdev.org/ . I guess we can use this extension to work out some of the practical issues involving link fingerprints. For example, as Chris says, we need to keep the original fingerprint during any redirects. Gerv
*** Bug 330315 has been marked as a duplicate of this bug. ***
GetRight 6 ( http://www.getright.com/ ), a popular Windows download manager supports link fingerprints. So does FlashGot ( http://www.flashgot.net/ ), a Firefox extension that works with around 30 download managers. If the download manager supports checksums (GetRight, Freedownloadmanagaer, iGetter, probably more) then FlashGot could probably be modified to pass the checksum info to them.
How would the hash be added to the link within the constraints of maintaining compliance with RFCs and W3C specifications? Would this require an anchor attribute that is non-standard? Or would it require a non-standard suffix to the URI? Remember, a significant claim for Mozilla products is that they are standards-compliant. Further, would a Web page with such a link validate as compliant with HTML or XHTML specifications? How would this protect against an attack that alters both the file to be downloaded and the hash in the link? If a hacker can change the file, changing the link should be trivial. See also my comment (28) about digestIT in bug #101743.
(In reply to comment #13) > How would the hash be added to the link within the constraints of maintaining > compliance with RFCs and W3C specifications? Would this require an anchor > attribute that is non-standard? Or would it require a non-standard suffix to > the URI? How the user agent interprets the fragment identifier (the thing after the # sign) is not standardized, except when the link target is a HTML/XHTML document. I do not see any spec violation in using it for linking to binary files. > Further, would a Web page with such a link validate as > compliant with HTML or XHTML specifications? Why should it not? It is a completely valid URI to a resource on some server, placed in the 'href' attribute of the 'a' tag in HTML/XHTML. What it links to (or whether it exists) is not relevant for the validity of the web page. > How would this protect against an attack that alters both the file to be > downloaded and the hash in the link? It does not, and does not need to. Remember, if a hacker can alter the linking web page, he could also change the MD5 checksums that are placed beside many download links nowadays. IMO link fingerprints are not primarily about enhanced security (that's what SSL, among others, is for) - they are about improved convenience, because the browser automatically validates files with checksums published on web pages, instead of a power-user doing it manually. > If a hacker can change the file, changing the link should be trivial. Of course, but most downloads are served from different hosts than the web page - think of mirror networks.
They are partly about improving security - it's a bar-raising exercise. Currently, to get people to download a trojaned binary, an attacker has to hack a single server in your mirror network. Your mirror network may not be under your control - this is often the case for free software projects - and its security is unknown. With link fingerprints, they also have to hack your main webserver and change the checksum to match (in which case, it won't match any copies of the original elsewhere on the network, if any). This is quite a lot harder - it usually means hacking two machines rather than one. In addition, often URLs to security updates are distributed through email. You can't break into everyone's mailboxes and change the email to match your trojan. So it is partly about security. I don't claim it solves every problem ever, but it makes things more difficult for an attacker, with the great bonus that it's completely transparent and backwardly-compatible. Worse is better. Gerv
(In reply to comment #15) > In addition, often URLs to security updates are distributed through email. You > can't break into everyone's mailboxes and change the email to match your > trojan. Clicking a URL to something called a security update in an email doesn't sound like a particularly good idea...
I guess unless you are pretty sure that the domain is right and there is no second IDN-like attack.
Companies like Red Hat send out security notification emails all the time. Example: https://www.redhat.com/archives/enterprise-watch-list/2006-June/msg00007.html If link fingerprints were implemented, the checksums present in that email could be integrated into the links and would be automatically checked for you by your browser. Gerv
Given repeated warnings about phishing attacks and viruses, I am very leery about downloading software from an E-mail link.
Red Hat sign their security emails, and your mail client can verify the signature automatically. But we're getting off the point here - the idea is not necessarily to establish a watertight chain of trust all the way to the file, the idea is to make crackers lives harder in a backwardly-compatible way. One way of measuring this would be too look at the various trojaning incidents over the past few years and see if any would have been averted, or more easily spotted, if link fingerprints were in use. Gerv
I use MD5 and SHA1 hashes to verify the integrity of the transfer of large files across a network. In this case, I am not concerned about a hostile attack. I'm merely concerned that the transfer did not corrupt the file. Yes, I do occasionally find that a file was indeed corrupted and must be transferred again. However, both MD5 and SHA1 are now known to contain vulnerabilities. While both are still valid for the use I indicate above, their continued use to verify a lack of hostile attacks on files is now questionable. For example, see <http://www.mccune.cc/PGPpage2.htm> and search for the terms "MD5" and "SHA1". Should Firefox be tied to a particular hash function by this RFE? Will security fixes be released if that function proves ineffective for protecting against hostile attacks? In that case, what happens to the Web pages that already implement the ineffective function? Somehow, I think the use of SSL and X.509 certificates -- existing capabilities -- is the proper way to handle downloads where there is concern for hostile attacks. Anything beyond that should be handled outside of browsers and E-mail clients.
> However, both MD5 and SHA1 are now known to contain vulnerabilities. I think SHA-256 is still believed to be safe. > Should Firefox be tied to a particular hash function by this RFE? Will > security fixes be released if that function proves ineffective for protecting > against hostile attacks? With MD5, I believe it is currently fairly easy to create two colliding chunks but not yet easy to make a file that hashes to a given value. This isn't a big deal for verifying official software downloads, but history suggests that it will be more broken soon. So when we add this feature, it shouldn't support MD5. If we ship with SHA-256 support and someone breaks SHA-256 in 2016, then depending on how broken it is, we can refuse to follow the link, make the UI look "slightly less secure", or make no changes.
> Somehow, I think the use of SSL and X.509 certificates -- existing capabilities > -- is the proper way to handle downloads where there is concern for hostile > attacks. When I filed bug 358384, I was initially inclined to agree with you. But justdave made it clear in that bug that upgrading our (volunteer!) download mirror network to support https would be a major headache compared to making mozilla.com use https and provide a hash attribute with each download link. One-to-one encryption is a lot of overhead if you use it for an entire 5MB download: each mirror has to have a certificate and the processing power to support SSL for a bunch of concurrent downloads. If a single mirror is hacked, SSL doesn't protect users who hit that mirror. I imagine that other software providers (especially OSS and freeware providers relying on volunteer mirrors but also large software providers such as Skype and Microsoft) face similar issues and would prefer the "https site with download link that has a hash" approach to the "https for entire download" approach.
If I understand http://en.wikipedia.org/wiki/Merkle-Damgard_hash_function correctly, it should be possible to compute the SHA-256 of a file incrementally while downloading it. So there need not be a separate "verification" step that makes these downloads significantly slower :)
(In reply to comment #24) > If I understand http://en.wikipedia.org/wiki/Merkle-Damgard_hash_function > correctly, it should be possible to compute the SHA-256 of a file incrementally > while downloading it. So there need not be a separate "verification" step > that makes these downloads significantly slower :) > Yes, all hash functions work incrementally, and can be down while still downloading. Otherwise, verifying a 300 MB download would take a real long time !
> Should Firefox be tied to a particular hash function by this RFE? Absolutely. If we don't define a limited set of hash functions, then interoperability is greatly damaged. The logic is: we have MD5 because it's short, and convenient for non-security applications, and SHA-256 because it's the best widely-used algorithm currently available, for use for security-sensitive applications. Perhaps that logic is bad because people might use MD5 for security-sensitive applications anyway. But the hashes from SHA256 are so long as to be unwieldy, if for example you are sending a number of the URLs in an email. It's a hard trade-off. Computing the hash incrementally during the download is a very sensible idea. Gerv
(In reply to comment #15) > They are partly about improving security - it's a bar-raising exercise. > > Currently, to get people to download a trojaned binary, an attacker has to hack > a single server in your mirror network. Your mirror network may not be under > your control - this is often the case for free software projects - and its > security is unknown. How should errors such as 404 be handled? Should the returned page be checked against the checksum? If not, a hacker could make the hacked server return a 404 error to trick the browser into aborting the checksum calculation. Then in the 404's html page they make it look like a redirection page with another link to a trojan without the original checksum.
Hmm. So you are saying that the 404 would be returned, but would have a <meta refresh=0> or something to direct to a trojaned copy of the download. We could deal with this in one of several ways: - Have an explicit "failure" notification, which would pop up when the 404 failed to match the checksum - Have an explicit "success" notification, which would _not_ pop up when the trojan downloaded, because the checksum is no longer active. I prefer the former. You are right: the checksum should be applied to whatever content gets returned, and an error explicitly shown if it fails. Gerv
Created attachment 268180 [details] [diff] [review] v1
patch comments: Look for the Link Fingerprint error status and delete the file from the download manager. This works for downloads explicitly started from alt-click or save link as. Exthandler will check for the error status for pages that start loading and then afterwards decides to download the file. After bug 384247 is fixed, this will work for HTTP downloads.
Does the patch allow the user to override the hash failure? Alternatively, does the patch a way to turn off this feature?
(In reply to comment #31) > Does the patch allow the user to override the hash failure? Alternatively, > does the patch a way to turn off this feature? Nope to both. Long answer/question: What do you mean by override? Something in the download manager UI to recover a download that was deleted because the link fingerprint mismatched? We don't want to endanger/confuse the average user, so no. If you really wanted to, you can take the download link and strip off the link fingerprint and try again. What part of "this feature" to turn off? The file removal or the alerting the user that the file was corrupted?
Comment on attachment 268180 [details] [diff] [review] v1 >+linkFingerprintError=%S could not be saved because the file has been corrupted.\n\nPlease contact the site administrator. That's pretty vague. How would the site admin know what's wrong if you don't mention that the downloaded file doesn't match the link fingerprint?
(In reply to comment #33) > (From update of attachment 268180 [details] [diff] [review]) > >+linkFingerprintError=%S could not be saved because the file has been corrupted.\n\nPlease contact the site administrator. > That's pretty vague. How would the site admin know what's wrong if you don't > mention that the downloaded file doesn't match the link fingerprint? As the administrator, I'd even assume that the user has a local problem, given that it's usually possible to save corrupted files. I suggest this: %S was not saved because it appears to be corrupted.\n\nPlease contact the site owner.
Steffen has a good point, although of course we can improve the error messages at any time. I also think that the admin will want to know where the source of the link is, so that he can go and fix it (if it's actually the link rather than the file that's broken, and it's in his control). So we should print the Referer. My attempt: %S was not saved because the file appears to be corrupted.\n\nPlease contact the owner of the <a href="">page you came from</a>, telling them: Target: %S Referer: %S Of course, it depends how this info is presented. If it's in a modal dialog box, links aren't going to be possible. And we should make the info copy-and-pasteable. Gerv Gerv
Created attachment 269538 [details] [diff] [review] v2 - Fix nits (comments, void) - Error message: %S was not saved because the file appears to be corrupted.
Edward, any progress here or gets your patch lost?
This will not be making Firefox 3
Bug 377245 got WONTFIXED, so reassigning to defaults.
Bump. What if we extend this conversation to include handling of file.ext.hash download links in comparison to recently downloaded files. For instance, FileZilla Server offers two downloads next to each other. http://filezilla-project.org/download.php?type=server Eg: FileZilla_Server-0_9_39.exe FileZilla_Server-0_9_39.exe.sha512 Clicking the second link could automatically Validate the first download, or bring up the Download Options dialog. I understand this task can be performed by an external process, but there's no reason it couldn't/shouldn't be integrated in Firefox, which already knows these cryptographic hashes.
MirrorBrain, which serves the FSF, OpenOffice.org, LibreOffice, openSUSE, etc mirrors, supports file.ext.hash as well. Another alternative would be to support Instance Digests (RFC 3230), which give the hash in an HTTP header field. This also integrates with Metalink (RFC 6249), for providing mirrors in HTTP headers.
One slight question though; shouldn't hash data be transmitted via HTTPS if the purpose is to avoid MITM attacks? I can understand the binary downloads themselves transmitted via insecure HTTP due to size/cpu resources, but the hash needs to be signed and protected in some manner otherwise it can be easily spoofed by an attacker.