Open Bug 292481 (link-fingerprints) Opened 19 years ago Updated 2 years ago

Support link fingerprints for downloads (file checksum/hash in href attribute)

Categories

(Toolkit :: Downloads API, enhancement)

x86
All
enhancement

Tracking

()

People

(Reporter: jens.b, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: sec-want, Whiteboard: [sg:want])

Attachments

(1 file, 1 obsolete file)

from http://www.gerv.net/security/link-fingerprints/ :
> This is a method of making an HTTP URL reference not only a
> particular resource, but a particular version of that resource,
> in a way that the fetching client can validate.

I think this should be implemented by Firefox. It's a nice, unobtrusive feature
that could be of real benefit for downloading files.

This bug is intended for discussing which hash algorithms to support, how the
user interface should indicate a "checked" download, and eventually for tracking
the implementation.

As MD5 is most likely one of the algorithms to be provided, adding dependency on
bug 92054.
Alias: link-fingerprints
Blocks: 101743
see also bug 292368
Depends on: 292368
Removing dependency on bug 92054 - link fingerprints need multiple algorithms
(not only MD5), so the generic solution of bug 292368 is more appropriate.
No longer depends on: 92054
Related to bug 292481 (Content-MD5 header).

Adding the checksum to the link allows a website to store its data on remote
mirrors (like the mozilla.org downoad servers), and still be sure that the
content wasn't altered. A Content-MD5 attack would protect against a MitM
attack, where the content was altered on the fly.

Ofcourse, a Context-MD5 header can also be faked or modified by a MitM attacker,
I'm not claiming it's the solution for all phishing problems. It's just an extra
check that we need to implement.
This bug does not block bug 101743; that bug could be implemented without this
one. This bug also has nothing to do with Content-MD5, which protects against
corruption in transmission (something entirely different) and again could be
implemented independently of this one.

Gerv
No longer blocks: 101743
The problem is that the ASCII HTML file might be altered as it is uploaded from
the author's computer to the Web server.  The most common alteration is changing
the PC end-of-line CR-LF (x0D0A) to the UNIX end-of-line LF (x0A).  While this
might be reversed when downloading the file via FTP, Mozilla and Firefox do not
reverse the alteration when downloading into a browser window.  I reported this
as bug 211130, which was closed as a duplicate of bug 38121 (which does not
clearly indicate the impact on verifying Web pages against fingerprints).    

Other alterations are somtimes added by Web servers.  HTML files at Geocities
had scripts added to bring up advertisements or at least a Geocities watermark,
neither of which were intended by the page authors.  

Such alterations invalidate any checksum used as fingerprints.  

I did some extensive testing about two years ago on the concept of using PGP for
digitally signing Web pages.  While a binary file -- transferred from one host
to another without any alterations -- can be signed and then later verified, an
ASCII file cannot unless the alterations are exactly reversed.  Note that
digitally signing an ASCII E-mail message works because (1) the signature
applies only to a section (the message block) within the file and not to the
entire file and (2) the verification process then takes into account the
alterations (per RFC 2440).  

In the two years since I did that testing, I have come to the conclusion that
verifying Web pages against checksums or digital signatures will not work in a
global context of varying hardware hosts, operating systems, and browsers.  
(In reply to comment #5)
> In the two years since I did that testing, I have come to the conclusion that
> verifying Web pages against checksums or digital signatures will not work in a
> global context of varying hardware hosts, operating systems, and browsers.  

The concept of link fingerprints is intended for binary downloads. If it would
be extended to web pages, we would have to alter/override Gecko's behaviour for
named anchors - for binary downloads, though, the fragment identifier currently
has no defined meaning, and can be used without causing any conflicts or
"re-definitions".
How would this be restricted to binary downloads?  

Note that Web pages are not the only ASCII files that might be targeted by a URL
in the HREF attribute of an anchor.  The problem I describe in comment #5
applies to all ASCII files, not just HTML files.  

As an example of that problem, I took the MD5 hash of five versions of one of my
own Web pages that I know to be non-dynamic: selecting the link and downloading
it via "Save link target as", loading the page and saving it via "Save page as",
the working copy on my local hard drive, downloading via WS_FTP95 LE, and
downloading via DOS FTP.  The last three all gave the same hash.  The first two
each gave a different hash.  I got similar results for a TXT file that is on my
Web server, except the first two versions gave the same hash (different from the
last three).  I am confident similar results would be obtained for a CSS file.  

I do use MD5 hashes to verify the integrity of downloaded binary files.  This
includes downloads of new virus definition data packages and downloads of new
versions of Mozilla Suite.  However, I prefer getting the file's original hash
from an ASCII file or visually on the Web page.  Currently, I compute the hash
for the downloaded file via an MD5 application; I would prefer to see bug
#101743 implemented to provide this automatically for the downloaded file.  
(In reply to comment #7)
> How would this be restricted to binary downloads?

It would not be restricted by technical measures.

Basically, you are saying that hashing doesn't work for plain text files - the
reasons for this may be browsers adding information to saved files (in case of
"Save page as..."), charset issues or using the FTP client's ASCII mode when
uploading the file.

IMO, this is no argument against implementing link fingerprints - after all, the
primary goal of the spec is to provide a way to automate hash verification for
downloaded *software*. Perhaps this should just be made clearer in the spec, so
no one tries to use it for plain text files.
Summary: Support link fingerprints (file checksum in href attribute) → Support link fingerprints for downloads (file checksum in href attribute)
Have redirects been considered? IMHO hashes should only be considered "trusted"
if they came from a link that was expictly clicked on by a user or triggered by
a trusted code eg chrome.  

Either redirects should be dissallowed or the orginal hash must be compared
against the final file, ignoring any new hashes which may have been presented
during the redirect chain. 

A small fear I have is the people will believe that this is a replacement for
signed binaries. It is not. It doesn't do anything to stop MITM attacks (unless
used along with SSL), or hacked servers which supply the orginal link.
There's currently a link fingerprint implementation in MDHashTool:
http://mdhashtool.mozdev.org/ . I guess we can use this extension to work out
some of the practical issues involving link fingerprints. For example, as Chris
says, we need to keep the original fingerprint during any redirects.

Gerv
*** Bug 330315 has been marked as a duplicate of this bug. ***
GetRight 6 ( http://www.getright.com/ ), a popular Windows download manager supports link fingerprints.

So does FlashGot ( http://www.flashgot.net/ ), a Firefox extension that works with around 30 download managers. If the download manager supports checksums (GetRight, Freedownloadmanagaer, iGetter, probably more) then FlashGot could probably be modified to pass the checksum info to them.
How would the hash be added to the link within the constraints of maintaining compliance with RFCs and W3C specifications?  Would this require an anchor attribute that is non-standard?  Or would it require a non-standard suffix to the URI?  Remember, a significant claim for Mozilla products is that they are standards-compliant.  Further, would a Web page with such a link validate as compliant with HTML or XHTML specifications?  

How would this protect against an attack that alters both the file to be downloaded and the hash in the link?  If a hacker can change the file, changing the link should be trivial.  

See also my comment (28) about digestIT in bug #101743.  
(In reply to comment #13)
> How would the hash be added to the link within the constraints of maintaining
> compliance with RFCs and W3C specifications? Would this require an anchor
> attribute that is non-standard?  Or would it require a non-standard suffix to
> the URI?

How the user agent interprets the fragment identifier (the thing after the # sign) is not standardized, except when the link target is a HTML/XHTML document. I do not see any spec violation in using it for linking to binary files.

> Further, would a Web page with such a link validate as
> compliant with HTML or XHTML specifications?  

Why should it not? It is a completely valid URI to a resource on some server, placed in the 'href' attribute of the 'a' tag in HTML/XHTML. What it links to (or whether it exists) is not relevant for the validity of the web page.

> How would this protect against an attack that alters both the file to be
> downloaded and the hash in the link?

It does not, and does not need to. Remember, if a hacker can alter the linking web page, he could also change the MD5 checksums that are placed beside many download links nowadays.

IMO link fingerprints are not primarily about enhanced security (that's what SSL, among others, is for) - they are about improved convenience, because the browser automatically validates files with checksums published on web pages, instead of a power-user doing it manually.

> If a hacker can change the file, changing the link should be trivial.

Of course, but most downloads are served from different hosts than the web page - think of mirror networks.
They are partly about improving security - it's a bar-raising exercise.

Currently, to get people to download a trojaned binary, an attacker has to hack a single server in your mirror network. Your mirror network may not be under your control - this is often the case for free software projects - and its security is unknown.

With link fingerprints, they also have to hack your main webserver and change the checksum to match (in which case, it won't match any copies of the original elsewhere on the network, if any). This is quite a lot harder - it usually means hacking two machines rather than one.

In addition, often URLs to security updates are distributed through email. You can't break into everyone's mailboxes and change the email to match your trojan.

So it is partly about security. I don't claim it solves every problem ever, but it makes things more difficult for an attacker, with the great bonus that it's completely transparent and backwardly-compatible. Worse is better.

Gerv
(In reply to comment #15)
> In addition, often URLs to security updates are distributed through email. You
> can't break into everyone's mailboxes and change the email to match your
> trojan.

Clicking a URL to something called a security update in an email doesn't sound like a particularly good idea...
I guess unless you are pretty sure that the domain is right and there is no second IDN-like attack.
Companies like Red Hat send out security notification emails all the time. Example:
https://www.redhat.com/archives/enterprise-watch-list/2006-June/msg00007.html

If link fingerprints were implemented, the checksums present in that email could be integrated into the links and would be automatically checked for you by your browser.

Gerv
Given repeated warnings about phishing attacks and viruses, I am very leery about downloading software from an E-mail link.  
Red Hat sign their security emails, and your mail client can verify the signature automatically.

But we're getting off the point here - the idea is not necessarily to establish a watertight chain of trust all the way to the file, the idea is to make crackers lives harder in a backwardly-compatible way.

One way of measuring this would be too look at the various trojaning incidents over the past few years and see if any would have been averted, or more easily spotted, if link fingerprints were in use.

Gerv
Summary: Support link fingerprints for downloads (file checksum in href attribute) → Support link fingerprints for downloads (file checksum/hash in href attribute)
I use MD5 and SHA1 hashes to verify the integrity of the transfer of large files across a network.  In this case, I am not concerned about a hostile attack.  I'm merely concerned that the transfer did not corrupt the file.  Yes, I do occasionally find that a file was indeed corrupted and must be transferred again.  

However, both MD5 and SHA1 are now known to contain vulnerabilities.  While both are still valid for the use I indicate above, their continued use to verify a lack of hostile attacks on files is now questionable.  For example, see <http://www.mccune.cc/PGPpage2.htm> and search for the terms "MD5" and "SHA1".  

Should Firefox be tied to a particular hash function by this RFE?  Will security fixes be released if that function proves ineffective for protecting against hostile attacks?  In that case, what happens to the Web pages that already implement the ineffective function?  

Somehow, I think the use of SSL and X.509 certificates -- existing capabilities -- is the proper way to handle downloads where there is concern for hostile attacks.  Anything beyond that should be handled outside of browsers and E-mail clients.  
> However, both MD5 and SHA1 are now known to contain vulnerabilities.

I think SHA-256 is still believed to be safe.

> Should Firefox be tied to a particular hash function by this RFE?  Will
> security fixes be released if that function proves ineffective for protecting
> against hostile attacks?

With MD5, I believe it is currently fairly easy to create two colliding chunks but not yet easy to make a file that hashes to a given value.  This isn't a big deal for verifying official software downloads, but history suggests that it will be more broken soon.  So when we add this feature, it shouldn't support MD5.

If we ship with SHA-256 support and someone breaks SHA-256 in 2016, then depending on how broken it is, we can refuse to follow the link, make the UI look "slightly less secure", or make no changes.
> Somehow, I think the use of SSL and X.509 certificates -- existing capabilities
> -- is the proper way to handle downloads where there is concern for hostile
> attacks.

When I filed bug 358384, I was initially inclined to agree with you.  But justdave made it clear in that bug that upgrading our (volunteer!) download mirror network to support https would be a major headache compared to making mozilla.com use https and provide a hash attribute with each download link.

One-to-one encryption is a lot of overhead if you use it for an entire 5MB download: each mirror has to have a certificate and the processing power to support SSL for a bunch of concurrent downloads.  If a single mirror is hacked, SSL doesn't protect users who hit that mirror.

I imagine that other software providers (especially OSS and freeware providers relying on volunteer mirrors but also large software providers such as Skype and Microsoft) face similar issues and would prefer the "https site with download link that has a hash" approach to the "https for entire download" approach.
If I understand http://en.wikipedia.org/wiki/Merkle-Damgard_hash_function
correctly, it should be possible to compute the SHA-256 of a file incrementally
while downloading it.   So there need not be a separate "verification" step
that makes these downloads significantly slower :)
(In reply to comment #24)
> If I understand http://en.wikipedia.org/wiki/Merkle-Damgard_hash_function
> correctly, it should be possible to compute the SHA-256 of a file incrementally
> while downloading it.   So there need not be a separate "verification" step
> that makes these downloads significantly slower :)
> 

Yes, all hash functions work incrementally, and can be down while still downloading. Otherwise, verifying a 300 MB download would take a real long time !
> Should Firefox be tied to a particular hash function by this RFE?

Absolutely. If we don't define a limited set of hash functions, then interoperability is greatly damaged. 

The logic is: we have MD5 because it's short, and convenient for non-security applications, and SHA-256 because it's the best widely-used algorithm currently available, for use for security-sensitive applications. Perhaps that logic is bad because people might use MD5 for security-sensitive applications anyway. But the hashes from SHA256 are so long as to be unwieldy, if for example you are sending a number of the URLs in an email. It's a hard trade-off.

Computing the hash incrementally during the download is a very sensible idea.

Gerv
Whiteboard: [sg:want]
(In reply to comment #15)
> They are partly about improving security - it's a bar-raising exercise.
> 
> Currently, to get people to download a trojaned binary, an attacker has to hack
> a single server in your mirror network. Your mirror network may not be under
> your control - this is often the case for free software projects - and its
> security is unknown.

How should errors such as 404 be handled? Should the returned page be checked against the checksum? If not, a hacker could make the hacked server return a 404 error to trick the browser into aborting the checksum calculation. Then in the 404's html page they make it look like a redirection page with another link to a trojan without the original checksum.
Hmm. So you are saying that the 404 would be returned, but would have a <meta refresh=0> or something to direct to a trojaned copy of the download. 

We could deal with this in one of several ways:

- Have an explicit "failure" notification, which would pop up when the 404 failed to match the checksum
- Have an explicit "success" notification, which would _not_ pop up when the trojan downloaded, because the checksum is no longer active.

I prefer the former. You are right: the checksum should be applied to whatever content gets returned, and an error explicitly shown if it fails.

Gerv
Depends on: 377245
Attached patch v1 (obsolete) — Splinter Review
Assignee: nobody → edilee
Status: NEW → ASSIGNED
Attachment #268180 - Flags: review?(cbiesinger)
patch comments:

Look for the Link Fingerprint error status and delete the file from the download manager. This works for downloads explicitly started from alt-click or save link as.

Exthandler will check for the error status for pages that start loading and then afterwards decides to download the file.

After bug 384247 is fixed, this will work for HTTP downloads.
Blocks: 377245
Depends on: 384246
No longer depends on: 377245
Depends on: 383716
Does the patch allow the user to override the hash failure?  Alternatively, does the patch a way to turn off this feature?  
(In reply to comment #31)
> Does the patch allow the user to override the hash failure?  Alternatively,
> does the patch a way to turn off this feature?  

Nope to both.

Long answer/question: What do you mean by override? Something in the download manager UI to recover a download that was deleted because the link fingerprint mismatched? We don't want to endanger/confuse the average user, so no. If you really wanted to, you can take the download link and strip off the link fingerprint and try again.

What part of "this feature" to turn off? The file removal or the alerting the user that the file was corrupted?

Comment on attachment 268180 [details] [diff] [review]
v1

>+linkFingerprintError=%S could not be saved because the file has been corrupted.\n\nPlease contact the site administrator.
That's pretty vague. How would the site admin know what's wrong if you don't mention that the downloaded file doesn't match the link fingerprint?
(In reply to comment #33)
> (From update of attachment 268180 [details] [diff] [review])
> >+linkFingerprintError=%S could not be saved because the file has been corrupted.\n\nPlease contact the site administrator.
> That's pretty vague. How would the site admin know what's wrong if you don't
> mention that the downloaded file doesn't match the link fingerprint?

As the administrator, I'd even assume that the user has a local problem, given that it's usually possible to save corrupted files.

I suggest this:
%S was not saved because it appears to be corrupted.\n\nPlease contact the site owner.
Steffen has a good point, although of course we can improve the error messages at any time. I also think that the admin will want to know where the source of the link is, so that he can go and fix it (if it's actually the link rather than the file that's broken, and it's in his control). So we should print the Referer. My attempt:

%S was not saved because the file appears to be corrupted.\n\nPlease contact the owner of the <a href="">page you came from</a>, telling them:
Target: %S
Referer: %S

Of course, it depends how this info is presented. If it's in a modal dialog box, links aren't going to be possible. And we should make the info copy-and-pasteable.

Gerv


Gerv

Attached patch v2Splinter Review
- Fix nits (comments, void)
- Error message: %S was not saved because the file appears to be corrupted.
Attachment #268180 - Attachment is obsolete: true
Attachment #268180 - Flags: review?(cbiesinger)
Edward, any progress here or gets your patch lost?
This will not be making Firefox 3
Bug 377245 got WONTFIXED, so reassigning to defaults.
Assignee: edilee → nobody
Status: ASSIGNED → NEW
Product: Firefox → Toolkit
Bump.

What if we extend this conversation to include handling of file.ext.hash download links in comparison to recently downloaded files.  For instance, FileZilla Server offers two downloads next to each other.

http://filezilla-project.org/download.php?type=server

Eg:

FileZilla_Server-0_9_39.exe
FileZilla_Server-0_9_39.exe.sha512

Clicking the second link could automatically Validate the first download, or bring up the Download Options dialog.

I understand this task can be performed by an external process, but there's no reason it couldn't/shouldn't be integrated in Firefox, which already knows these cryptographic hashes.
MirrorBrain, which serves the FSF, OpenOffice.org, LibreOffice, openSUSE, etc mirrors, supports file.ext.hash as well.

Another alternative would be to support Instance Digests (RFC 3230), which give the hash in an HTTP header field. This also integrates with Metalink (RFC 6249), for providing mirrors in HTTP headers.
One slight question though; shouldn't hash data be transmitted via HTTPS if the purpose is to avoid MITM attacks?  I can understand the binary downloads themselves transmitted via insecure HTTP due to size/cpu resources, but the hash needs to be signed and protected in some manner otherwise it can be easily spoofed by an attacker.
See Also: → 992096
For distribution of open-source software using mirrors this would be very helpful since we can't completely trust the mirrors (even if they are serving files over HTTPS). Obviously this would require HTTPS on the origin website to be effective but not necessarily on the target. 

I know it's a 12 years old issue but it would solve many problems related to trust between an editor website and the file served by a different host that may or may not be trusted.
+1. A good additional layer to other controls.
just out of curiosity, what is the hold up for committing the patch and close this bug?
Blocks: 1565128

Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: