Open Bug 671093 Opened 13 years ago Updated 2 years ago

Allow sites to specify a resource's hash in addition to its source, so the browser can skip downloading the resource if it already has a copy

Categories

(Core :: DOM: Core & HTML, defect, P5)

defect

Tracking

()

People

(Reporter: justin.lebar+bug, Unassigned)

Details

jst mentioned this to me earlier today.

The main use-case is for javascript libraries.  A site could say "I want the script with this SHA-256 hash, and if you can't find it in your cache, here's where you can download it from."

Here are my initial thoughts about this:

* I think we'd probably want sites to opt in to sharing a script.  Otherwise, evil.com might be able to tell if I've visited bank.com recently by asking for a script whose hash matches one on bank.com.

This does limit the immediate usefulness of this feature, since I can't specify a hash on my one site and expect to go faster due to scripts downloaded from other sites, but I don't see a way around this.

* You'd want the opt-in mechanism to be something more flexible than "add a same-origin script and specify its hash", because some scripts (e.g. Google Analytics) are rarely or never retrieved from a same-origin page.  I'm not sure what's the right way to do this.

* It's not clear if this would be useful for resources other than scripts.

* I don't think we should hard-code in a specific hash algorithm.  Some years ago, we would have chosen MD5 or SHA-1, but neither of those is a secure choice now.  Sites should have to explicitly state what algorithm they used, and we should reserve the right to deprecate (and resultantly ignore) algorithms in the future.

* We should also keep the list of accepted algorithms short (length 1?), since the more algorithms we accept, the more likely it is that two sites will specify the same resource but use different hash algorithms, causing an unnecessary miss.

Jonas, Chris, and Brendan, jst mentioned you'd already thought about this some.  What are your ideas?
In addition to the concerns you've raised, I'm concerned about cache poisoning. I.e. evil.com crafting a script with the same hash as one used by a specificc version jQuery.

Hash collisions has happened in the past, though I think only when the attacker has been able to control both pieces of content. And possibly only on weaker algorithms.
(In reply to comment #1)
> In addition to the concerns you've raised, I'm concerned about cache
> poisoning. I.e. evil.com crafting a script with the same hash as one used by
> a specificc version jQuery.
> 
> Hash collisions has happened in the past, though I think only when the
> attacker has been able to control both pieces of content. And possibly only
> on weaker algorithms.

I think we'll just have to pick a strong hash algorithm.  If you can generate collisions on secure hash functions, I think you can forge signatures on SSL certs, so we're kind of screwed anyway.

In the past, there's been plenty of warning before a collision is found in a hash algorithm.  If we were to pick SHA-256, I imagine we'd be safe for at least a few years after SHA-3 is done.
cc'ing security folks to get their thoughts on this.
> * I think we'd probably want sites to opt in to sharing a script. 
> Otherwise, evil.com might be able to tell if I've visited bank.com recently
> by asking for a script whose hash matches one on bank.com.

There are plenty of ways an attacker can use the cache to find out whether you've visited bank.com recently. (Maybe we'll solve those eventually, I don't know.)

What's different here is that the attacker can query based on the *contents* of the resource, not just whether it is cached.  I guess that's a problem with link fingerprints (bug 292481) too!

> * You'd want the opt-in mechanism to be something more flexible than "add a
> same-origin script and specify its hash", because some scripts (e.g. Google
> Analytics) are rarely or never retrieved from a same-origin page.  I'm not
> sure what's the right way to do this.

Perhaps "Cache-control: public" or "Cache-control: public hash".

Another possibility is to use a special URL scheme. Then a site "opts in" by responding with 200 OK and content that matches the hash.  And we don't need a new HTML attribute to specify the hash, because it's part of the URL.

http://example.com/.well-known/sha-256/da39a3ee5e6b4b0d3255bfef95601890afd80709

When we request such a URL, we could omit our Accept and Accept-Language headers, saving a few bytes.

Certain headers, such as content-type and content-disposition, would be considered part of the "content" that gets hashed.

> * It's not clear if this would be useful for resources other than scripts.

I could imagine it being used for fonts.  Fonts are referenced from CSS, rather than HTML elements you can stick an attribute on, so they would be happy with the hash-in-URL scheme.
(In reply to comment #4)
> What's different here is that the attacker can query based on the *contents*
> of the resource, not just whether it is cached.  I guess that's a problem
> with link fingerprints (bug 292481) too!

Yes, this is the main reason that it must be opted into by the hoster of the file.

> http://example.com/.well-known/sha-256/da39a3ee5e6b4b0d3255bfef95601890afd80709

I think security people would like this more than real web content authors. I imagine content authors would like to be able to tell this is JQuery from the URL.

> I could imagine it being used for fonts.  Fonts are referenced from CSS,
> rather than HTML elements you can stick an attribute on, so they would be
> happy with the hash-in-URL scheme.

Also images like the RSS icon.
> http://example.com/.well-known/sha-256/da39a3ee5e6b4b0d3255bfef95601890afd80709

It also seems easier to modify Apache to send a new header than to get it to understand this scheme.
With Sid Stamm and Ben Adida we already started to work in this direction and have a short description of the feature here : https://wiki.mozilla.org/Security/Features/Content_Hashing

If the project can get more traction that will be great :)
The feature page currently has two use-cases.  The first is the one we all agree on, for speeding up downloads of your favorite JS library.  The second is: 

> Untrusted CDN - a site may use a content distribution network (CDN) to serve 
> the majority of their site (images, stylesheets, scripts) but may not want to 
> rely on them to serve the right files. The site can serve the document and 
> reference CDN-hosted content with hashed references to provide an integrity 
> check.

I'm curious how important we think this is, in comparison to the first use case.  There's likely to be at least some complexity involved in supporting it, since we'd have to leave the downloaded content unused until we finish downloading it and can compute its hash.  Normally we try to use bits we've downloaded as soon as they arrive (for instance, we'll incrementally display an image).
For caching what I think a key problem is that if it is okay when a hash is specified to re-use the data that was cached on another website that have the same hash and dis regard the same origin policy ? This can potentially lead to privacy leak.

The untrusted CDN case is a very important: it will allows website such as eBay to defeat image swapping attack (the image is replaced by another after the approval) . Gravatar already use some sort of hashing to fight this kind of problem.

This is also important to fight malware distribution via Ads network because it will allows ads company  to "freeze" the content of the ads they are serving while leaving their clients in control
The "second use case" is similar to bug 292481.  I think it's more important, since it would allow sites like mozilla.com to make software downloads secure.  The other one just lets sites make things faster ;)
After digging around it seems that we have a bug duplicate. Bug 292481 - (link-fingerprints) Support link fingerprints for downloads (file checksum/hash in href attribute) 

Is someone able to merge the two ?  

A good news on our side: we have a working prototype of Firefox which support hash as an attribute :) We will run performance tests in the upcoming weeks
This may be the key to finally getting Amazon.com, etc to 
provide trustable security against firesheep, etc. - 

IF this can be used to validate mixed content as secure - 

If ALL images, js, etc loaded from http://images.amazon.com 
have been validated by hashes on the main page from https://amazon.com,

_and_ ALL cookies from https://amazon.com have the secure bit set,

supress the mixed-content warning! 

We will need a new lock or ~ to indicate "secure but NOT totally private".
Any ideas?
(maybe also add a pref for a notice-bar that your privacy is not guaranteed)

1. No needless encrypting of content 10% of the planet already has in their cache
     (like all static, non-stock images on etrade.com)
2. Easy deduplication of content from images[0-9].site.com
3. User can choose privacy or speed - 
     If you value your privacy, use https-everywhere or ~ to get secure images.
     If you cache secure content, 
       you still get secure CDN deduplication by the hash - AND faster cache - 
       no secure-conn setup, etc. to the CDN to check for changes!
-or - 
     Choose speed - 
        We never need to burden a server with setting up more than 1 
        secure connection per session - Secure pages can get static content 
        (90%+?) from server-side or CDN caches.
        (Image/CDN Servers could even be set to prioritize http over https)

We definitely need to save the origin with the hash - 
perhaps we need something like CSP or Origin-headers that list servers that
can serve content for this page, cache the list, and save a link to that list
with the hashes - so evil.com's link to images.amazon.com never sees anything
fetched by any amazon.com page as cached, much less hashed.

But we already need that, ever since the add-on safecache was abandoned.
( https://addons.mozilla.org/en-US/firefox/addon/safecache/?src=search )

Hashes: 
Unless you are Amazon or ~ fortune 500 .com, 
sha1 is usually secure enough - for 5 years or so (set the expire).
If one could steal $10M from your customers w/o them noticing, use sha256.
If you have secret gov't contracts, or serve your IP crown jewels, 
don't use hashes.
https://bugzilla.mozilla.org/show_bug.cgi?id=1472046

Move all DOM bugs that haven’t been updated in more than 3 years and has no one currently assigned to P5.

If you have questions, please contact :mdaly.
Priority: -- → P5
Component: DOM → DOM: Core & HTML
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.