Tracking bug for SoC project
Bug 292481 is for link fingerprints in the context of just downloads.
Link fingerprints could possibly be supported at a lower level (network) to inherently verify embedded images, scripts, objects, css (and backgrounds referenced from those css files).
Basically, the network layer would need to pretend that it has no data to the higher layers until it finishes receiving the whole file and verifies the link fingerprint. If the file isn't as expected, the higher level would see the file as "not found".
However, that behavior isn't entirely desirable for file downloads. Users will want to see the download progress and probably be informed that the file probably isn't what the provider wanted.
(The network /could/ make the data available even after failing a check because we've already spent all that bandwidth transferring the file, and an advanced user might just end up requesting the file again after manually removing the link fingerprint.)
(In reply to comment #1)
> Basically, the network layer would need to pretend that it has no data to the
> higher layers until it finishes receiving the whole file and verifies the link
> fingerprint. If the file isn't as expected, the higher level would see the file
> as "not found".
In that approach, how would you deal with things like progressive JPEG encoding, where you already see something before the whole image is loaded?
(btw, tracking bugs should depend on other bugs, not block them: this bug blocks bug 377241, but should depend on bug 292481)
(In reply to comment #2)
> how would you deal with things like progressive JPEG
That was part of the reasoning to implement this on a lower level. Make the request "all or nothing". This prevents partial viewing of an image that might have been compromised with an obscene one. The browser makes a request for the image like normal, and the only difference is that it seems like the network has high latency but also super high bandwidth. (The file is only available after the whole thing transfers.) But again, that might not be entirely desirable from a UI perspective..
One of the first steps is to figure out is if this approach of modifying the network is even appropriate. cbiesinger, darin.moz, bzbarsky (a few names that I found on patch reviews when researching for the SoC application) should be familiar with necko and could comment on the feasibility of doing this. Otherwise, we'll go back to the original plan of just supporting downloads as the first step.
> should depend on bug 292481
I figured if this is implemented as superset, the depends is the other way. But I suppose that this /is/ a tracking bug.
Edward: I would suggest that a newsgroup is a more appropriate forums for design discussions than this bug, which only has a few people reading it. Where would you like to start? mozilla.dev.tech.network? Or perhaps mozilla.dev.apps.firefox?
It seems to me that the key benefit of Link Fingerprints is avoiding (or noticing more quickly) trojaned downloads. So if implementing at the network layer actually made it _less_ useful for this, I'd be concerned. I'm not all that worried about changed images. Link Fingerprints protects you best when the link and the target file are on separate servers or in different locations (otherwise a hack can compromise them both). So validating the component parts of a website isn't a priority use-case.
(In reply to comment #4)
> So if implementing at the network layer actually made it
> _less_ useful for this, I'd be concerned.
Right, but the network could also provide a progress status to the higher layer as well which will require modifications to both layers. The alternative is hand-picking certain URI requesters (e.g., downloads) and locally make its own buffer before writing to the download destination. I'll ask about these networking things on .tech.network.
> validating the component parts of a website isn't a priority use-case.
Not all components need to be from the same location. While hot-linking images is generally bad, some places provide it as a service (e.g., flickr, photobucket, imageshack). Displaying the wrong image typically is safe from the computer's point of view, but there /have/ been exploits of buffer overrun type bugs in the past.
I believe extension downloading doesn't use the same code as file downloading at the higher end, but link fingerprints could help allow mirrors for a.m.o. A similar line is Greasemonkey scripts that are downloaded/installed just by viewing a script file in-page. Both of these allow somewhat privileged code to run in Firefox - could be bad if hijacked.
Just a couple more examples.. object files can potentially be anything, so problems with Flash (used by so many video hosting sites that want people to embed the content) could avoid trojan-like behavior. And if someone hacks googlesyndication and changes the page ads script to have some expoit.... brendan is there to make sure there isn't one to begin with! :)
[Code] Last couple weeks I started playing with Mozilla C XPCOM and did a quick implementation of Link Fingerprints for the Download Manager, which will help me implement Link Fingerprints at the network level (i.e., Channels). I'll finish up the initial Channel implementation by conditionally deciding to do Link Fingerprints and compare hashes to conditionally fail the transfer.
[RFCs] Looked into existing RFCs and internet drafts to see possibles scopes for our RFCs (a general extension to #fragment-ids (e.g., #!type!data) and Link Fingerprints). URI general syntax says the semantics of #fragment-ids are per MIME-type (e.g., application/gzip or text/plain (which has an internet draft requesting new #fragment-id functionality)), so right now, things seem tricky to request an all-encompassing RFC.
Last 2 weeks (2007/05/13 - 2007/05/26):
- Graduate 5-yr MS/BS in CS from UIUC and fly home to California bay area
- Settle in at Mozilla Mountain View, discuss implementation with Dan Veditz
- Investigate RFCs and discuss on m.d.a.firefox
- Research Mozilla C codebase to learn XPCOM, pointers, interfaces, fun
- Practice what I learned by partially implementing Link Fingerprints in Download Manager
- Begin implementing Link Fingerprints (as a stream converter) in the network to handle all requests and not just downloads (currently it prints the md5 hash of all HTTP requests)
This week (2007/05/27 - 2007/06/02):
- Start (officially) Summer of Code (2005/05/28)
- Communicate with DownThemAll developer (Nils Maier) to see what we can share
- Add checks to nsHttpChannel to only do Link Fingerprint stuff if the URI contains a Link Fingerprint-like reference and if it's not a partial Range request
- Actually compare hashes and not just print out the computed hash ;)
- Find Erik Wilde (#fragment-id for text/plain author) to ask about RFC stuff
Created attachment 266400 [details] [diff] [review]
download manager reference C patch - prints hashes of downloads
Quick n dirty partial implementation of Link Fingerprints for the Download Manager. This computes the md5 hash of the file after it finishes downloading and printfs the the text version of the hash.
[Code] Got automatic hash checking working in the channel so it gives an error status on Link Fingerprint failure, so loading bad hash pages results in an error page, and some images (gif, bmp) show up as a broken image.. others like jpg progressively load and show whatever they have on failure (i.e., the whole thing).
[RFCs] Looking into other #fragment-id uses (XML XPointers and PDF) and references from Wilde and Baschnagel's text/plain paper in HT 2005 (Sixteenth ACM Conference on Hypertext and Hypermedia).
Last week (2007/05/27 - 2007/06/02):
- Implement hash comparisons and pass on an error code from OnStopRequest to listeners downstream
- Add a new network error pages for Link Fingerprint failures on page view (need a better string..)
- Discuss implementation details on m.d.t.network clarifying my current implementation just provides a new error code after the transfer finishes
- Attend Google Dev Day (Thursday)
- Invite Erik Wilde to discuss RFC stuff and Link Fingerprint issues
- Begin handling of Link Fingerprint failures in the consumers like download manager (vs webbrowserpersist/exthandler)
This week (2007/06/03 - 2007/06/09):
- Remove/delete failed downloads cleanly from the download manager
- Refactor added code in HttpChannel so adding Link Fingerprints to other channels is clean/simple
- Look into the image library to handle Link Fingerprint transfer failures
- Figure out what interfaces to provide (for extensions) e.g., exist?/get fingerprint from URI, check fingerprint against file/stream
- Note: Next week I'll be out - attending FCRC 2007 in San Diego 9th to 13th
[Code] Things are working fine for me. Waiting for reviews.
[RFCs] Contacting people on how to draft Link Fingerprints for IETF's July 2nd meeting.
Last 2 weeks (2007/06/03 - 2007/06/16):
- Done initial coding to support Link Fingerprints for HTTP downloads/pages
- Open various bugs to break the patch into pieces - waiting for reviews.
- (Submit to MICRO; Attend FCRC for ISCA, PLDI, HOPL - majority of last 2 weeks)
- Fix related download manager bugs (and other random bugs..)
This week (2007/06/17 - 2007/06/23):
- Draft Link Fingerprints.
- Write. Write. Write.
- Contact Borden and St.Laurent about their type-independent #fragment-id
Created attachment 269462 [details]
generate sha256 hashes
Created attachment 269463 [details]
testing link fingerprints
You know.. it's kinda tricky getting sha256 hashes. ;)
Test page has 2 links to text pages, an embedded image, and a pdf that should open with an external app (same pdf link can be used for save as).
FYI, some alternatives for sha256; from our Link Fingerprints page:
* On *nix there is md5sum, sha1sum and (newer distros) sha256sum.
* OpenSSL (available for almost all platforms incl. Windows) is capable of producing hashes using "openssl dgst -sha256 < file" (-md5, -sha1)
* GnuPG (available for almost all platforms incl. Windows) will produce hashes using "gpg -print-md sha256 < file" (md5, sha1). However don't forget to remove the spaces from the output ;)
Created attachment 269543 [details] [diff] [review]
combo patch for easier testing/reviewing
This attachment is a combination of several updated patches for what's needed for Link Fingerprints to work for HTTP/FTP downloads, pages, (some) images with the ability to turn on with --enable-link-fingerprints in .mozconfig.
The patch is organized by stream listener bug 384246, configure bug 385599, http bug 384247, ftp bug 385090, downloads bug 383716 and bug 292481, and pages bug 384249.
Stream listener bug 384246:
- General cleanup for nits and coding guidelines.
- Only recognize sha256
- Strict parsing of #fragment-id in getEntryAndHash (so no early failure)
Configure bug 385599:
- Add --enable-link-fingerprints
HTTP bug 384247:
- Add (void) ignoring return of Push
- Only Push #ifdef NECKO_LINK_FINGERPRINTS
FTP bug 385090:
- Same as HTTP
Downloads bug 383716 and bug 292481:
- Fix nits (comments, void)
- Error message: %S was not saved because the file appears to be corrupted.
Pages bug 384249:
- Error message: The page you are trying to view is not shown because it appears to be corrupted.
[RFC] Writing draft for #hash(type:data) syntax with a specific type of sha256 for all mime types.
[Code] Updated code from reviews + strictify syntax and made big patch available in bug 377245. (grab v2 bug 385599 with the #undef if you want to try things out)
Last week (2007/06/17 - 2007/06/23):
- Get code reviewed and update from comments
- Add flag to turn on/off (bug 385599)
- (Interview for platform internship)
- Not get responses from various people about drafts, RFC, IETF
- Outline and draft the internet draft
This week (2007/06/24 - 2007/06/30):
- Continue writing draft and review/submit before July 2nd
- (Start internship?)
[RFC] Submitted draft-lee-uri-linkfingerprints-00.txt . Drumming up support/awareness and getting comments on IETF HTTP-WG and IETF Apps Area lists.
[Code] Implement fail-on-syntax-error parsing, add missing #undef, update style nits.
Last week (2007/06/24 - 2007/06/30):
- Write and submit Link Fingerprints draft
- Update code for minor changes and syntax error failing
- (Look into ActionMonkey, MMgc)
This week (2007/06/24 - 2007/06/30):
- Monitor/respond to comments on IETF mailing lists
Created attachment 270704 [details] [diff] [review]
combo patch v2
Implement fail-on-syntax-error parsing, add missing #undef, update style
Created attachment 270706 [details]
testing link fingerprints (fail on syntax error)
Similar to attachment 269463 [details] except it has #hash() with wrong lengths, bad characters.
Created attachment 270707 [details]
screenshot of testcase (fail on syntax error)
Screenshot of attachment 270706 [details].
Purple links were successfully visited while blue links resulted in "Problem loading page". Notice all the blue links are those that have syntax errors (wrong length, invalid characters, or unsupported hash) or have the wrong hash.
Interesting to note about the gif images is that the middle 2 with syntax errors fail early, so Firefox doesn't know anything about the image, but the last one with the incorrect hash first displays and later is removed, so it retains its size.
[RFC] 69th IETF meeting is happening now (July 22-27), so I might be getting comments about the internet draft I submitted in a bit. The draft has been on their website:
[Code] Potentially switching gears to a Firefox 3 Download Manager implementation instead of a Gecko 1.9 necko implementation. This would mean Firefox 3 checks Link Fingerprints for only file downloads and not web pages/embedded content.
Added a feature plan to the wiki for mconnor to decide if this is okay for Firefox 3:
Comment on attachment 270704 [details] [diff] [review]
combo patch v2
[RFC] Rob Sayre said that responses have been fairly negative or at best non-supportive. Major complaints/concerns stem from misusing the fragment identifier for not specifying a sub-view and trying to standardize metadata in a URI which is supposed to be very general/flexible. Additionally compatibility would be hindered with multiple Link-Fingerprint-#hash()-like syntax; e.g., #hash() + #metalink. There were suggestions for other approaches like adding a hash attribute similar to that proposed by WHATWG . Implementing a non-standard would be tricky without support from other major browsers.
[Code] Mike Connor said that Link Fingerprints shouldn't be part of Firefox 3 download manager. This avoids the problem of people starting to rely on it as a security feature but not being able to continue doing so if it's removed from later versions after it's for sure not standardized or something better comes along. This is in addition to Brendan Eich and Christian Biesinger's concerns about the Necko implementation being too closely integrated for a non-standard feature that affects everything built on top (probably why the build flag was requested).
Writing an extension for Link Fingerprints doesn't seem too useful when there's an existing implementation for DownThemAll! 1 beta  (which Nils has kindly updated to match the new #hash() spec [not sure if the actual changes were from the patch I submitted to them..]). The code that would have been needed would be along the lines of uri.match(/#hash([\w\d]+:[a-f\d]+)/) to get the hash type and checksum then using nsICryptoHash on the downloaded file and removing it if it fails. dTa! does that and so much more (like metalinks too! ;))
On the up side now that I've gone through download manager + related code plenty of times, I'll be able to help sdwilsh with the new download manager. Additionally I can continue to assist brahmana with download resume (another SoC project, bug 377243), so hopefully we can get this in before the end of the summer.
resolve => INVALID/WONTFIX ? (as well as for all dependent Link-Fingerprint-specific bugs, including bug 292481)
Ed: I am working on your final SoC evaluation. Can you please attach your latest patch or patches to this bug, along with a copy of the draft RFC?
Which of the alternative Link Hash proposals would you choose, if you were king for a day?
(In reply to comment #21)
> Which of the alternative Link Hash proposals would you choose, if you were king
> for a day?
Let me answer, although that question was not directed at me.
I see Link-Fingerprints as a part of the URI as the only viable solution.
It is media independent, while the hash attribute/microformat proposals rely on the client understanding at least basic HTML/DOM.
Stuff like Content-MD5 headers doesn't work, because the server will likely build the checksum from a corrupt copy of the payload.
How would I transmit such a link in a text/plain mail/irc/IM or in a PDF/ODF? Of course, how would I copy/paste such a link incl. the hash?
(How would I send header information from an FTP server?)
Not a problem if the hash is contained with the URI, but achieving this with the other proposals is almost impossible.
Seeing all those "too inflexible" complaints makes me wonder how restricting the URIs to be less flexible in use will create or at least keep more flexibility/usability after all. I see this as pure and plain destructive criticism.
Those special fragments are inflexible as they don't allow multiple items, but that is a general problem and not limited to LF/metalink.
This is simply an issue sub-format/encoding. Something that should be easily solvable by something like: #meta(hash=sha1:xy;metalink3=somelink) or #hash(sha1:xy)&metalink3(somelink) or whatever.
Wasn't there already discussion about enhancing the URI specs to support multiple fragments?
So either find some solution to embed the hash within the fragment or screw the idea of LF altogether.
(In reply to comment #21)
> if you were king for a day?
Well.. it would probably have to be king for multiple days. ;) Link Fingerprints is nice because it's a surgical modification that helps solve some issues easily. The problem is its potential implication to further uses of the URI fragment identifier.
(E.g., there were some complaints about misusing the "fragment identifier," but the "fragment identifier" is really just "data for the client" -- it just happens that it's only really been used for identifying fragments. Other complaints about unreadable URIs.. I can't find the quote or where I originally stumbled upon it, but I recall Tim Berners Lee stating that URLs were supposed to be an implementation detail of the web and end users shouldn't even have to know about it.)
Perhaps I'm naive or just not as wise as the IETF community that say "this will be a bad thing down the road," but Link Fingerprints in the URI doesn't seem too bad. There just needs to be a standardization of "data for the client." So if I were king for multiple days, URIs would be made more suitable for Link Fingerprint-like additions while fixing up existing fragment identifiers then adding in Link Fingerprints. And if things do go bad with the URI or basically Link Fingerprints needs to be gone.. being king again, Link Fingerprints can just disappear without having to worry about compatibility (e.g., for clients supporting it and links using Link Fingerprints).
Created attachment 278794 [details] [diff] [review]
Necko Combo Patch
This is an unbitrotted necko patch created with mercurial's gitstyle diffs.
Created attachment 278795 [details]
Link Fingerprints Internet Draft
Seems if we can't use #, we either need a pseudo-scheme (akin to jar: or some hacks I've been associated with such as wysiwyg: and wyciwyg:, and of course good old view-source:) to prepend, or another hash-like suffix delimiter. Has anyone ever proposed such a thing?
(In reply to comment #26)
> Seems if we can't use #, we either need a pseudo-scheme (akin to jar:... Has
> anyone ever proposed such a thing?
There was something similar to that in a response to Link Fingerprints on the IETF HTTP WG mailing list..
"You could, instead, define a new URI scheme, e.g.,
However that defeats one of the main attractions of Link Fingerprints: backwards compatibility -- clients that don't even know about Link Fingerprints will still be able to download the file (using the exact same link) just by discarding the fragment identifier.
The only other way I can see this embedded in URLs is if we treated path components like: "/hash-md5-23FDE34EAC.../file.bin" specially. But that would prevent people from creating link fingerprints to files which they didn't control the location of. And people might object to making some directory names special.
I guess we could also use the username and password field. But that's even more of a hack. And, in fact, because that part _does_ get sent to the remote server, a good attacker could send the genuine file when they knew a check was coming, and the trojaned file otherwise.
The fragment identifier is the perfect location for all sorts of reasons. <sigh>
Put it into the element's class attribute (though I don't know how that might affect performance)?
There are several ways of putting it into HTML. But then you lose the portability - you can't send the fingerprint everywhere the URL goes (e.g. email messages, newsgroup postings, other plain text).
(In reply to comment #28)
> The only other way I can see this embedded in URLs is if we treated path
> components like: "/hash-md5-23FDE34EAC.../file.bin" specially.
What about using a special search key/value pair instead of the fragment identifier? Should be backwards compatible and extensible:
This bug is NOT the place to discuss redesigning this, please take those discussions back to the newsgroups. The problems with the query parameter approach have already been discussed -- it has all the problems of the fragment identifier (which I still like) plus the additional headache that it's sent to the server.
Created attachment 297219 [details]
Link Fingerprints Internet Draft (xml)
This isn't going to happen this way. Thanks to ed for all his hard work.