Closed Bug 243040 Opened 21 years ago Closed 21 years ago

HTML sanitizer misinterprets attribute values

Categories

(MailNews Core :: MIME, defect, P1)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bugzilla.mozilla.org-3, Assigned: BenB)

Details

(Keywords: testcase, Whiteboard: needed-aviary1.0)

Attachments

(3 files, 2 obsolete files)

The HTML sanitizer does not properly handle the contents of attribute values. This makes it possible to trick it to show remote images despite that I configured the HTML sanitizer to ignore the src attribute of img elements. Steps to reproduce: 1. Send the testcase to yourself, e.g. by making a manual connection to an SMTP server with telnet (the HTML sanitizer appears not to work with messages opened with Thunderbird's File -> Open Saved Message) 2. Choose View -> Message body as -> Original HTML 3. Choose View -> Message body as -> Simple HTML Actual results: Some text and an image that was not visible in the Original HTML view is displayed. The image is loaded from a remote server. Expected results: The Simple HTML view should be the same as the original HTML view, except that the alt text of the <img> should be displayed instead of the image itself. The alt text should be interpreted as plain text, not HTML. This happens both when the mailnews.display.html_sanitizer.allowed_tags pref is not set, and when the value contains "img(alt,title,longdesc,width,height)" (i.e. src is omitted). I see this using Thunderbird 0.6 on both WinXP and Linux.
Attached file test case
Keywords: testcase
Attachment #147997 - Attachment mime type: text/plain → message/rfc822
Oh. And I specifically stripped quotes to avoid stuff like this. Thanks for finding this! It's not a real hole, but it bypasses the Simple HTML mode, making it equal in security to the Original HTML mode. It seems anything is unescaping the attribute values. nsAutoString value(mParserNode->GetValueAt(i)); // SanitizeAttrValue() modifies |value| if (NS_SUCCEEDED(SanitizeAttrValue(type, key, value))) { // Write out Write(" " + key + "=\"") + value + "\""); // cutted down SanitizeAttrValue() doesn't seem to cause this either. So, it seems the generic HTML parser unescapes the attributes for us, and we'll have to re-escape them.
Severity: normal → major
Priority: -- → P1
BTW, Christian, did you try the same tricks at other places? I guess we should escape <>& in text nodes, too.
I only tried it in the three contexts mentioned in the test case, i.e. <img src=>, <a href=> and <foo bar=>. The first two showed the issue, though <a href=> was allowed in the pref, and <img src=> was not.
Summary: HTML sanitizer misinterprets atrribute values → HTML sanitizer misinterprets attribute values
Assignee: sspitzer → ben.bucksch
Group: security
Severity: major → critical
Group: security
Flags: blocking1.7?
It turns out, I was paranoid enough to protect against this, but I failed because I didn't set the inout parameter :(. I have a small fix which just does that and fixes another small bug. I also checked, if there may be other places where the same problem may appear, e.g. title, charset etc.. I tried to fix all these, and changed a few debugging statements etc.. I think this is a better and more comprehensive fix, but larger. I think this larger fix should be used for both trunk and branches, because the smaller one may leave loopholes. akk, can you review, please? asa, this is a security bug, it makes it possible to circumvent the additional protection that Simple HTML is supposed to offer. It is not a severe bug, it's still at least as secure as the Original HTML mode. Still, requesting that this goes into 1.7. I don't know how the AVIARY branch (Thunderbird) is maintained, but Thunderbird releases should take this, too, IMO.
Attached patch Small fix, v1Splinter Review
This gives the following HTML output (after sanitizing): <html><body> <html> <title>Test</title> <body> Foo <img alt="1 1 1 <b>2 2 2</b> 3 3 3 <img src=http://www.mozilla.org/images/ico-tb.gif"> <a href="1 1 1 <b>2 2 2</b> 3 3 3 <img src=http://www.mozilla.org/images/ico-tb.gif">foo</a> bar </body></html> which displays as ("foo" is a link): Foo 1 1 1 <b>2 2 2</b> 3 3 3 <img src=http://www.mozilla.org/images/ico-tb.gif foo bar
Attached patch Comprehensive fix, v2 (obsolete) — Splinter Review
This gives the following HTML output: <html><body> <html> <title>Test</title> <body> Foo <img alt="&quot;1 1 1 &lt;b&gt;2 2 2&lt;/b&gt; 3 3 3 &lt;img src=&quot;http://www.mozilla.org/images/ico-tb.gif"> <a href="&quot;1 1 1 &lt;b&gt;2 2 2&lt;/b&gt; 3 3 3 &lt;img src=&quot;http://www.mozilla.org/images/ico-tb.gif">foo</a> bar </body></html> which displays as ("foo" is a link): Foo "1 1 1 <b>2 2 2</b> 3 3 3 <img src="http://www.mozilla.org/images/ico-tb.gif foo bar
Attachment #149643 - Flags: review?(akkzilla)
Asa's out in Hawaii for 10 days or so (according to his blog), so perhaps some other driver can block 1.7. I second Ben's reasoning. This is a good patch to include in 1.7. It's enough of a security problem to include. Especially if 1.7 is going to have a long life, and be used by Distro's (Netscape et. al.) Also should make AVIARY.
Cc'ing bzbarsky to see if he has any input. I'm confused why you need to escape single quotes, but nsEscapeHTML2 doesn't have to ...
> I'm confused why you need to escape single quotes, but nsEscapeHTML2 > doesn't have to ... me too. In my confusion, I am tending to suspect that nsEscapeHTML2 is faulty. Although I havn't tested it, reading the spec <http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.2>, I'd think that the same exploit as the attaches testcase would work, if you used single quotes (both directly and as entities). Even if it doesn't happen to work, I am still trying to be on the paranoid side with this sanitizer.
> me too. In my confusion, I am tending to suspect that nsEscapeHTML2 is faulty. that's what I was wondering - if so, we should fix that routine and use it.
Yeah, nsEscapeHTML2 is broken here, imo.
OK, Boris, is it risky to fix it for 1.7? If it's not too bad, then, Ben, can you attach a patch that fixes it, and uses it, instead of creating your own escape method?
I don't really know. I'm not overly familiar with any of the callers of nsEscapeHTML2.... If it breaks something, the something was making some bogus assumptions... but that doesn't really say much. ;)
> Yeah, nsEscapeHTML2 is broken here, imo. OK. Filed bug 245399. > is it risky to fix it for 1.7? Is it certainly *far* more risky (and possibly far more dangerous hole) than this bug.
Status: NEW → ASSIGNED
Comment on attachment 149643 [details] [diff] [review] Comprehensive fix, v2 @@ -46,6 +48,8 @@ parser can decode during execution. E.g. there are these dreaded data: and javascript URLs and base64 encoding (which I don't really understand how it alloies + + Development note: You need to recompile content/ *and* layout/ */ I don't see how this comment is needed in this file, the same applies for every file in content/ and dom/ (and gfx?). I'd vote for removing it. +// can't use xpcom/io/nsEscape.*, need to escape ' (single quote) too Add a reference to the bug about nsEscape. +static void escape(nsAString& source) +{ + nsAutoString result; + + nsAString::const_iterator i; + source.BeginReading(i); + nsAString::const_iterator end; + source.EndReading(end); + for (; i != end; ++i) + { + PRUnichar ch = *i; + if (ch == '<') + result += NS_LITERAL_STRING("&lt;"); + else if (ch == '>') + result += NS_LITERAL_STRING("&gt;"); + else if (ch == '&') + result += NS_LITERAL_STRING("&amp;"); + else if (ch == '"') + result += NS_LITERAL_STRING("&quot;"); + else if (ch == '\'') + result += NS_LITERAL_STRING("&lsquo;"); + else + result += ch; + } + + source = result; +} This is not all that efficient (i.e. adding character by character, etc), do we care? @@ -360,7 +394,9 @@ // missing </title> tag won't result in everything // being eaten up as the title. Write(NS_LITERAL_STRING("<title>")); - Write(aValue); + nsAutoString value(aValue); + escape(value); + Write(value); Again, inefficient. Copy aValue into value, copy value into the local result in escape(), and then copy it back into value. Do we care? This would be a bit more efficient if you'd separate intput from output in the escape() signature (i.e. have a const nsAString& input, and a nsAString& output) to avoid copying before making the escape() call. - In mozSanitizingHTMLSerializer::SetDocumentCharset(): + nsAutoString charset = NS_ConvertASCIItoUCS2(aCharset); + escape(charset); Here too you could save a copy if escape() took a separate input. - nsAutoString text(aText); + nsString text(aText); // may be long -> heap Can't see the benefit of this change. If it's long, nsAutoString will malloc, just as nsString would, w/o any real difference in performance. But if the string does fit in an nsAutoString, we'll save a malloc. sr=jst with those changes.
Attachment #149643 - Flags: superreview+
My main issues have to do with entity conversion and wheel reinventing. How does escape() differ from the intl entity encoder, i.e. NS_ENTITYCONVERTER_CONTRACTID as used in nsHTMLContentSink.cpp? I didn't know about nsEscape. How does that relate to the intl converters? It looks like maybe nsEscape is intended specifically for urls, though, not body text. So the intl one is probably the right one for this use. If single quotes are the issue with using one or the other of those methods, how about calling them then having a special case just for single quote? You could still wrap that in a method called escape() if you want, I just would rather not see the rest of the code duplicated. Do we need to escape the characters that you list, but not all the extra unicode and smartquote characters that display just the same? (I commonly see many different single and double quote characters, and a couple different angle bracket pairs, in mail and web pages of various charsets.) I may not be understanding your entity-handling code properly anyway, though. For example: @@ -529,11 +567,13 @@ if (type == eHTMLTag_whitespace || type == eHTMLTag_newline) { - Write(aText); + nsAutoString text(aText); + escape(text); // necessary? not? sure? + Write(text); Why does whitespace need to be escaped? else if (type == eHTMLTag_entity) { Write(NS_LITERAL_STRING("&")); - Write(aText); + nsAutoString text(aText); + escape(text); // see above + Write(text); + Write(NS_LITERAL_STRING(";")); If the DOM already knows it's an entity, then it won't test as being equal to any of those plain ascii chars you're testing for, will it? I agree with jst about the comment on what to recompile, although I also agree with Ben that this is very confusing and I wish it were noted somewhere. But this probably isn't the best place to do it. Perhaps a README in both top-level directories? Or Makefile rules so that building at the top level of one calls the other and the developer doesn't need to keep track? - nsString& aValue /*inout*/) + nsString& value /*inout*/) Aren't arguments supposed to start with a even if they're out params? But you're really the owner of this file, so I won't argue if you actively don't want to use that convention.
Just a few comments offhand: > I don't see how this comment is needed in this file, the same applies > for every file in content/ and dom/ (and gfx?). I'd vote for removing it. I realize how this is totally pointless from your standpoint, but I don't usually hack content/ and found myself wondering for the second time why my changes don't get used. It's just a reminder to self, I should call it that, not "Development note". > If it's long, nsAutoString will malloc Ah, nsAutoString's ctor already knows the length. Will change back. > How does escape() differ from the intl entity encoder I don't know the latter, but I don't care about intl here (using wide strings anyways, so shouldn't be a problem), only HTML/SGML syntax (<>&"'). > It looks like maybe nsEscape is intended specifically for urls, though, > not body text. Slight incorrectness on my part in the summary of the other bug (biesi corrected it). We mean nsEscapeHTML* (which does HTML escaping), not NS_EscapeURL(which does URL escaping, which is completely different). Not sure what exactly nsEscape() does. > Do we need to escape the characters that you list, but not all the > extra unicode and smartquote characters that display just the same? Concern is not the display, but the HTML parser (quotes that delimit attribute values). I'd *hope* that the HTML parser doesn't and never will treat smart quotes as attribute delimiters. If there's any risk of such "smartness" in Gecko in the future, we should escape them, too, but I hope there isn't such risk. > I may not be understanding your entity-handling code properly anyway, though. > Why does whitespace need to be escaped? > If the DOM already knows it's an entity... :-) No, you understood that correctly. It's indeed complete nonsense to escape whitespace. However, can we be 100% sure that Gecko will only *ever* feed us whitespace there? > Aren't arguments supposed to start with a even if they're out params? I just didn't want to change all (3-4) the references to "value" in the function.
- Fixed review comments, incl. using nsEscapeHTML2 now, depends on bug 245399. - Removed comment about JS in attributes, because I think that quotes will now pass (be decoded before execution). Of course, I still remove javascript: URLs and the prefs removes eventhandler attributes etc.. - removed base64 blocker which is probably pointless - little code cleanup
Attachment #149643 - Attachment is obsolete: true
Attachment #149955 - Flags: review?(akkzilla)
Comment on attachment 149955 [details] [diff] [review] Comprehensive fix, v3. (Needs fix for 245399.) I'm not familiar with nsAdoptingString, so I'll trust you on that usage. Line 554: > Write(aText); // sure to safe? Typo, "to be"? (Your call.) You've addressed all my other issues with this patch. r=akkana
Attachment #149955 - Flags: review?(akkzilla) → review+
This patch looks like it leaks a bit -- some callers of |escape| use nsAdoptingString, and some callers of |escape| look like they leak the value. What are the potential problems not fixing this could cause, and how risky do you think it is? Is this code used by default in mail, or does it require the user to change a pref? This seems like the kind of thing that we'd want to land on the trunk before taking on the branch.
This sounds like it requires a bunch of risky changes and it's not going to make it for 1.7.
Flags: blocking1.7? → blocking1.7-
dbaron, the "Comprehensive fix, v2" is not really risky.
Flags: blocking1.7- → blocking1.7?
or I could produce a patch that matches v3, but without dependence on bug 245399, by implementing the escape function inside sanitizer, like v2 did.
dbaron: > What are the potential problems not fixing this could cause, and how risky do > you think it is? See comment 5. The bug, if triggered, circumvents the protection that the sanitizer is supposed to offer. THe sanitizer is supposed to protect against - possible future security problems in Gecko - mail tracking (e.g. remote image load by spammers) - formatting with bad taste (usability, not security) Risks: - Users which rely on the additional protection against our security bugs might not be protected when combined with another hole, i.e. less safe. - If spammers find out about this, they can track people despite the mode > Is this code used by default in mail, or does it require the > user to change a pref? Not used by default (for all msgs) in mozilla.org products, enabled via menu View|Body|Simple HTML. Used automatically and by default for detected spam in Thunderbird and soon Seamonkey.
Changes to v3: - using nsString::Adopt() instead of = to fix probably leak (thanks dbaron) - Removed comment which is probably bogus That's what I'll check into trunk.
Attachment #149955 - Attachment is obsolete: true
Attachment #150102 - Attachment description: Comprehensive fix, v3. (For trunk, needs fix for 245399.) → Comprehensive fix, v4. (For trunk, needs fix for 245399.)
Checked into trunk, marking FIXED. Thanks for the reviews. Still requesting checkin of version 4 for 1.7, modified to escape in class (like in version 2) instead of using nsEscapeHTML2().
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Whiteboard: needed-aviary1.0
Attachment #149643 - Flags: review?(akkzilla) → review-
too late for 1.7.
Flags: blocking1.7? → blocking1.7-
Great. It wasn't too late when I requested the 1.7 green-light. I tried hard to create a fix in time for 1.7, putting other projects aside.
Tracking regression in bug 246505, "simple html displays the ; from entities".
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: