Closed Bug 243040 Opened 21 years ago Closed 21 years ago

HTML sanitizer misinterprets attribute values

Categories

(MailNews Core :: MIME, defect, P1)

Product:

Component:

Platform:

x86

All

Type:

defect

Priority:

P1

Severity:

critical

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: bugzilla.mozilla.org-3, Assigned: BenB)

Details

(Keywords: testcase, Whiteboard: needed-aviary1.0)

Attachments

(3 files, 2 obsolete files)

test case 21 years ago Christian Schmidt 821 bytes, message/rfc822		Details
Small fix, v1 21 years ago Ben Bucksch (:BenB) 1.37 KB, patch		Details \| Diff \| Splinter Review
Comprehensive fix, v2 21 years ago Ben Bucksch (:BenB) 8.27 KB, patch	akkzilla : review- jst : superreview+	Details \| Diff \| Splinter Review
Comprehensive fix, v3. (Needs fix for 245399.) 21 years ago Ben Bucksch (:BenB) 9.95 KB, patch	akkzilla : review+	Details \| Diff \| Splinter Review
Comprehensive fix, v4. (For trunk, needs fix for 245399.) 21 years ago Ben Bucksch (:BenB) 9.83 KB, patch		Details \| Diff \| Splinter Review

Christian Schmidt

Reporter

Description

•

21 years ago

The HTML sanitizer does not properly handle the contents of attribute values. This makes it possible to trick it to show remote images despite that I configured the HTML sanitizer to ignore the src attribute of img elements. Steps to reproduce: 1. Send the testcase to yourself, e.g. by making a manual connection to an SMTP server with telnet (the HTML sanitizer appears not to work with messages opened with Thunderbird's File -> Open Saved Message) 2. Choose View -> Message body as -> Original HTML 3. Choose View -> Message body as -> Simple HTML Actual results: Some text and an image that was not visible in the Original HTML view is displayed. The image is loaded from a remote server. Expected results: The Simple HTML view should be the same as the original HTML view, except that the alt text of the <img> should be displayed instead of the image itself. The alt text should be interpreted as plain text, not HTML. This happens both when the mailnews.display.html_sanitizer.allowed_tags pref is not set, and when the value contains "img(alt,title,longdesc,width,height)" (i.e. src is omitted). I see this using Thunderbird 0.6 on both WinXP and Linux.

Christian Schmidt

Reporter

Comment 1

•

21 years ago

Attached file test case — Details

Christian Schmidt

Reporter

Updated

•

21 years ago

Keywords: testcase

Ben Bucksch (:BenB)

Assignee

Updated

•

21 years ago

Attachment #147997 - Attachment mime type: text/plain → message/rfc822

Ben Bucksch (:BenB)

Assignee

Comment 2

•

21 years ago

Oh. And I specifically stripped quotes to avoid stuff like this. Thanks for finding this! It's not a real hole, but it bypasses the Simple HTML mode, making it equal in security to the Original HTML mode. It seems anything is unescaping the attribute values. nsAutoString value(mParserNode->GetValueAt(i)); // SanitizeAttrValue() modifies |value| if (NS_SUCCEEDED(SanitizeAttrValue(type, key, value))) { // Write out Write(" " + key + "=\"") + value + "\""); // cutted down SanitizeAttrValue() doesn't seem to cause this either. So, it seems the generic HTML parser unescapes the attributes for us, and we'll have to re-escape them.

Severity: normal → major

Priority: -- → P1

Ben Bucksch (:BenB)

Assignee

Comment 3

•

21 years ago

BTW, Christian, did you try the same tricks at other places? I guess we should escape <>& in text nodes, too.

Christian Schmidt

Reporter

Comment 4

•

21 years ago

I only tried it in the three contexts mentioned in the test case, i.e. <img src=>, <a href=> and <foo bar=>. The first two showed the issue, though <a href=> was allowed in the pref, and <img src=> was not.

Stephen Donner [:stephend] Not actively reading bugmail

Updated

•

21 years ago

Summary: HTML sanitizer misinterprets atrribute values → HTML sanitizer misinterprets attribute values

Ben Bucksch (:BenB)

Assignee

Updated

•

21 years ago

Assignee: sspitzer → ben.bucksch

Ben Bucksch (:BenB)

Assignee

Updated

•

21 years ago

Group: security

Severity: major → critical

Ben Bucksch (:BenB)

Assignee

Updated

•

21 years ago

Group: security

Ben Bucksch (:BenB)

Assignee

Updated

•

21 years ago

Flags: blocking1.7?

Ben Bucksch (:BenB)

Assignee

Comment 5

•

21 years ago

It turns out, I was paranoid enough to protect against this, but I failed because I didn't set the inout parameter :(. I have a small fix which just does that and fixes another small bug. I also checked, if there may be other places where the same problem may appear, e.g. title, charset etc.. I tried to fix all these, and changed a few debugging statements etc.. I think this is a better and more comprehensive fix, but larger. I think this larger fix should be used for both trunk and branches, because the smaller one may leave loopholes. akk, can you review, please? asa, this is a security bug, it makes it possible to circumvent the additional protection that Simple HTML is supposed to offer. It is not a severe bug, it's still at least as secure as the Original HTML mode. Still, requesting that this goes into 1.7. I don't know how the AVIARY branch (Thunderbird) is maintained, but Thunderbird releases should take this, too, IMO.

Ben Bucksch (:BenB)

Assignee

Comment 6

•

21 years ago

Attached patch Small fix, v1 — Details — Splinter Review

This gives the following HTML output (after sanitizing): <html><body> <html> <title>Test</title> <body> Foo <img alt="1 1 1 <b>2 2 2</b> 3 3 3 <img src=http://www.mozilla.org/images/ico-tb.gif"> <a href="1 1 1 <b>2 2 2</b> 3 3 3 <img src=http://www.mozilla.org/images/ico-tb.gif">foo</a> bar </body></html> which displays as ("foo" is a link): Foo 1 1 1 <b>2 2 2</b> 3 3 3 <img src=http://www.mozilla.org/images/ico-tb.gif foo bar

Ben Bucksch (:BenB)

Assignee

Comment 7

•

21 years ago

Attached patch Comprehensive fix, v2 (obsolete) — Details — Splinter Review

This gives the following HTML output: <html><body> <html> <title>Test</title> <body> Foo <img alt=""1 1 1 <b>2 2 2</b> 3 3 3 <img src="http://www.mozilla.org/images/ico-tb.gif"> <a href=""1 1 1 <b>2 2 2</b> 3 3 3 <img src="http://www.mozilla.org/images/ico-tb.gif">foo</a> bar </body></html> which displays as ("foo" is a link): Foo "1 1 1 <b>2 2 2</b> 3 3 3 <img src="http://www.mozilla.org/images/ico-tb.gif foo bar

Ben Bucksch (:BenB)

Assignee

Updated

•

21 years ago

Attachment #149643 - Flags: review?(akkzilla)

Robert Accettura [:raccettura]

Comment 8

•

21 years ago

Asa's out in Hawaii for 10 days or so (according to his blog), so perhaps some other driver can block 1.7. I second Ben's reasoning. This is a good patch to include in 1.7. It's enough of a security problem to include. Especially if 1.7 is going to have a long life, and be used by Distro's (Netscape et. al.) Also should make AVIARY.

David :Bienvenu

Comment 9

•

21 years ago

Cc'ing bzbarsky to see if he has any input. I'm confused why you need to escape single quotes, but nsEscapeHTML2 doesn't have to ...

Ben Bucksch (:BenB)

Assignee

Comment 10

•

21 years ago

> I'm confused why you need to escape single quotes, but nsEscapeHTML2 > doesn't have to ... me too. In my confusion, I am tending to suspect that nsEscapeHTML2 is faulty. Although I havn't tested it, reading the spec <http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.2>, I'd think that the same exploit as the attaches testcase would work, if you used single quotes (both directly and as entities). Even if it doesn't happen to work, I am still trying to be on the paranoid side with this sanitizer.

David :Bienvenu

Comment 11

•

21 years ago

> me too. In my confusion, I am tending to suspect that nsEscapeHTML2 is faulty. that's what I was wondering - if so, we should fix that routine and use it.

Boris Zbarsky [:bzbarsky]

Comment 12

•

21 years ago

Yeah, nsEscapeHTML2 is broken here, imo.

David :Bienvenu

Comment 13

•

21 years ago

OK, Boris, is it risky to fix it for 1.7? If it's not too bad, then, Ben, can you attach a patch that fixes it, and uses it, instead of creating your own escape method?

Boris Zbarsky [:bzbarsky]

Comment 14

•

21 years ago

I don't really know. I'm not overly familiar with any of the callers of nsEscapeHTML2.... If it breaks something, the something was making some bogus assumptions... but that doesn't really say much. ;)

Ben Bucksch (:BenB)

Assignee

Comment 15

•

21 years ago

> Yeah, nsEscapeHTML2 is broken here, imo. OK. Filed bug 245399. > is it risky to fix it for 1.7? Is it certainly *far* more risky (and possibly far more dangerous hole) than this bug.

Status: NEW → ASSIGNED

Johnny Stenback (:jst)

Comment 16

•

21 years ago

Comment on attachment 149643 [details] [diff] [review] Comprehensive fix, v2 @@ -46,6 +48,8 @@ parser can decode during execution. E.g. there are these dreaded data: and javascript URLs and base64 encoding (which I don't really understand how it alloies + + Development note: You need to recompile content/ *and* layout/ */ I don't see how this comment is needed in this file, the same applies for every file in content/ and dom/ (and gfx?). I'd vote for removing it. +// can't use xpcom/io/nsEscape.*, need to escape ' (single quote) too Add a reference to the bug about nsEscape. +static void escape(nsAString& source) +{ + nsAutoString result; + + nsAString::const_iterator i; + source.BeginReading(i); + nsAString::const_iterator end; + source.EndReading(end); + for (; i != end; ++i) + { + PRUnichar ch = *i; + if (ch == '<') + result += NS_LITERAL_STRING("<"); + else if (ch == '>') + result += NS_LITERAL_STRING(">"); + else if (ch == '&') + result += NS_LITERAL_STRING("&"); + else if (ch == '"') + result += NS_LITERAL_STRING("""); + else if (ch == '\'') + result += NS_LITERAL_STRING("‘"); + else + result += ch; + } + + source = result; +} This is not all that efficient (i.e. adding character by character, etc), do we care? @@ -360,7 +394,9 @@ // missing </title> tag won't result in everything // being eaten up as the title. Write(NS_LITERAL_STRING("<title>")); - Write(aValue); + nsAutoString value(aValue); + escape(value); + Write(value); Again, inefficient. Copy aValue into value, copy value into the local result in escape(), and then copy it back into value. Do we care? This would be a bit more efficient if you'd separate intput from output in the escape() signature (i.e. have a const nsAString& input, and a nsAString& output) to avoid copying before making the escape() call. - In mozSanitizingHTMLSerializer::SetDocumentCharset(): + nsAutoString charset = NS_ConvertASCIItoUCS2(aCharset); + escape(charset); Here too you could save a copy if escape() took a separate input. - nsAutoString text(aText); + nsString text(aText); // may be long -> heap Can't see the benefit of this change. If it's long, nsAutoString will malloc, just as nsString would, w/o any real difference in performance. But if the string does fit in an nsAutoString, we'll save a malloc. sr=jst with those changes.

Attachment #149643 - Flags: superreview+

Comment 17

•

21 years ago

My main issues have to do with entity conversion and wheel reinventing. How does escape() differ from the intl entity encoder, i.e. NS_ENTITYCONVERTER_CONTRACTID as used in nsHTMLContentSink.cpp? I didn't know about nsEscape. How does that relate to the intl converters? It looks like maybe nsEscape is intended specifically for urls, though, not body text. So the intl one is probably the right one for this use. If single quotes are the issue with using one or the other of those methods, how about calling them then having a special case just for single quote? You could still wrap that in a method called escape() if you want, I just would rather not see the rest of the code duplicated. Do we need to escape the characters that you list, but not all the extra unicode and smartquote characters that display just the same? (I commonly see many different single and double quote characters, and a couple different angle bracket pairs, in mail and web pages of various charsets.) I may not be understanding your entity-handling code properly anyway, though. For example: @@ -529,11 +567,13 @@ if (type == eHTMLTag_whitespace || type == eHTMLTag_newline) { - Write(aText); + nsAutoString text(aText); + escape(text); // necessary? not? sure? + Write(text); Why does whitespace need to be escaped? else if (type == eHTMLTag_entity) { Write(NS_LITERAL_STRING("&")); - Write(aText); + nsAutoString text(aText); + escape(text); // see above + Write(text); + Write(NS_LITERAL_STRING(";")); If the DOM already knows it's an entity, then it won't test as being equal to any of those plain ascii chars you're testing for, will it? I agree with jst about the comment on what to recompile, although I also agree with Ben that this is very confusing and I wish it were noted somewhere. But this probably isn't the best place to do it. Perhaps a README in both top-level directories? Or Makefile rules so that building at the top level of one calls the other and the developer doesn't need to keep track? - nsString& aValue /*inout*/) + nsString& value /*inout*/) Aren't arguments supposed to start with a even if they're out params? But you're really the owner of this file, so I won't argue if you actively don't want to use that convention.

Ben Bucksch (:BenB)

Assignee

Comment 18

•

21 years ago

Just a few comments offhand: > I don't see how this comment is needed in this file, the same applies > for every file in content/ and dom/ (and gfx?). I'd vote for removing it. I realize how this is totally pointless from your standpoint, but I don't usually hack content/ and found myself wondering for the second time why my changes don't get used. It's just a reminder to self, I should call it that, not "Development note". > If it's long, nsAutoString will malloc Ah, nsAutoString's ctor already knows the length. Will change back. > How does escape() differ from the intl entity encoder I don't know the latter, but I don't care about intl here (using wide strings anyways, so shouldn't be a problem), only HTML/SGML syntax (<>&"'). > It looks like maybe nsEscape is intended specifically for urls, though, > not body text. Slight incorrectness on my part in the summary of the other bug (biesi corrected it). We mean nsEscapeHTML* (which does HTML escaping), not NS_EscapeURL(which does URL escaping, which is completely different). Not sure what exactly nsEscape() does. > Do we need to escape the characters that you list, but not all the > extra unicode and smartquote characters that display just the same? Concern is not the display, but the HTML parser (quotes that delimit attribute values). I'd *hope* that the HTML parser doesn't and never will treat smart quotes as attribute delimiters. If there's any risk of such "smartness" in Gecko in the future, we should escape them, too, but I hope there isn't such risk. > I may not be understanding your entity-handling code properly anyway, though. > Why does whitespace need to be escaped? > If the DOM already knows it's an entity... :-) No, you understood that correctly. It's indeed complete nonsense to escape whitespace. However, can we be 100% sure that Gecko will only *ever* feed us whitespace there? > Aren't arguments supposed to start with a even if they're out params? I just didn't want to change all (3-4) the references to "value" in the function.

Ben Bucksch (:BenB)

Assignee

Comment 19

•

21 years ago

Attached patch Comprehensive fix, v3. (Needs fix for 245399.) (obsolete) — Details — Splinter Review

- Fixed review comments, incl. using nsEscapeHTML2 now, depends on bug 245399. - Removed comment about JS in attributes, because I think that quotes will now pass (be decoded before execution). Of course, I still remove javascript: URLs and the prefs removes eventhandler attributes etc.. - removed base64 blocker which is probably pointless - little code cleanup

Ben Bucksch (:BenB)

Assignee

Updated

•

21 years ago

Attachment #149643 - Attachment is obsolete: true

Serge Gautherie (:sgautherie)

Updated

•

21 years ago

Attachment #149955 - Flags: review?(akkzilla)

Comment 20

•

21 years ago

Comment on attachment 149955 [details] [diff] [review] Comprehensive fix, v3. (Needs fix for 245399.) I'm not familiar with nsAdoptingString, so I'll trust you on that usage. Line 554: > Write(aText); // sure to safe? Typo, "to be"? (Your call.) You've addressed all my other issues with this patch. r=akkana

Attachment #149955 - Flags: review?(akkzilla) → review+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 21

•

21 years ago

This patch looks like it leaks a bit -- some callers of |escape| use nsAdoptingString, and some callers of |escape| look like they leak the value. What are the potential problems not fixing this could cause, and how risky do you think it is? Is this code used by default in mail, or does it require the user to change a pref? This seems like the kind of thing that we'd want to land on the trunk before taking on the branch.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 22

•

21 years ago

This sounds like it requires a bunch of risky changes and it's not going to make it for 1.7.

Flags: blocking1.7? → blocking1.7-

Ben Bucksch (:BenB)

Assignee

Comment 23

•

21 years ago

dbaron, the "Comprehensive fix, v2" is not really risky.

Flags: blocking1.7- → blocking1.7?

Ben Bucksch (:BenB)

Assignee

Comment 24

•

21 years ago

or I could produce a patch that matches v3, but without dependence on bug 245399, by implementing the escape function inside sanitizer, like v2 did.

Ben Bucksch (:BenB)

Assignee

Comment 25

•

21 years ago

dbaron: > What are the potential problems not fixing this could cause, and how risky do > you think it is? See comment 5. The bug, if triggered, circumvents the protection that the sanitizer is supposed to offer. THe sanitizer is supposed to protect against - possible future security problems in Gecko - mail tracking (e.g. remote image load by spammers) - formatting with bad taste (usability, not security) Risks: - Users which rely on the additional protection against our security bugs might not be protected when combined with another hole, i.e. less safe. - If spammers find out about this, they can track people despite the mode > Is this code used by default in mail, or does it require the > user to change a pref? Not used by default (for all msgs) in mozilla.org products, enabled via menu View|Body|Simple HTML. Used automatically and by default for detected spam in Thunderbird and soon Seamonkey.

Ben Bucksch (:BenB)

Assignee

Comment 26

•

21 years ago

Attached patch Comprehensive fix, v4. (For trunk, needs fix for 245399.) — Details — Splinter Review

Changes to v3: - using nsString::Adopt() instead of = to fix probably leak (thanks dbaron) - Removed comment which is probably bogus That's what I'll check into trunk.

Ben Bucksch (:BenB)

Assignee

Updated

•

21 years ago

Attachment #149955 - Attachment is obsolete: true

Ben Bucksch (:BenB)

Assignee

Updated

•

21 years ago

Attachment #150102 - Attachment description: Comprehensive fix, v3. (For trunk, needs fix for 245399.) → Comprehensive fix, v4. (For trunk, needs fix for 245399.)

Ben Bucksch (:BenB)

Assignee

Comment 27

•

21 years ago

Checked into trunk, marking FIXED. Thanks for the reviews. Still requesting checkin of version 4 for 1.7, modified to escape in class (like in version 2) instead of using nsEscapeHTML2().

Status: ASSIGNED → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

Ben Bucksch (:BenB)

Assignee

Updated

•

21 years ago

Whiteboard: needed-aviary1.0

Updated

•

21 years ago

Attachment #149643 - Flags: review?(akkzilla) → review-

Asa Dotzler [:asa]

Comment 28

•

21 years ago

too late for 1.7.

Flags: blocking1.7? → blocking1.7-

Ben Bucksch (:BenB)

Assignee

Comment 29

•

21 years ago

Great. It wasn't too late when I requested the 1.7 green-light. I tried hard to create a fix in time for 1.7, putting other projects aside.

Comment 30

•

21 years ago

Tracking regression in bug 246505, "simple html displays the ; from entities".

Myk Melez [:myk] [@mykmelez]

Updated

•

21 years ago

Product: MailNews → Core

Nobody; OK to take it and work on it

Updated

•

17 years ago

Product: Core → MailNews Core

You need to log in before you can comment on or make changes to this bug.