Closed
Bug 243040
Opened 21 years ago
Closed 21 years ago
HTML sanitizer misinterprets attribute values
Categories
(MailNews Core :: MIME, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bugzilla.mozilla.org-3, Assigned: BenB)
Details
(Keywords: testcase, Whiteboard: needed-aviary1.0)
Attachments
(3 files, 2 obsolete files)
821 bytes,
message/rfc822
|
Details | |
1.37 KB,
patch
|
Details | Diff | Splinter Review | |
9.83 KB,
patch
|
Details | Diff | Splinter Review |
The HTML sanitizer does not properly handle the contents of attribute values.
This makes it possible to trick it to show remote images despite that I
configured the HTML sanitizer to ignore the src attribute of img elements.
Steps to reproduce:
1. Send the testcase to yourself, e.g. by making a manual connection to an SMTP
server with telnet (the HTML sanitizer appears not to work with messages opened
with Thunderbird's File -> Open Saved Message)
2. Choose View -> Message body as -> Original HTML
3. Choose View -> Message body as -> Simple HTML
Actual results:
Some text and an image that was not visible in the Original HTML view is
displayed. The image is loaded from a remote server.
Expected results:
The Simple HTML view should be the same as the original HTML view, except that
the alt text of the <img> should be displayed instead of the image itself. The
alt text should be interpreted as plain text, not HTML.
This happens both when the mailnews.display.html_sanitizer.allowed_tags pref is
not set, and when the value contains "img(alt,title,longdesc,width,height)"
(i.e. src is omitted).
I see this using Thunderbird 0.6 on both WinXP and Linux.
Reporter | ||
Comment 1•21 years ago
|
||
Assignee | ||
Updated•21 years ago
|
Attachment #147997 -
Attachment mime type: text/plain → message/rfc822
Assignee | ||
Comment 2•21 years ago
|
||
Oh. And I specifically stripped quotes to avoid stuff like this.
Thanks for finding this! It's not a real hole, but it bypasses the Simple HTML
mode, making it equal in security to the Original HTML mode.
It seems anything is unescaping the attribute values.
nsAutoString value(mParserNode->GetValueAt(i));
// SanitizeAttrValue() modifies |value|
if (NS_SUCCEEDED(SanitizeAttrValue(type, key, value)))
{
// Write out
Write(" " + key + "=\"") + value + "\""); // cutted down
SanitizeAttrValue() doesn't seem to cause this either.
So, it seems the generic HTML parser unescapes the attributes for us, and we'll
have to re-escape them.
Severity: normal → major
Priority: -- → P1
Assignee | ||
Comment 3•21 years ago
|
||
BTW, Christian, did you try the same tricks at other places?
I guess we should escape <>& in text nodes, too.
Reporter | ||
Comment 4•21 years ago
|
||
I only tried it in the three contexts mentioned in the test case, i.e. <img
src=>, <a href=> and <foo bar=>.
The first two showed the issue, though <a href=> was allowed in the pref, and
<img src=> was not.
Updated•21 years ago
|
Summary: HTML sanitizer misinterprets atrribute values → HTML sanitizer misinterprets attribute values
Assignee | ||
Updated•21 years ago
|
Assignee: sspitzer → ben.bucksch
Assignee | ||
Updated•21 years ago
|
Group: security
Severity: major → critical
Assignee | ||
Updated•21 years ago
|
Group: security
Assignee | ||
Updated•21 years ago
|
Flags: blocking1.7?
Assignee | ||
Comment 5•21 years ago
|
||
It turns out, I was paranoid enough to protect against this, but I failed
because I didn't set the inout parameter :(. I have a small fix which just does
that and fixes another small bug.
I also checked, if there may be other places where the same problem may appear,
e.g. title, charset etc.. I tried to fix all these, and changed a few debugging
statements etc.. I think this is a better and more comprehensive fix, but
larger. I think this larger fix should be used for both trunk and branches,
because the smaller one may leave loopholes.
akk, can you review, please?
asa, this is a security bug, it makes it possible to circumvent the additional
protection that Simple HTML is supposed to offer. It is not a severe bug, it's
still at least as secure as the Original HTML mode. Still, requesting that this
goes into 1.7.
I don't know how the AVIARY branch (Thunderbird) is maintained, but Thunderbird
releases should take this, too, IMO.
Assignee | ||
Comment 6•21 years ago
|
||
This gives the following HTML output (after sanitizing):
<html><body>
<html>
<title>Test</title>
<body>
Foo <img alt="1 1 1 <b>2 2 2</b> 3 3 3 <img
src=http://www.mozilla.org/images/ico-tb.gif">
<a href="1 1 1 <b>2 2 2</b> 3 3 3 <img
src=http://www.mozilla.org/images/ico-tb.gif">foo</a>
bar
</body></html>
which displays as ("foo" is a link):
Foo 1 1 1 <b>2 2 2</b> 3 3 3 <img src=http://www.mozilla.org/images/ico-tb.gif
foo bar
Assignee | ||
Comment 7•21 years ago
|
||
This gives the following HTML output:
<html><body>
<html>
<title>Test</title>
<body>
Foo <img alt=""1 1 1 <b>2 2 2</b> 3 3 3 <img
src="http://www.mozilla.org/images/ico-tb.gif">
<a href=""1 1 1 <b>2 2 2</b> 3 3 3 <img
src="http://www.mozilla.org/images/ico-tb.gif">foo</a>
bar
</body></html>
which displays as ("foo" is a link):
Foo "1 1 1 <b>2 2 2</b> 3 3 3 <img
src="http://www.mozilla.org/images/ico-tb.gif foo bar
Assignee | ||
Updated•21 years ago
|
Attachment #149643 -
Flags: review?(akkzilla)
Comment 8•21 years ago
|
||
Asa's out in Hawaii for 10 days or so (according to his blog), so perhaps some
other driver can block 1.7.
I second Ben's reasoning. This is a good patch to include in 1.7. It's enough
of a security problem to include. Especially if 1.7 is going to have a long
life, and be used by Distro's (Netscape et. al.)
Also should make AVIARY.
![]() |
||
Comment 9•21 years ago
|
||
Cc'ing bzbarsky to see if he has any input. I'm confused why you need to escape
single quotes, but nsEscapeHTML2 doesn't have to ...
Assignee | ||
Comment 10•21 years ago
|
||
> I'm confused why you need to escape single quotes, but nsEscapeHTML2
> doesn't have to ...
me too. In my confusion, I am tending to suspect that nsEscapeHTML2 is faulty.
Although I havn't tested it, reading the spec
<http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.2>, I'd think that the
same exploit as the attaches testcase would work, if you used single quotes
(both directly and as entities). Even if it doesn't happen to work, I am still
trying to be on the paranoid side with this sanitizer.
![]() |
||
Comment 11•21 years ago
|
||
> me too. In my confusion, I am tending to suspect that nsEscapeHTML2 is faulty.
that's what I was wondering - if so, we should fix that routine and use it.
![]() |
||
Comment 12•21 years ago
|
||
Yeah, nsEscapeHTML2 is broken here, imo.
![]() |
||
Comment 13•21 years ago
|
||
OK, Boris, is it risky to fix it for 1.7? If it's not too bad, then, Ben, can
you attach a patch that fixes it, and uses it, instead of creating your own
escape method?
![]() |
||
Comment 14•21 years ago
|
||
I don't really know. I'm not overly familiar with any of the callers of
nsEscapeHTML2....
If it breaks something, the something was making some bogus assumptions... but
that doesn't really say much. ;)
Assignee | ||
Comment 15•21 years ago
|
||
> Yeah, nsEscapeHTML2 is broken here, imo.
OK. Filed bug 245399.
> is it risky to fix it for 1.7?
Is it certainly *far* more risky (and possibly far more dangerous hole) than
this bug.
Status: NEW → ASSIGNED
Comment 16•21 years ago
|
||
Comment on attachment 149643 [details] [diff] [review]
Comprehensive fix, v2
@@ -46,6 +48,8 @@
parser can decode during execution.
E.g. there are these dreaded data: and javascript URLs and
base64 encoding (which I don't really understand how it alloies
+
+ Development note: You need to recompile content/ *and* layout/
*/
I don't see how this comment is needed in this file, the same applies for every
file in content/ and dom/ (and gfx?). I'd vote for removing it.
+// can't use xpcom/io/nsEscape.*, need to escape ' (single quote) too
Add a reference to the bug about nsEscape.
+static void escape(nsAString& source)
+{
+ nsAutoString result;
+
+ nsAString::const_iterator i;
+ source.BeginReading(i);
+ nsAString::const_iterator end;
+ source.EndReading(end);
+ for (; i != end; ++i)
+ {
+ PRUnichar ch = *i;
+ if (ch == '<')
+ result += NS_LITERAL_STRING("<");
+ else if (ch == '>')
+ result += NS_LITERAL_STRING(">");
+ else if (ch == '&')
+ result += NS_LITERAL_STRING("&");
+ else if (ch == '"')
+ result += NS_LITERAL_STRING(""");
+ else if (ch == '\'')
+ result += NS_LITERAL_STRING("‘");
+ else
+ result += ch;
+ }
+
+ source = result;
+}
This is not all that efficient (i.e. adding character by character, etc), do we
care?
@@ -360,7 +394,9 @@
// missing </title> tag won't result in everything
// being eaten up as the title.
Write(NS_LITERAL_STRING("<title>"));
- Write(aValue);
+ nsAutoString value(aValue);
+ escape(value);
+ Write(value);
Again, inefficient. Copy aValue into value, copy value into the local result in
escape(), and then copy it back into value. Do we care? This would be a bit
more efficient if you'd separate intput from output in the escape() signature
(i.e. have a const nsAString& input, and a nsAString& output) to avoid copying
before making the escape() call.
- In mozSanitizingHTMLSerializer::SetDocumentCharset():
+ nsAutoString charset = NS_ConvertASCIItoUCS2(aCharset);
+ escape(charset);
Here too you could save a copy if escape() took a separate input.
- nsAutoString text(aText);
+ nsString text(aText); // may be long -> heap
Can't see the benefit of this change. If it's long, nsAutoString will malloc,
just as nsString would, w/o any real difference in performance. But if the
string does fit in an nsAutoString, we'll save a malloc.
sr=jst with those changes.
Attachment #149643 -
Flags: superreview+
Comment 17•21 years ago
|
||
My main issues have to do with entity conversion and wheel reinventing.
How does escape() differ from the intl entity encoder, i.e.
NS_ENTITYCONVERTER_CONTRACTID as used in nsHTMLContentSink.cpp?
I didn't know about nsEscape. How does that relate to the intl converters? It
looks like maybe nsEscape is intended specifically for urls, though, not body
text. So the intl one is probably the right one for this use.
If single quotes are the issue with using one or the other of those methods, how
about calling them then having a special case just for single quote? You could
still wrap that in a method called escape() if you want, I just would rather not
see the rest of the code duplicated.
Do we need to escape the characters that you list, but not all the extra unicode
and smartquote characters that display just the same? (I commonly see many
different single and double quote characters, and a couple different angle
bracket pairs, in mail and web pages of various charsets.)
I may not be understanding your entity-handling code properly anyway, though.
For example:
@@ -529,11 +567,13 @@
if (type == eHTMLTag_whitespace ||
type == eHTMLTag_newline)
{
- Write(aText);
+ nsAutoString text(aText);
+ escape(text); // necessary? not? sure?
+ Write(text);
Why does whitespace need to be escaped?
else if (type == eHTMLTag_entity)
{
Write(NS_LITERAL_STRING("&"));
- Write(aText);
+ nsAutoString text(aText);
+ escape(text); // see above
+ Write(text);
+ Write(NS_LITERAL_STRING(";"));
If the DOM already knows it's an entity, then it won't test as being equal to
any of those plain ascii chars you're testing for, will it?
I agree with jst about the comment on what to recompile, although I also agree
with Ben that this is very confusing and I wish it were noted somewhere. But
this probably isn't the best place to do it. Perhaps a README in both top-level
directories? Or Makefile rules so that building at the top level of one calls
the other and the developer doesn't need to keep track?
- nsString& aValue /*inout*/)
+ nsString& value /*inout*/)
Aren't arguments supposed to start with a even if they're out params? But
you're really the owner of this file, so I won't argue if you actively don't
want to use that convention.
Assignee | ||
Comment 18•21 years ago
|
||
Just a few comments offhand:
> I don't see how this comment is needed in this file, the same applies
> for every file in content/ and dom/ (and gfx?). I'd vote for removing it.
I realize how this is totally pointless from your standpoint, but I don't
usually hack content/ and found myself wondering for the second time why my
changes don't get used. It's just a reminder to self, I should call it that, not
"Development note".
> If it's long, nsAutoString will malloc
Ah, nsAutoString's ctor already knows the length. Will change back.
> How does escape() differ from the intl entity encoder
I don't know the latter, but I don't care about intl here (using wide strings
anyways, so shouldn't be a problem), only HTML/SGML syntax (<>&"').
> It looks like maybe nsEscape is intended specifically for urls, though,
> not body text.
Slight incorrectness on my part in the summary of the other bug (biesi corrected
it). We mean nsEscapeHTML* (which does HTML escaping), not NS_EscapeURL(which
does URL escaping, which is completely different). Not sure what exactly
nsEscape() does.
> Do we need to escape the characters that you list, but not all the
> extra unicode and smartquote characters that display just the same?
Concern is not the display, but the HTML parser (quotes that delimit attribute
values). I'd *hope* that the HTML parser doesn't and never will treat smart
quotes as attribute delimiters. If there's any risk of such "smartness" in Gecko
in the future, we should escape them, too, but I hope there isn't such risk.
> I may not be understanding your entity-handling code properly anyway, though.
> Why does whitespace need to be escaped?
> If the DOM already knows it's an entity...
:-) No, you understood that correctly. It's indeed complete nonsense to escape
whitespace. However, can we be 100% sure that Gecko will only *ever* feed us
whitespace there?
> Aren't arguments supposed to start with a even if they're out params?
I just didn't want to change all (3-4) the references to "value" in the function.
Assignee | ||
Comment 19•21 years ago
|
||
- Fixed review comments, incl.
using nsEscapeHTML2 now, depends on bug 245399.
- Removed comment about JS in attributes, because I think that quotes
will now pass (be decoded before execution). Of course, I still
remove javascript: URLs and the prefs removes eventhandler attributes etc..
- removed base64 blocker which is probably pointless
- little code cleanup
Assignee | ||
Updated•21 years ago
|
Attachment #149643 -
Attachment is obsolete: true
Updated•21 years ago
|
Attachment #149955 -
Flags: review?(akkzilla)
Comment 20•21 years ago
|
||
Comment on attachment 149955 [details] [diff] [review]
Comprehensive fix, v3. (Needs fix for 245399.)
I'm not familiar with nsAdoptingString, so I'll trust you on that usage.
Line 554:
> Write(aText); // sure to safe?
Typo, "to be"? (Your call.)
You've addressed all my other issues with this patch.
r=akkana
Attachment #149955 -
Flags: review?(akkzilla) → review+
This patch looks like it leaks a bit -- some callers of |escape| use
nsAdoptingString, and some callers of |escape| look like they leak the value.
What are the potential problems not fixing this could cause, and how risky do
you think it is? Is this code used by default in mail, or does it require the
user to change a pref?
This seems like the kind of thing that we'd want to land on the trunk before
taking on the branch.
This sounds like it requires a bunch of risky changes and it's not going to make
it for 1.7.
Flags: blocking1.7? → blocking1.7-
Assignee | ||
Comment 23•21 years ago
|
||
dbaron, the "Comprehensive fix, v2" is not really risky.
Flags: blocking1.7- → blocking1.7?
Assignee | ||
Comment 24•21 years ago
|
||
or I could produce a patch that matches v3, but without dependence on bug
245399, by implementing the escape function inside sanitizer, like v2 did.
Assignee | ||
Comment 25•21 years ago
|
||
dbaron:
> What are the potential problems not fixing this could cause, and how risky do
> you think it is?
See comment 5. The bug, if triggered, circumvents the protection that the
sanitizer is supposed to offer. THe sanitizer is supposed to protect against
- possible future security problems in Gecko
- mail tracking (e.g. remote image load by spammers)
- formatting with bad taste (usability, not security)
Risks:
- Users which rely on the additional protection against our security bugs
might not be protected when combined with another hole, i.e. less safe.
- If spammers find out about this, they can track people despite the mode
> Is this code used by default in mail, or does it require the
> user to change a pref?
Not used by default (for all msgs) in mozilla.org products, enabled via menu
View|Body|Simple HTML. Used automatically and by default for detected spam in
Thunderbird and soon Seamonkey.
Assignee | ||
Comment 26•21 years ago
|
||
Changes to v3:
- using nsString::Adopt() instead of = to fix probably leak
(thanks dbaron)
- Removed comment which is probably bogus
That's what I'll check into trunk.
Assignee | ||
Updated•21 years ago
|
Attachment #149955 -
Attachment is obsolete: true
Assignee | ||
Updated•21 years ago
|
Attachment #150102 -
Attachment description: Comprehensive fix, v3. (For trunk, needs fix for 245399.) → Comprehensive fix, v4. (For trunk, needs fix for 245399.)
Assignee | ||
Comment 27•21 years ago
|
||
Checked into trunk, marking FIXED. Thanks for the reviews.
Still requesting checkin of version 4 for 1.7, modified to escape in class (like
in version 2) instead of using nsEscapeHTML2().
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•21 years ago
|
Whiteboard: needed-aviary1.0
Updated•21 years ago
|
Attachment #149643 -
Flags: review?(akkzilla) → review-
Assignee | ||
Comment 29•21 years ago
|
||
Great. It wasn't too late when I requested the 1.7 green-light. I tried hard to
create a fix in time for 1.7, putting other projects aside.
Comment 30•21 years ago
|
||
Tracking regression in bug 246505, "simple html displays the ; from entities".
Updated•21 years ago
|
Product: MailNews → Core
Updated•17 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•