inline SVG parsing rules allow bypassing server XSS filters

RESOLVED WONTFIX

Status

()

Core
HTML: Parser
RESOLVED WONTFIX
7 years ago
a year ago

People

(Reporter: dveditz, Unassigned)

Tracking

({sec-low})

Trunk
x86
Mac OS X
sec-low
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [sg:low])

(Reporter)

Description

7 years ago
In a recent talk Mario Heiderich points out there's an impedance mismatch between the parsing rules within inline <svg> and the rest of HTML, which may allow XSS filter bypasses. Will have to think some about whether this is likely to be useful in practice, but in theory places that might successfully block HTML script injection could be fooled by wrapping the script in <svg></svg>

No alert:
data:text/html,<script>&#x61lert(1)</script>

Alert fires
data:text/html,<svg><script>&#x61lert(1)</script></svg>

Slides:
http://www.owasp.org/images/0/03/Mario_Heiderich_OWASP_Sweden_The_image_that_called_me.pdf

Obviously in such a simple example one hopes the server would filter on the <script> tag itself, but it does demonstrate that unescaping is done prior to handing the contents off to the script compiler. If parts of the script were generated from user-content such escapes could lead to injection.
I proposed a long time ago to make the parsing rules for SVG <script>s and HTML <script>s the same, but so far this hasn't happened.

This ads another reason for doing that.
(In reply to comment #0)
> Obviously in such a simple example one hopes the server would filter on the
> <script> tag itself, but it does demonstrate that unescaping is done prior to
> handing the contents off to the script compiler. If parts of the script were
> generated from user-content such escapes could lead to injection.

Indeed, a typical XSS filter would prohibit the inclusion of all script elements, so parsing (or not) the NCRs in script content shouldn't be an issue when the whole script should get dropped by the filter anyway.

However, if a filter wants to inspect scripts with more granularity than dropping all elements whose local name is "script", the filter needs to contain an implementation of HTML parsing algorithm. Anything less will have bugs.

I'd recommend simpler XSS filters to use a full implementation of the HTML parsing algorithm anyway, because even in the absence of an <svg> tag anywhere in the document, figuring out where an HTML script ends is non-trivial. Also, if a filter is sophisticated enough to have a JS compiler in it, having an HTML parser, too, shouldn't be too big a deal.

In general, I think HTML XSS filter that doesn't parse the input into a tree (or a stream of SAX-like events), drop all unknown elements in the parsed representation (including "svg" if the filter predates HTML5) and reserialize is doomed to failure. So here a properly written pre-HTML5 XSS filter should drop the svg element and a properly written post-HTML5 XSS filter should know how to tokenize SVG script content per spec.

(In reply to comment #1)
> I proposed a long time ago to make the parsing rules for SVG <script>s and HTML
> <script>s the same, but so far this hasn't happened.

As discussed earlier, that's not workable. Making SVG script content parse like HTML scripts would fail to meet the requirement of XML-serialized SVG being pasteable into text/html. Making HTML scripts parse like SVG scripts wouldn't be compatible with existing content.

Chrome has shipped with this behavior. IE9 will ship with this behavior tomorrow. We'll ship with this behavior very soon. Opera already has an implementation. I think we're way past the point of making changes of the magnitude of unifying HTML and SVG script parsing to the HTML parsing algorithm.

I'd treat this as WONTFIX as in "known, foreseen and accepted as a tradeoff in face of conflicting requirements at the time the spec was developed".
FWIW, sg:moderate seems too high a rating to me, since the threat scenario assumes an implausible combination of bogosity (either failing to drop <svg> if SVG-unaware or failing to tokenize SVG scripts correctly if SVG-aware) and sophistication (analysis of script text for safety as opposed to dropping <script> categorically) and yet another twist of bogosity (the script analyzer then considering scripts parse errors safe while passing through the original text instead of reserializing the AST as a safety measure) in the XSS filter.

Updated

7 years ago
Blocks: 301375

Updated

7 years ago
Whiteboard: [sg:moderate] → [sg:low]
Hixie, I've pointed a number of times that the SVG WG is ok with not following the requirement strictly in all cases. If we can make parsing of HTML and SVG <script>s the same for the case when the SVG <script> is wrapped in <![CDATA[]]>, then that likely will be enough for the vast majority of the existing SVG contents out there.

I confirmed this by actually asking the SVG WG during one of their F2F meetings.
Henri, the perceived security risk here isn't that someone would accidentally let a <script> through. The risk is that they'd intentionally put a <script> through, but assume that user-generated-content injected into it would be parsed the same in SVG scripts as in HTML scripts, such that if the UGC contained numeric character references, it would close the string and do dastardly things. This is compounded by the complication that if a page puts UGC into a <script> safely today, but has an earlier part of the page that allows the unescaped string "<svg>", even if both of these filters are completely safe and never let anything through, and even if the page never uses SVG, the attacker still can end up causing the <script> to be parsed differently.
So do I understand correctly that the threat scenario is the following?

Site allows one piece of UGC in HTML content and later another piece of UGC on the page in a string literal in an inline script. The HTML content part is sanitized by blacklist so that the attacker can have <svg> inserted to the page. There are none of the break-out tags between the point where <svg> can be injected and the script that follows. When escaping UGC for the string literal in the script, only < and " are escaped but & isn't, since it's not necessary to escape & in HTML scripts, so the attacker can inject &#x22; into the string literal and terminate the literal leaving the rest of the UCG run as code.

If that's the scenario, the site would already fail at security when the first piece of UGC is able to contain <svg> due to a blacklist-based sanitizer.

Even if we accept that pages that use blacklist-based (i.e. bogus) sanitizers need to be protected from surprises, the break-out tags are a pretty big mitigating factor. The attack won't work if the page contains any of the following start tags after the UGC <svg> but before the script that contains more UGC in a string literal: "b", "big", "blockquote", "body", "br", "center", "code", "dd", "div", "dl", "dt", "em", "embed", "h1", "h2", "h3", "h4", "h5", "h6", "head", "hr", "i", "img", "li", "listing", "menu", "meta", "nobr", "ol", "p", "pre", "ruby", "s", "small", "span", "strong", "strike", "sub", "sup", "table", "tt", "u", "ul", "var". In particular, <div> and <p> are so common, that there's a very good chance that either of those occurs between the first UGC injection point and the second.

If I understand Hixie's threat scenario correctly, exploiting SVG scripts would still require the page to fail at previously-known security best practice (i.e. it would have to allow an unmatched <svg> through in the first UGC injection point which no non-bogus SVG-unaware sanitizer should be allowing) *and* the page would have to be unlucky enough not to contain any of the very common HTML start tags before the script that hosts the other piece of UGC. I, therefore, still think that the risk is very low.

As for trying to change parsing I think there are problems:

 * Giving up on the SVG WG's design requirements (the requirements thay posed before they changed their mind per comment 4) and tokenizing SVG script content the way HTML script content is tokenized would make it reasonable for authors to assume that the content of the style element is tokenized like the content of the HTML style element. If we made the content of SVG style tokenize as in XML while tokenizing the content of SVG script as in HTML, we'd end of with a result that'd be even more confusing to authors than what we have now. If we changed SVG style to tokenize like HTML style, we'd most likely end up breaking the kind of copy&paste scenarios that the SVG WG really wanted to work in the first place.

 * Even if we considered the replacement cycles of Firefox and Chrome to be fast enough that Firefox or Chrome legacy didn't matter, Microsoft has deployed this is IE. (Apple has deployed the parsing in Safari, but Safari doesn't seem to run SVG scripts. Chrome does run them. Weird.) Microsoft says IE9 will be supported until January 2020. Even if IE9 doesn't have a large active userbase all the way until 2020, I think authors who try to make Web content that works and is secure would be worse of if we had a period of even a couple of years with browsers that support SVG-in-text/html tokenizing script content substantially differently. (Which would be the case if we changed the tokenization but MS didn't agree to issue such a drastic change as a security patch for IE9.)

 * I think telling people to use CDATA sections in order to secure their code is a bad idea, because it makes it harder to write correct and secure serializers. Without CDATA sections, you can make a serializer secure by making the decision whether to escape a given character in an attribute value or in a text node by examining the character in isolation. If you have to emit CDATA sections, you have to start looking for ]]> sequences to avoid cases where UGC tries to break out of the CDATA section.

So due to the above reasons, assuming that I understand the threat scenario correctly, I think we shouldn't make SVG script content tokenize like HTML script content. However, if security experts disagree with WONTFIXing this outright, I'd rather make character references that expand to " and ' expand to the REPLACEMENT CHARACTER when the current node is an SVG script element than make SVG scripts tokenize like HTML scripts.

Comment 7

5 years ago
marking wontfix.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → WONTFIX

Updated

3 years ago
Group: core-security → core-security-release
(Reporter)

Updated

a year ago
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.