HTML Parser ambiguity with inline svg and nested styles
Categories
(Core :: DOM: HTML Parser, defect)
Tracking
()
People
(Reporter: freddy, Assigned: hsivonen)
References
()
Details
Attachments
(2 files, 1 obsolete file)
I am not sure how this should parse according to the standard but it seems mightily odd:
This piece of HTML/SVG in itself execute JavaScript:
> <svg><p><style><img src="</style><img src=x onerror=alert(40)//">
(also attached as test.html for your convenience)
But the same piece of code does *not* execute JavaScript when it is assigned to a node using innerHTML!
Mario, the author of DOMPurify reported that this was tripping his sanitization library because the image element wasn't properly seen with a NodeIterator on a HTMLDocument.
He now changed his code to use a DOMParser, which seems to work.
Maybe this is a bug in our parser? In any case, the ambiguity when using the file in itself vs. using innerHTML seems noteworthy.
(This bug is in so far related to mxss as we appear to see a mutation when assigning to innerHTML, funnily the mutation prevents the execution of JavaScript, when in "mxss" it *creates* an unexpected JS execution)
Reporter | ||
Updated•9 years ago
|
Comment 1•9 years ago
|
||
What do you mean by DOMParser seems to work? That shouldn't execute script. Do you have a testcase that shows it does?
Reporter | ||
Comment 2•9 years ago
|
||
Maybe that was poorly explained. To quote from the release notes: *snip**snip**snip* <script> // This is SAFE document.body.innerHTML='<svg><p><style><img src="</style><img src=x onerror=alert(1)//">' </script> <script> // This is UNSAFE document.write('<svg><p><style><img src="</style><img src=x onerror=alert(1)//">') </script> *snip**snip**snip*
Comment 3•9 years ago
|
||
Yes, innerHTML does not run scripts. Did you expect that it would?
Comment 5•9 years ago
|
||
Per http://www.w3.org/TR/2008/WD-html5-20080610/dom.html#innerhtml0 Note: script elements inserted using innerHTML do not execute when they are inserted.
Reporter | ||
Comment 6•9 years ago
|
||
The spec note is about script elements, not about event handlers, which really work with innerHTML! But the alert is only a symptom, not the cause of what I am trying to describe! The bug is about the fact that the innerHTML assignment creates a different DOM tree than the document.write call: The DOM tree created for document.write is > svg > p > style > img (all nested. The img within style isn't parsed as HTML of course) Using innerHTML, the DOM tree is: > svg > p > style > img (svg, p and img all as children of the same parent node) This is easily observed using the Developer Tools' DOM Inspector on the file I am just attaching.
Comment 7•9 years ago
|
||
We have some details on why this turned out to be come a security issue here: https://github.com/cure53/DOMPurify/releases/tag/0.6.7 Our library created the DOM tree using doc.body.outerHTML. That however generated a DOM that was missing the offending element. And that only happened on Firefox. Other browsers generate the correct DOM tree. That why we (DOMPurify) could see the offending element and stripped it. In FF we did not see it - and when someone chose to doc.write() perfectly well-sanitized HTML, an XSS would be the consequence.
Comment 8•9 years ago
|
||
With document.write you're going to create a legacy non-html5 output aren't you but with innertHTML it will be html5 if the original document is html5 because you're just replacing the children of the existing body. Does the following create a safe DOM? document.write('<!DOCTYPE><svg><p><style><img src="</style><img src=x onerror=alert(1)//">')
Reporter | ||
Comment 9•5 years ago
|
||
(In reply to Robert Longson [:longsonr] from comment #8)
With document.write you're going to create a legacy non-html5 output aren't
you but with innertHTML it will be html5 if the original document is html5
because you're just replacing the children of the existing body.Does the following create a safe DOM?
document.write('<!DOCTYPE><svg><p><style><img src="</style><img src=x
onerror=alert(1)//">')
It doesn't. The doctype doesn't seem to change things.
Reporter | ||
Comment 11•4 years ago
|
||
This has been unfixed for so long. Do you think you can help with this, Henri?
I think it might cause to website XSS, similar to the other parser problems we've had recently.
Comment 12•4 years ago
|
||
Henri is still occupied by Fission. Alphan will help investigation here first.
Reporter | ||
Comment 13•4 years ago
|
||
Alphan asked out of bands which issues I was referring to in comment 11.
It was mostly meant for context to judge the severity. I'm not entirely sure how relevant they will be for solving this particular issue. But for the record, I found HTML parsing security issues that we had in 2019 and in 2020 (until today):
Comment 14•4 years ago
|
||
Comment 15•4 years ago
|
||
(In reply to Frederik Braun [:freddy] from comment #0)
(This bug is in so far related to mxss as we appear to see a mutation when
assigning to innerHTML, funnily the mutation prevents the execution of
JavaScript, when in "mxss" it creates an unexpected JS execution)
(In reply to Robert Longson [:longsonr] from comment #5)
Per http://www.w3.org/TR/2008/WD-html5-20080610/dom.html#innerhtml0
Note: script elements inserted using innerHTML do not execute when they are
inserted.
When opening test-new.html(attachment 9138928 [details]), you can see both "innerHTML" and "document.write()".
It means FF doesn't prevent the execution of JavaScript when assigning the code into innerHTML.
Currently, I don't know the reason for the different parsing results(svg id 1 and svg id 2) when opening test2.html(attachment 9138928 [details]).
I will keep investigating it.
Compare the results of Chrome and Firefox:
[test2.html]
Firefox: only show "/document.write/"
Chrome: show "/innerHTML/" and "/document.write/"
[test-new.html]
Firefox: show "innerHTML" and "document.write()"
Chrome: show "innerHTML" and "document.write()"
Comment 16•4 years ago
|
||
Summarize current status, the reason why Firefox seems to be safer when setting .innerHTML is we have different DOM trees after parsing.
Hi Freddy,
I would like to know the main purpose of this bug.
Are we focusing on the reason for different parsing results(DOM trees) or security issue here?
From the bug title, I suppose that we are focusing on the reason here.
Reporter | ||
Comment 17•4 years ago
•
|
||
Yes, Alphan. I agree that the parsed trees are different, which links back to the comment 6:
The DOM tree created for document.write is
svg
p
style
img
(all nested. The img within style isn't parsed as HTML of course)
Using innerHTML, the DOM tree is:
svg
p
style
img
(svg, p and img all as children of the same parent node)
The bug was filed due to a security issue in a popular web security library, called DOMPurify. The library has worked around the issue, but we should still track this as a compliance/interoperability issue with other browsers.
Comment 18•4 years ago
|
||
For innerHTML, we have a specific parser called nsHtml5StringParser.
https://searchfox.org/mozilla-central/rev/7fd1c1c34923ece7ad8c822bee062dd0491d64dc/dom/base/FragmentOrElement.cpp#1959
https://searchfox.org/mozilla-central/rev/7fd1c1c34923ece7ad8c822bee062dd0491d64dc/dom/base/nsContentUtils.cpp#4835
For document.write(), we use normal parser called nsHtml5Parser.
https://searchfox.org/mozilla-central/rev/7fd1c1c34923ece7ad8c822bee062dd0491d64dc/dom/base/Document.cpp#9115
Here are some related bugs about nsHtml5StringParser and nsHtml5Parser
In bug 714777, we move fragment parsing out of nsHtml5Parser and create nsHtml5StringParser.
In bug 959150, we make innerHTML faster by not having to deal with tree ops. Here we introduce a class implements a minimal subclass of nsHtml5DocumentBuilder called nsHtml5OplessBuilder. Therefore, nsHtml5TreeOpExecutor is not used.
For nsHtml5StringParser, we have mBuilder in nsHtml5TreeBuilder but mOpSink is nullptr.
For nsHtml5StringParser, we have mOpSink in nsHtml5TreeBuilder but mBuilder is nullptr.
Normally, we just have corresponding behavior according to what we have(mBuilder or mOpSink).
But there are still some different behaviors in some cases.
Comment 19•4 years ago
|
||
For the symptom of this bug, the biggest difference is what will the treeBuilder do when there is an error in "HtmlStartTagInForeignContext".
https://searchfox.org/mozilla-central/rev/7fd1c1c34923ece7ad8c822bee062dd0491d64dc/parser/html/nsHtml5TreeBuilder.cpp#727
For document.write(), "fragement" is false.
We will popForeign(). But "fragement" is true when setting innHtml.
This causes two different DOM tree layouts as below.
[Use doc.body.innerHTML="..."]
<svg id="1">
= <p>
== <style>
=== <img src="</style><img src=x onerror=alert(/innerHTML/)//">
== </style>
= <p>
</svg>
[Use document.write("...")]
<svg id="2"></svg>
<p>
= <style><img src="</style>
= <img src=x onerror=alert(/innerHTML/)//">
</p>
Comment 20•4 years ago
•
|
||
[Use document.write("...")]
<svg id="2"></svg>
<p>
= <style><img src="</style>
= <img src=x onerror=alert(/innerHTML/)//">
</p>
[Tree build process]
nsHtml5TreeBuilder::createElement (aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder startTag(name=p)
--- call errEndTagDidNotMatchCurrentOpenElement ()
--- fragment=false call popForeign ---------------------------------------------> main difference
nsHtml5TreeBuilder::popForeign
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped
nsHtml5TreeBuilder::createElement (p) (aNamespace = kNameSpaceID_XHTML)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder::createElement (style) (aNamespace = kNameSpaceID_XHTML)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder::appendCharacters
nsHtml5TreeBuilder::elementPopped(style)
nsHtml5TreeBuilder::createElement (img)
nsHtml5TreeBuilder::createElement (img) (aNamespace = kNameSpaceID_XHTML)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder::elementPopped(img)
Comment 21•4 years ago
|
||
[Use doc.body.innerHTML="..."]
<svg id="1">
= <p>
== <style>
=== <img src="</style><img src=x onerror=alert(/innerHTML/)//">
== </style>
= <p>
</svg>
Since we don't popForeign, the namespace is always in kNameSpaceID_SVG.
nsHtml5TreeBuilder::createElement(svg)(aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder startTag(name=p)
--- call errHtmlStartTagInForeignContext
nsHtml5TreeBuilder::errHtmlStartTagInForeignContext
--- fragment=true --------------------------------------------> main difference
nsHtml5TreeBuilder::createElement(p)(aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder::createElement(style)(aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder startTag(name=img)
--- call errHtmlStartTagInForeignContext
nsHtml5TreeBuilder::errHtmlStartTagInForeignContext
--- fragment=true
nsHtml5TreeBuilder::createElement(img)(aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
call nsHtml5TreeBuilder::popOnEof
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped(img)
call nsHtml5TreeBuilder::popOnEof
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped(style)
--- call mBuilder->UpdateStyleSheet
call nsHtml5TreeBuilder::popOnEof
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped(p)
call nsHtml5TreeBuilder::popOnEof
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped(svg)
--- call nsHtml5TreeOperation::SvgLoad()
Comment 22•4 years ago
|
||
From comment 18-21, we know it is our design which makes the difference.
However, I think that it makes sense that we have the same parsing result when doing document.write() and setting document.body.innerHTML.
Hello Henri, what is your opinion?
Assignee | ||
Comment 23•4 years ago
•
|
||
Thanks for tracing this. This is working as designed: https://html.spec.whatwg.org/#parsing-main-inforeign
In retrospect, it's debatable whether it's a good idea for the rule not to apply in the fragment case, but it would also be kinda weird to break out in the fragment case.
The backstory for why we break out in the non-fragment case is that when Hixie was writing this part of the spec, he ran a query over the Google index and found random Web pages that had incomplete SVG fragments in them. Before HTML had SVG support, these fragments didn't have user-visible effect. However, without the break-out rule, the part of these pages after the incomplete SVG fragment would have become invisible. Therefore, browsers would have had to have taken a Web compat hit in order to ship SVG-in-HTML. This way, there wasn't a Web compat impediment to shipping SVG-in-HTML.
document.write()
is consistent with parsing from network.
Marking INVALID in the sense that the allegation of behavior is true, but the behavior is not a bug per spec.
Sites that care about security should parse, then apply an allow-list, and then reserialize. It's unfortunate that "should parse" isn't a single thing but contextual whether the target is fragment parsing or not.
Assignee | ||
Comment 24•4 years ago
|
||
That however generated a DOM that was missing the offending element. And that only happened on Firefox.
Except it looks like Chrome has a bug: It appears to apply the break-out behavior in the fragment case.
Reopening for assessing if we should change the spec to match Chrome's failure to implement the spec.
Anne, what do you think?
Comment 25•4 years ago
|
||
Since Safari is aligned with Chrome, I would suggest we align and update the HTML Standard.
Assignee | ||
Comment 26•4 years ago
|
||
It's even on file already: https://github.com/whatwg/html/issues/5117
Assignee | ||
Comment 27•4 years ago
|
||
The special case in the spec came from: https://www.w3.org/Bugs/Public/show_bug.cgi?id=17924
Reporter | ||
Comment 28•3 years ago
|
||
The HTML standard update is going to happen in https://github.com/whatwg/html/pull/6399.
I think it's time for us to make the change.
Comment 29•3 years ago
|
||
wpt tests updated in https://github.com/web-platform-tests/wpt/pull/27799 which was just merged.
The 6399 spec change introduced a regression for foster parenting; see https://github.com/whatwg/html/pull/6455
Reporter | ||
Comment 30•3 years ago
|
||
Henri, with the HTML spec change above, can you take this on?
Assignee | ||
Comment 31•3 years ago
|
||
(In reply to Frederik Braun [:freddy] from comment #30)
Henri, with the HTML spec change above, can you take this on?
Assigning to self.
Updated•2 years ago
|
Description
•