Open Bug 1205631 Opened 9 years ago Updated 2 years ago

HTML Parser ambiguity with inline svg and nested styles

Categories

(Core :: DOM: HTML Parser, defect)

defect

Tracking

()

ASSIGNED

People

(Reporter: freddy, Assigned: hsivonen)

References

()

Details

Attachments

(2 files, 1 obsolete file)

Attached file test.html (obsolete) —
I am not sure how this should parse according to the standard but it seems mightily odd:

This piece of HTML/SVG in itself execute JavaScript:
> <svg><p><style><img src="</style><img src=x onerror=alert(40)//">
(also attached as test.html for your convenience)

But the same piece of code does *not* execute JavaScript when it is assigned to a node using innerHTML!

Mario, the author of DOMPurify reported that this was tripping his sanitization library because the image element wasn't properly seen with a NodeIterator on a HTMLDocument.


He now changed his code to use a DOMParser, which seems to work.

Maybe this is a bug in our parser? In any case, the ambiguity when using the file in itself vs. using innerHTML seems noteworthy.

(This bug is in so far related to mxss as we appear to see a mutation when assigning to innerHTML, funnily the mutation prevents the execution of JavaScript, when in "mxss" it *creates* an unexpected JS execution)
What do you mean by DOMParser seems to work? That shouldn't execute script. Do you have a testcase that shows it does?
Maybe that was poorly explained. To quote from the release notes:

*snip**snip**snip*
<script>
// This is SAFE
document.body.innerHTML='<svg><p><style><img src="</style><img src=x onerror=alert(1)//">'
</script>


<script>
// This is UNSAFE
document.write('<svg><p><style><img src="</style><img src=x onerror=alert(1)//">')
</script>
*snip**snip**snip*
Yes, innerHTML does not run scripts. Did you expect that it would?
I.e. what exactly is the issue with comment 2 that you're reporting here?
Per http://www.w3.org/TR/2008/WD-html5-20080610/dom.html#innerhtml0

Note: script elements inserted using innerHTML do not execute when they are inserted.
Attached file test2.html
The spec note is about script elements, not about event handlers, which really work with innerHTML! But the alert is only a symptom, not the cause of what I am trying to describe!

The bug is about the fact that the innerHTML assignment creates a different DOM tree than the document.write call:


The DOM tree created for document.write is
> svg
>   p
>     style
>      img
(all nested. The img within style isn't parsed as HTML of course)

Using innerHTML, the DOM tree is:
> svg
> p
>   style
> img
(svg, p and img all as children of the same parent node)

This is easily observed using the Developer Tools' DOM Inspector on the file I am just attaching.
Attachment #8662349 - Attachment is obsolete: true
We have some details on why this turned out to be come a security issue here:

https://github.com/cure53/DOMPurify/releases/tag/0.6.7

Our library created the DOM tree using doc.body.outerHTML. That however generated a DOM that was missing the offending element. And that only happened on Firefox.

Other browsers generate the correct DOM tree. That why we (DOMPurify) could see the offending element and stripped it. In FF we did not see it - and when someone chose to doc.write() perfectly well-sanitized HTML, an XSS would be the consequence.
With document.write you're going to create a legacy non-html5 output aren't you but with innertHTML it will be html5 if the original document is html5 because you're just replacing the children of the existing body.

Does the following create a safe DOM?

document.write('<!DOCTYPE><svg><p><style><img src="</style><img src=x onerror=alert(1)//">')

(In reply to Robert Longson [:longsonr] from comment #8)

With document.write you're going to create a legacy non-html5 output aren't
you but with innertHTML it will be html5 if the original document is html5
because you're just replacing the children of the existing body.

Does the following create a safe DOM?

document.write('<!DOCTYPE><svg><p><style><img src="</style><img src=x
onerror=alert(1)//">')

It doesn't. The doctype doesn't seem to change things.

This has been unfixed for so long. Do you think you can help with this, Henri?
I think it might cause to website XSS, similar to the other parser problems we've had recently.

Flags: needinfo?(hsivonen)

Henri is still occupied by Fission. Alphan will help investigation here first.

Flags: needinfo?(hsivonen)

Alphan asked out of bands which issues I was referring to in comment 11.
It was mostly meant for context to judge the severity. I'm not entirely sure how relevant they will be for solving this particular issue. But for the record, I found HTML parsing security issues that we had in 2019 and in 2020 (until today):

Attached file test-new.html

(In reply to Frederik Braun [:freddy] from comment #0)

(This bug is in so far related to mxss as we appear to see a mutation when
assigning to innerHTML, funnily the mutation prevents the execution of
JavaScript, when in "mxss" it creates an unexpected JS execution)

(In reply to Robert Longson [:longsonr] from comment #5)

Per http://www.w3.org/TR/2008/WD-html5-20080610/dom.html#innerhtml0

Note: script elements inserted using innerHTML do not execute when they are
inserted.

When opening test-new.html(attachment 9138928 [details]), you can see both "innerHTML" and "document.write()".
It means FF doesn't prevent the execution of JavaScript when assigning the code into innerHTML.

Currently, I don't know the reason for the different parsing results(svg id 1 and svg id 2) when opening test2.html(attachment 9138928 [details]).
I will keep investigating it.

Compare the results of Chrome and Firefox:
[test2.html]
Firefox: only show "/document.write/"
Chrome: show "/innerHTML/" and "/document.write/"
[test-new.html]
Firefox: show "innerHTML" and "document.write()"
Chrome: show "innerHTML" and "document.write()"

Summarize current status, the reason why Firefox seems to be safer when setting .innerHTML is we have different DOM trees after parsing.

Hi Freddy,
I would like to know the main purpose of this bug.
Are we focusing on the reason for different parsing results(DOM trees) or security issue here?
From the bug title, I suppose that we are focusing on the reason here.

Flags: needinfo?(fbraun)

Yes, Alphan. I agree that the parsed trees are different, which links back to the comment 6:

The DOM tree created for document.write is

svg
    p
      style
      img

(all nested. The img within style isn't parsed as HTML of course)

Using innerHTML, the DOM tree is:

svg
 p
   style
 img

(svg, p and img all as children of the same parent node)

The bug was filed due to a security issue in a popular web security library, called DOMPurify. The library has worked around the issue, but we should still track this as a compliance/interoperability issue with other browsers.

Flags: needinfo?(fbraun)

For innerHTML, we have a specific parser called nsHtml5StringParser.
https://searchfox.org/mozilla-central/rev/7fd1c1c34923ece7ad8c822bee062dd0491d64dc/dom/base/FragmentOrElement.cpp#1959
https://searchfox.org/mozilla-central/rev/7fd1c1c34923ece7ad8c822bee062dd0491d64dc/dom/base/nsContentUtils.cpp#4835
For document.write(), we use normal parser called nsHtml5Parser.
https://searchfox.org/mozilla-central/rev/7fd1c1c34923ece7ad8c822bee062dd0491d64dc/dom/base/Document.cpp#9115

Here are some related bugs about nsHtml5StringParser and nsHtml5Parser
In bug 714777, we move fragment parsing out of nsHtml5Parser and create nsHtml5StringParser.
In bug 959150, we make innerHTML faster by not having to deal with tree ops. Here we introduce a class implements a minimal subclass of nsHtml5DocumentBuilder called nsHtml5OplessBuilder. Therefore, nsHtml5TreeOpExecutor is not used.

For nsHtml5StringParser, we have mBuilder in nsHtml5TreeBuilder but mOpSink is nullptr.
For nsHtml5StringParser, we have mOpSink in nsHtml5TreeBuilder but mBuilder is nullptr.
Normally, we just have corresponding behavior according to what we have(mBuilder or mOpSink).
But there are still some different behaviors in some cases.

For the symptom of this bug, the biggest difference is what will the treeBuilder do when there is an error in "HtmlStartTagInForeignContext".
https://searchfox.org/mozilla-central/rev/7fd1c1c34923ece7ad8c822bee062dd0491d64dc/parser/html/nsHtml5TreeBuilder.cpp#727
For document.write(), "fragement" is false.
We will popForeign(). But "fragement" is true when setting innHtml.
This causes two different DOM tree layouts as below.

[Use doc.body.innerHTML="..."]
<svg id="1">
= <p>
== <style>
=== <img src="</style><img src=x onerror=alert(/innerHTML/)//">
== </style>
= <p>
</svg>

[Use document.write("...")]
<svg id="2"></svg>
<p>
= <style><img src="</style>
= <img src=x onerror=alert(/innerHTML/)//">
</p>

[Use document.write("...")]
<svg id="2"></svg>
<p>
= <style><img src="</style>
= <img src=x onerror=alert(/innerHTML/)//">
</p>

[Tree build process]
nsHtml5TreeBuilder::createElement (aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder startTag(name=p)
--- call errEndTagDidNotMatchCurrentOpenElement ()
--- fragment=false call popForeign ---------------------------------------------> main difference
nsHtml5TreeBuilder::popForeign
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped
nsHtml5TreeBuilder::createElement (p) (aNamespace = kNameSpaceID_XHTML)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder::createElement (style) (aNamespace = kNameSpaceID_XHTML)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder::appendCharacters
nsHtml5TreeBuilder::elementPopped(style)
nsHtml5TreeBuilder::createElement (img)
nsHtml5TreeBuilder::createElement (img) (aNamespace = kNameSpaceID_XHTML)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder::elementPopped(img)

[Use doc.body.innerHTML="..."]
<svg id="1">
= <p>
== <style>
=== <img src="</style><img src=x onerror=alert(/innerHTML/)//">
== </style>
= <p>
</svg>

Since we don't popForeign, the namespace is always in kNameSpaceID_SVG.

nsHtml5TreeBuilder::createElement(svg)(aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder startTag(name=p)
--- call errHtmlStartTagInForeignContext
nsHtml5TreeBuilder::errHtmlStartTagInForeignContext
--- fragment=true --------------------------------------------> main difference
nsHtml5TreeBuilder::createElement(p)(aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder::createElement(style)(aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder startTag(name=img)
--- call errHtmlStartTagInForeignContext
nsHtml5TreeBuilder::errHtmlStartTagInForeignContext
--- fragment=true
nsHtml5TreeBuilder::createElement(img)(aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
call nsHtml5TreeBuilder::popOnEof
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped(img)
call nsHtml5TreeBuilder::popOnEof
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped(style)
--- call mBuilder->UpdateStyleSheet
call nsHtml5TreeBuilder::popOnEof
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped(p)
call nsHtml5TreeBuilder::popOnEof
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped(svg)
--- call nsHtml5TreeOperation::SvgLoad()

From comment 18-21, we know it is our design which makes the difference.
However, I think that it makes sense that we have the same parsing result when doing document.write() and setting document.body.innerHTML.

Hello Henri, what is your opinion?

Flags: needinfo?(hsivonen)

Thanks for tracing this. This is working as designed: https://html.spec.whatwg.org/#parsing-main-inforeign

In retrospect, it's debatable whether it's a good idea for the rule not to apply in the fragment case, but it would also be kinda weird to break out in the fragment case.

The backstory for why we break out in the non-fragment case is that when Hixie was writing this part of the spec, he ran a query over the Google index and found random Web pages that had incomplete SVG fragments in them. Before HTML had SVG support, these fragments didn't have user-visible effect. However, without the break-out rule, the part of these pages after the incomplete SVG fragment would have become invisible. Therefore, browsers would have had to have taken a Web compat hit in order to ship SVG-in-HTML. This way, there wasn't a Web compat impediment to shipping SVG-in-HTML.

document.write() is consistent with parsing from network.

Marking INVALID in the sense that the allegation of behavior is true, but the behavior is not a bug per spec.

Sites that care about security should parse, then apply an allow-list, and then reserialize. It's unfortunate that "should parse" isn't a single thing but contextual whether the target is fragment parsing or not.

Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(hsivonen)
Resolution: --- → INVALID

That however generated a DOM that was missing the offending element. And that only happened on Firefox.

Except it looks like Chrome has a bug: It appears to apply the break-out behavior in the fragment case.

Reopening for assessing if we should change the spec to match Chrome's failure to implement the spec.

Anne, what do you think?

Status: RESOLVED → REOPENED
Flags: needinfo?(annevk)
Resolution: INVALID → ---

Since Safari is aligned with Chrome, I would suggest we align and update the HTML Standard.

Flags: needinfo?(annevk)

The special case in the spec came from: https://www.w3.org/Bugs/Public/show_bug.cgi?id=17924

Blocks: 1669945

The HTML standard update is going to happen in https://github.com/whatwg/html/pull/6399.
I think it's time for us to make the change.

wpt tests updated in https://github.com/web-platform-tests/wpt/pull/27799 which was just merged.

The 6399 spec change introduced a regression for foster parenting; see https://github.com/whatwg/html/pull/6455

Henri, with the HTML spec change above, can you take this on?

Flags: needinfo?(hsivonen)

(In reply to Frederik Braun [:freddy] from comment #30)

Henri, with the HTML spec change above, can you take this on?

Assigning to self.

Assignee: nobody → hsivonen
Status: REOPENED → ASSIGNED
Flags: needinfo?(hsivonen)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: