1205631 - HTML Parser ambiguity with inline svg and nested styles

Reporter

Description

•

9 years ago

Attached file test.html (obsolete) — Details

I am not sure how this should parse according to the standard but it seems mightily odd:

This piece of HTML/SVG in itself execute JavaScript:
> <svg><p><style><img src="</style><img src=x onerror=alert(40)//">
(also attached as test.html for your convenience)

But the same piece of code does *not* execute JavaScript when it is assigned to a node using innerHTML!

Mario, the author of DOMPurify reported that this was tripping his sanitization library because the image element wasn't properly seen with a NodeIterator on a HTMLDocument.


He now changed his code to use a DOMParser, which seems to work.

Maybe this is a bug in our parser? In any case, the ambiguity when using the file in itself vs. using innerHTML seems noteworthy.

(This bug is in so far related to mxss as we appear to see a mutation when assigning to innerHTML, funnily the mutation prevents the execution of JavaScript, when in "mxss" it *creates* an unexpected JS execution)

Frederik Braun [:freddy]

Reporter

Updated

•

9 years ago

URL: https://github.com/cure53/DOMPurify/r...

Robert Longson [:longsonr]

Comment 1

•

9 years ago

What do you mean by DOMParser seems to work? That shouldn't execute script. Do you have a testcase that shows it does?

Frederik Braun [:freddy]

Reporter

Comment 2

•

9 years ago

Maybe that was poorly explained. To quote from the release notes:

*snip**snip**snip*
<script>
// This is SAFE
document.body.innerHTML='<svg><p><style><img src="</style><img src=x onerror=alert(1)//">'
</script>


<script>
// This is UNSAFE
document.write('<svg><p><style><img src="</style><img src=x onerror=alert(1)//">')
</script>
*snip**snip**snip*

Robert Longson [:longsonr]

Comment 3

•

9 years ago

Yes, innerHTML does not run scripts. Did you expect that it would?

Robert Longson [:longsonr]

Comment 4

•

9 years ago

I.e. what exactly is the issue with comment 2 that you're reporting here?

Robert Longson [:longsonr]

Comment 5

•

9 years ago

Per http://www.w3.org/TR/2008/WD-html5-20080610/dom.html#innerhtml0

Note: script elements inserted using innerHTML do not execute when they are inserted.

Frederik Braun [:freddy]

Reporter

Comment 6

•

9 years ago

Attached file test2.html — Details

The spec note is about script elements, not about event handlers, which really work with innerHTML! But the alert is only a symptom, not the cause of what I am trying to describe!

The bug is about the fact that the innerHTML assignment creates a different DOM tree than the document.write call:


The DOM tree created for document.write is
> svg
>   p
>     style
>      img
(all nested. The img within style isn't parsed as HTML of course)

Using innerHTML, the DOM tree is:
> svg
> p
>   style
> img
(svg, p and img all as children of the same parent node)

This is easily observed using the Developer Tools' DOM Inspector on the file I am just attaching.

Attachment #8662349 - Attachment is obsolete: true

Mario Heiderich

Comment 7

•

9 years ago

We have some details on why this turned out to be come a security issue here:

https://github.com/cure53/DOMPurify/releases/tag/0.6.7

Our library created the DOM tree using doc.body.outerHTML. That however generated a DOM that was missing the offending element. And that only happened on Firefox.

Other browsers generate the correct DOM tree. That why we (DOMPurify) could see the offending element and stripped it. In FF we did not see it - and when someone chose to doc.write() perfectly well-sanitized HTML, an XSS would be the consequence.

Robert Longson [:longsonr]

Comment 8

•

9 years ago

With document.write you're going to create a legacy non-html5 output aren't you but with innertHTML it will be html5 if the original document is html5 because you're just replacing the children of the existing body.

Does the following create a safe DOM?

document.write('<!DOCTYPE><svg><p><style><img src="</style><img src=x onerror=alert(1)//">')

Frederik Braun [:freddy]

Reporter

Comment 9

•

5 years ago

(In reply to Robert Longson [:longsonr] from comment #8)

With document.write you're going to create a legacy non-html5 output aren't
you but with innertHTML it will be html5 if the original document is html5
because you're just replacing the children of the existing body.

Does the following create a safe DOM?

document.write('<!DOCTYPE><svg><p><style><img src="</style><img src=x
onerror=alert(1)//">')

It doesn't. The doctype doesn't seem to change things.

Frederik Braun [:freddy]

Reporter

Comment 11

•

4 years ago

This has been unfixed for so long. Do you think you can help with this, Henri?
I think it might cause to website XSS, similar to the other parser problems we've had recently.

Flags: needinfo?(hsivonen)

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 12

•

4 years ago

Henri is still occupied by Fission. Alphan will help investigation here first.

Flags: needinfo?(hsivonen)

Frederik Braun [:freddy]

Reporter

Comment 13

•

4 years ago

Alphan asked out of bands which issues I was referring to in comment 11.
It was mostly meant for context to judge the severity. I'm not entirely sure how relevant they will be for solving this particular issue. But for the record, I found HTML parsing security issues that we had in 2019 and in 2020 (until today):

Alphan Chen [:alchen]

Comment 14

•

4 years ago

Attached file test-new.html — Details

Alphan Chen [:alchen]

Comment 15

•

4 years ago

(In reply to Frederik Braun [:freddy] from comment #0)

(This bug is in so far related to mxss as we appear to see a mutation when
assigning to innerHTML, funnily the mutation prevents the execution of
JavaScript, when in "mxss" it creates an unexpected JS execution)

(In reply to Robert Longson [:longsonr] from comment #5)

Per http://www.w3.org/TR/2008/WD-html5-20080610/dom.html#innerhtml0

Note: script elements inserted using innerHTML do not execute when they are
inserted.

When opening test-new.html(attachment 9138928 [details]), you can see both "innerHTML" and "document.write()".
It means FF doesn't prevent the execution of JavaScript when assigning the code into innerHTML.

Currently, I don't know the reason for the different parsing results(svg id 1 and svg id 2) when opening test2.html(attachment 9138928 [details]).
I will keep investigating it.

Compare the results of Chrome and Firefox:
[test2.html]
Firefox: only show "/document.write/"
Chrome: show "/innerHTML/" and "/document.write/"
[test-new.html]
Firefox: show "innerHTML" and "document.write()"
Chrome: show "innerHTML" and "document.write()"

Alphan Chen [:alchen]

Comment 16

•

4 years ago

Summarize current status, the reason why Firefox seems to be safer when setting .innerHTML is we have different DOM trees after parsing.

Hi Freddy,
I would like to know the main purpose of this bug.
Are we focusing on the reason for different parsing results(DOM trees) or security issue here?
From the bug title, I suppose that we are focusing on the reason here.

Flags: needinfo?(fbraun)

Frederik Braun [:freddy]

Reporter

Comment 17

•

4 years ago

•

Edited

Yes, Alphan. I agree that the parsed trees are different, which links back to the comment 6:

The DOM tree created for document.write is

svg
    p
      style
      img

(all nested. The img within style isn't parsed as HTML of course)

Using innerHTML, the DOM tree is:

svg
 p
   style
 img

(svg, p and img all as children of the same parent node)

The bug was filed due to a security issue in a popular web security library, called DOMPurify. The library has worked around the issue, but we should still track this as a compliance/interoperability issue with other browsers.

Flags: needinfo?(fbraun)

Alphan Chen [:alchen]

Comment 18

•

4 years ago

For innerHTML, we have a specific parser called nsHtml5StringParser.
https://searchfox.org/mozilla-central/rev/7fd1c1c34923ece7ad8c822bee062dd0491d64dc/dom/base/FragmentOrElement.cpp#1959
https://searchfox.org/mozilla-central/rev/7fd1c1c34923ece7ad8c822bee062dd0491d64dc/dom/base/nsContentUtils.cpp#4835
For document.write(), we use normal parser called nsHtml5Parser.
https://searchfox.org/mozilla-central/rev/7fd1c1c34923ece7ad8c822bee062dd0491d64dc/dom/base/Document.cpp#9115

Here are some related bugs about nsHtml5StringParser and nsHtml5Parser
In bug 714777, we move fragment parsing out of nsHtml5Parser and create nsHtml5StringParser.
In bug 959150, we make innerHTML faster by not having to deal with tree ops. Here we introduce a class implements a minimal subclass of nsHtml5DocumentBuilder called nsHtml5OplessBuilder. Therefore, nsHtml5TreeOpExecutor is not used.

For nsHtml5StringParser, we have mBuilder in nsHtml5TreeBuilder but mOpSink is nullptr.
For nsHtml5StringParser, we have mOpSink in nsHtml5TreeBuilder but mBuilder is nullptr.
Normally, we just have corresponding behavior according to what we have(mBuilder or mOpSink).
But there are still some different behaviors in some cases.

Alphan Chen [:alchen]

Comment 19

•

4 years ago

For the symptom of this bug, the biggest difference is what will the treeBuilder do when there is an error in "HtmlStartTagInForeignContext".
https://searchfox.org/mozilla-central/rev/7fd1c1c34923ece7ad8c822bee062dd0491d64dc/parser/html/nsHtml5TreeBuilder.cpp#727
For document.write(), "fragement" is false.
We will popForeign(). But "fragement" is true when setting innHtml.
This causes two different DOM tree layouts as below.

[Use doc.body.innerHTML="..."]
<svg id="1">
= <p>
== <style>
=== <img src="</style><img src=x onerror=alert(/innerHTML/)//">
== </style>
= <p>
</svg>

[Use document.write("...")]
<svg id="2"></svg>
<p>
= <style><img src="</style>
= <img src=x onerror=alert(/innerHTML/)//">
</p>

Alphan Chen [:alchen]

Comment 20

•

4 years ago

•

Edited

[Use document.write("...")]
<svg id="2"></svg>
<p>
= <style><img src="</style>
= <img src=x onerror=alert(/innerHTML/)//">
</p>

[Tree build process]
nsHtml5TreeBuilder::createElement (aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder startTag(name=p)
--- call errEndTagDidNotMatchCurrentOpenElement ()
--- fragment=false call popForeign ---------------------------------------------> main difference
nsHtml5TreeBuilder::popForeign
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped
nsHtml5TreeBuilder::createElement (p) (aNamespace = kNameSpaceID_XHTML)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder::createElement (style) (aNamespace = kNameSpaceID_XHTML)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder::appendCharacters
nsHtml5TreeBuilder::elementPopped(style)
nsHtml5TreeBuilder::createElement (img)
nsHtml5TreeBuilder::createElement (img) (aNamespace = kNameSpaceID_XHTML)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder::elementPopped(img)

Alphan Chen [:alchen]

Comment 21

•

4 years ago

[Use doc.body.innerHTML="..."]
<svg id="1">
= <p>
== <style>
=== <img src="</style><img src=x onerror=alert(/innerHTML/)//">
== </style>
= <p>
</svg>

Since we don't popForeign, the namespace is always in kNameSpaceID_SVG.

nsHtml5TreeBuilder::createElement(svg)(aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder startTag(name=p)
--- call errHtmlStartTagInForeignContext
nsHtml5TreeBuilder::errHtmlStartTagInForeignContext
--- fragment=true --------------------------------------------> main difference
nsHtml5TreeBuilder::createElement(p)(aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder::createElement(style)(aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
nsHtml5TreeBuilder startTag(name=img)
--- call errHtmlStartTagInForeignContext
nsHtml5TreeBuilder::errHtmlStartTagInForeignContext
--- fragment=true
nsHtml5TreeBuilder::createElement(img)(aNamespace = kNameSpaceID_SVG)
nsHtml5TreeBuilder::appendElement
nsHtml5TreeBuilder::elementPushed
call nsHtml5TreeBuilder::popOnEof
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped(img)
call nsHtml5TreeBuilder::popOnEof
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped(style)
--- call mBuilder->UpdateStyleSheet
call nsHtml5TreeBuilder::popOnEof
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped(p)
call nsHtml5TreeBuilder::popOnEof
nsHtml5TreeBuilder::markMalformedIfScript
nsHtml5TreeBuilder::elementPopped(svg)
--- call nsHtml5TreeOperation::SvgLoad()

Alphan Chen [:alchen]

Comment 22

•

4 years ago

From comment 18-21, we know it is our design which makes the difference.
However, I think that it makes sense that we have the same parsing result when doing document.write() and setting document.body.innerHTML.

Hello Henri, what is your opinion?

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Assignee

Comment 23

•

4 years ago

•

Edited

Thanks for tracing this. This is working as designed: https://html.spec.whatwg.org/#parsing-main-inforeign

In retrospect, it's debatable whether it's a good idea for the rule not to apply in the fragment case, but it would also be kinda weird to break out in the fragment case.

The backstory for why we break out in the non-fragment case is that when Hixie was writing this part of the spec, he ran a query over the Google index and found random Web pages that had incomplete SVG fragments in them. Before HTML had SVG support, these fragments didn't have user-visible effect. However, without the break-out rule, the part of these pages after the incomplete SVG fragment would have become invisible. Therefore, browsers would have had to have taken a Web compat hit in order to ship SVG-in-HTML. This way, there wasn't a Web compat impediment to shipping SVG-in-HTML.

document.write() is consistent with parsing from network.

Marking INVALID in the sense that the allegation of behavior is true, but the behavior is not a bug per spec.

Sites that care about security should parse, then apply an allow-list, and then reserialize. It's unfortunate that "should parse" isn't a single thing but contextual whether the target is fragment parsing or not.

Status: NEW → RESOLVED

Closed: 4 years ago

Flags: needinfo?(hsivonen)

Resolution: --- → INVALID

Henri Sivonen (:hsivonen)

Assignee

Comment 24

•

4 years ago

That however generated a DOM that was missing the offending element. And that only happened on Firefox.

Except it looks like Chrome has a bug: It appears to apply the break-out behavior in the fragment case.

Reopening for assessing if we should change the spec to match Chrome's failure to implement the spec.

Anne, what do you think?

Status: RESOLVED → REOPENED

Flags: needinfo?(annevk)

Resolution: INVALID → ---

Anne (:annevk)

Comment 25

•

4 years ago

Since Safari is aligned with Chrome, I would suggest we align and update the HTML Standard.

Flags: needinfo?(annevk)

Henri Sivonen (:hsivonen)

Assignee

Comment 26

•

4 years ago

It's even on file already: https://github.com/whatwg/html/issues/5117

See Also: → https://github.com/whatwg/html/issues/5117

Henri Sivonen (:hsivonen)

Assignee

Comment 27

•

4 years ago

The special case in the spec came from: https://www.w3.org/Bugs/Public/show_bug.cgi?id=17924

Frederik Braun [:freddy]

Reporter

Updated

•

4 years ago

Blocks: 1669945

Frederik Braun [:freddy]

Reporter

Comment 28

•

3 years ago

The HTML standard update is going to happen in https://github.com/whatwg/html/pull/6399.
I think it's time for us to make the change.

See Also: → https://github.com/whatwg/html/pull/6399

Simon Pieters [:zcorpan]

Comment 29

•

3 years ago

wpt tests updated in https://github.com/web-platform-tests/wpt/pull/27799 which was just merged.

The 6399 spec change introduced a regression for foster parenting; see https://github.com/whatwg/html/pull/6455

Frederik Braun [:freddy]

Reporter

Comment 30

•

3 years ago

Henri, with the HTML spec change above, can you take this on?

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Assignee

Comment 31

•

3 years ago

(In reply to Frederik Braun [:freddy] from comment #30)

Henri, with the HTML spec change above, can you take this on?

Assigning to self.

Assignee: nobody → hsivonen

Status: REOPENED → ASSIGNED

Flags: needinfo?(hsivonen)

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

test.html 9 years ago Frederik Braun [:freddy] 67 bytes, text/html		Details
test2.html 9 years ago Frederik Braun [:freddy] 297 bytes, text/html		Details
test-new.html 4 years ago Alphan Chen [:alchen] 305 bytes, text/html		Details