Closed Bug 1669945 Opened 4 years ago Closed 3 years ago

Sanitizer bypass if the sanitized markup is assigned to srcdoc

Categories

(Core :: DOM: Security, defect, P2)

defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: michal.bentkowski, Assigned: freddy)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: reporter-external, sec-moderate, Whiteboard: [reporter-external] [client-bounty-form][domsecurity-active])

I've found a bug in Firefox sanitizer that can be exploited if the result of sanitizeToString is used in a sink that doesn't parse the HTML using fragment parsing algorithm (examples being: iframe.srcdoc, data:text/html,markup etc.)

Proof of concept

Here's the proof-of-concept:

<iframe id=ifr></iframe>
<script>
    const bypass = `<svg><font color><title><u rel="</title><img src onerror=alert(document.domain)>">`;
    
    ifr.srcdoc = new Sanitizer().sanitizeToString(bypass);
    
</script>

Make sure to test on Firefox Nightly. An alert is shown despite using the sanitizer.

Explanation

There's an interesting difference in HTML parsing in HTML spec between fragment parsing algorithm and document parsing algorithm in foreign content.

Suppose we have the following markup:

<svg><p>Test

If we use document parsing algorithm (via srcdoc for instance), it creates the following DOM tree:

├ <svg svg>
└ <html p>

(I'm using a notation that tag name is prepended with a namespace name)

On the other hand, if we use fragment parsing algorithm (via innerHTML), it creates the following DOM tree:

└ <svg svg>
  └ <svg p>

This is correct per HTML spec and rules of parsing tokens in foreign content (however, interestingly, Chrome parses both cases the same way).

According to the specification, if the parses is in document parsing mode, then certain tokens (<p> included) "escape" the foreign content. The same doesn't happen in fragment parsing.

Another interesting tag that "escapes" foreign content is <font> because it does so only if it has any attributes named color, face or size.

So in document parsing mode the markup <svg><font> creates svg font, while <svg><font face> creates html font.

Now consider the following markup (that was the bypass):

<svg>
<font color>
<title>
<u rel="</title>
<img src onerror=alert(document.domain)>">

Internally, Firefox sanitizer makes use of fragment parsing algorithm. This means that the markup is parsed into the following DOM tree:

└ <svg svg>
  └ <svg font color="">
    └ <svg title>
      └ <html u rel="</title><img src onerror=alert(1)>">

All of these elements and attributes are allow-listed by the sanitizer. When authors use sanitizeToString, the markup is serialized to:

<svg>
<font color="">
<title>
<u rel="</title>
<img src onerror=alert(document.domain)>"></u></title></font></svg>

Now, when this result is assigned to srcdoc (or other places with document parsing mode), it is parsed into the following DOM tree:

├ <svg svg>
└ <html font color="">
  ├ <html title>
  │ └ #text: <u rel="
  ├ <html img src="" onerror="alert(document.domain)"/>
  └ #text: ">

So because of the difference in parsing between document and fragment, <font color> "escaped" foreign content and thus title is now parsed in HTML namespace, not in SVG namespace, which means that it is closed on first instance of </title> leading to the bypass.

Flags: sec-bounty?
Assignee: nobody → fbraun
Group: firefox-core-security → core-security
Status: UNCONFIRMED → ASSIGNED
Component: Security → DOM: Security
Ever confirmed: true
Product: Firefox → Core
Whiteboard: [reporter-external] [client-bounty-form] [verif?] → [reporter-external] [client-bounty-form]

OK, this is a real shortcoming of the API. Currently, we can't tell if the result is going to be re-parsed using a document parser or the fragment parser.
This could all be avoided for fragments only if developers avoided the re-parsing roundtrip by just using the API bits that return a DocumentFragment like domElement.append(sanitize()). That won't help the document parsing case though.

Makes me wonder if we need to re-evaluate the API and enforce a combination of Sanitization + DOM insertion.
I'm making stuff up here, but an example would be domNode.appendSanitized(dirty) and newSanitizedDocument(dirty).
That would rid us off sanitization options, which we are currently holding using the constructor pattern in new Sanitizer.

Ugh. :)

There's a very relevant spec discussion in https://github.com/WICG/sanitizer-api/issues/37 btw.

Maybe we should think about adjusting the spec? I have no idea why the distinction between fragment and non-fragment exists in the specification in the first place (maybe someone will greater knowledge will let us know) but I do believe that it shouldn't exist.

Case in point: both Safari and Chromium parses foreign content the same way no matter if it's a fragment or not (it's parsed as if it wasn't a fragment) so they ignore this part of the spec.

I realized that there's already a spec issue about it: https://github.com/whatwg/html/issues/5117

Freddy: although the DOM API is a new way to reach the sanitizer, is this likely to be actually the same underlying problem as bug 1666300 and that whole cluster?

Group: core-security → dom-core-security
Type: task → defect
Keywords: sec-moderate
Flags: needinfo?(fbraun)

No, this is tied to another problem.
The issue here is more of a parsing/serialization quirk. The sanitizer is creating a string to be consumed by a fragment parser, but the .srcdoc assignment is using a document parser instead.

Flags: needinfo?(fbraun)
Severity: -- → S4
Priority: -- → P2
Whiteboard: [reporter-external] [client-bounty-form] → [reporter-external] [client-bounty-form][domsecurity-active]
Depends on: 1205631

I don't think I can solve this without our HTML parser aligning with WebKit & Blink. Anne says, this ought to be happening, but priorities are unclear. I'm poking their team lead.

Flags: sec-bounty? → sec-bounty+

Freddy, is it now time to make the change (https://bugzilla.mozilla.org/show_bug.cgi?id=1205631#c28) :) ?

Flags: needinfo?(fbraun)

This has been super valuable report, but instead of fixing this single bug we are fixing this issue by changing the whole API.
i.e., there just isn't going to be a sanitize() function that accepts a html string without contextual parsing anymore.
The decision came out of standards discussions in https://github.com/WICG/sanitizer-api/issues/42

I think we can mark this as WONTFIX. Work on the new API shall happen in another bug, which I will link to shortly.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(fbraun)
Resolution: --- → WONTFIX

(In reply to Frederik Braun [:freddy] from comment #9)

This has been super valuable report, but instead of fixing this single bug we are fixing this issue by changing the whole API.
i.e., there just isn't going to be a sanitize() function that accepts a html string without contextual parsing anymore.
The decision came out of standards discussions in https://github.com/WICG/sanitizer-api/issues/42

I think we can mark this as WONTFIX. Work on the new API shall happen in another bug, which I will link to shortly.

So could we open this bug? I think it'll be a useful reference point in certain discussions about Sanitizer API.

Yes, I agree it would be a useful reference.
Dan, are you OK with opening this up?
The issue is still unfixed but the sanitizer is also behind a pref..

Flags: needinfo?(dveditz)

As per comment 4 it seems to me this is really a duplicate of bug 1205631. The API change wouldn't really help with this and there might still be differences between document and fragment parsing that could be exploited in this way. Hard to say until someone does a formal proof of sorts.

Really? But the new API pushes back n the notion of a context-free parser.
The sanitizeFor or setSanitizedHTML (naming TBD) functions wouldn't accept the srcdoc usecase explained here.
(That is unless we hack various other use cases into the sanitizeFor function, i.e., not just accepting element names as the context node for a fragment parser but also things like a 'document' or 'srcdoc' parameter, that invokes a totally different parameter. But that seems odd, to say the least.)

For this specific issue here, I'm hoping people would use the sanitize() function (that operates on a fragment or a document) instead.
I acknowledge it might be far fetched to just expect everyone using the right parser..

I think it's fair to say that no longer exposing a serializer helps a lot in terms of guiding people toward avoiding this pitfall.

Group: dom-core-security
Flags: needinfo?(dveditz)
You need to log in before you can comment on or make changes to this bug.