Closed Bug 1510114 (CVE-2018-18500) Opened 2 years ago Closed 2 years ago

Heap write-after-free in Custom Elements / HTML5 Parser

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla66
Tracking Status
firefox-esr60 65+ verified
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 + verified
firefox66 + verified

People

(Reporter: yaniv.frank, Assigned: hsivonen)

References

Details

(Keywords: csectype-uaf, regression, sec-critical, Whiteboard: [post-critsmash-triage][adv-main65+][adv-esr60.5+][coordinate disclosure with bug 1524711])

Attachments

(5 files)

While parsing an HTML5 stream, a microtask checkpoint is performed when a custom element is being created.
By using a mutation observer callback it's possible to run arbitrary javascript code inside this checkpoint, allowing us to cause the destruction of the currently running nsHtml5StreamParser object by navigating to another page.
When the checkpoint finishes, a write to freed heap memory (a 0x1000 sized allocation previously belonging to nsHtml5StreamParser) occurs.
The content being written to freed memory is a pointer to a newly created nsIContent object, and it is written at a controllable offset (determined by the number of elements in the stream).

In the attached POC, a WebGLBuffer object is allocated in place of the freed object and the bug is used to overwrite its mByteLength field, potentially allowing for further disclosure and corruption of memory. tested successfully on Firefox 63.0.3 - Windows 7.
$ python delay_http_server.py 8080 &
$ firefox http://127.0.0.1:8080/customelements_uaf_poc.html
Attached file delay_http_server.py
Henri, can you make sure someone takes a look at this? Thanks.
Flags: needinfo?(hsivonen)
Attachment #9027785 - Attachment mime type: text/x-log → text/plain
So the key part of the code looks like this (in nsHtml5TreeOperation.cpp):

    case eTreeOpCreateHTMLElementNetwork:
    case eTreeOpCreateHTMLElementNotNetwork: {
      nsIContent** target = mOne.node; // Now "target" points into "this"
...
      *target = CreateHTMLElement(name, ....) // Runs script, destroys "this".

Script runs under the CreateHTMLElement call.  There are provisions for that there.  The script in this example basically does:

		location.replace("data:text/html,");
		delayed_xml.async = false;
		delayed_xml.load('delay.xml');

to start a load and spin the event loop, but it could use sync XHR as well (to work around us fixing bug 332175).

I would assume we have some setup for preventing this problem when the nsHtml5TreeOperation would run an inline <script>, right?  We probably need to use a similar setup here or something.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Group: core-security → dom-core-security
A non-asyc/defer <script> is always the last operation of a tree op flush. It's been a while since I've thought about custom element, but I'd have expected microtasks not to be run until the end of the tree of flush.
(In reply to Henri Sivonen (:hsivonen) from comment #5)
> A non-asyc/defer <script> is always the last operation of a tree op flush.
> It's been a while since I've thought about custom element, but I'd have
> expected microtasks not to be run until the end of the tree of flush.

s/tree of flush/tree op flush/
> I'd have expected microtasks not to be run until the end of the tree of flush

So per spec, there's a microtask checkpoint performed right before creating an element with a custom element definition.  See https://html.spec.whatwg.org/multipage/parsing.html#create-an-element-for-the-token step 6.  That's basically what we implement, looks like; there's this code in nsHtml5TreeOperation::CreateHTMLElement:

    {
      nsAutoMicroTask mt;
    }

which will do a checkpoint.  Furthermore, step 9 of the same algorithm will invoke custom element reactions (which is running script directly; no microtask checkpoints involved) and those invocations will do their own microtask checkpoints.
OK. After refreshing my memory of this code, https://searchfox.org/mozilla-central/source/parser/html/nsHtml5AutoPauseUpdate.h#16 is supposed to keep the builder alive, which is supposed to keep the op queue, including the current op alive. nsHtml5TreeOpExecutor::ClearOpQueue(), nsHtml5TreeOpExecutor::MoveOpsFrom(nsTArray<nsHtml5TreeOperation>& aOpQueue),  and nsHtml5TreeOpExecutor::RemoveFromStartOfOpQueue(size_t aNumberOfOpsToRemove) all MOZ_RELEASE_ASSERT that it's OK to modify the op queue. nsHtml5AutoFlush is supposed to manage the state that the MOZ_RELEASE_ASSERTs assert on.

https://searchfox.org/mozilla-central/source/parser/html/nsHtml5TreeOpStage.cpp#24 is supposed to be protected by mMutex.

So what I said in comment 5 about expecting microtasks not to run was wrong and instead something is broken in the above-described mechanisms.
We do have a nsHtml5AutoPauseUpdate in CreateHTMLElement.  But it's going out of scope before we return from the call, and the treebuilder dies at that point.

Would it make sense to move the scoping for that nsHtml5AutoPauseUpdate out to nsHtml5TreeOperation::Perform in the eTreeOpCreateHTMLElement* cases?

I don't know whether we'd want something in nsHtml5TreeBuilder::createElement too (that calls CreateHTMLElement).
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #9)
> Would it make sense to move the scoping for that nsHtml5AutoPauseUpdate out
> to nsHtml5TreeOperation::Perform in the eTreeOpCreateHTMLElement* cases?

Making the doc update have a larger scope might be bad in terms of performance, since we'd close and reopen doc updates when not strictly necessary. (Though I'm not sure if Stylo has changed the cost from what it used to be.)

As for keeping the builder alive, we are also supposed to have outer stack-based pinning:
https://searchfox.org/mozilla-central/source/parser/html/nsHtml5TreeOpExecutor.cpp#579
https://searchfox.org/mozilla-central/source/parser/html/nsHtml5TreeOpExecutor.cpp#392

So when the use after free in the first stack in the ASAN report runs, there should be nsHtml5FlushLoopGuard guard(this); in nsHtml5TreeOpExecutor::RunFlushLoop() keeping the nsHtml5TreeOpExecutor alive. And, yet, the cycle collector running from the nested event loop is the second ASAN stack feels eligible to destroy the nsHtml5TreeOpExecutor...

> I don't know whether we'd want something in
> nsHtml5TreeBuilder::createElement too (that calls CreateHTMLElement).

In the innerHTML case, which is different and is what I had in mind in comment 5.
I don't think CC is destroying the nsHtml5TreeOpExecutor.

It _is_ destroying the nsHtml5TreeBuilder.  The asan log attached claims that the op is being destroyed from ~nsHtml5TreeBuilder() on nsHtml5TreeBuilderCppSupplement.h line 98, which is right after the "mOpQueue.Clear()" in ~nsHtml5TreeBuilder().

That said, it looks to me like the tree op executor has a separate mOpQueue, right?  And that's the one we're going through in RunFlushLoop.  So it's not clear to me why destroying the nsHtml5TreeBuilder would affect that, offhand.
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #11)
> I don't think CC is destroying the nsHtml5TreeOpExecutor.
> 
> It _is_ destroying the nsHtml5TreeBuilder.

Oops. Right.

Specifically, it is destroyed by nsHtml5StreamParser::~nsHtml5StreamParser(), so it's the stream parsing nsHtml5TreeBuilder that's getting destroyed.

> That said, it looks to me like the tree op executor has a separate mOpQueue,
> right?  And that's the one we're going through in RunFlushLoop.  So it's not
> clear to me why destroying the nsHtml5TreeBuilder would affect that, offhand.

It appears that at some point, nsTArray has changed and as a side effect of the changes, the transfer of tree ops across threads has started using C++ moves applied to nsTArray. I guess the next step is auditing if moving from an nsTArray works the way the parser code expects.
Mostly a note to self: This needs e10s enabled to reproduce under ASAN.
(In reply to Henri Sivonen (:hsivonen) from comment #12)
> It appears that at some point, nsTArray has changed and as a side effect of
> the changes, the transfer of tree ops across threads has started using C++
> moves applied to nsTArray. I guess the next step is auditing if moving from
> an nsTArray works the way the parser code expects.

FWIW, as a matter of general C++ advice of "don't operate on moved-from values", the introduction of moves to op queue handling is wrong, since the parser will operated on a moved-from queue expecting it to have the post-move semantics of an empty nsTArray.

(I still haven't located the actual concrete problem, though. At least superficially, move seems to leave the moved-from nsTArray in the empty state.)
A local ASAN debug build has fewer symbols optimized out and points to the real reason. It's not the op queue that gets deleted from underneath the executor (i.e. it's not a matter of moves with nsTArray). Instead, what gets deleted is mOldHandles and mHandles on the nsHtml5TreeBuilder.
Ah, mOne.node is a pointer into mHandles, right.

So making sure we keep the nsHtml5StreamParser alive on the stack should help, right?
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Flags: needinfo?(hsivonen)
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #16)
> So making sure we keep the nsHtml5StreamParser alive on the stack should
> help, right?

Yes, it seems to be sufficient to fix this. Normally the stream parser is dropped at the end of the stream, but with the patch, it's kept alive via the stack if the parser is terminated from within a custom element constructor.
Comment on attachment 9029299 [details]
Bug 1510114.

[Security Approval Request]

How easily could an exploit be constructed based on the patch?: Not easy. The patch makes it obvious that we have a use-after-free involving nsHtml5StreamParser but does not make it obvious how to trigger it.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No

Which older supported branches are affected by this flaw?: All, but caniuse.com suggests not by default for ESR

If not all supported branches, which bug introduced the flaw?: Bug 1378079

Do you have backports for the affected branches?: No

If not, how different, hard to create, and risky will they be?: Backports are the same patch, possible with whitespace changes.

How likely is this patch to cause regressions; how much testing does it need?: Extremely unlikely. It's probably good to have a person other than me check with an ASAN build that the steps to reproduce no longer reproduce the ASAN failure.
Attachment #9029299 - Flags: sec-approval?
Comment on attachment 9029299 [details]
Bug 1510114.

sec-approval+ for trunk. Please create and nominate patches for beta and ESR60 as well.
Attachment #9029299 - Flags: sec-approval? → sec-approval+
(In reply to Al Billings [:abillings] from comment #20)
> sec-approval+ for trunk.

https://hg.mozilla.org/integration/autoland/rev/69649540a938

> Please create and nominate patches for beta and ESR60 as well.

The same patch applies. I'll nominate once the trunk landing has made it to Nightly.
https://hg.mozilla.org/integration/autoland/rev/69649540a938d1f3d9f145f34d074932a3feff4e
https://hg.mozilla.org/mozilla-central/rev/69649540a938
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Comment on attachment 9029299 [details]
Bug 1510114.

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: Bug 1378079

User impact if declined: The usual assumed consequences of use-after-free.

Is this code covered by automated tests?: No

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: Yes

If yes, steps to reproduce: See comment 0 for the steps (to be performed in an ASAN build).

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): This patch merely holds an existing object alive for slightly longer (until the end of a method scope) than before. Making an object cycle collection-eligible slightly later poses a very low risk.

String changes made/needed: None

[ESR Uplift Approval Request]

If this is not a sec:{high,crit} bug, please state case for ESR consideration: 

User impact if declined: The usual assumed consequences of use-after-free.

Fix Landed on Version: 66

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): This patch merely holds an existing object alive for slightly longer (until the end of a method scope) than before. Making an object cycle collection-eligible slightly later poses a very low risk.

String or UUID changes made by this patch: None
Attachment #9029299 - Flags: approval-mozilla-esr60?
Attachment #9029299 - Flags: approval-mozilla-beta?
Comment on attachment 9029299 [details]
Bug 1510114.

[Triage Comment]
Fixes a sec-crit. Approved for 65.0b6 and 60.5.0esr.
Attachment #9029299 - Flags: approval-mozilla-esr60?
Attachment #9029299 - Flags: approval-mozilla-esr60+
Attachment #9029299 - Flags: approval-mozilla-beta?
Attachment #9029299 - Flags: approval-mozilla-beta+
Flags: sec-bounty?
Alias: CVE-2018-18500
Flags: qe-verify+
Whiteboard: [post-critsmash-triage]
In order to verify this issue, I will need more explicit steps. Thanks.
(In reply to Stefan [:StefanG_QA] from comment #27)
> In order to verify this issue, I will need more explicit steps. Thanks.

1. Use 64-bit Linux
2. Download attachment 9027787 [details] and attachment 9027788 [details] and put both in the same directory.
3. cd into that directory
4. Run python delay_http_server.py 8080
5. In another terminal window, start an ASAN-enabled Firefox build.
6. In the Firefox build navigate to http://127.0.0.1:8080/customelements_uaf_poc.html
7. Check the terminal output from Firefox.

If ASAN didn't report an error as part of the terminal output from Firefox, things should be OK.
Flags: sec-bounty? → sec-bounty+
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main65+][adv-esr60.5+]

Mozilla/5.0 (X11; Linux x86_64; rv:65.0) Gecko/20100101 Firefox/65.0 (20181219203225)

I've tested this issue on Ubuntu 18.10 using the provided steps in comment 28. During execution, I experienced several tab crashes and two browser crashes. Attached is the output from the terminal after a browser crash.

Flags: needinfo?(hsivonen)

On surface, attachment 9040239 [details] looks unrelated to the Custom Elements bug. Also, the Custom Elements bug with this POC was expected to only crash the tab, not the whole browser.

However, the POC uses WebGLBuffer, so given "GraphicsCriticalError" in the ASAN report, it's possible that the same POC also reveals a graphics bug now that the parser bug is no longer in the way. Could you, please, file a new graphics security bug with the steps you took?

Flags: needinfo?(hsivonen)

Based on comment 30 I will mark this issue as Verified fixed since the encountered error is not related to the issue presented here.

Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0 (20181220134820)

Status: RESOLVED → VERIFIED
Flags: qe-verify+

(In reply to Henri Sivonen (:hsivonen) from comment #30)

However, the POC uses WebGLBuffer, so given "GraphicsCriticalError" in the ASAN report, it's possible that the same POC also reveals a graphics bug now that the parser bug is no longer in the way. Could you, please, file a new graphics security bug with the steps you took?

This is now bug 1524711.

Whiteboard: [post-critsmash-triage][adv-main65+][adv-esr60.5+] → [post-critsmash-triage][adv-main65+][adv-esr60.5+][coordinate disclosure with bug 1524711]
See Also: → 1524711
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.