Closed Bug 1854076 (CVE-2023-6205) Opened 2 years ago Closed 2 years ago

heap-use-after-free messagechannel/MessagePort.cpp:574 in mozilla::dom::MessagePort::Entangled

Categories

(Core :: DOM: postMessage, defect, P2)

defect

Tracking

()

VERIFIED FIXED
121 Branch
Tracking Status
firefox-esr102 --- wontfix
firefox-esr115 120+ verified
firefox117 --- wontfix
firefox118 --- wontfix
firefox119 --- wontfix
firefox120 + verified
firefox121 + verified

People

(Reporter: m.cooolie, Assigned: asuth)

References

Details

(Keywords: csectype-uaf, reporter-external, sec-high, Whiteboard: [adv-main120+][adv-esr115.5+][reporter-external] [client-bounty-form] [verif?][post-critsmash-triage])

Attachments

(5 files)

Attached file poc.zip

#Summary
heap-use-after-free messagechannel/MessagePort.cpp:574 in mozilla::dom::MessagePort::Entangled

#Reproduce
OS:Win10 X64
119.0a1 (2023-09-20) (64-bit)

step:

  1. python -m http.server 1337
  2. python -m ffpuppet firefox.exe -p prefs.js -d -u http://localhost:1337/poc_heavy_but_stable.html

NOTE!
poc_heavy_but_stable.html (Samples before minimization can reproduce the issue stably)
poc_mini_not_stable.html (Minimizing the sample may be helpful for analysis, but it cannot be stably reproduced)

#Type of crash
Tab process

#Analysis
MessagePortChild uses mPort to hold the original pointer of MessagePort.
The comment explains that it is because "this child is owned by this MessagePort", but this comment seems incorrect.
Searching the code found that MessagePortChild was created at [2] and assigned to MessagePort at [3].

class MessagePortChild final : public PMessagePortChild {
  friend class PMessagePortChild;

 public:
  NS_INLINE_DECL_REFCOUNTING(MessagePortChild)

  MessagePortChild();

  void SetPort(MessagePort* aPort) { mPort = aPort; }
---CUT---

  // This is a raw pointer because this child is owned by this MessagePort.
  MessagePort* mPort;
};

[2]
https://searchfox.org/mozilla-central/source/ipc/glue/BackgroundChildImpl.cpp#441
RefPtr<dom::MessagePortChild> agent = new dom::MessagePortChild(); // found in mozilla::ipc::BackgroundChildImpl::AllocPMessagePortChild

[3]
https://searchfox.org/mozilla-central/source/dom/messagechannel/MessagePort.cpp#794
mActor = static_cast<MessagePortChild*>(actor); // found in mozilla::dom::MessagePort::ConnectToPBackground

#FIX Suggestion
If the above analysis is correct,RefPtr can be used to resolve the issue.

Flags: sec-bounty?
Group: firefox-core-security → dom-core-security
Component: Security → DOM: postMessage
Keywords: csectype-uaf
Product: Firefox → Core

Andrew, do you know who might be able to look at this issue? Thanks.

Flags: needinfo?(bugmail)
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Flags: needinfo?(bugmail)

I'll guess this is old because it looks like MessagePort hasn't changed much recently.

Haven't confirmed this, but assuming it can be triggered reliably this sounds like a sec-high issue.

Keywords: sec-high
Severity: -- → S3
Priority: -- → P2

The severity field for this bug is set to S3. However, the bug is flagged with the sec-high keyword.
:asuth, could you consider increasing the severity of this security bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(bugmail)
Severity: S3 → S2
Flags: needinfo?(bugmail)

Is the security issue of 'sec-high' only a P2 priority? I didn't find a clear description in the relevant documentation, but it's clear that it is a P1 priority in the Chromium project.

Our tracking is centered more around the Severity field than the Priority field (as you can see from the bot that complained when this was S3). S2 issues, particularly sec-high ones, we try to fix in the next release or two. S1 is reserved for only the most severe issues, like an actively exploited zero day or if WebExtensions don't work for any Firefox user.

More specifically to what I expect you are getting at, I spoke to asuth about this issue yesterday, and despite the lack of activity here he has been actively looking into it. I think he got bogged down a bit because he was making sure there were not any other similar lifetime issues with these MessagePort classes, which are a bit complicated. Hopefully he'll be able to post a fix soon.

Thank you for your detailed explanation.

The problem here seems to be more subtle than an ownership problem; the confusing state machine around entangling and UpdateMustKeepAlive which maintains a self-induced strong-ref through at least entangling means that the state machine is largely correct. And changing the reference held by the MessagePortChild to a strong reference inherently would interfere with cycle collection[1]. That said, the raw pointer is inherently dangerous and the destructor and cycle collection unlink logic does not go far enough to ensure that the dangerous raw pointer is cleaned up.

By inspection, I suspect the issue here is that if we fail to create the WorkerRef we directly advanced the state machine to eStateDisentangled and call UpdateMustKeepAlive but without taking any action to ensure the actor is detached/cleaned up. UpdateMustKeepAlive will drop our strong ref, and setting the state to disentangled defeats any actions that might be taken to clean up the actor properly.

I don't believe this is possible for the normal Worker.postMessage, but this is possible due to an unfortunate timing of events related to our implementation of RTCRtpScriptTransform which this POC uses. This is the WorkerRun implementation of EventWithOptionsRunnable that was forked from MessageEventRunnable for RTCRtpScriptTransform. This is the WorkerRun implementation of MessageEventRunnable. They do not look the same. They should look the same. This is what is missing from EventWithOptionsRunnable that would have prevented the MessagePort from being deserialized when the WorkerRef could not be created. The problem is the code was forked just prior to the landing of the changes to MessageEventRunnable.

So there's a couple fixes here for tomorrow:

  • Bring EventWithOptionsRunnable up-to-date.
  • Add failsafes to MessagePort so that we drop the actor no matter what in our destructor, asserting very hard if we have to do that since it means there was a state machine problem.
    • Ensure the actor can handle the MessagePort not existing. (It may already do this.)
  • Correct the logic for when we can't create the WorkerRef to shut down the actor correctly/safely. We may also want to diagnostic assert a bit in this case because it should no longer be possible at this point to attempt to create a MessagePort after we've reached cancellation thanks to Eden's other cleanups.

In terms of follow-ups:

  • We likely need to revisit how we're handling cycle collection for the MessagePort. Right now we keep it alive entirely if it's entangled, but we really should be switching to doing KeepAliveIfHasListenersFor(nsGkAtoms::onmessage); like we do for BroadcastChannel once we've reached the entangled stage. (But we absolutely do need to keep the extra reference until the entanglement state machine is completed based on how it's currently structured.)

In terms of specific regressing versions, this specific thing we're seeing will have started in v117 with the landing of bug 1631263 and will not be present in ESR. It's not clear if a different variation would be possible in versions prior to that. The cancellation refactor bug 1800659 landed in v116 and would have eliminated prior variants. In v115 and earlier cancellation would be likely to make it very hard to create the problem, although not impossible, so I expect an ESR uplift of everything but changes to EventWithOptionsRunnable would be appropriate.

1: As noted above, the current cycle collection is not particularly cycle-collecty, so in a follow-up we should probably improve that.

Blocks: 1858809
Attachment #9359937 - Attachment description: Bug 1854076 - Improve MessagePort state machine. r=#dom-workers! → Bug 1854076 - ESR115 Improve MessagePort state machine. r=#dom-workers!
Attachment #9359937 - Attachment description: Bug 1854076 - ESR115 Improve MessagePort state machine. r=#dom-workers! → Bug 1854076 - Improve MessagePort state machine. r=#dom-workers!
Attachment #9360076 - Flags: approval-mozilla-esr115?

Comment on attachment 9359937 [details]
Bug 1854076 - Improve MessagePort state machine. r=#dom-workers!

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Moderate concern:
  • Patch clarity: The patch does not identify the exact steps required to make the UAF happen, but it's fairly clear that there's something going on with worker shutdown from the non-EventWithOptionsRunnable steps, and for EventWithOptionsRunnable a trivial investigation into the history of the class or its class hierarchy should make it clear RTCRtpScriptTransform is involved.
  • Exploitability:
    • From comment 0 it sounds like it's not entirely trivial to cause the detected UAF to occur. But attempts to cause the UAF to occur will not cause a crash or destabilize the process, so one could keep at it for a while.
    • However, in the event the freed MessagePort memory has not been repurposed, we expect the UAF to result in a write to the freed, poisoned area, followed by a crash as we try and deref a poisoned nsTArray header pointer. So failure to repurpose the memory quite possibly would crash the process immediately in a way we'd see in crash-stats.
    • Timing of when the free and UAF happen would probably have a wide variability.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Unknown
  • Which older supported branches are affected by this flaw?: ESR has the same underlying MessagePort state machine problem, but it should be significantly less exploitable because bug 1631263's introduction of EventWithOptionsRunnable or the changes to worker cancellation
  • If not all supported branches, which bug introduced the flaw?: Bug 1631263
  • Do you have backports for the affected branches?: Yes
  • If not, how different, hard to create, and risky will they be?:
  • How likely is this patch to cause regressions; how much testing does it need?: This is low risk because all changes relate to edge cases where a bad thing is already happening if the new logic comes into play.
  • Is Android affected?: Yes
Attachment #9359937 - Flags: sec-approval?
Attachment #9360076 - Flags: sec-approval?

Comment on attachment 9359937 [details]
Bug 1854076 - Improve MessagePort state machine. r=#dom-workers!

Approved to land

Attachment #9359937 - Flags: sec-approval? → sec-approval+

Comment on attachment 9360076 [details]
Bug 1854076 - ESR115 Improve MessagePort state machine. r=#dom-workers!

sec-approved

Attachment #9360076 - Flags: sec-approval? → sec-approval+
Pushed by bugmail@asutherland.org: https://hg.mozilla.org/integration/autoland/rev/b6d6a9c84257 Improve MessagePort state machine. r=dom-worker-reviewers,edenchuang
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 121 Branch

The patch landed in nightly and beta is affected.
:asuth, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox120 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(bugmail)

Comment on attachment 9359937 [details]
Bug 1854076 - Improve MessagePort state machine. r=#dom-workers!

Beta/Release Uplift Approval Request

  • User impact if declined: See comment 12.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Quoting comment 12:

This is low risk because all changes relate to edge cases where a bad thing is already happening if the new logic comes into play.

  • String changes made/needed:
  • Is Android affected?: Yes
Flags: needinfo?(bugmail)
Attachment #9359937 - Flags: approval-mozilla-beta?

I should also say I'm not entirely clear on whether I would push my ESR fix directly to ESR or use lando given the recent announcements about uplift requests via phabricator/lando. (The announcement came after I uploaded the patch.)

Comment on attachment 9359937 [details]
Bug 1854076 - Improve MessagePort state machine. r=#dom-workers!

Approved for 120.0b5

Attachment #9359937 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9360076 [details]
Bug 1854076 - ESR115 Improve MessagePort state machine. r=#dom-workers!

Approved for 115.5esr

Attachment #9360076 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+
Flags: sec-bounty? → sec-bounty+
Whiteboard: [reporter-external] [client-bounty-form] [verif?] → [adv-main120+][adv-esr115.5+][reporter-external] [client-bounty-form] [verif?]
Attached file advisory.txt
Alias: CVE-2023-6205
Flags: qe-verify+
Whiteboard: [adv-main120+][adv-esr115.5+][reporter-external] [client-bounty-form] [verif?] → [adv-main120+][adv-esr115.5+][reporter-external] [client-bounty-form] [verif?][post-critsmash-triage]
QA Whiteboard: [qa-triaged]

HI @Andrew, I tried to reproduce this issue on my end with a Windows 10 machine and Firefox Nightly 119.0a1 (2023-09-20) but I keep getting the"
Pass
assert_equals("registered", "registered")
/poc_heavy_but_stable.html:51:34

There is no crash on my end.

I copied everything from the poc.zip file in the Firefox installation folder, started the server from cmd with the "python -m http.server 1337" command and then ran the python -m ffpuppet firefox.exe -p prefs.js -d -u http://localhost:1337/poc_heavy_but_stable.html where it would start Firefox and go to that page, but no crashes occurs on my side, do I need a debug build or something similar ? am I missing some steps ?

Flags: needinfo?(bugmail)

@m.coolie can you please take a look in our latest ESR build 115.5 you can find it here:
https://ftp.mozilla.org/pub/firefox/candidates/115.5.0esr-candidates/build1/win64/en-US/

as well as our latest Beta 121.0b4:
https://ftp.mozilla.org/pub/firefox/candidates/121.0b4-candidates/build1/win64/en-US/

and see if the issue still occurs with these builds ?
Or maybe a different test case in order to reproduce this issue on our end ?

Flags: needinfo?(m.cooolie)

I don't get crash on latest ESR build 115.5.
But if it is expected to crash, you may need the ASAN version to confirm.

Flags: needinfo?(m.cooolie)

That was the issue, I was able to reproduce this issue with asan builds and I was able to verify the fix. Thanks @m.coolie.

Status: RESOLVED → VERIFIED
QA Whiteboard: [qa-triaged]
Flags: qe-verify+
Flags: needinfo?(bugmail)

Bulk-unhiding security bugs fixed in Firefox 119-121 (Fall 2023). Use "moo-doctrine-subsidy" to filter

Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: