Closed Bug 1848454 (CVE-2023-5174) Opened 7 months ago Closed 6 months ago

Double-delete UAF in BrokerServicesBase::SpawnTarget()

Categories

(Core :: Security: Process Sandboxing, defect, P1)

Firefox 116
Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
118 Branch
Tracking Status
firefox-esr102 --- wontfix
firefox-esr115 118+ fixed
firefox116 --- wontfix
firefox117 --- wontfix
firefox118 + fixed

People

(Reporter: mozillabugs, Assigned: bobowen)

References

Details

(Keywords: sec-moderate, Whiteboard: [adv-main118+][adv-esr115.3+])

Attachments

(4 files)

BrokerServicesBase::SpawnTarget() (security/sandbox/chromium/sandbox/win/src/broker_services.cc) can cause a double-delete UAF if Windows cannot duplicate a handle [1]. The bug is in the !job.IsValid() case (see lines 662 et seq below). If AddTargetPeerInternal() fails (which can occur if the Windows API DuplicateHandle() returns error), then SpawnTarget() calls SpawnCleanup(target), which does delete target. But line 638 in SpawnTarget() already called policy_base->AddTarget(target), which adds target to policy_base->targets_, which is a list of raw pointers. When policy_base's reference count goes to 0, its destructor PolicyBase::~PolicyBase() (security/sandbox/chromium/sandbox/win/src/sandbox_policy_base.cc) does:

   122: TargetSet::iterator it;
   123: for (it = targets_.begin(); it != targets_.end(); ++it) {
   124:    TargetProcess* target = (*it);
   125:    delete target; // BOOM!
   126: }

thus causing a double-delete UAF.

   423: ResultCode BrokerServicesBase::SpawnTarget(...
   ...
   426:                                            scoped_refptr<TargetPolicy> policy...) {
   ...
   457:   scoped_refptr<PolicyBase> policy_base(static_cast<PolicyBase*>(policy.get()));
   ...
   637:   // Now the policy is the owner of the target.
   638:   result = policy_base->AddTarget(target);
   ...
   646:   if (job.IsValid()) {
   ...
   662:   } else {
   663:     result = AddTargetPeerInternal(process_info.process_handle(),
   664:                                    process_info.process_id(),
   665:                                    policy_base, last_error);
   666:     if (result != SBOX_ALL_OK) {
   667:       // This may fail in the same way as Job associated processes.
   668:       // crbug.com/480639.
   669:       SpawnCleanup(target);
   670:       return result;
   671:     }
   672:   }
   ...
   675:   return result;
   676: }
   688: ResultCode BrokerServicesBase::AddTargetPeerInternal(
   ...
   695:   HANDLE tmp_process_handle = INVALID_HANDLE_VALUE;
   696:   if (!::DuplicateHandle(::GetCurrentProcess(), peer_process_handle,
   697:                          ::GetCurrentProcess(), &tmp_process_handle,
   698:                          SYNCHRONIZE, false, 0 /*no options*/)) {
   699:     *last_error = ::GetLastError();
   700:     return SBOX_ERROR_CANNOT_DUPLICATE_PROCESS_HANDLE;
   701:   }
   ...
   714: }

You can demonstrate the UAF using the debugger. To do so:

  1. Start FF and attach a debugger to it.
  2. Set a BP on broker_services.cc line 663. Also set a BP on PolicyBase()::~PolicyBase()
  3. Open any webpage in FF.
  4. When the line 663 BP fires, step into AddTargetPeerInternal(). Simulate an error from DuplicateHandle() by skipping lines 696-98, setting control to line 700, setting *lastError to 5 (or anything else except 0, really), and stepping out of the function.
  5. Step into SpawnCleanup() and notice that it does delete target. Note the value of target and open a memory window there.
  6. Proceed. Eventually the reference count of the object that was pointed to by policy_base will reach 0, and something will call PolicyBase()::~PolicyBase() in the chrome process.
  7. Step the loop and notice that ~PolicyBase() calls delete on the address you noted in step 5, which, of course, the code no longer owns.
  8. Step further and watch the mayhem.

Interestingly, BrokerServicesBase::SpawnTarget()'s author noticed that this UAF was possible (lines 667-68), but never fixed it.

I will report this bug upstream.

[1] An attacker might be able to cause this situation by, e.g., using a script to do fetches from many URLs concurrently.

Flags: sec-bounty?

I found this bug after an AST matcher I had written identified the class sandbox::TargetProcess (of which target, above, is an instance) as (1) being a reference-counted class with a public destructor; and (2) having an instance deleteed outside of its Release() method. The matcher identified the deletes in SpawnCleanup() (broker_services.cc) and PolicyBase::~PolicyBase() (sandbox_policy_base.cc) as likely to be problematic, as well as an additional delete in PolicyBase::OnJobEmpty().

It was not obvious that these deletes were actually problematic, but the following lines from target_process.h added to my suspicions:

50:  // TODO(cpu): Currently there does not seem to be a reason to implement
51:  // reference counting for this class since is internal, but kept the
52:  // the same interface so the interception framework does not need to be
53:  // touched at this point.
54:  void AddRef() {}
55:  void Release() {}

so I began reviewing the code, which is how I located the bug, which was reconfirmed by the curious lines:

667:     // This may fail in the same way as Job associated processes.
668:     // crbug.com/480639.

. The bug probably wouldn't have existed if sandbox::TargetProcess had been properly handled as a reference-counted class; in particular, not if (1) there had been no deletes outside of Release() and (2) all pointers to class objects had been reference-counting smart pointers.

The pattern that the AST matcher identifies (reference-counted class with public destructor and delete of object of such classes not inside of class::Release()) is questionable practice likely to cause bugs, since the point of reference counting is to prevent UAFs and leaks by (semi-)automating object lifetimes. A public destructor is not harmful in itself, but it creates the opportunity for non-class::Release() code to delete a class object, thus potentially destroying an object with a nonzero reference count.

The AST matcher has identified several reference-counting classes with public destructors, and 1 other non-test-code instance of a delete outside of class::Release():

txResultRecycler::recycle() (dom/xslt/xpath/txResultRecycler.cpp)

I have not examined thisdelete instance for bugginess yet.

I have integrated the AST matcher into FF's build procedure, and will attach the relevant files in a later comment.

I developed the matcher in clang-query. There are 3 queries: one that identifies reference-counting classes (basically any class that has or inherits members void AddRef() and void Release()), one that identifies "suspect" classes, which are reference-counting classes with public destructors; and one that identifies delete expressions on suspect class objects that don't occur within a suspect class's Release() method. The queries, in order, are:

	let refcountingClassMatcher cxxRecordDecl(isSameOrDerivedFrom(allOf(cxxRecordDecl(has(cxxMethodDecl(hasName("AddRef"),returns(asString("void")),parameterCountIs(0)))), cxxRecordDecl(has(cxxMethodDecl(hasName("Release"),returns(asString("void")),parameterCountIs(0)).bind("releaseMethods")))))).bind("refcountingClasses")

	let suspectClassMatcher cxxRecordDecl(allOf(refcountingClassMatcher, has(allOf(cxxDestructorDecl(), isPublic())))).bind("suspectClasses")

	let deleteMatcher cxxDeleteExpr(allOf(cxxDeleteExpr(hasDescendant(implicitCastExpr(hasImplicitDestinationType(pointsTo(qualType(hasDeclaration(suspectClassMatcher))))))), unless(hasAncestor(cxxMethodDecl(equalsBoundNode("releaseMethods"))))))

Attached is a sample program (test3.cpp) that you can load in clang-query to see how the above queries work. The first query matches RefCnterBase, RefCnterBase2, and all the classes that inherit from them, either directly or indirectly. Curiously, RefCnterBase and RefCnterBase2 match more than once, which I don't understand.

The second query matches only RefCnterBase, C, and E, apparently because these are the only reference-counting classes of which the sample program actually creates objects (it doesn't match D, even though it's a subobject of E, because D has a protected destructor). This result is suboptimal; I'd like to catch all suspect classes, even if nothing creates objects of those types.

The third query matches all the bad deletes in the test program. It does not, however, match the acceptable deletes:

  1. in foocontainer()::f() (since foocontainer isn't reference-counting);
  2. delete pi in f5() (since pi isa int *); and
  3. deletes in RefCnterBase::Release() (since they are inside a reference-counting class's Release() method).

Note that the matchers do not check for more than one delete of the same object in the same scope, so

delete pe;
delete (E*) pe;

(and the same thing in RefCnterBase::Release()) doesn't elicit a warning. These matchers weren't designed to catch this particular bug.

Attached file test3.cpp
Group: core-security → dom-core-security

I think we have a static analysis that rejects refcounted classes with public dtors, but maybe this directory is exempt because it is third party code.

(In reply to Andrew McCreight [:mccr8] from comment #3)

I think we have a static analysis that rejects refcounted classes with public dtors, but maybe this directory is exempt because it is third party code.

Hmm, if that matcher exists, it doesn't seem to be run by default in --enable-clang-plugin builds. For example, the widely-used refcounting classes mozilla::RefCounted<T> (mfbt/RefCounted.h line 292 (116.0.2) / line 296 (trunk)) and mozilla::AtomicRefCounted (same module, line 309 (116.0.2) / line 313 (trunk) have public destructors, which my matcher flags, but the default matchers don't. Oddly, their base class mozilla::RefCounted<T, Atomicity>(same module) has a public destructor in release builds, but a protected destructor in debug builds (??) I didn't do a debug build, so maybe that's why I didn't see any warnings on the base class?

My matcher also turns up, e.g., class WeakReference (mfbt/WeakPtr.h), the derived classes GenericRefCounted and GenericAtomicRefCounted (gfx/2d/GenericRefCounted.h), etc.

Attached is my matcher, with diffs for existing files versus the 116.0.2 release baseline. Note that my matcher discovered some issues in other matchers' test files, so I had to add expected-warning tags to those files.

The current upstream broker_services.cc (https://chromium.googlesource.com/chromium/src.git/+/refs/heads/main/sandbox/win/src/broker_services.cc ) is very different from the FF trunk version, so I have not reported this bug upstream.

interesting that crbug 480639 is still private despite being pretty old.

Andi: is this something our checkers should have found, or could be improved to find?

Flags: needinfo?(bpostelnicu)
Keywords: sec-moderate
Assignee: nobody → bobowencode
Severity: -- → S2
Priority: -- → P1

I think that the number of installs that might be vulnerable to this is going to be a fairly small percentage.

Normally all of the child processes that are using this code will be going through the job.IsValid() path.

However, on Windows 7 if the process is already in a job then we can't use nested jobs and have to use JOB_NONE.
Examples are using runas to start Firefox puts it in a job and also certain remote sessions use a job.

The upstream code has removed JOB_NONE entirely.

Looks like we just want to call Terminate on the target, instead of SpawnCleanup.

Status: NEW → ASSIGNED

(In reply to Bob Owen (:bobowen) from comment #9)

I think that the number of installs that might be vulnerable to this is going to be a fairly small percentage.

Normally all of the child processes that are using this code will be going through the job.IsValid() path.

However, on Windows 7 if the process is already in a job then we can't use nested jobs and have to use JOB_NONE.
Examples are using runas to start Firefox puts it in a job and also certain remote sessions use a job.

The upstream code has removed JOB_NONE entirely.

Looks like we just want to call Terminate on the target, instead of SpawnCleanup.

Hmm, I got the non-job path in Windows 10 when running FF under a limited account that was not the login account. That is, I started a cmd prompt using runas a limited account, then started FF in the cmd prompt. Is that what you mean by FF already being in a job?

(In reply to mozillabugs from comment #10)

(In reply to Bob Owen (:bobowen) from comment #9)
...
Hmm, I got the non-job path in Windows 10 when running FF under a limited account that was not the login account. That is, I started a cmd prompt using runas a limited account, then started FF in the cmd prompt. Is that what you mean by FF already being in a job?

Strange, are you logged into the machine remotely or possibly using compatibility mode?

Flags: needinfo?(mozillabugs)

Ah wait, we were using JOB_NONE for the gpu process until recently.

Flags: needinfo?(mozillabugs)

(In reply to Bob Owen (:bobowen) from comment #12)

Ah wait, we were using JOB_NONE for the gpu process until recently.

Although I imagine that makes it much less likely that DuplicateHandle will fail at this point.

(In reply to Bob Owen (:bobowen) from comment #11)

(In reply to mozillabugs from comment #10)

(In reply to Bob Owen (:bobowen) from comment #9)
...
Hmm, I got the non-job path in Windows 10 when running FF under a limited account that was not the login account. That is, I started a cmd prompt using runas a limited account, then started FF in the cmd prompt. Is that what you mean by FF already being in a job?

Strange, are you logged into the machine remotely or possibly using compatibility mode?

No. I am logged in locally and not using compatibility mode. The process being started is a content process, not a GPU process (it has the -contentproc and tab command line arguments).

Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/9a3b35d59182
Ensure an attempt is made to terminate the child when SpawnTarget fails and there is no job. r=handyman

(In reply to mozillabugs from comment #0)
...

Interestingly, BrokerServicesBase::SpawnTarget()'s author noticed that this UAF was possible (lines 667-68), but never fixed it.

Incidentally, it's difficult to know because I don't think any of us have access to that bug, but my guess is the comment is saying that the termination of the process might fail. Not necessarily that there was a known theoretical UAF (if a current processDuplicateHandle happened to fail for some reason).

Comment on attachment 9349479 [details]
Bug 1848454: Ensure an attempt is made to terminate the child when SpawnTarget fails and there is no job. r=handyman!

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Even though failure of DuplicateHandle for the same process is probably very rare, the only way we would normally be using this code is for windows 7.
    As they will only have ESR from now on, it seems sensible to uplift.
  • User impact if declined: Theoretical double delete UAF, although we don't have a PoC that can trigger it.
  • Fix Landed on Version: 118
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Simple change that effectively removes the offending extra delete.
Attachment #9349479 - Flags: approval-mozilla-esr115?
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 118 Branch

The AST matcher has identified...1 other non-test-code instance of a delete outside of class::Release():

txResultRecycler::recycle() (dom/xslt/xpath/txResultRecycler.cpp)

I have not examined thisdelete instance for bugginess yet.

I have examined this usage, and it looks OK.

(In reply to Daniel Veditz [:dveditz] from comment #8)

interesting that crbug 480639 is still private despite being pretty old.

Andi: is this something our checkers should have found, or could be improved to find?

No, we don't have something that is targeting this specific scenario. We have more generic checkers.

Flags: needinfo?(bpostelnicu)

Comment on attachment 9349479 [details]
Bug 1848454: Ensure an attempt is made to terminate the child when SpawnTarget fails and there is no job. r=handyman!

Approved for 115.3esr.

Attachment #9349479 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+
Flags: sec-bounty? → sec-bounty+
QA Whiteboard: [post-critsmash-triage]
Flags: qe-verify-
Whiteboard: [adv-main118+]
Attached file advisory.txt
Whiteboard: [adv-main118+] → [adv-main118+][adv-esr115.3+]
Group: core-security-release
Alias: CVE-2023-5174
You need to log in before you can comment on or make changes to this bug.