Closed Bug 1839703 (CVE-2023-3600) Opened 1 year ago Closed 1 year ago

Crash in [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::operator()]

Categories

(Core :: DOM: Workers, defect, P2)

Unspecified
Windows 11
defect

Tracking

()

RESOLVED FIXED
116 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox-esr115 116+ fixed
firefox114 --- unaffected
firefox115 + wontfix
firefox116 + fixed

People

(Reporter: mccr8, Assigned: jesup)

References

Details

(4 keywords, Whiteboard: [adv-main115.0.2+][adv-esr115.0.2+])

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/85986019-c55b-471d-a851-e951d0230621

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0  xul.dll  mozilla::ipc::DataPipeReceiver::AsyncWait::<lambda_8>::operator const  ipc/glue/DataPipe.cpp:660
0  xul.dll  NS_NewCancelableRunnableFunction<`lambda at /builds/worker/checkouts/gecko/ipc/glue/DataPipe.cpp:656:23'>::FuncCancelableRunnable::Run  xpcom/threads/nsThreadUtils.h:667
1  xul.dll  mozilla::dom::  dom/workers/WorkerPrivate.cpp:202
2  xul.dll  mozilla::dom::WorkerRunnable::Run  dom/workers/WorkerRunnable.cpp:377
3  xul.dll  nsThread::ProcessNextEvent  xpcom/threads/nsThread.cpp:1234
3  xul.dll  NS_ProcessNextEvent  xpcom/threads/nsThreadUtils.cpp:479
4  xul.dll  mozilla::dom::WorkerPrivate::DoRunLoop  dom/workers/WorkerPrivate.cpp:3287
5  xul.dll  mozilla::dom::workerinternals::  dom/workers/RuntimeService.cpp:2149
6  xul.dll  nsThread::ProcessNextEvent  xpcom/threads/nsThread.cpp:1234
6  xul.dll  NS_ProcessNextEvent  xpcom/threads/nsThreadUtils.cpp:479

13 crashes on this signature, mostly on poison values. All of them are on workers.

The crash is on this line: callback->OnInputStreamReady(self);

Both callback and self are strong references held by the callback, so I'm not sure how we can have a UAF here.

Also odd is that this crash is only showing up on 115b7 and 115b8.

Nika, any ideas? Maybe we're actually crashing inside OnInputStreamReady somehow and it isn't showing up in the stack?

By looking for UAF crashes with ExternalRunnableWrapper in the proto signature, I found a few similar crashes. They are much rarer than this one.

bp-f8a735c1-927a-43f8-a1c8-e966e0230621
bp-82fd3618-b172-4faf-b16f-d74ef0230620

Hi Eden, did we uplift something to 115.0b7 that might be relevant?

Flags: needinfo?(echuang)
Duplicate of this bug: 1840136

No, we did not upflit something to 115.0b7.

Flags: needinfo?(echuang)
Severity: -- → S3
Priority: -- → P2

It is possible the compiler is just doing something different so this top frame actually shows up correctly now. Also present on 115b9 but still nowhere else.

The severity field for this bug is set to S3. However, the bug is flagged with the sec-high keyword.
:janv, could you consider increasing the severity of this security bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(jvarga)
Severity: S3 → S2
Flags: needinfo?(jvarga)

I'm seeing another signature that looks very similar, [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::~ ]: bp-74c98a3e-dcdf-4c7c-bedb-0b2560230628

Crash Signature: [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::operator()] → [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::operator()] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::~ ]

Two more variations (rarer)
[@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::~$_0 ] bp-cc5073a5-87e7-410b-a543-be7450230625
[@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::operator() ] bp-3791f9ad-0ccd-4372-87e6-5887d0230627

Crash Signature: [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::operator()] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::~ ] → [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::operator()] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::~ ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::~$_0 ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::operator() ]

I won't add the signature because it doesn't look exactly the same, but these [@ CallbackHolder::CallbackHolder::<T>::operator() ] crashes look similar: showing up heavily on 115, on a worker thread, happening on the access of a strong pointer in the closure of a lambda on a callback.

bp-1dae6e84-81a5-4113-91a5-8cdef0230628

See Also: → 1840922

I filed a separate bug for the CallbackHolder crashes.

It is a bit hard to get helpful information from the crash stack.

I am not sure if there could be a race between creating the CancelableRunnable and releasing the nsIInputStreamCallback

Duplicate of this bug: 1841123

I've found one place where we could end up with a crash like this so far. In the InputStreamHolder constructor we attempt to create a StrongWorkerRef which take sa strong reference to this: https://searchfox.org/mozilla-central/rev/3b707c8fd7e978eebf24279ee51ccf07895cfbcb/dom/streams/UnderlyingSourceCallbackHelpers.cpp#171-179. If this fails for whatever reason, then we will end up destroying the callback aCallback when leaving the constructor, causing the object to be delete-ed during the constructor. This could easily lead to a UAF like the one we're seeing here.

I'm not sure if it's the root cause of the issue here, but I think it would cause these symptoms.

To fix this we'd want to move the logic which creates the strong worker ref out of the InputStreamHolder constructor, to a separate function, so that we can make sure to already be holding a reference before it has a chance to fail.

Oh yes, and any error in the constructor should not be silently ignored after all. Nice catch.

Should RefPtr{this} be statically prevented in the constructor?

Assignee: nobody → rjesup
Status: NEW → ASSIGNED

Comment on attachment 9341791 [details]
Bug 1839703: Move worker create into an Init() method r=nika

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Pretty hard, even if you know it's a sec issue. The relationship to workers is there; you'd have to guess that it was due to possible failure to Create. And then you'd have to figure out how to trigger it. Not impossible, though.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: 115 only it appears, though 114 in theory might be affected
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: Trivial to create if needed. We have no crashes in 114, so I think it might happen to be immune. 116 is unaffected, though the latent bug is there
  • How likely is this patch to cause regressions; how much testing does it need?: Very unlikely to regress
  • Is Android affected?: Yes

Beta/Release Uplift Approval Request

  • User impact if declined: We'll want this in 115 release next week, or in 115.0.1
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: Twitch has turned off the WebTransport code that triggered this. (It was left on accidentally)
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Low risk
  • String changes made/needed: none
  • Is Android affected?: Yes
Attachment #9341791 - Flags: sec-approval?
Attachment #9341791 - Flags: approval-mozilla-release?

(In reply to Jens Stutte [:jstutte] from comment #3)

Hi Eden, did we uplift something to 115.0b7 that might be relevant?

I believe beta 7 is when all the EARLY_BETA_OR_EARLIER features switch off. Things might be going great up to that point and that's when we find out someone merged something that depends on "nightly-only" code changes and doesn't work with the code/settings we're actually shipping. EARLY_BETA_OR_EARLIER is a cursed temptation: use sparingly

Comment on attachment 9341791 [details]
Bug 1839703: Move worker create into an Init() method r=nika

sec-approval+ = dveditz

Please work w/Release managers on uplift. OK to land on nightly 116 now either way.

Attachment #9341791 - Flags: approval-mozilla-release? → approval-mozilla-release+
Attachment #9341791 - Flags: sec-approval?
Attachment #9341791 - Flags: sec-approval+
Attachment #9341791 - Flags: approval-mozilla-release?
Attachment #9341791 - Flags: approval-mozilla-release+
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 116 Branch

The patch landed in nightly and beta is affected.
:jesup, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox115 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(rjesup)
Flags: needinfo?(rjesup)
Duplicate of this bug: 1840922
See Also: → 1842190

Copying crash signatures from duplicate bugs.

Crash Signature: [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::operator()] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::~ ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::~$_0 ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::operator() ] → [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::operator()] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::~ ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::~$_0 ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::operator() ] [@ CallbackHolder::…

Comment on attachment 9341791 [details]
Bug 1839703: Move worker create into an Init() method r=nika

Approved for 115.0.2
Approved for 115.0.2esr

Attachment #9341791 - Flags: approval-mozilla-release?
Attachment #9341791 - Flags: approval-mozilla-release+
Attachment #9341791 - Flags: approval-mozilla-esr115+
Crash Signature: [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::operator()] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::~ ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::~$_0 ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::operator() ] [@ CallbackHolder::… → [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::operator()] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::~ ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::~$_0 ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::operator() ] [@ CallbackHolder:…
Alias: CVE-2023-3600

The only one of these signature that appears before 115 is CallbackHolder::CallbackHolder::<T>::operator(), with 5 crashes in 114, 6 crashes in 113, and 9 crashes in 107-111. Almost all of those were on the Main Thread while all the >=115.0b7 crashes are on the DOM Worker thread. Safe to call 114 "unaffected" rather than imply this was a danger there.

Whiteboard: [adv-main115.0.2+][adv-esr115.0.2+]
Flags: qe-verify-

:jesup there are still some reports from 115.0.2 which includes the patch

Flags: needinfo?(rjesup)

I looked through the classes on m-c that have nsCOMPtr<nsIInputStreamCallback> and didn't see any misuses of this in their ctor. I guess I should check subclasses of nsIInputStreamCallback, too. Also looking at release specifically might be better, because maybe the relevant class got deleted or incidentally changed and that's why it went away.

I looked through the subclasses of nsIInputStreamCallback on release and I didn't see any other dodgy uses of this in ctors.

I'm on PTO, though I'll try to look at this as I have time. NI'ing some other people who might have ideas what's causing this. Certainly what we fixed and uplifted here should have fixed any problem from that constructor, which had seemed like a reasonable guess as to a cause (it might cause something like this), but these continuing reports imply that's NOT the source of the problem (or not the only source).

Flags: needinfo?(smaug)
Flags: needinfo?(rjesup)
Flags: needinfo?(nika)
Flags: needinfo?(krosylight)

Reopening, since it's still happening.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

The bug is linked to topcrash signatures, which match the following criterion:

  • Top 10 content process crashes on release

For more information, please visit BugBot documentation.

Keywords: topcrash
Status: REOPENED → ASSIGNED
Target Milestone: 116 Branch → ---

Have we seen this on 116 or 117?

krosylight, I think some of UnderlyingSourceCallbackHelpers.cpp changed recently.
Is it possible that we now keep mInput alive longer or something when calling some method on it and on 115 we somehow release mInput at wrong time? (just pondering about options here)

Flags: needinfo?(smaug)

Oh, indeed it's not in 116 beta, interesting. Per the commit list:

Today's beta: https://hg.mozilla.org/releases/mozilla-beta/log/878cb82fefe45fbd753bf9df19c7b44691234fdc/dom/streams/UnderlyingSourceCallbackHelpers.cpp
Today's release: https://hg.mozilla.org/releases/mozilla-release/log/6a1bd37c214e0d0bb321ea0f7244583f45c81a63/dom/streams/UnderlyingSourceCallbackHelpers.cpp

Only bug 1811882 and bug 1811440. IMO bug 1811882 shouldn't affect DataPipeReceiver usage so it's only bug 1811440, which did change some mInput handling, specifically it added a cautious if (mInput) check before calling AsyncWait(): https://searchfox.org/mozilla-central/rev/2d06b7d5fbfa7a31776389da87c5f895b80df8d9/dom/streams/UnderlyingSourceCallbackHelpers.cpp#306-316

But not sure that matters here as we are crashing on e5e5 and not a null pointer. 🤔

Flags: needinfo?(krosylight)

(In reply to Kagami [:saschanaz] from comment #37)

Only bug 1811882 and bug 1811440. IMO bug 1811882 shouldn't affect DataPipeReceiver usage so it's only bug 1811440, which did change some mInput handling, specifically it added a cautious if (mInput) check before calling AsyncWait(): https://searchfox.org/mozilla-central/rev/2d06b7d5fbfa7a31776389da87c5f895b80df8d9/dom/streams/UnderlyingSourceCallbackHelpers.cpp#306-316

But not sure that matters here as we are crashing on e5e5 and not a null pointer. 🤔

Looking at bug 11811882, that patch does end up removing BodyStream, which does implement nsIInputStreamCallback (https://searchfox.org/mozilla-release/rev/9eef5f03dfa0edffe746121c05e51583c88439fd/dom/base/BodyStream.h#75), so it's definitely a potential culprit if there's some reference counting issue which could lead to the BodyStream type being destroyed early while there's still an outstanding reference from AsyncWait. I don't quite know what that situation would be off the top of my head though unfortunately, and I don't see any sketchy looking reference counting in the class on first glance, unless it is used from the wrong thread in some error case.

I also did a quick scan over the changes in bug 1811440, and didn't see anything which stuck out like a sore thumb, though it might just be that the specific code path which is triggering the issue ended up changing to using some other mechanism which isn't vulnerable to the same issue. I'm unsure.

My understanding is that DataPipeReceiver is specifically for WebTransport and can only be used there..............

Oh wait, one can pass DataPipeReceiver-based WebTransportReceiveStream to new Response() and then Fetch will extract the pipe here (which was here in pre-116) and then BodyStream would get the access to DataPipeReceiver.

But yeah, not sure what could be a problem in BodyStream. Something to take a good look at.

(But wouldn't we get tons of similar errors from other nsIInputStreams if BodyStream was the culprit? It can extract nsIInputStream from blob for example and there's a good established pattern to do so e.g. new Response(blob).text().)

One notable difference between BodyStream and InputToReadableStreamAlgorithms is that worker shutdown won't close the DOM stream (although both close the corresponding nsIInputStream). Still not sure what could happen by that...

One of the big differences between DataPipeReceiver and other stream types might be when callbacks like AsyncWait happen. Unlike other streams, the data is made available by a completely different process at arbitrary times, so if we don't close down the DOM stream (as you note) we could e.g. receive notifications much later during the object's lifecycle than it would normally expect for an in-process stream? That being said, I don't know if that's the culprit, and I haven't seen code which could lead to this kind of issue.

Maybe we have some issue with the logic in BodyStream::ReleaseObjects (https://searchfox.org/mozilla-release/rev/9eef5f03dfa0edffe746121c05e51583c88439fd/dom/base/BodyStream.cpp#553-554) which destroys both a reference to this and mStreamHolder before returning from the function? I don't know how we'd end up setting up an AsyncWait after this point though, especially given that we'd need to have the mInputStream reference around to call AsyncWait with (https://searchfox.org/mozilla-release/rev/9eef5f03dfa0edffe746121c05e51583c88439fd/dom/base/BodyStream.cpp#218).

(In reply to Kagami [:saschanaz] from comment #39)

My understanding is that DataPipeReceiver is specifically for WebTransport and can only be used there..............

No, DataPipeReceiver is an IPC primitive type used for all kinds of nsIInputStream IPC. We've been focused on WebTransport because of the more recent changes there, and it using DataPipeReceiver in a bare fashion, but it's also used to send normal IPCStream data between processes (https://searchfox.org/mozilla-central/rev/7a4c08f2c3a895c9dc064734ada320f920250c1f/ipc/glue/InputStreamParams.ipdlh#56), as the fallback case for sending non-serializable pipes.

It's unfortunately quite possible for just about any listener to be potentially listening to a DataPipeReceiver if they are dealing with a stream sent over IPC. We definitely shouldn't be limiting our search to only WebTransport code.

Flags: needinfo?(nika)
See Also: → 1811882

Kagami -- given we believe your change in 116 makes this go away, can you look at what it might be about that patch that either fixes or avoids this failure? If it only avoids it, we may have a latent bug still there waiting for some future change to show up again. And we can't avoid doing the bad thing in the future (or fix the underlying bug) if we don't know what causes this.

Thanks!

Flags: needinfo?(krosylight)

Still have no idea.

What I can newly note is that BodyStream::Close could skip closing any streams at all: https://searchfox.org/mozilla-release/source/dom/base/BodyStream.cpp#504-507

So it was possible for an nsIInputStream to call on a closed BodyStream, but that doesn't really explain UAF.

The only possible owner of BodyStream is nsIInputStreams and BodyStreamHolder, which is FetchBody. Can FetchBody be UAF-ed like InputStreamHolder once was? 🤷🏻‍♀️

Flags: needinfo?(krosylight)

Crashes all continue to be 115.*

One thing we should be careful about: making sure 116 didn't change the signature of problems. Quite likely it didn't, but since we don't have a string candidate for the actual bug, it's smart to be cautious

Calling this fixed by bug 1811882 as we haven't seen any crashes in 116.* or 115.1esr since they shipped. This bug has already gotten pretty messy for tracking, so I would suggest that any further work happen in a new bug for the sake of better tracking.

Status: ASSIGNED → RESOLVED
Closed: 1 year ago1 year ago
Depends on: 1811882
Resolution: --- → FIXED
See Also: 1811882
Target Milestone: --- → 116 Branch
Crash Signature: [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::operator()] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::~ ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::~$_0 ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::operator() ] [@ CallbackHolder:… → [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::operator()] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::~ ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::~$_0 ] [@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::~$_7 ] [@ mozilla::ipc::DataP…
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: