Crash in [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::operator()]
Categories
(Core :: DOM: Workers, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox-esr115 | 116+ | fixed |
firefox114 | --- | unaffected |
firefox115 | + | wontfix |
firefox116 | + | fixed |
People
(Reporter: mccr8, Assigned: jesup)
References
Details
(4 keywords, Whiteboard: [adv-main115.0.2+][adv-esr115.0.2+])
Crash Data
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
dmeehan
:
approval-mozilla-release+
dmeehan
:
approval-mozilla-esr115+
RyanVM
:
sec-approval+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/85986019-c55b-471d-a851-e951d0230621
Reason: EXCEPTION_ACCESS_VIOLATION_READ
Top 10 frames of crashing thread:
0 xul.dll mozilla::ipc::DataPipeReceiver::AsyncWait::<lambda_8>::operator const ipc/glue/DataPipe.cpp:660
0 xul.dll NS_NewCancelableRunnableFunction<`lambda at /builds/worker/checkouts/gecko/ipc/glue/DataPipe.cpp:656:23'>::FuncCancelableRunnable::Run xpcom/threads/nsThreadUtils.h:667
1 xul.dll mozilla::dom:: dom/workers/WorkerPrivate.cpp:202
2 xul.dll mozilla::dom::WorkerRunnable::Run dom/workers/WorkerRunnable.cpp:377
3 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1234
3 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:479
4 xul.dll mozilla::dom::WorkerPrivate::DoRunLoop dom/workers/WorkerPrivate.cpp:3287
5 xul.dll mozilla::dom::workerinternals:: dom/workers/RuntimeService.cpp:2149
6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1234
6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:479
13 crashes on this signature, mostly on poison values. All of them are on workers.
The crash is on this line: callback->OnInputStreamReady(self);
Both callback
and self
are strong references held by the callback, so I'm not sure how we can have a UAF here.
Also odd is that this crash is only showing up on 115b7 and 115b8.
Reporter | ||
Comment 1•1 year ago
|
||
Nika, any ideas? Maybe we're actually crashing inside OnInputStreamReady somehow and it isn't showing up in the stack?
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Comment 2•1 year ago
|
||
By looking for UAF crashes with ExternalRunnableWrapper in the proto signature, I found a few similar crashes. They are much rarer than this one.
bp-f8a735c1-927a-43f8-a1c8-e966e0230621
bp-82fd3618-b172-4faf-b16f-d74ef0230620
Comment 3•1 year ago
|
||
Hi Eden, did we uplift something to 115.0b7 that might be relevant?
Updated•1 year ago
|
Reporter | ||
Comment 6•1 year ago
|
||
It is possible the compiler is just doing something different so this top frame actually shows up correctly now. Also present on 115b9 but still nowhere else.
Comment 7•1 year ago
|
||
The severity field for this bug is set to S3. However, the bug is flagged with the sec-high
keyword.
:janv, could you consider increasing the severity of this security bug?
For more information, please visit BugBot documentation.
Updated•1 year ago
|
Reporter | ||
Comment 8•1 year ago
|
||
I'm seeing another signature that looks very similar, [@ mozilla::ipc::DataPipeReceiver::AsyncWait::<T>::~ ]: bp-74c98a3e-dcdf-4c7c-bedb-0b2560230628
Reporter | ||
Comment 9•1 year ago
|
||
Two more variations (rarer)
[@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::~$_0 ] bp-cc5073a5-87e7-410b-a543-be7450230625
[@ mozilla::ipc::DataPipeReceiver::AsyncWait::$::operator() ] bp-3791f9ad-0ccd-4372-87e6-5887d0230627
Reporter | ||
Comment 10•1 year ago
|
||
I won't add the signature because it doesn't look exactly the same, but these [@ CallbackHolder::CallbackHolder::<T>::operator() ] crashes look similar: showing up heavily on 115, on a worker thread, happening on the access of a strong pointer in the closure of a lambda on a callback.
Reporter | ||
Comment 11•1 year ago
|
||
I filed a separate bug for the CallbackHolder crashes.
Comment 12•1 year ago
•
|
||
It is a bit hard to get helpful information from the crash stack.
I am not sure if there could be a race between creating the CancelableRunnable and releasing the nsIInputStreamCallback
Comment 14•1 year ago
|
||
I've found one place where we could end up with a crash like this so far. In the InputStreamHolder
constructor we attempt to create a StrongWorkerRef
which take sa strong reference to this
: https://searchfox.org/mozilla-central/rev/3b707c8fd7e978eebf24279ee51ccf07895cfbcb/dom/streams/UnderlyingSourceCallbackHelpers.cpp#171-179. If this fails for whatever reason, then we will end up destroying the callback aCallback
when leaving the constructor, causing the object to be delete
-ed during the constructor. This could easily lead to a UAF like the one we're seeing here.
I'm not sure if it's the root cause of the issue here, but I think it would cause these symptoms.
Comment 15•1 year ago
|
||
To fix this we'd want to move the logic which creates the strong worker ref out of the InputStreamHolder
constructor, to a separate function, so that we can make sure to already be holding a reference before it has a chance to fail.
Comment 16•1 year ago
|
||
Oh yes, and any error in the constructor should not be silently ignored after all. Nice catch.
Should RefPtr{this}
be statically prevented in the constructor?
Assignee | ||
Comment 17•1 year ago
|
||
Updated•1 year ago
|
Assignee | ||
Comment 18•1 year ago
|
||
Comment on attachment 9341791 [details]
Bug 1839703: Move worker create into an Init() method r=nika
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Pretty hard, even if you know it's a sec issue. The relationship to workers is there; you'd have to guess that it was due to possible failure to Create. And then you'd have to figure out how to trigger it. Not impossible, though.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: 115 only it appears, though 114 in theory might be affected
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: Trivial to create if needed. We have no crashes in 114, so I think it might happen to be immune. 116 is unaffected, though the latent bug is there
- How likely is this patch to cause regressions; how much testing does it need?: Very unlikely to regress
- Is Android affected?: Yes
Beta/Release Uplift Approval Request
- User impact if declined: We'll want this in 115 release next week, or in 115.0.1
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce: Twitch has turned off the WebTransport code that triggered this. (It was left on accidentally)
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Low risk
- String changes made/needed: none
- Is Android affected?: Yes
Comment 19•1 year ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #3)
Hi Eden, did we uplift something to 115.0b7 that might be relevant?
I believe beta 7 is when all the EARLY_BETA_OR_EARLIER features switch off. Things might be going great up to that point and that's when we find out someone merged something that depends on "nightly-only" code changes and doesn't work with the code/settings we're actually shipping. EARLY_BETA_OR_EARLIER is a cursed temptation: use sparingly
Comment 20•1 year ago
|
||
Comment on attachment 9341791 [details]
Bug 1839703: Move worker create into an Init() method r=nika
sec-approval+ = dveditz
Please work w/Release managers on uplift. OK to land on nightly 116 now either way.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 21•1 year ago
|
||
Move worker create into an Init() method r=nika
https://hg.mozilla.org/integration/autoland/rev/878cb82fefe45fbd753bf9df19c7b44691234fdc
https://hg.mozilla.org/mozilla-central/rev/878cb82fefe4
Comment 22•1 year ago
|
||
The patch landed in nightly and beta is affected.
:jesup, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox115
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Updated•1 year ago
|
Comment 24•1 year ago
|
||
Copying crash signatures from duplicate bugs.
Comment 25•1 year ago
|
||
Comment on attachment 9341791 [details]
Bug 1839703: Move worker create into an Init() method r=nika
Approved for 115.0.2
Approved for 115.0.2esr
Comment 26•1 year ago
|
||
uplift |
Updated•1 year ago
|
Updated•1 year ago
|
Comment 27•1 year ago
|
||
uplift |
Comment 28•1 year ago
|
||
The only one of these signature that appears before 115 is CallbackHolder::CallbackHolder::<T>::operator()
, with 5 crashes in 114, 6 crashes in 113, and 9 crashes in 107-111. Almost all of those were on the Main Thread while all the >=115.0b7 crashes are on the DOM Worker thread. Safe to call 114 "unaffected" rather than imply this was a danger there.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 29•1 year ago
|
||
uplift |
Updated•1 year ago
|
Comment 30•1 year ago
|
||
:jesup there are still some reports from 115.0.2 which includes the patch
Reporter | ||
Comment 31•1 year ago
|
||
I looked through the classes on m-c that have nsCOMPtr<nsIInputStreamCallback>
and didn't see any misuses of this
in their ctor. I guess I should check subclasses of nsIInputStreamCallback, too. Also looking at release specifically might be better, because maybe the relevant class got deleted or incidentally changed and that's why it went away.
Reporter | ||
Comment 32•1 year ago
|
||
I looked through the subclasses of nsIInputStreamCallback on release and I didn't see any other dodgy uses of this
in ctors.
Assignee | ||
Comment 33•1 year ago
|
||
I'm on PTO, though I'll try to look at this as I have time. NI'ing some other people who might have ideas what's causing this. Certainly what we fixed and uplifted here should have fixed any problem from that constructor, which had seemed like a reasonable guess as to a cause (it might cause something like this), but these continuing reports imply that's NOT the source of the problem (or not the only source).
Assignee | ||
Comment 34•1 year ago
|
||
Reopening, since it's still happening.
Comment 35•1 year ago
|
||
The bug is linked to topcrash signatures, which match the following criterion:
- Top 10 content process crashes on release
For more information, please visit BugBot documentation.
Updated•1 year ago
|
Comment 36•1 year ago
|
||
Have we seen this on 116 or 117?
krosylight, I think some of UnderlyingSourceCallbackHelpers.cpp changed recently.
Is it possible that we now keep mInput alive longer or something when calling some method on it and on 115 we somehow release mInput at wrong time? (just pondering about options here)
Comment 37•1 year ago
|
||
Oh, indeed it's not in 116 beta, interesting. Per the commit list:
Today's beta: https://hg.mozilla.org/releases/mozilla-beta/log/878cb82fefe45fbd753bf9df19c7b44691234fdc/dom/streams/UnderlyingSourceCallbackHelpers.cpp
Today's release: https://hg.mozilla.org/releases/mozilla-release/log/6a1bd37c214e0d0bb321ea0f7244583f45c81a63/dom/streams/UnderlyingSourceCallbackHelpers.cpp
Only bug 1811882 and bug 1811440. IMO bug 1811882 shouldn't affect DataPipeReceiver usage so it's only bug 1811440, which did change some mInput handling, specifically it added a cautious if (mInput)
check before calling AsyncWait()
: https://searchfox.org/mozilla-central/rev/2d06b7d5fbfa7a31776389da87c5f895b80df8d9/dom/streams/UnderlyingSourceCallbackHelpers.cpp#306-316
But not sure that matters here as we are crashing on e5e5 and not a null pointer. 🤔
Comment 38•1 year ago
|
||
(In reply to Kagami [:saschanaz] from comment #37)
Only bug 1811882 and bug 1811440. IMO bug 1811882 shouldn't affect DataPipeReceiver usage so it's only bug 1811440, which did change some mInput handling, specifically it added a cautious
if (mInput)
check before callingAsyncWait()
: https://searchfox.org/mozilla-central/rev/2d06b7d5fbfa7a31776389da87c5f895b80df8d9/dom/streams/UnderlyingSourceCallbackHelpers.cpp#306-316But not sure that matters here as we are crashing on e5e5 and not a null pointer. 🤔
Looking at bug 11811882, that patch does end up removing BodyStream
, which does implement nsIInputStreamCallback
(https://searchfox.org/mozilla-release/rev/9eef5f03dfa0edffe746121c05e51583c88439fd/dom/base/BodyStream.h#75), so it's definitely a potential culprit if there's some reference counting issue which could lead to the BodyStream
type being destroyed early while there's still an outstanding reference from AsyncWait
. I don't quite know what that situation would be off the top of my head though unfortunately, and I don't see any sketchy looking reference counting in the class on first glance, unless it is used from the wrong thread in some error case.
I also did a quick scan over the changes in bug 1811440, and didn't see anything which stuck out like a sore thumb, though it might just be that the specific code path which is triggering the issue ended up changing to using some other mechanism which isn't vulnerable to the same issue. I'm unsure.
Comment 39•1 year ago
|
||
My understanding is that DataPipeReceiver is specifically for WebTransport and can only be used there..............
Oh wait, one can pass DataPipeReceiver-based WebTransportReceiveStream to new Response()
and then Fetch will extract the pipe here (which was here in pre-116) and then BodyStream would get the access to DataPipeReceiver.
But yeah, not sure what could be a problem in BodyStream. Something to take a good look at.
Comment 40•1 year ago
|
||
(But wouldn't we get tons of similar errors from other nsIInputStreams if BodyStream was the culprit? It can extract nsIInputStream from blob for example and there's a good established pattern to do so e.g. new Response(blob).text()
.)
Comment 41•1 year ago
|
||
One notable difference between BodyStream and InputToReadableStreamAlgorithms is that worker shutdown won't close the DOM stream (although both close the corresponding nsIInputStream). Still not sure what could happen by that...
Comment 42•1 year ago
|
||
One of the big differences between DataPipeReceiver
and other stream types might be when callbacks like AsyncWait
happen. Unlike other streams, the data is made available by a completely different process at arbitrary times, so if we don't close down the DOM stream (as you note) we could e.g. receive notifications much later during the object's lifecycle than it would normally expect for an in-process stream? That being said, I don't know if that's the culprit, and I haven't seen code which could lead to this kind of issue.
Maybe we have some issue with the logic in BodyStream::ReleaseObjects
(https://searchfox.org/mozilla-release/rev/9eef5f03dfa0edffe746121c05e51583c88439fd/dom/base/BodyStream.cpp#553-554) which destroys both a reference to this
and mStreamHolder
before returning from the function? I don't know how we'd end up setting up an AsyncWait
after this point though, especially given that we'd need to have the mInputStream
reference around to call AsyncWait
with (https://searchfox.org/mozilla-release/rev/9eef5f03dfa0edffe746121c05e51583c88439fd/dom/base/BodyStream.cpp#218).
(In reply to Kagami [:saschanaz] from comment #39)
My understanding is that DataPipeReceiver is specifically for WebTransport and can only be used there..............
No, DataPipeReceiver
is an IPC primitive type used for all kinds of nsIInputStream
IPC. We've been focused on WebTransport
because of the more recent changes there, and it using DataPipeReceiver
in a bare fashion, but it's also used to send normal IPCStream
data between processes (https://searchfox.org/mozilla-central/rev/7a4c08f2c3a895c9dc064734ada320f920250c1f/ipc/glue/InputStreamParams.ipdlh#56), as the fallback case for sending non-serializable pipes.
It's unfortunately quite possible for just about any listener to be potentially listening to a DataPipeReceiver
if they are dealing with a stream sent over IPC. We definitely shouldn't be limiting our search to only WebTransport
code.
Assignee | ||
Comment 43•1 year ago
|
||
Kagami -- given we believe your change in 116 makes this go away, can you look at what it might be about that patch that either fixes or avoids this failure? If it only avoids it, we may have a latent bug still there waiting for some future change to show up again. And we can't avoid doing the bad thing in the future (or fix the underlying bug) if we don't know what causes this.
Thanks!
Assignee | ||
Updated•1 year ago
|
Comment 44•1 year ago
|
||
Still have no idea.
What I can newly note is that BodyStream::Close could skip closing any streams at all: https://searchfox.org/mozilla-release/source/dom/base/BodyStream.cpp#504-507
So it was possible for an nsIInputStream to call on a closed BodyStream, but that doesn't really explain UAF.
The only possible owner of BodyStream is nsIInputStreams and BodyStreamHolder, which is FetchBody. Can FetchBody be UAF-ed like InputStreamHolder once was? 🤷🏻♀️
Assignee | ||
Comment 45•1 year ago
|
||
Crashes all continue to be 115.*
Assignee | ||
Comment 46•1 year ago
|
||
One thing we should be careful about: making sure 116 didn't change the signature of problems. Quite likely it didn't, but since we don't have a string candidate for the actual bug, it's smart to be cautious
Comment 47•1 year ago
|
||
Calling this fixed by bug 1811882 as we haven't seen any crashes in 116.* or 115.1esr since they shipped. This bug has already gotten pretty messy for tracking, so I would suggest that any further work happen in a new bug for the sake of better tracking.
Updated•1 year ago
|
Updated•10 months ago
|
Description
•